真实世界的pg——bit类型的底层存储分析

Home / Blog / Download / About me

真实世界的pg——bit类型的底层存储分析

Directory

这是一个只有0和1的世界。

对于数据库,大多数人的需求都是满足功能需求,支撑业务的正常和发展,简单的来说就是dml。而我想知道,一个字符串,在pg中是如何存储的。

bit类型是pg中存储2进制数据的一个数据类型,今天,来研究一下它的存储方式。

一、创建测试表

1.1 用python生成创建的sql

>>> print 'create table ('
create table (
>>> for i in range(1,83):
...     print 'c' + str(i) + ' bit(' + str(i) +'),';
... 
c1 bit(1),
c2 bit(2),
c3 bit(3),
c4 bit(4),
c5 bit(5),
c6 bit(6),
c7 bit(7),
c8 bit(8),
c9 bit(9),
...

1.2 建表

这里创建一个83个字段的t1表,同时也是83个长度的bit类型,83是随便取的数字。

pg用的11rc3

postgres=# select version();
                                                 version                                                  
----------------------------------------------------------------------------------------------------------
 PostgreSQL 11rc1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit
(1 row)

postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 c1     | bit(1)  |           |          | 
 c2     | bit(2)  |           |          | 
 c3     | bit(3)  |           |          | 
 c4     | bit(4)  |           |          | 
 c5     | bit(5)  |           |          | 
 c6     | bit(6)  |           |          | 
 c7     | bit(7)  |           |          | 
 c8     | bit(8)  |           |          | 
 c9     | bit(9)  |           |          | 
 c10    | bit(10) |           |          | 
 c11    | bit(11) |           |          | 
 c12    | bit(12) |           |          | 
 c13    | bit(13) |           |          | 
 c14    | bit(14) |           |          | 
 c15    | bit(15) |           |          | 
 c16    | bit(16) |           |          | 
 c17    | bit(17) |           |          | 
 c18    | bit(18) |           |          | 
 c19    | bit(19) |           |          | 
 c20    | bit(20) |           |          | 
 c21    | bit(21) |           |          | 
 c22    | bit(22) |           |          | 
 c23    | bit(23) |           |          | 
 c24    | bit(24) |           |          | 
 c25    | bit(25) |           |          | 
 c26    | bit(26) |           |          | 
 c27    | bit(27) |           |          | 
 c28    | bit(28) |           |          | 
 c29    | bit(29) |           |          | 
 c30    | bit(30) |           |          | 
 c31    | bit(31) |           |          | 
 c32    | bit(32) |           |          | 
 c33    | bit(33) |           |          | 
 c34    | bit(34) |           |          | 
 c35    | bit(35) |           |          | 
 c36    | bit(36) |           |          | 
 c37    | bit(37) |           |          | 
 c38    | bit(38) |           |          | 
 c39    | bit(39) |           |          | 
 c40    | bit(40) |           |          | 
 c41    | bit(41) |           |          | 
 c42    | bit(42) |           |          | 
 c43    | bit(43) |           |          | 
 c44    | bit(44) |           |          | 
 c45    | bit(45) |           |          | 
 c46    | bit(46) |           |          | 
 c47    | bit(47) |           |          | 
 c48    | bit(48) |           |          | 
 c49    | bit(49) |           |          | 
 c50    | bit(50) |           |          | 
 c51    | bit(51) |           |          | 
 c52    | bit(52) |           |          | 
 c53    | bit(53) |           |          | 
 c54    | bit(54) |           |          | 
 c55    | bit(55) |           |          | 
 c56    | bit(56) |           |          | 
 c57    | bit(57) |           |          | 
 c58    | bit(58) |           |          | 
 c59    | bit(59) |           |          | 
 c60    | bit(60) |           |          | 
 c61    | bit(61) |           |          | 
 c62    | bit(62) |           |          | 
 c63    | bit(63) |           |          | 
 c64    | bit(64) |           |          | 
 c65    | bit(65) |           |          | 
 c66    | bit(66) |           |          | 
 c67    | bit(67) |           |          | 
 c68    | bit(68) |           |          | 
 c69    | bit(69) |           |          | 
 c70    | bit(70) |           |          | 
 c71    | bit(71) |           |          | 
 c72    | bit(72) |           |          | 
 c73    | bit(73) |           |          | 
 c74    | bit(74) |           |          | 
 c75    | bit(75) |           |          | 
 c76    | bit(76) |           |          | 
 c77    | bit(77) |           |          | 
 c78    | bit(78) |           |          | 
 c79    | bit(79) |           |          | 
 c80    | bit(80) |           |          | 
 c81    | bit(81) |           |          | 
 c82    | bit(82) |           |          | 
 c83    | bit(83) |           |          | 

postgres=# 

1.3 python生成插入的数据

这里直接用每个最大长度的2进制来插入,效果大概是这样:

>>> for i in range(1,84):
...   a=2**i
...   print 'insert into  t1(c'+str(i)+') values (B\''+ str(bin(a-1)).split('0b')[1] +'\');';
... 
insert into  t1(c1) values (B'1');
insert into  t1(c2) values (B'11');
insert into  t1(c3) values (B'111');
insert into  t1(c4) values (B'1111');
insert into  t1(c5) values (B'11111');
insert into  t1(c6) values (B'111111');
insert into  t1(c7) values (B'1111111');
insert into  t1(c8) values (B'11111111');
insert into  t1(c9) values (B'111111111');
insert into  t1(c10) values (B'1111111111');
insert into  t1(c11) values (B'11111111111');
insert into  t1(c12) values (B'111111111111');
insert into  t1(c13) values (B'1111111111111');
insert into  t1(c14) values (B'11111111111111');
insert into  t1(c15) values (B'111111111111111');
insert into  t1(c16) values (B'1111111111111111');
...

1.4 插入后的效果

postgres=# select c1,c2,c3,c4 from t1 limit 4;
 c1 | c2 | c3  |  c4  
----+----+-----+------
 1  |    |     | 
    | 11 |     | 
    |    | 111 | 
    |    |     | 1111
(4 rows)

postgres=# select count(*) from t1;
 count 
-------
    83
(1 row)

postgres=# 

二、pageinspect查看数据

postgres=# select t_ctid,t_bits,t_data from heap_page_items(get_raw_page('t1',0));
 t_ctid |                                          t_bits                                          |               t_data               
--------+------------------------------------------------------------------------------------------+------------------------------------
 (0,1)  | 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 | \x0d0100000080
 (0,2)  | 0100000000000000000000000000000000000000000000000000000000000000000000000000000000000000 | \x0d02000000c0
 (0,3)  | 0010000000000000000000000000000000000000000000000000000000000000000000000000000000000000 | \x0d03000000e0
 (0,4)  | 0001000000000000000000000000000000000000000000000000000000000000000000000000000000000000 | \x0d04000000f0
 (0,5)  | 0000100000000000000000000000000000000000000000000000000000000000000000000000000000000000 | \x0d05000000f8
 (0,6)  | 0000010000000000000000000000000000000000000000000000000000000000000000000000000000000000 | \x0d06000000fc
 (0,7)  | 0000001000000000000000000000000000000000000000000000000000000000000000000000000000000000 | \x0d07000000fe
 (0,8)  | 0000000100000000000000000000000000000000000000000000000000000000000000000000000000000000 | \x0d08000000ff
 (0,9)  | 0000000010000000000000000000000000000000000000000000000000000000000000000000000000000000 | \x0f09000000ff80
 (0,10) | 0000000001000000000000000000000000000000000000000000000000000000000000000000000000000000 | \x0f0a000000ffc0
 (0,11) | 0000000000100000000000000000000000000000000000000000000000000000000000000000000000000000 | \x0f0b000000ffe0
 (0,12) | 0000000000010000000000000000000000000000000000000000000000000000000000000000000000000000 | \x0f0c000000fff0
 (0,13) | 0000000000001000000000000000000000000000000000000000000000000000000000000000000000000000 | \x0f0d000000fff8
 (0,14) | 0000000000000100000000000000000000000000000000000000000000000000000000000000000000000000 | \x0f0e000000fffc
 (0,15) | 0000000000000010000000000000000000000000000000000000000000000000000000000000000000000000 | \x0f0f000000fffe
 (0,16) | 0000000000000001000000000000000000000000000000000000000000000000000000000000000000000000 | \x0f10000000ffff
 (0,17) | 0000000000000000100000000000000000000000000000000000000000000000000000000000000000000000 | \x1111000000ffff80
 (0,18) | 0000000000000000010000000000000000000000000000000000000000000000000000000000000000000000 | \x1112000000ffffc0
 (0,19) | 0000000000000000001000000000000000000000000000000000000000000000000000000000000000000000 | \x1113000000ffffe0
 (0,20) | 0000000000000000000100000000000000000000000000000000000000000000000000000000000000000000 | \x1114000000fffff0
 (0,21) | 0000000000000000000010000000000000000000000000000000000000000000000000000000000000000000 | \x1115000000fffff8
 (0,22) | 0000000000000000000001000000000000000000000000000000000000000000000000000000000000000000 | \x1116000000fffffc
 (0,23) | 0000000000000000000000100000000000000000000000000000000000000000000000000000000000000000 | \x1117000000fffffe
 (0,24) | 0000000000000000000000010000000000000000000000000000000000000000000000000000000000000000 | \x1118000000ffffff
 (0,25) | 0000000000000000000000001000000000000000000000000000000000000000000000000000000000000000 | \x1319000000ffffff80
 (0,26) | 0000000000000000000000000100000000000000000000000000000000000000000000000000000000000000 | \x131a000000ffffffc0
 (0,27) | 0000000000000000000000000010000000000000000000000000000000000000000000000000000000000000 | \x131b000000ffffffe0
 (0,28) | 0000000000000000000000000001000000000000000000000000000000000000000000000000000000000000 | \x131c000000fffffff0
 (0,29) | 0000000000000000000000000000100000000000000000000000000000000000000000000000000000000000 | \x131d000000fffffff8
 (0,30) | 0000000000000000000000000000010000000000000000000000000000000000000000000000000000000000 | \x131e000000fffffffc
 (0,31) | 0000000000000000000000000000001000000000000000000000000000000000000000000000000000000000 | \x131f000000fffffffe
 (0,32) | 0000000000000000000000000000000100000000000000000000000000000000000000000000000000000000 | \x1320000000ffffffff

2.1 bit类型的真实存储结构

1)\x0d0100000080,可以看到结构由 1 byte的标志位 + 1 byte的长度 + 数据位组成。

2)标志位是13(0X0d)、15(0x0f)、17(0x11)这样的

3)标志位 = 数据位长度 + 5

4)每8个跨度增加1个字节(2个16进制位),即每个标志位的都有8个长度区间。第几个区间算法:长度%8

2.2 bit类型转换方法

1)每个区间的第一个数据为除以8

2)第二个数据除以4

3)第三个数据除以2

4)第四个数据就是本身

5)第五个数据除以8

6)第六个数据除以4

7)第七个数据除以2

8)第八个数据就是本身

2.3 验证

以上只是推算和猜测,现在来验证一下。

2.3.1 第一条记录 \x0d0100000080

1%8=1 所以这条记录属于第一个区间

第一个区间的数据除以8就是真实的数据了,换算一下就是8/8=1(16进制的8就是10进制的8)。

数据库里面验证

postgres=# select c1,c1::int from t1 where ctid='(0,1)';
 c1 | c1 
----+----
 1  |  1
(1 row)

postgres=# 

2.3.2 第二条记录 \x1320000000ffffffff

16进制20转换10进制后是32,32%8=0,也是第八个区间,数据是本身自己

ffffffff的10进制是4294967295

数据库里面验证

postgres=# select c32,c32::bigint from t1 where ctid='(0,32)';
               c32                |    c32     
----------------------------------+------------
 11111111111111111111111111111111 | 4294967295
(1 row)

postgres=# 

四、bit类型最大长度

mydb=# create table t2(c1 bit(80000000);
mydb(# \r
Query buffer reset (cleared).
mydb=# create table t2(c1 bit(80000000));
CREATE TABLE
mydb=# alter table t2 alter c1 type bit(200000000);
ERROR:  length for type bit cannot exceed 83886080
mydb=# alter table t2 alter c1 type bit(83886080);
ALTER TABLE
mydb=# alter table t2 alter c1 type bit(83886081);
ERROR:  length for type bit cannot exceed 83886080
mydb=# 

4.1 32位的2进制最大表示为4294967295

postgres=# select 2^32 -1;
  ?column?  
------------
 4294967295
(1 row)

4.2 83886080能表示最大长度呢?真的是超级大了

postgres=# select 2^83886080 -1;
ERROR:  value out of range: overflow
postgres=# select 2^83886080::bigint -1;
ERROR:  value out of range: overflow
postgres=# 

五、小结

5.1 以上只是纯猜测和推论,如有不当请指正

5.2 标志位为什么是13,15这样开头的还没想清楚

5.3 为什么每个数据区间都是8个跨度也没有想明白

5.4 如果和其他数据类型一起存储可能有所区别,尤其是varbit这里没有具体讨论了

5.5 研究这个有什么用?暂时可能没什么用,但就是想知道底层是如何存储这些0和1的