Pg_authid数据误删恢复

Home / Blog / Download / About me

Pg_authid数据误删恢复

Directory

在群里看到某网友问pg_authid删了可以恢复不,我想,没有备份的话很难了吧,权叔建议其他好的系统拷贝一个过去看看。再仔细一问,是delete,那可以试着用我的bbedp恢复看看。

一、pg_authid

这个系统表记录了的是role和权限,如果删了数据会报错没有role。当然,数据库的启停是没有影响的,没有用户都不能登录了。

postgres=# \d pg_authid
                        Table "pg_catalog.pg_authid"
     Column     |           Type           | Collation | Nullable | Default 
----------------+--------------------------+-----------+----------+---------
 oid            | oid                      |           | not null | 
 rolname        | name                     |           | not null | 
 rolsuper       | boolean                  |           | not null | 
 rolinherit     | boolean                  |           | not null | 
 rolcreaterole  | boolean                  |           | not null | 
 rolcreatedb    | boolean                  |           | not null | 
 rolcanlogin    | boolean                  |           | not null | 
 rolreplication | boolean                  |           | not null | 
 rolbypassrls   | boolean                  |           | not null | 
 rolconnlimit   | integer                  |           | not null | 
 rolpassword    | text                     | C         |          | 
 rolvaliduntil  | timestamp with time zone |           |          | 
Indexes:
    "pg_authid_oid_index" UNIQUE, btree (oid), tablespace "pg_global"
    "pg_authid_rolname_index" UNIQUE, btree (rolname), tablespace "pg_global"
Tablespace: "pg_global"

postgres=# 
postgres=# delete from pg_authid;
DELETE 12
postgres=# \q
[pg@whf307 soft]$

[pg@whf307 global]$ psql
psql: FATAL:  role "pg" does not exist

二、重新初始化一个库,拷贝文件

这种方法测试是可行的。新建一个目录来initdb,然后把对应的page拷贝过去,系统超级用户是有了,但是其他用户就根据业务来补了。也很麻烦。

三、bbedp来恢复

bbedp是我写的一个小工具,可以查看和修改表的page每一个字节。具体用法可以参考https://whf307.github.io/pg/BBED-for-PostgreSQL.html

这里说一下恢复思路。delete没有加条件的话只要没有被vacuum清理,数据还在的。我们只要修改t_xmax、t_infomask2和t_infomask三个标志位就好了。但是infomask没有参考。那试着新建一个用户来照着新用户的infomask来恢复。t_xmax改为0就好了。

3.1 关闭autovacuum

首先备份数据库。任何有风险的操作都应该进行备份。如果实在不行把pg_authid对应的物理文件先备份一下。

避免dead tuple被autovacuum清理先关掉vacuum

postgres=# show autovacuum;
 autovacuum 
------------
 on
(1 row)

postgres=# alter system set autovacuum=off;
ALTER SYSTEM
postgres=# show autovacuum;                
 autovacuum 
------------
 on
(1 row)

postgres=# \q
[pg@whf307 ~]$pg_ctl reload
server signaled
[pg@whf307 ~]$psql
psql (11.2)
Type "help" for help.

postgres=# show autovacuum;
 autovacuum 
------------
 off
(1 row)

postgres=# 

3.2 单用户模式新建用户sa

[pg@whf307 soft]$ postgres --single -D /oracle/soft/pg11_data postgres
2019-10-30 17:46:49.341 CST [18101] LOG:  database system was interrupted; last known up at 2019-10-30 17:46:12 CST
2019-10-30 17:46:49.353 CST [18101] LOG:  database system was not properly shut down; automatic recovery in progress
2019-10-30 17:46:49.355 CST [18101] LOG:  redo starts at 0/169B0F8
2019-10-30 17:46:49.355 CST [18101] LOG:  invalid record length at 0/169B980: wanted 24, got 0
2019-10-30 17:46:49.355 CST [18101] LOG:  redo done at 0/169B948
2019-10-30 17:46:49.355 CST [18101] LOG:  last completed transaction was at log time 2019-10-30 17:46:25.53714+08
2019-10-30 17:46:49.357 CST [18101] LOG:  checkpoint starting: end-of-recovery immediate
2019-10-30 17:46:49.363 CST [18101] LOG:  checkpoint complete: wrote 1 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.007 s; sync files=1, longest=0.000 s, average=0.000 s; distance=2 kB, estimate=2 kB
2019-10-30 17:46:49.364 CST [18101] WARNING:  no roles are defined in this database system
2019-10-30 17:46:49.364 CST [18101] HINT:  You should immediately run CREATE USER "pg" SUPERUSER;.

PostgreSQL stand-alone backend 11.2
backend> create user sa;
2019-10-30 17:46:55.115 CST [18101] LOG:  statement: create user sa;

2019-10-30 17:46:55.116 CST [18101] LOG:  duration: 1.860 ms
backend> select * from pg_authid;
2019-10-30 17:46:58.274 CST [18101] LOG:  statement: select * from pg_authid;

         1: rolname     (typeid = 19, len = 64, typmod = -1, byval = f)
         2: rolsuper    (typeid = 16, len = 1, typmod = -1, byval = t)
         3: rolinherit  (typeid = 16, len = 1, typmod = -1, byval = t)
         4: rolcreaterole       (typeid = 16, len = 1, typmod = -1, byval = t)
         5: rolcreatedb (typeid = 16, len = 1, typmod = -1, byval = t)
         6: rolcanlogin (typeid = 16, len = 1, typmod = -1, byval = t)
         7: rolreplication      (typeid = 16, len = 1, typmod = -1, byval = t)
         8: rolbypassrls        (typeid = 16, len = 1, typmod = -1, byval = t)
         9: rolconnlimit        (typeid = 23, len = 4, typmod = -1, byval = t)
        10: rolpassword (typeid = 25, len = -1, typmod = -1, byval = f)
        11: rolvaliduntil       (typeid = 1184, len = 8, typmod = -1, byval = t)
        ----
         1: rolname = "sa"      (typeid = 19, len = 64, typmod = -1, byval = f)
         2: rolsuper = "f"      (typeid = 16, len = 1, typmod = -1, byval = t)
         3: rolinherit = "t"    (typeid = 16, len = 1, typmod = -1, byval = t)
         4: rolcreaterole = "f" (typeid = 16, len = 1, typmod = -1, byval = t)
         5: rolcreatedb = "f"   (typeid = 16, len = 1, typmod = -1, byval = t)
         6: rolcanlogin = "t"   (typeid = 16, len = 1, typmod = -1, byval = t)
         7: rolreplication = "f"        (typeid = 16, len = 1, typmod = -1, byval = t)
         8: rolbypassrls = "f"  (typeid = 16, len = 1, typmod = -1, byval = t)
         9: rolconnlimit = "-1" (typeid = 23, len = 4, typmod = -1, byval = t)
        ----
2019-10-30 17:46:58.276 CST [18101] LOG:  duration: 2.303 ms
可以看到只有sa用户。

3.3 bbed查看

查到pg_authid的物理文件绝对路径,新建file文件

vi file
/oracle/soft/pg11_data/global/1260

bbed查看,可以看到这个page有13行tuple,说明没有被vacuum清理,还有机会。

[pg@whf307 soft]$ ./bbedp 
Password: 

BBEDP: Release 0.1.0.0.0 - Limited Production on Wed Oct 30 17:59:09 2019

Copyright (c) 2018, 2019, whf307 and/or its affiliates.  All rights reserved.

************* !!! For PostgreSQL Internal Test only !!! *************** 

BBEDP> map
 File: /oracle/soft/pg11_data/global/1260
 Block: 0                                offset: 0
 -------------------------------------------------------------------
 struct  PageHeaderData , 24 bytes          @0

 struct  Linps , 52 bytes                   @24

 struct  Tuples , 1456 bytes                @76     

 struct  Pd_special , 0 bytes               @8191

 There are 13 tuples in the block

3.4 恢复超级用户

超级用户一般都是第一个tuple,如果不是按tupledata,按utf8转换16进制找就可以了。

BBEDP> p tuple 1
tuple 1
-----------------------------------------------------
lp_off        8080          @24            0x00d89f90
lp_flags      1             @24            0x00d89f90
lp_len        108           @24            0x00d89f90
t_xmin        1             @8080          0x01000000
t_xmax        588           @8084          0x4c020000
t_field3      0             @8088          0x00000000
t_ctid        (0,1)         @8092          0x000000000100
t_infomask2   8203          @8098          0x0b20
t_infomask    265           @8100          0x0901
t_hoff        32            @8102          0x20
bits8         0b11111111    @8103          0xff
tupledata                   @8104          0x010000000a000000706700000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000101010101010100ffffffff

这里tupledata中,0100这段后面的就是rolname,7076转换utf8就是pg。而且这里t_xmax不是0,说明是dead tuple。

查看新建的用户对应的数据

BBEDP> p tuple 13
tuple 13
-----------------------------------------------------
lp_off        6736          @72            0x00d89ba0
lp_flags      1             @72            0x00d89ba0
lp_len        108           @72            0x00d89ba0
t_xmin        587           @6736          0x4b020000
t_xmax        0             @6740          0x00000000
t_field3      0             @6744          0x00000000
t_ctid        (0,13)        @6748          0x000000000d00
t_infomask2   11            @6754          0x0b00
t_infomask    2313          @6756          0x0909
t_hoff        32            @6758          0x20
bits8         0b11111111    @6759          0xff
tupledata                   @6760          0x0100000014400000736100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000001000000ffffffff

这里,7361对应的就是sa。t_xmax是0,是正常的tuple。我们按照tuple 13去恢复tuple 1

BBEDP> m /x 0000 offset 8084
BBEDP> m /x 0b00 offset 8098
BBEDP> m /x 0909 offset 8100
BBEDP> p tuple 1
tuple 1
-----------------------------------------------------
lp_off        8080          @24            0x00d89f90
lp_flags      1             @24            0x00d89f90
lp_len        108           @24            0x00d89f90
t_xmin        1             @8080          0x01000000
t_xmax        0             @8084          0x00000000
t_field3      0             @8088          0x00000000
t_ctid        (0,1)         @8092          0x000000000100
t_infomask2   11            @8098          0x0b00
t_infomask    2313          @8100          0x0909
t_hoff        32            @8102          0x20
bits8         0b11111111    @8103          0xff
tupledata                   @8104          0x010000000a000000706700000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000101010101010100ffffffff

3.5 验证测试

这里正常起库测试。

[pg@whf307 soft]$ pg_ctl start
waiting for server to start....2019-10-30 18:03:35.517 CST [19514] LOG:  listening on IPv4 address "0.0.0.0", port 2345
2019-10-30 18:03:35.518 CST [19514] LOG:  could not create IPv6 socket for address "::": Address family not supported by protocol
2019-10-30 18:03:35.521 CST [19514] LOG:  listening on Unix socket "/oracle/soft/pg11_data/.s.PGSQL.2345"
2019-10-30 18:03:35.532 CST [19514] LOG:  redirecting log output to logging collector process
2019-10-30 18:03:35.532 CST [19514] HINT:  Future log output will appear in directory "log".
 done
server started
[pg@whf307 soft]$ psql
psql (11.2)
Type "help" for help.

postgres=# select * from pg_authid;
 rolname | rolsuper | rolinherit | rolcreaterole | rolcreatedb | rolcanlogin | rolreplication | rolbypassrls | rolconnlimit | rolpassword | rolvaliduntil 
---------+----------+------------+---------------+-------------+-------------+----------------+--------------+--------------+-------------+---------------
 pg      | t        | t          | t             | t           | t           | t              | t            |           -1 |             | 
 sa      | f        | t          | f             | f           | t           | f              | f            |           -1 |             | 
(2 rows)

测试是可以了,但是很尴尬。退出再登录就不行了。

[pg@whf307 soft]$ psql
psql: FATAL:  role "pg" does not exist

再去单用户模式确认

[pg@whf307 soft]$ postgres --single -D /oracle/soft/pg11_data postgres

PostgreSQL stand-alone backend 11.2
backend> select * from pg_authid;
2019-10-30 18:04:06.082 CST [19539] LOG:  statement: select * from pg_authid;

         1: rolname     (typeid = 19, len = 64, typmod = -1, byval = f)
         2: rolsuper    (typeid = 16, len = 1, typmod = -1, byval = t)
         3: rolinherit  (typeid = 16, len = 1, typmod = -1, byval = t)
         4: rolcreaterole       (typeid = 16, len = 1, typmod = -1, byval = t)
         5: rolcreatedb (typeid = 16, len = 1, typmod = -1, byval = t)
         6: rolcanlogin (typeid = 16, len = 1, typmod = -1, byval = t)
         7: rolreplication      (typeid = 16, len = 1, typmod = -1, byval = t)
         8: rolbypassrls        (typeid = 16, len = 1, typmod = -1, byval = t)
         9: rolconnlimit        (typeid = 23, len = 4, typmod = -1, byval = t)
        10: rolpassword (typeid = 25, len = -1, typmod = -1, byval = f)
        11: rolvaliduntil       (typeid = 1184, len = 8, typmod = -1, byval = t)
        ----
         1: rolname = "pg"      (typeid = 19, len = 64, typmod = -1, byval = f)
         2: rolsuper = "t"      (typeid = 16, len = 1, typmod = -1, byval = t)
         3: rolinherit = "t"    (typeid = 16, len = 1, typmod = -1, byval = t)
         4: rolcreaterole = "t" (typeid = 16, len = 1, typmod = -1, byval = t)
         5: rolcreatedb = "t"   (typeid = 16, len = 1, typmod = -1, byval = t)
         6: rolcanlogin = "t"   (typeid = 16, len = 1, typmod = -1, byval = t)
         7: rolreplication = "t"        (typeid = 16, len = 1, typmod = -1, byval = t)
         8: rolbypassrls = "t"  (typeid = 16, len = 1, typmod = -1, byval = t)
         9: rolconnlimit = "-1" (typeid = 23, len = 4, typmod = -1, byval = t)
        ----
         1: rolname = "sa"      (typeid = 19, len = 64, typmod = -1, byval = f)
         2: rolsuper = "f"      (typeid = 16, len = 1, typmod = -1, byval = t)
         3: rolinherit = "t"    (typeid = 16, len = 1, typmod = -1, byval = t)
         4: rolcreaterole = "f" (typeid = 16, len = 1, typmod = -1, byval = t)
         5: rolcreatedb = "f"   (typeid = 16, len = 1, typmod = -1, byval = t)
         6: rolcanlogin = "t"   (typeid = 16, len = 1, typmod = -1, byval = t)
         7: rolreplication = "f"        (typeid = 16, len = 1, typmod = -1, byval = t)
         8: rolbypassrls = "f"  (typeid = 16, len = 1, typmod = -1, byval = t)
         9: rolconnlimit = "-1" (typeid = 23, len = 4, typmod = -1, byval = t)
        ----
2019-10-30 18:04:06.084 CST [19539] LOG:  duration: 2.084 ms

有点奇怪,重启库第一次可以登录,退出再登录就不行了。而且但用户模式去看数据是恢复出来了的。我strace跟踪了一把也没发现什么。。。

重新梳理了一下思路,page中的数据肯定是在了,会不会是哪里遗漏了什么?索引呢?对,我直接改的page,索引还里面是没有这个记录的,数据不一致了。重建索引试试。

postgres=# reindex index pg_authid_oid_index;
REINDEX
postgres=# reindex index pg_authid_rolname_index;
REINDEX

postgres=# select * from pg_authid;
 rolname | rolsuper | rolinherit | rolcreaterole | rolcreatedb | rolcanlogin | rolreplication | rolbypassrls | rolconnlimit | rolpassword | rolvaliduntil 
---------+----------+------------+---------------+-------------+-------------+----------------+--------------+--------------+-------------+---------------
 pg      | t        | t          | t             | t           | t           | t              | t            |           -1 |             | 
 sa      | f        | t          | f             | f           | t           | f              | f            |           -1 |             | 
(2 rows)

postgres=# \q
[pg@whf307 global]$ psql
psql (11.2)
Type "help" for help.

postgres=# \q
[pg@whf307 global]$ psql
psql (11.2)
Type "help" for help.

果然,重建了索引就可以了,重启库也没有问题。如果要恢复其他用户,恢复pg一样的步骤,不再赘述。

四、结局

4.1 bbedp

这个工具我搞了很久,很强大,可以绕过数据库层面修改page,也修改了很多bug,我对他是很有信心的。这次一遍测试,一遍帮助网友恢复。

4.2 网友

网友这边的生产环境是数据恢复出来了。但是登陆的时候还是报错role不存在。当我建议网友重建索引的时候,网友说他新建库重新导数据了。。重建索引的明天再试。。。

就差那么一点了,可能人家生产环境急,也可能数据可以丢。

备份终于一切!