[转帖]postgresql 表和索引的膨胀简析

postgresql,索引,膨胀,简析 · 浏览次数 : 0

小编点评

## PostgreSQL Autovacuum Report This report provides an overview of the PostgreSQL autovacuum process and its impact on your database. **Key points:** * Autovacuum successfully removed 800,000 dead row versions from the `tmp_t0` table. * However, due to the presence of a long running transaction, only 800,000 row versions were removed from the `idx_tmp_t0` index. * The total time taken for autovacuum was 1.57 seconds. * The database is configured for SSD, which allows for efficient autovacuum handling due to its high performance. * The following warnings were encountered during autovacuum: * `\"tmp_t0\": truncated 13235 to 2568 pages` * `\"tmp_t0\": found 800,000 removable, 200,000 nonremovable row versions in 13235 out of 13235 pages` **Recommendations:** * Adjust the `autovacuum` parameters to balance performance and safety. * Monitor the database for long running transactions and potential dead tuples. * Investigate the cause of the truncated truncation error. **Further Analysis:** * Analyze the `pg_stat_activity` table to identify the reasons for long running transactions. * Analyze the `pg_stat_all_tables` table to identify which tables are the biggest culprits for dead tuples. * Implement strategies to prevent or mitigate these performance issues.

正文

postgresql 表和索引的膨胀是非常常见的，一方面是因为 autovacuum 清理标记为 dead tuple 的速度跟不上，另一方面也可能是由于长事物，未决事物，复制槽引起的。

#初始化数据

zabbix=# create table tmp_t0(c0 varchar(100),c1 varchar(100), c2 varchar(100));
CREATE TABLE
zabbix=# 
zabbix=# create index idx_tmp_t0 on tmp_t0(c0);
CREATE INDEX
zabbix=# 
zabbix=# create index idx_tmp_t0_1 on tmp_t0(c1);
CREATE INDEX
zabbix=#
zabbix=# insert into tmp_t0 select id::varchar,md5(id::varchar),md5(md5(id::varchar)) from generate_series(1,1000000) as id ;
INSERT 0 1000000
复制

查看大小

select pt.schemaname||'.'||pt.tablename,pg_relation_filepath(pt.schemaname||'.'||pt.tablename),
       pg_table_size(pt.schemaname||'.'||pt.tablename),
       pg_relation_size(pt.schemaname||'.'||pt.tablename),
       pg_total_relation_size(pt.schemaname||'.'||pt.tablename),
       pi.schemaname||'.'||pi.indexname,pg_relation_filepath(pi.schemaname||'.'||pi.indexname),
       pg_relation_size(pi.schemaname||'.'||pi.indexname),--指定的单个索引
       pg_indexes_size(pi.schemaname||'.'||pi.tablename) --表上的所有索引
 from pg_tables pt
      left outer join pg_indexes pi 
                   on pt.schemaname||'.'||pt.tablename = pi.schemaname||'.'||pi.tablename
where 1=1
  and pt.schemaname='public'
  and pt.tablename='tmp_t0'
;

   ?column?    | pg_relation_filepath | pg_table_size | pg_relation_size | pg_total_relation_size |      ?column?       | pg_relation_filepath | pg_relation_size | pg_indexes_size 
---------------+----------------------+---------------+------------------+------------------------+---------------------+----------------------+------------------+-----------------
 public.tmp_t0 | base/24579/24611     |     108470272 |        108421120 |              224714752 | public.idx_tmp_t0   | base/24579/24614     |         39215104 |       116244480
 public.tmp_t0 | base/24579/24611     |     108470272 |        108421120 |              224714752 | public.idx_tmp_t0_1 | base/24579/24615     |         77029376 |       116244480
(2 rows)

复制

zabbix=# select * from pg_stat_all_tables where relname = 'tmp_t0';

 relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum | last_analyze |       last_autoanalyze        | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count 
-------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-----------------+--------------+-------------------------------+--------------+------------------+---------------+-------------------
 24611 | public     | tmp_t0  |        2 |            0 |        0 |             0 |   1000000 |         0 |         0 |             0 |    1000000 |          0 |                   0 |             |                 |              | 2018-09-19 15:22:10.987535+08 |            0 |                0 |             0 |                 1
(1 row)
复制

#删除操作

zabbix=# delete from tmp_t0 where c0::int4 > '200000';
DELETE 800000
复制

查看大小

select pt.schemaname||'.'||pt.tablename,pg_relation_filepath(pt.schemaname||'.'||pt.tablename),
       pg_table_size(pt.schemaname||'.'||pt.tablename),
       pg_relation_size(pt.schemaname||'.'||pt.tablename),
       pg_total_relation_size(pt.schemaname||'.'||pt.tablename),
       pi.schemaname||'.'||pi.indexname,pg_relation_filepath(pi.schemaname||'.'||pi.indexname),
       pg_relation_size(pi.schemaname||'.'||pi.indexname),--指定的单个索引
       pg_indexes_size(pi.schemaname||'.'||pi.tablename) --表上的所有索引
 from pg_tables pt
      left outer join pg_indexes pi 
                   on pt.schemaname||'.'||pt.tablename = pi.schemaname||'.'||pi.tablename
where 1=1
  and pt.schemaname='public'
  and pt.tablename='tmp_t0'
;

   ?column?    | pg_relation_filepath | pg_table_size | pg_relation_size | pg_total_relation_size |      ?column?       | pg_relation_filepath | pg_relation_size | pg_indexes_size 
---------------+----------------------+---------------+------------------+------------------------+---------------------+----------------------+------------------+-----------------
 public.tmp_t0 | base/24579/24611     |     108478464 |        108421120 |              224722944 | public.idx_tmp_t0   | base/24579/24614     |         39215104 |       116244480
 public.tmp_t0 | base/24579/24611     |     108478464 |        108421120 |              224722944 | public.idx_tmp_t0_1 | base/24579/24615     |         77029376 |       116244480
(2 rows)
复制

可以看出，即使删除了大量的数据，表和索引的大小依然没有发生变化。

zabbix=# select * from pg_stat_all_tables where relname = 'tmp_t0';

 relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum |        last_autovacuum        | last_analyze |       last_autoanalyze        | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count 
-------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-------------------------------+--------------+-------------------------------+--------------+------------------+---------------+-------------------
 24611 | public     | tmp_t0  |        4 |      1800000 |        0 |             0 |   1000000 |         0 |    800000 |             0 |     200000 |     800000 |                   0 |             | 2018-09-19 15:26:13.424378+08 |              | 2018-09-19 15:24:15.300776+08 |            0 |                3 |             0 |                 2
(1 row)

zabbix=# select * from pg_class pc where pc.relname = 'tmp_t0';

 relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | reltoastrelid | relhasindex | relisshared | relpersistence | relkind | relnatts | relchecks | relhasoids | relhaspkey | relhasrules | relhastriggers | relhassubclass | relrowsecurity | relforcerowsecurity | relispopulated | relreplident | relfrozenxid | relminmxid | relacl | reloptions 
---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+------------+------------+-------------+----------------+----------------+----------------+---------------------+----------------+--------------+--------------+------------+--------+------------
 tmp_t0  |         2200 |   24613 |         0 |       10 |     0 |       24611 |             0 |    13235 |     1e+06 |             0 |             0 | t           | f           | p              | r       |        3 |         0 | f          | f          | f           | f              | f              | f              | f                   | t              | d            |         1184 |          1 |        | 
(1 row)

复制

从 pg_stat_all_tables.autovacuum_count 也可以看出，即使发生了 autovacuum 也依然没有收缩空间(只有 vacuum full 才会收缩空间,类似ctas)。

手动 vacuum

zabbix=# vacuum verbose tmp_t0;
INFO:  vacuuming "public.tmp_t0"
INFO:  index "idx_tmp_t0" now contains 1000000 row versions in 4787 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_tmp_t0_1" now contains 1000000 row versions in 9403 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "tmp_t0": found 0 removable, 1000000 nonremovable row versions in 13235 out of 13235 pages
DETAIL:  800000 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.16u sec elapsed 0.16 sec.
vacuum

zabbix=# select * from pg_stat_all_tables where relname = 'tmp_t0';

 relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze |          last_vacuum          |        last_autovacuum        | last_analyze |       last_autoanalyze        | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count 
-------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------------------------+-------------------------------+--------------+-------------------------------+--------------+------------------+---------------+-------------------
 24611 | public     | tmp_t0  |        4 |      1800000 |        0 |             0 |   1000000 |         0 |    800000 |             0 |     200000 |     800000 |                   0 | 2018-09-19 15:42:22.095405+08 | 2018-09-19 15:42:13.804102+08 |              | 2018-09-19 15:24:15.300776+08 |            1 |               19 |             0 |                 2
(1 row)
复制

注意这个提示 DETAIL: 800000 dead row versions cannot be removed yet.

这是因为在整个测试之前，另外一个毫不相干的数据库表上挂起了一个长事务，结束长事务后再 vacuum

zabbix=# vacuum verbose tmp_t0;
INFO:  vacuuming "public.tmp_t0"
INFO:  scanned index "idx_tmp_t0" to remove 800000 row versions
DETAIL:  CPU 0.00s/0.17u sec elapsed 0.19 sec
INFO:  scanned index "idx_tmp_t0_1" to remove 800000 row versions
DETAIL:  CPU 0.01s/0.49u sec elapsed 0.54 sec
INFO:  "tmp_t0": removed 800000 row versions in 10668 pages
DETAIL:  CPU 0.00s/0.02u sec elapsed 0.03 sec
INFO:  index "idx_tmp_t0" now contains 200000 row versions in 4787 pages
DETAIL:  800000 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_tmp_t0_1" now contains 200000 row versions in 9403 pages
DETAIL:  800000 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "tmp_t0": found 800000 removable, 200000 nonremovable row versions in 13235 out of 13235 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.04s/1.40u sec elapsed 1.57 sec.
INFO:  "tmp_t0": truncated 13235 to 2568 pages
DETAIL:  CPU 0.02s/0.01u sec elapsed 0.07 sec
VACUUM
复制

注意这几个输出：

INFO:  "tmp_t0": removed 800000 row versions in 10668 pages
INFO:  "tmp_t0": found 800000 removable, 200000 nonremovable row versions in 13235 out of 13235 pages
复制

测试环境使用的是单个表，数据量又少，autovacuum 能及时的处理 dead tuple，实际的生产环境大多数是由于 autovacuum 的不及时造成的。

预防表的膨胀，可以从以下几个方便入手：

1、调整autovacuum的参数，默认值总是设置的很保守。对于配置好的机器(配置SSD)
可以参考 http://postgres.cn/docs/9.6/runtime-config-autovacuum.html 认真阅读

select *
  from pg_settings ps
 where 1=1
   and lower(ps.name) like '%autovacuum%'
 order by ps.name
;
复制

2、关注长事务

select extract(epoch FROM (clock_timestamp() - xact_start )) as longtrans,
       extract(epoch FROM (clock_timestamp() - query_start )) as longquery
  from pg_stat_activity 
 where 1=1
   and state <> 'idle'
;
复制

3、关注表的膨胀率

select schemaname||'.'||relname,
	   n_dead_tup,
	   n_live_tup,
	   coalesce(round(n_dead_tup * 100 / (case when n_live_tup + n_dead_tup = 0 then null else n_live_tup + n_dead_tup end ),2),0.00) as dead_tup_ratio
  from pg_stat_all_tables
 where 1=1
   and n_dead_tup >= 10000
 order by dead_tup_ratio desc
 limit 10
; 
复制

参考：
http://postgres.cn/docs/9.6/runtime-config-autovacuum.html