索引膨胀
对于索引,随着业务不断的增删改,会造成膨胀,尤其Btree索引,也会涉及索引分裂、合并等,导致索引访问效率降低、维护成本增加。另外,索引页的复用与HEAP PAGE不一样,因为索引的内容是有序结构,只有符合顺序的ITEM才能插入对应的PAGE中,不像HEAP TUPLE,只要有空间就可以插入。index page无论在任何位置,都不能从磁盘删除,因此索引变大后,不能回收空间,除非vacuum full。所以索引膨胀后,通常需要重建索引来回收索引空间。
此外,对于B树索引,新构建的索引比多次更新的索引访问速度稍快,因为逻辑上相邻的页面通常在新构建索引中也是物理上相邻的。为了提高访问速度,定期重新B_Tree 索引可能是值得的。
对于重建索引,REINDEX在任何情况下都可以安全方便地使用。默认情况下,该命令需要ACCESS EXCLUSIVE锁。可以使用CONCURRENTLY选项创建索引,该选项只需要SHARE UPDATE EXCLUSIV锁,不阻塞读写。
索引膨胀的原因:
1.大量删除发生后,导致索引页面稀疏,降低了索引使用效率。
2.长时间运行的事务,阻止了vacuum对表的清理工作,因而导致页面稀疏状态一直保持。
3.索引字段乱序写入,导致索引频繁分裂,使得索引页并不是百分百填满,所以膨胀。
查询获取每个表的行数、索引以及有关这些索引的一些信息:
- SELECT
- pg_class.relname,
- pg_size_pretty(pg_class.reltuples::bigint) AS rows_in_bytes,
- pg_class.reltuples AS num_rows,
- COUNT(*) AS total_indexes,
- COUNT(*) FILTER ( WHERE indisunique) AS unique_indexes,
- COUNT(*) FILTER ( WHERE indnatts = 1 ) AS single_column_indexes,
- COUNT(*) FILTER ( WHERE indnatts IS DISTINCT FROM 1 ) AS multi_column_indexes
- FROM
- pg_namespace
- LEFT JOIN pg_class ON pg_namespace.oid = pg_class.relnamespace
- LEFT JOIN pg_index ON pg_class.oid = pg_index.indrelid
- WHERE
- pg_namespace.nspname = 'public' AND
- pg_class.relkind = 'r'
- GROUP BY pg_class.relname, pg_class.reltuples
- ORDER BY pg_class.reltuples DESC;
- SELECT
- t.schemaname,
- t.tablename,
- c.reltuples::bigint AS num_rows,
- pg_size_pretty(pg_relation_size(c.oid)) AS table_size,
- psai.indexrelname AS index_name,
- pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
- CASE WHEN i.indisunique THEN 'Y' ELSE 'N' END AS "unique",
- psai.idx_scan AS number_of_scans,
- psai.idx_tup_read AS tuples_read,
- psai.idx_tup_fetch AS tuples_fetched
- FROM
- pg_tables t
- LEFT JOIN pg_class c ON t.tablename = c.relname
- LEFT JOIN pg_index i ON c.oid = i.indrelid
- LEFT JOIN pg_stat_all_indexes psai ON i.indexrelid = psai.indexrelid
- WHERE
- t.schemaname NOT IN ('pg_catalog', 'information_schema')
- ORDER BY 1, 2;
查找重复索引,查找具有相同列集、相同操作类、表达式和谓词的多个索引,但需要人为判断需要删除的重复项:
- SELECT pg_size_pretty(sum(pg_relation_size(idx))::bigint) as size,
- (array_agg(idx))[1] as idx1, (array_agg(idx))[2] as idx2,
- (array_agg(idx))[3] as idx3, (array_agg(idx))[4] as idx4
- FROM (
- SELECT indexrelid::regclass as idx, (indrelid::text ||E'\n'|| indclass::text ||E'\n'|| indkey::text ||E'\n'||coalesce(indexprs::text,'')||E'\n' || coalesce(indpred::text,'')) as key
- FROM pg_index) sub
- GROUP BY key HAVING count(*)>1
- ORDER BY sum(pg_relation_size(idx)) DESC;
查找未使用的索引:
- select
- indexrelid::regclass as index, relid::regclass as table
- from
- pg_stat_user_indexes
- JOIN pg_index USING (indexrelid)
- where
- idx_scan = 0 and indisunique is false;
有些场景,重建索引后,索引就变小了。通常这种情况是索引字段乱序写入,导致索引频繁分裂,使得索引页并不是百分百填满,密度低,索引页面浪费。
索引碎片模拟
- 乱序写入:
-
- test=# create table t_split(id int);
- CREATE TABLE
- test=# create index idx_split on t_split (id);
- CREATE INDEX
- test=# insert into t_split select random()*1000000 from generate_series(1,1000000);
- INSERT 0 1000000
- test=# \di+ idx_split
- List of relations
- Schema | Name | Type | Owner | Table | Size | Description
- --------+-----------+-------+--------+---------+-------+-------------
- public | idx_split | index | system | t_split | 30 MB |
-
- (1 row)
-
- 顺序写入:
-
- test=# truncate t_split ;
- TRUNCATE TABLE
- test=# \di+ idx_split
- List of relations
- Schema | Name | Type | Owner | Table | Size | Description
- --------+-----------+-------+--------+---------+------------+-------------
- public | idx_split | index | system | t_split | 8192 bytes |
-
- (1 row)
-
- test=# insert into t_split select generate_series(1,1000000);
- INSERT 0 1000000
- test=# \di+ idx_split
- List of relations
- Schema | Name | Type | Owner | Table | Size | Description
- --------+-----------+-------+--------+---------+-------+-------------
- public | idx_split | index | system | t_split | 22 MB |
-
- (1 row)
-
- 先写入数据,后建索引:
-
- test=# drop index idx_split ;
- DROP INDEX
- test=# create index idx_split on t_split (id);
- CREATE INDEX
- test=# \di+ idx_split
- List of relations
- Schema | Name | Type | Owner | Table | Size | Description
- --------+-----------+-------+--------+---------+-------+-------------
- public | idx_split | index | system | t_split | 22 MB |
- (1 row)
-
-
- 业务运行久了,不断的增删改,也会导致索引碎片:
-
- test=# create table test(id int);
- CREATE TABLE
- test=# insert into test values(generate_series(1,1000000));
- INSERT 0 1000000
- test=# create index idx_fragmented on test(id);
- CREATE INDEX
- CREATE EXTENSION kbstattuple;
- 刚刚创建的索引没有碎片:
- test=# \x
- Expanded display is on.
- test=# SELECT * FROM pgstatindex('idx_fragmented');
- -[ RECORD 1 ]------+---------
- version | 4
- tree_level | 2
- index_size | 22609920
- root_block_no | 289
- internal_pages | 11
- leaf_pages | 2748
- empty_pages | 0
- deleted_pages | 0
- avg_leaf_density | 89.93
- leaf_fragmentation | 0
-
-
- leaf_fragmentation的碎片率是33.33%:
- test=# insert into test values(generate_series(1,1000000));
- INSERT 0 1000000
- test=# SELECT * FROM pgstatindex('idx_fragmented');
- -[ RECORD 1 ]------+---------
- version | 4
- tree_level | 2
- index_size | 67846144
- root_block_no | 289
- internal_pages | 39
- leaf_pages | 8242
- empty_pages | 0
- deleted_pages | 0
- avg_leaf_density | 60.06
- leaf_fragmentation | 33.33
-
-
- reindex之后,即可回收空间,减少碎片。
-
- test=# reindex index idx_fragmented;
- REINDEX
- test=# \di+ idx_fragmented
- List of relations
- -[ RECORD 1 ]---------------
- Schema | public
- Name | idx_fragmented
- Type | index
- Owner | system
- Table | test
- Size | 43 MB
- Description |
-
- test=# SELECT * FROM pgstatindex('idx_fragmented');
- -[ RECORD 1 ]------+---------
- version | 4
- tree_level | 2
- index_size | 45236224
- root_block_no | 208
- internal_pages | 26
- leaf_pages | 5495
- empty_pages | 0
- deleted_pages | 0
- avg_leaf_density | 90.01
- leaf_fragmentation | 0
总结
通过以上方法监控索引膨胀,以及索引碎片情况,及时对索引reindex 进行碎片优化,建议不要在一个表上建太多索引,准确评估经常update的列和经常select的列,以便创建合适的索引。