PostgreSql VACUUM – dba cheat-sheet

17.12.2020

This article is inspired by an actual event that caused the overall application performance of one of our clients applications to drop dramatically due to bad PostgreSql database performance. This was caused by „deleted“ data not beeing actually deleted and therefore slowing down I/O ops on the db. During the analysis of the problem, we gained some knowledge about the PG-internal „vacuuming“ procedure. This knowledge will be shared with you in the following article.

The vacuum command can be used to cleanup table data that has been marked as „deleted“. It can be compared to „empty the trash“ on the OS. Normally, the vacuuming is performed periodically in the background automatically by the database (if configured correctly).

But: The automatic vacuuming is suspended as long as there are active connections to the data that could be vacuumed. This can cause data to not get deleted correctly and in result the diskspace used by the db to grow.

Therefore, vacuuming can also be triggered manually for the whole db or for individual tables by the following command:

VACUUM full <table_name (optional)>

Since vacuuming the whole db takes very long, it is recommended to vacuum individual tables. To check, which tables need to be vacuumed, the following command can be used:

SELECT schemaname, relname, last_autovacuum, n_dead_tup
FROM pg_stat_all_tables
WHERE schemaname in ('zone_trusted','zone_refined')
ORDER BY schemaname, relname;

This displays the date of the last autovacuum within the column „last_autovacuum“ and the number of potentially dead tuples (=tuples that will be removed with the next vacuuming) within the column „n_dead_tup“.

An example looks like:

schemaname relname last_autovacuum n_dead_tup
myschema table_a 2020-10-13 06:01:14.141055+00 705
myschema table_b 2020-12-15 04:09:40.142925+00 12872
myschema table_c 2020-12-15 09:27:13.611185+00 0
myschema table_d 2020-12-15 11:38:28.502629+00 37
myschema table_e 2020-12-15 11:40:29.094987+00 0

 

Additionally, the used table space of the tables can bee looked up with the following command (the parameter „nspname“ (= the name of the schema to include in the analysis) might need to be adapted)

SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size",
pg_size_pretty(pg_relation_size(C.oid)) AS "table_size",
pg_size_pretty(pg_indexes_size(C.oid)) AS "index_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname IN ('zone_trusted', 'zone_refined')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC;

An example output looks like:

relation total_size table_size index_size
table_a 7804 MB 6350 MB 1453 MB
table_b 3801 MB 3352 MB 448 MB
table_c 2508 MB 2306 MB 201 MB
table_d 2139 MB 2017 MB 121 MB
table_e 1501 MB 753 MB 748 MB

 

With this two measurements, the db-admin can decide, which tables shall be vacuumed manually. Criterias are: Has not been vacuumed for a long time and has dead tuples and is using a relevant amount of space (e.g. GB).

In the example used here, the table_b should be vacuumed since it has a relevant size and many dead tuples.

 

Mehr über Software Development erfahren

Zurück zur Übersicht

Kommentar verfassen

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert

*Pflichtfelder

*