PostgreSql VACUUM – dba cheat-sheet
This article is inspired by an actual event that caused the overall application performance of one of our clients applications to drop dramatically due to bad PostgreSql database performance. This was caused by “deleted” data not beeing actually deleted and therefore slowing down I/O ops on the db. During the analysis of the problem, we gained some knowledge about the PG-internal “vacuuming” procedure. This knowledge will be shared with you in the following article.
The vacuum command can be used to cleanup table data that has been marked as “deleted”. It can be compared to “empty the trash” on the OS. Normally, the vacuuming is performed periodically in the background automatically by the database (if configured correctly).
But: The automatic vacuuming is suspended as long as there are active connections to the data that could be vacuumed. This can cause data to not get deleted correctly and in result the diskspace used by the db to grow.
Therefore, vacuuming can also be triggered manually for the whole db or for individual tables by the following command:
VACUUM full <table_name (optional)>
Since vacuuming the whole db takes very long, it is recommended to vacuum individual tables. To check, which tables need to be vacuumed, the following command can be used:
SELECT schemaname, relname, last_autovacuum, n_dead_tup
WHERE schemaname in ('zone_trusted','zone_refined')
ORDER BY schemaname, relname;
This displays the date of the last autovacuum within the column “last_autovacuum” and the number of potentially dead tuples (=tuples that will be removed with the next vacuuming) within the column “n_dead_tup”.
An example looks like:
Additionally, the used table space of the tables can bee looked up with the following command (the parameter “nspname” (= the name of the schema to include in the analysis) might need to be adapted)
SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size",
pg_size_pretty(pg_relation_size(C.oid)) AS "table_size",
pg_size_pretty(pg_indexes_size(C.oid)) AS "index_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname IN ('zone_trusted', 'zone_refined')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC;
An example output looks like:
|table_a||7804 MB||6350 MB||1453 MB|
|table_b||3801 MB||3352 MB||448 MB|
|table_c||2508 MB||2306 MB||201 MB|
|table_d||2139 MB||2017 MB||121 MB|
|table_e||1501 MB||753 MB||748 MB|
With this two measurements, the db-admin can decide, which tables shall be vacuumed manually. Criterias are: Has not been vacuumed for a long time and has dead tuples and is using a relevant amount of space (e.g. GB).
In the example used here, the table_b should be vacuumed since it has a relevant size and many dead tuples.