Alternatives to VACUUM FULL in Postgres

16.10.2023

If deleted data („dead tuples“) are not purged in time, tables continue to grow. With VACUUM FULL, this space can be freed again. However, this leads to a downtime of the table due to its EXCLUSIVE LOCK. Are there other tools available that support online running?

1) Autovacuum

VACUUM FULL should not be used as a problem solver by default. Autovacuum is intended for regular cleaning. It removes entries marked as deleted, thus freeing space for new entries in the table. Autovacuum is fully automatic and becomes active as soon as a certain threshold of changed data is exceeded.

Only in rare cases is a VACUUM FULL actually necessary. For example, when a large purge has been performed and 90% of the data has been deleted. In this case, Autovacuum will not be able to shrink the oversized table back to the space actually needed.

 

Exemplary use of autovacuum:

There is a global configuration for autovacuum:

select name, setting
from pg_settings 
where name in ('autovacuum', 'autovacuum_analyze_scale_factor', 'autovacuum_analyze_threshold');
name                                 |setting  |
-------------------------------------+---------+
autovacuum                           |on       |
autovacuum_analyze_scale_factor      |0.05     |
autovacuum_analyze_threshold         |50       |

Likewise, the values can be set at table level:

CREATE TABLE dummy_table (
	..
)
WITH (
	autovacuum_vacuum_scale_factor=0.001,
	autovacuum_vacuum_threshold=250,
	autovacuum_enabled=true
);

The last autovacuum run per table can be seen in the statistics:

select relname, last_autovacuum, n_live_tup, n_dead_tup
from pg_catalog.pg_stat_all_tables;
relname |last_autovacuum              |n_live_tup|n_dead_tup|
--------+-----------------------------+----------+----------+
dummy_1 |2023-08-10 07:04:53.803 +0200|       160|         0|
dummy_2 |2023-08-28 17:21:41.258 +0200|  25146898|     18504|
dummy_3 |2023-08-10 07:04:38.423 +0200|  10579558|       140|

Preconditions:

  • Autovacuum needs to be enabled
  • Threshhold must be configured

Needed Permissions: 

  • Requires table permissions.

Necessary storage space: 

  • Low.

Table Lock: 

  • Weak Lock: Interrupts the clean-up as soon as another process is blocked.

Execution:

  • Automated execution when threshold is reached.

2) Copy table manually

A very simple alternative that can be used on all systems is to manually copy the table. This involves copying the contents of the table to a new table, deleting the original table, and then renaming the new table to the name of the original table. It is important to be aware of this:

  • Permissions and indexes must also be copied.
  • For very large tables, copying should be done in batches.

Example of copying a table manually:

CREATE TABLE dummy_table_new
(
  dummy_field_1 int,
  dummy_field_2 int,
  ...
);

INSERT INTO dummy_table_new
SELECT * 
FROM dummy_table 
-- optional: 
WHERE ID >= insert_min_id 
LIMIT insert_max_amount
;
-- TODO: Copy Indexes and Permissions

DROP TABLE dummy_table CASCADE;
ALTER TABLE dummy_table_new RENAME TO dummy_table;

Preconditions:

  • SQL must be written specifically for the table.
  • For batch execution: A unique identifier is required.

Needed Permissions: 

  • Requires table permissions.

Necessary storage space: 

  • Additional disk space is required equal to the table size including indexes.

Table Lock: 

  • The table should be locked during this time, as new changes will not be taken into account when copying.

Execution:

  • Manual Execution

3) pg_repack

pg_repack performs the same functions as VACUUM FULL, but can be run online. It is a purely console-based tool that can be used to perform ad hoc cleanups.

Exemplary use of pg_repack:

pg_repack --host=localhost --port=6432 --username=dummy_user --dbname=dummy_db --table=dummy_table

INFO: repacking table "public.dummy_table"

Preconditions:

  • pg_repack extension needs to be installed.
  • Can only be run from the console. This must either be run directly on the DB host, or pg_repack must be additionally installed on a client host.

Needed Permissions: 

  • Superuser is needed.

Necessary storage space: 

  • Additional disk space is required equal to the table size including indexes.

Table Lock:

  • Short table locks.
  • In this case, other running transactions will be killed after a certain time (default 2 minutes). If this is not desired, it can be disabled with –no-kill-backend.

Execution:

  • Manual Execution

4) pg_squeeze

pg_squeeze is considered the successor to pg_repack. It is an automatic DB background job. It becomes active when the free space of a table exceeds a certain threshold.

Exemplary use of pg_sqeeze:

Execution at 22:30, always on Wednesdays and Fridays, when there is more than 30% empty space in the table:

INSERT INTO squeeze.tables (tabschema, tabname, schedule, free_space_extra)
VALUES ('public', 'foo', ('{30}', '{22}', NULL, NULL, '{3, 5}'), 30);

-- schedule: minutes, hours, days_of_month, months, days_of_week

Preconditions:

  • The pg_squeeze extension must be installed and enabled.
  • Table needs a primary key or unique constraint.

Needed Permissions: 

  • Must grant permissions on table squeeze.tables.

Necessary storage space: 

  • Additional disk space is required equal to the table size including indexes.

Table Lock:

  • Short table locks.
  • Maximum lock time can be limited with squeeze.max_xlock_time.

Execution:

  • Automatic execution, at a specified time, when the table_bloat threshold is reached.
  • Ad hoc execution is also possible.

In Summary

no Full Table Lock Automatic Execution
VACUUM FULL  ❌
AUTOVACUUM** ✅*
manual copy  ❌
pg_repack ✅*  ❌
pg_squeeze ✅*

* only short table locks
** Does not reduce the size of the table, only frees up space for new entries.

In summary, the combination of AUTOVACUUM and pg_squeeze is recommended, as it provides the most benefits.


Links:

 

Interested in more information about VACUUM? Here are all the posts:

Zurück zur Übersicht

Kommentar verfassen

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert

*Pflichtfelder

*