This python program does vacuum/analyze/freeze actions on tables and materialized views based on the user inputs provided.
(c) 2018-2023 SQLEXEC LLC
GNU V3 and MIT licenses are conveyed accordingly.
Bugs can be reported @ [email protected]
The first version of this program was created in 2018.
Program renamed from optimize_db.py to pg_vacuum.py (December 2020)
This program is useful to identify and vacuum tables. Most inputs are optional, and either an optional parameter is not used or a default value is used if not provided. That means you can override internal parameters by specifying them on the command line. The latest version of pg_vacuum incorporates parallel vacuuming if maintenance workers are available. Here are the parameters:
-H --host
host name
-d --dbname
database name
-p --dbport
database port
-U --dbuser
database user
-s --maxsize
max table size in Gigabytes that will be considered (default, 1 TB)
-y --analyzemaxdays
Analyzes older than this will be considered
-x --vacuummaxdays
Vacuums older than this will be considered
-t --mindeadtups
minimum dead tups before considering a vacuum
-z --minmodanalyzed
minimum mod since analyzed tups before considering an analyze
-b --maxtables
max number of tables to vacuum (default 9999)
-i --ignoreparts
ignore partitioned tables
-a --async
run async jobs ignoring thresholds
-m --schema
if provided, perform actions only on this schema
-c --check
Only Check stats on tables
-r --dryrun
do a dry run for analysis before actually running it.
-q --inquiry
show stats to validate run. Best used with dryrun. Values: "all" | "found"
-v --verbose
Used primarily for debugging
-l --debug
Used primarily for debugging queries
-k --disablepageskipping
Used to force vacuum to not skip pages that the VM says are not needed
-g --orderbydate
Useful for prioritizing tables that haven't been vacuumed/analyzed the longest
-e --autotune
specifies scale_factor to use for both vaccums and analyzes (range: 0.00001 to 0.2)
-f --freeze
perform freeze if xid age input value is in range (range: 50,000,000 - 1,500,000,000), no commas
-n --nullsonly
Only consider tables with no vacuum or analyze history
- python 2.7 or above
- python packages: psycopg2
- Works on Linux and Windows.
- PostgreSQL versions 9.6 and up
- Only when a table is within 25 million of reaching the wraparound threshold is it considered a FREEZE candidate.
- By default, catalog tables are ignored unless specified explicitly with the --schema option.
- If passwords are required (authentication <> trust), then you must define credentials in the .pgpass (linux)/pgpass.conf (windows) files.
- The less parameters you supply, the more wide-open the vacuum operation, i.e., more tables qualify
Once you have your autovacuum tuned for your specific SQL workload, it is usually good to combine that with some nightly cronjobs to take some of the load off of the autovacuum daemon. The following 2 pg_vacuum.py jobs are an example of one setup to run on a nightly basis in which the 1st one checks for tables with no history of vacuums or analyzes and the 2nd one does them based on how long since the previous ones were done and perhaps on n_dead_tups/n_mod_since_analyze thresholds:
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres --maxsize 1000000000 --nullsonly
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres --maxsize 1000000000 -x2 -y 20 -t 10000
Basically, always check for tables without vacuums and analyzes. This can happen on newly created tables or after a PG service crashed, invalidating all the vacuum stats. The second job just makes sure we do vacuuming at least every 2 days if dead tuples has reached out maximum. Do analyzes for tables that haven't been analyzed in the last 20 days.
NOTE: all examples shown are in --dryrun mode since this is a best practice before actually running the command.
Vacuum all tables that don't have any vacuums/analyzes. Only do tables less that 100MB in size. Bypass partitioned tables. Dryrun first.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres --maxsize 1000000000 --nullsonly --ignoreparts --dryrun
Same as before but only do it for the first 50 tables.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 --nullsonly --ignoreparts --dryrun -b 50
Same as before but only do it for a specific schema.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 --nullsonly --ignoreparts --dryrun -b 50 --schema concept
Vacuum tables that haven't been vacuumed in 10 days, 20 days for analyzes. Dryrun first.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 -x 20 -y 20 --dryrun
Vacuum tables that have more than 1000 dead tuples and haven't been vacuumed in 20 days. Dryrun first.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 -x 20 -y 20 -t 1000 --dryrun
Run a check to get the overall status of vacuuming in the database.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 --check
Vacuum Freeze tables that are at the 90% threshold for transaction wrap-around to kick in.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 --pctfreeze 90 --freeze --dryrun
Vacuum/analyze tables whose dead tups/tups since analyzed > threshold factor.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 --autotune 0.1 --dryrun