Skip to content

Latest commit

 

History

History
116 lines (104 loc) · 5.96 KB

readme.md

File metadata and controls

116 lines (104 loc) · 5.96 KB

pg_vacuum

This python program does vacuum/analyze/freeze actions on tables and materialized views based on the user inputs provided.

(c) 2018-2023 SQLEXEC LLC
GNU V3 and MIT licenses are conveyed accordingly.
Bugs can be reported @ [email protected]

History

The first version of this program was created in 2018.
Program renamed from optimize_db.py to pg_vacuum.py (December 2020)

Overview

This program is useful to identify and vacuum tables. Most inputs are optional, and either an optional parameter is not used or a default value is used if not provided. That means you can override internal parameters by specifying them on the command line. The latest version of pg_vacuum incorporates parallel vacuuming if maintenance workers are available. Here are the parameters:
-H --host host name
-d --dbname database name
-p --dbport database port
-U --dbuser database user
-s --maxsize max table size in Gigabytes that will be considered (default, 1 TB)
-y --analyzemaxdays Analyzes older than this will be considered
-x --vacuummaxdays Vacuums older than this will be considered
-t --mindeadtups minimum dead tups before considering a vacuum
-z --minmodanalyzed minimum mod since analyzed tups before considering an analyze
-b --maxtables max number of tables to vacuum (default 9999)
-i --ignoreparts ignore partitioned tables
-a --async run async jobs ignoring thresholds
-m --schema if provided, perform actions only on this schema
-c --check Only Check stats on tables
-r --dryrun do a dry run for analysis before actually running it.
-q --inquiry show stats to validate run. Best used with dryrun. Values: "all" | "found"
-v --verbose Used primarily for debugging
-l --debug Used primarily for debugging queries
-k --disablepageskipping Used to force vacuum to not skip pages that the VM says are not needed
-g --orderbydate Useful for prioritizing tables that haven't been vacuumed/analyzed the longest
-e --autotune specifies scale_factor to use for both vaccums and analyzes (range: 0.00001 to 0.2)
-f --freeze perform freeze if xid age input value is in range (range: 50,000,000 - 1,500,000,000), no commas
-n --nullsonly Only consider tables with no vacuum or analyze history

Requirements

  1. python 2.7 or above
  2. python packages: psycopg2
  3. Works on Linux and Windows.
  4. PostgreSQL versions 9.6 and up

Assumptions

  1. Only when a table is within 25 million of reaching the wraparound threshold is it considered a FREEZE candidate.
  2. By default, catalog tables are ignored unless specified explicitly with the --schema option.
  3. If passwords are required (authentication <> trust), then you must define credentials in the .pgpass (linux)/pgpass.conf (windows) files.
  4. The less parameters you supply, the more wide-open the vacuum operation, i.e., more tables qualify

Vacuuming Best Practices

Once you have your autovacuum tuned for your specific SQL workload, it is usually good to combine that with some nightly cronjobs to take some of the load off of the autovacuum daemon. The following 2 pg_vacuum.py jobs are an example of one setup to run on a nightly basis in which the 1st one checks for tables with no history of vacuums or analyzes and the 2nd one does them based on how long since the previous ones were done and perhaps on n_dead_tups/n_mod_since_analyze thresholds:
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres --maxsize 1000000000 --nullsonly
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres --maxsize 1000000000 -x2 -y 20 -t 10000

Basically, always check for tables without vacuums and analyzes. This can happen on newly created tables or after a PG service crashed, invalidating all the vacuum stats. The second job just makes sure we do vacuuming at least every 2 days if dead tuples has reached out maximum. Do analyzes for tables that haven't been analyzed in the last 20 days.

Examples

NOTE: all examples shown are in --dryrun mode since this is a best practice before actually running the command.
Vacuum all tables that don't have any vacuums/analyzes. Only do tables less that 100MB in size. Bypass partitioned tables. Dryrun first.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres --maxsize 1000000000 --nullsonly --ignoreparts --dryrun

Same as before but only do it for the first 50 tables.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 --nullsonly --ignoreparts --dryrun -b 50

Same as before but only do it for a specific schema.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 --nullsonly --ignoreparts --dryrun -b 50 --schema concept

Vacuum tables that haven't been vacuumed in 10 days, 20 days for analyzes. Dryrun first.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 -x 20 -y 20 --dryrun

Vacuum tables that have more than 1000 dead tuples and haven't been vacuumed in 20 days. Dryrun first.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 -x 20 -y 20 -t 1000 --dryrun

Run a check to get the overall status of vacuuming in the database.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 --check

Vacuum Freeze tables that are at the 90% threshold for transaction wrap-around to kick in.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 --pctfreeze 90 --freeze --dryrun

Vacuum/analyze tables whose dead tups/tups since analyzed > threshold factor.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 --autotune 0.1 --dryrun