Scan and collect the minimal amount of data needed to identify potential problems in your PostgreSQL database, and then generate an analysis report using that data. This project provides two SQL scripts for users:
gather.sql
: Gathers performance and configuration data from PostgreSQL databases.gather_report.sql
: Analyzes the collected data and generates detailed HTML reports.Everything is SQL-only, leveraging the built-in features of psql, the command-line utility of PostgreSQL
Over time, DBAs will collect large amounts of SQL statements to analyze database problems. It is not easy to maintain such a large collection of SQL scripts, ensuring they all remain relevant. Functionality changes in the PostgreSQL version put additional challenges on DBAs.
So, The first objective of this project is to provide a single SQL script which works on all currently supported PostgreSQL versions and on all platforms without worrying whether the SQL fails or not.
There could be highly restricted environments like cloud-provided “Database as a Service” or users could be connecting using a less privillaged account.
So, the second objective is to be fail-safe. That means: Try to collect the data. But if access is denied and any specific SQL fails, collect only those data which is allowed and provide analysis based on that. A complete failure is unacceptable.
The third objective is to provide accurate and pinpointing analysis after automatically analyzing the collected data. The method should provide additional details which might be relevant for further analysis.
The fourth objective is to provide complete transparency to users on what is collected and allow them to mask any data, if required before giving it for analysis.
Supported PostgreSQL Versions : 10, 11, 12, 13, 14, 15, 16 & 17
*PostgreSQL client psql
which is used for collecting the data should be at version 11 minimum.
Older versions : For PostgeSQL 9.6 and older, Please refer the documentation page
pg_gather
requires only the standard PostgreSQL command line utility, psql
, and no other libraries or executables.pg_gather
, because it uses the standard psql
command-line utility.psql
is available, ensuring maximum portability.
(Windows users, please see the Notes section below)psql
is available.pg_gather
collects data in Tab Separated Values (TSV) format, making it easy to review and audit the information before sharing it for analysis. Additional masking or trimming is also possible with simple steps.pg_gather
can generate a report from available information even if data collection is partial or fails due to permission issues, unavailable tables/views, or other reasons.gzip
for the easy transmission and storage, pg_gather
avoids redundancy in the collected data as much as possible.To gather configuration and performance information, run the gather.sql
script against the database using psql
:
psql <connection_parameters_if_any> -X -f gather.sql > out.tsv
OR ALTERNATIVELY pipe to a compression utilty to get a compressed output as follows:
psql <connection_parameters_if_any> -X -f gather.sql | gzip > out.tsv.gz
This script may take over 20 seconds to run because it contains sleeps/delays. We recommend running the script as a privileged user (such as superuser
or rds_superuser
) or as an account with the pg_monitor
privilege. The output file contains performance and configuration data for analysis.
pg_statistics
may produce errors during data collection, but these errors can be ignored.psql
, which can be used to run pg_gather
against local or remote databases.
For example:
"C:\Program Files\pgAdmin 4\v4\runtime\psql.exe" -h pghost -U postgres -f gather.sql > out.tsv
gather.sql
script to replace any unapplicable lines with “NULL”.
sed -i -e 's/^CASE WHEN pg_is_in_recovery().*/NULL/' gather.sql
curl
or wget
utilities necessary to download
gather.sql. Therefore, it is recommended to pipe the contents of the SQL file to
psql` instead.
cat gather.sql | docker exec -i <container> psql -X -f - > out.tsv
cat gather.sql | kubectl exec -i <PGpod> -- psql -X -f - > out.tsv
There could be requirements for collecting data continuously and repatedly. pg_gather
has a special lightweight mode for continuous data gathering, which is automatically enabled when it connects to the “template1” database. Please refer to detailed documentation specific to continuous and repated data collection
The collected data can be imported to a PostgreSQL Instance. This creates required schema objects in the public
schema of the database.
CAUTION : Avoid importing the data into critical environments/databases. A temporary PostgreSQL instance is preferable.
psql -f gather_schema.sql -f out.tsv
Deprecated usage of sed
: sed -e ‘/^Pager/d; /^Tuples/d; /^Output/d; /^SELECT pg_sleep/d; /^PREPARE/d; /^\s*$/d’ out.tsv | psql -f gather_schema.sql -
An analysis report in HTML format can be generated from the imported data as follows.
psql -X -f gather_report.sql > GatherReport.html
You may use your favourite web browser to read the report.
NOTE: PostgreSQL version 13 or above is required to generate the analysis report.
The steps for data analysis mentioned above seem simple (single command), but they require a PostgreSQL instance to import the data into. An alternative is to use the generate_report.sh
script, which can spin up a PostgreSQL Docker container and automate the entire process. To use this script, you must place it in a directory containing the gather_schema.sql
and gather_report.sql
files.
The script will spin up a Docker container, import the output of gather.sql
(out.tsv) and then it generates an HTML report. This script expects at least a single argument: path to the out.tsv
.
There are two more additional positional arguments:
Example 1: Import data and generate an HTML file
$ ./generate_report.sh /tmp/out.tsv
...
Container 61fbc6d15c626b484bdf70352e94bbdb821971de1e00c6de774ca5cd460e8db3 deleted
Finished generating report in /tmp/out.txt.html
Example 2: Import data, keep the container intact and generate the report in the specified location
$ ./generate_report.sh /tmp/out.tsv /tmp/custom-name.html y
...
Container df7b228a5a6a49586e5424e5fe7a2065d8be78e0ae3aa5cddd8658ee27f4790c left around
Finished generating report in /tmp/custom-name.html
By default, the pg_gather
report uses the same timezone of the server from which the data is collected, because it considers the log_timezone
parameter for generating the report. This default timezone setting helps to compare the PostgreSQL log entries with the pg_gather
report.
However, this may not be the right timezone for few users, especially when cloud hostings are used. The pg_gather
allows the user to have a custom timezone by setting the environment variable PG_GATHER_TIMEZONE
to override the default. For example,
export PG_GATHER_TIMEZONE='UTC'
Please use the timezone name or abbreviation available from pg_timezone_names