This Ruby extension uses the actual PostgreSQL server source to parse SQL queries and return the internal PostgreSQL parsetree.
In addition the extension allows you to normalize queries (replacing constant values with ?) and parse these normalized queries into a parsetree again.
When you build this extension, it fetches a copy of the PostgreSQL server source and builds parts of it, and then statically links it into this extension.
This is slightly crazy, but is the only reliable way of parsing all valid PostgreSQL queries.
You can find further examples and a longer rationale here: https://pganalyze.com/blog/parse-postgresql-queries-in-ruby.html
gem install pg_query
Due to compiling parts of PostgreSQL, installation might take a while on slower systems. Expect up to 5 minutes.
PgQuery.parse("SELECT 1")
=> #<PgQuery:0x007fe92b27ea18
@parsetree=
[{"SELECT"=>
{"distinctClause"=>nil,
"intoClause"=>nil,
"targetList"=>
[{"RESTARGET"=>
{"name"=>nil,
"indirection"=>nil,
"val"=>{"A_CONST"=>{"val"=>1, "location"=>7}},
"location"=>7}}],
...}}],
@query="SELECT 1",
@warnings=[]>
parsed_query = PgQuery.parse("SELECT * FROM users")
=> #<PgQuery:0x007ff3e956c8b0
@parsetree=
[{"SELECT"=>{"distinctClause"=>nil,
"intoClause"=>nil,
"targetList"=>
[{"RESTARGET"=>
{"name"=>nil,
"indirection"=>nil,
"val"=>
{"COLUMNREF"=>
{"fields"=>[{"A_STAR"=>{}}],
"location"=>7}},
"location"=>7}}],
"fromClause"=>
[{"RANGEVAR"=>
{"schemaname"=>nil,
"relname"=>"users",
"inhOpt"=>2,
"relpersistence"=>"p",
"alias"=>nil,
"location"=>14}}],
...}}],
@query="SELECT * FROM users",
@warnings=[]>
# Modify the parse tree in some way
parsed_query.parsetree[0]['SELECT']['fromClause'][0]['RANGEVAR']['relname'] = 'other_users'
# Turn it into SQL again
parsed_query.deparse
=> "SELECT * FROM other_users"
Note: The deparsing feature is experimental and does not support outputting all SQL yet.
# Normalizing a query (like pg_stat_statements)
PgQuery.normalize("SELECT 1 FROM x WHERE y = 'foo'")
=> "SELECT ? FROM x WHERE y = ?"
# Parsing a normalized query
PgQuery.parse("SELECT ? FROM x WHERE y = ?")
=> #<PgQuery:0x007fb99455a438
@parsetree=
[{"SELECT"=>
{"distinctClause"=>nil,
"intoClause"=>nil,
"targetList"=>
[{"RESTARGET"=>
{"name"=>nil,
"indirection"=>nil,
"val"=>{"PARAMREF"=>{"number"=>0, "location"=>7}},
"location"=>7}}],
"fromClause"=>
[{"RANGEVAR"=>
{"schemaname"=>nil,
"relname"=>"x",
"inhOpt"=>2,
"relpersistence"=>"p",
"alias"=>nil,
"location"=>14}}],
"whereClause"=>
{"AEXPR"=>
{"name"=>["="],
"lexpr"=>{"COLUMNREF"=>{"fields"=>["y"], "location"=>22}},
"rexpr"=>{"PARAMREF"=>{"number"=>0, "location"=>26}},
"location"=>24}},
...}}],
@query="SELECT ? FROM x WHERE y = ?",
@warnings=[]>
PgQuery.parse("SELECT ? FROM x JOIN y USING (id) WHERE z = ?").tables
=> ["x", "y"]
PgQuery.parse("SELECT ? FROM x WHERE x.y = ? AND z = ?").filter_columns
=> [["x", "y"], [nil, "z"]]
PgQuery.parse("SELECT 1").fingerprint
=> "db76551255b7861b99bd384cf8096a3dd5162ab3"
PgQuery.parse("SELECT 2; --- comment").fingerprint
=> "db76551255b7861b99bd384cf8096a3dd5162ab3"
This gem is based on the latest stable PostgreSQL version, but applies a few patches to make this library work:
- 01_output_nodes_as_json.patch: Auto-generated outfuncs that outputs a parsetree as JSON (called through nodeToJSONString)
- 02_parse_replacement_char.patch: Modify scan.l/gram.y to support parsing normalized queries
- Known regression: Removed support for custom operators containing "?" (doesn't affect hstore/JSON/geometric operators)
- 03_regenerate_bison_flex_files.patch: Regenerate scan.c/gram.c to avoid bison/flex dependency on deployment
High-level unit tests for these patches are inside this library.
- Jack Danger Canty, for significantly improving deparsing
Copyright (c) 2015, pganalyze Team [email protected]
pg_query is licensed under the 3-clause BSD license, see LICENSE file for details.
Query normalization code:
Copyright (c) 2008-2015, PostgreSQL Global Development Group