-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support SELECT/INSERT/UPDATE/DELETE db operation + span names #1253
base: main
Are you sure you want to change the base?
Conversation
8f0e456
to
5340675
Compare
@@ -8,6 +8,8 @@ import ( | |||
"os" | |||
"strconv" | |||
|
|||
sql "github.com/xwb1989/sqlparser" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked through a couple sql parsing libraries, this isn't the most maintained... https://github.com/krasun/gosqlparser seems good but for some reason I don't think it supports *
in queries (??). For what we're doing now I think it's fine but this could use another look some day
89e64c7
to
7116a06
Compare
c809878
to
97ccfaf
Compare
|
||
if operation != "" { | ||
span.Attributes().PutStr(string(semconv.DBOperationNameKey), operation) | ||
span.SetName(operation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try to build the {db.query.summary}
or {db.operation.name} {target}
here given we have already parsed the query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally doing db.query.summary
, but we don't have any of the values for {target}
(db.collection, db.namespace, or server address). So, it recommends just using the operation:
If a corresponding
{target}
value is not available for a specific operation, the instrumentation SHOULD omit the{target}
. For example, for an operation describing SQL query on an anonymous table likeSELECT * FROM (SELECT * FROM table) t
, span name should beSELECT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table might be defined in the query statement though, right? The query parsing library should be able to get this, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I could find, unfortunately :\ The first library I tried (https://github.com/krasun/gosqlparser) does have that, which is how I was building it at first. But that library breaks when it tries to parse a *
character (like SELECT * FROM
). Which makes no sense to me, but that's my understanding from trying to debug it and read that library's code. I'm open to add a TODO here for this though because maybe I'm just missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should work, right?
package main
import (
"fmt"
"github.com/xwb1989/sqlparser"
)
// Parse takes a SQL query string and returns the parsed query statement type
// and table name, or an error if parsing failed.
func Parse(query string) (string, string, error) {
stmt, err := sqlparser.Parse(query)
if err != nil {
return "", "", fmt.Errorf("failed to parse query: %w", err)
}
switch stmt := stmt.(type) {
case *sqlparser.Select:
return "SELECT", getTableName(stmt.From), nil
case *sqlparser.Update:
return "UPDATE", getTableName(stmt.TableExprs), nil
case *sqlparser.Insert:
return "INSERT", stmt.Table.Name.String(), nil
case *sqlparser.Delete:
return "DELETE", getTableName(stmt.TableExprs), nil
default:
return "UNKNOWN", "", fmt.Errorf("unsupported statement type")
}
}
// getTableName extracts the table name from a SQL node.
func getTableName(node sqlparser.SQLNode) string {
switch tableExpr := node.(type) {
case sqlparser.TableName:
return tableExpr.Name.String()
case sqlparser.TableExprs:
for _, expr := range tableExpr {
if tableName, ok := expr.(*sqlparser.AliasedTableExpr); ok {
if name, ok := tableName.Expr.(sqlparser.TableName); ok {
return name.Name.String()
}
}
}
}
return ""
}
func main() {
queries := []string{
"SELECT * FROM users WHERE id = 1",
"SELECT id, name FROM users WHERE id = 1",
"INSERT INTO orders (id, amount) VALUES (1, 100)",
"UPDATE products SET price = 19.99 WHERE id = 10",
"DELETE FROM sessions WHERE expired = true",
"CREATE TABLE logs (id INT, message TEXT)",
}
for _, query := range queries {
fmt.Println("Query: ", query)
statement, table, err := Parse(query)
if err != nil {
fmt.Println("Error:", err)
continue
}
fmt.Printf("Statement: %s, Table: %s\n", statement, table)
}
}
$ go run .
Query: SELECT * FROM users WHERE id = 1
Statement: SELECT, Table: users
Query: SELECT id, name FROM users WHERE id = 1
Statement: SELECT, Table: users
Query: INSERT INTO orders (id, amount) VALUES (1, 100)
Statement: INSERT, Table: orders
Query: UPDATE products SET price = 19.99 WHERE id = 10
Statement: UPDATE, Table: products
Query: DELETE FROM sessions WHERE expired = true
Statement: DELETE, Table: sessions
Query: CREATE TABLE logs (id INT, message TEXT)
Error: unsupported statement type
FWIW, looking at vitess, it seems like it should be able to support more statement types in the table lookup (i.e. table create).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using "vitess.io/vitess/go/vt/sqlparser"
:
package main
import (
"fmt"
"vitess.io/vitess/go/vt/sqlparser"
)
// Parse takes a SQL query string and returns the parsed query statement type
// and table name(s), or an error if parsing failed.
func Parse(query string) (string, []string, error) {
p, err := sqlparser.New(sqlparser.Options{})
if err != nil {
return "", nil, fmt.Errorf("failed to create parser: %w", err)
}
stmt, err := p.Parse(query)
if err != nil {
return "", nil, fmt.Errorf("failed to parse query: %w", err)
}
var statementType string
var tables []string
switch stmt := stmt.(type) {
case *sqlparser.Select:
statementType = "SELECT"
tables = extractTables(stmt.From)
case *sqlparser.Update:
statementType = "UPDATE"
tables = extractTables(stmt.TableExprs)
case *sqlparser.Insert:
statementType = "INSERT"
tables = []string{stmt.Table.TableNameString()}
case *sqlparser.Delete:
statementType = "DELETE"
tables = extractTables(stmt.TableExprs)
case *sqlparser.CreateTable:
statementType = "CREATE TABLE"
tables = []string{stmt.Table.Name.String()}
case *sqlparser.AlterTable:
statementType = "ALTER TABLE"
tables = []string{stmt.Table.Name.String()}
case *sqlparser.DropTable:
statementType = "DROP TABLE"
for _, table := range stmt.FromTables {
tables = append(tables, table.Name.String())
}
case *sqlparser.CreateDatabase:
statementType = "CREATE DATABASE"
tables = []string{stmt.DBName.String()}
case *sqlparser.DropDatabase:
statementType = "DROP DATABASE"
tables = []string{stmt.DBName.String()}
case *sqlparser.TruncateTable:
statementType = "TRUNCATE TABLE"
tables = []string{stmt.Table.Name.String()}
default:
return "UNKNOWN", nil, fmt.Errorf("unsupported statement type")
}
return statementType, tables, nil
}
// extractTables extracts table names from a list of SQL nodes.
func extractTables(exprs sqlparser.TableExprs) []string {
var tables []string
for _, expr := range exprs {
switch tableExpr := expr.(type) {
case *sqlparser.AliasedTableExpr:
if name, ok := tableExpr.Expr.(sqlparser.TableName); ok {
tables = append(tables, name.Name.String())
}
}
}
return tables
}
func main() {
queries := []string{
"SELECT * FROM users WHERE id = 1",
"SELECT id, name FROM users WHERE id = 1",
"INSERT INTO users (id, name) VALUES (1, 'Alice')",
"UPDATE users SET name = 'Bob' WHERE id = 1",
"DELETE FROM users WHERE id = 1",
"CREATE TABLE users (id INT, name VARCHAR(100))",
"ALTER TABLE users ADD COLUMN age INT",
"DROP TABLE users",
"CREATE DATABASE test_db",
"DROP DATABASE test_db",
"TRUNCATE TABLE users",
}
for _, query := range queries {
fmt.Println("Query: ", query)
statement, tables, err := Parse(query)
if err != nil {
fmt.Printf("Error parsing query: %s\nQuery: %s\n\n", err, query)
continue
}
fmt.Printf("Statement: %s, Tables/DBs: %v\n", statement, tables)
}
}
$ go run .
Query: SELECT * FROM users WHERE id = 1
Statement: SELECT, Tables/DBs: [users]
Query: SELECT id, name FROM users WHERE id = 1
Statement: SELECT, Tables/DBs: [users]
Query: INSERT INTO users (id, name) VALUES (1, 'Alice')
Statement: INSERT, Tables/DBs: [users]
Query: UPDATE users SET name = 'Bob' WHERE id = 1
Statement: UPDATE, Tables/DBs: [users]
Query: DELETE FROM users WHERE id = 1
Statement: DELETE, Tables/DBs: [users]
Query: CREATE TABLE users (id INT, name VARCHAR(100))
Statement: CREATE TABLE, Tables/DBs: [users]
Query: ALTER TABLE users ADD COLUMN age INT
Statement: ALTER TABLE, Tables/DBs: [users]
Query: DROP TABLE users
Statement: DROP TABLE, Tables/DBs: [users]
Query: CREATE DATABASE test_db
Statement: CREATE DATABASE, Tables/DBs: [test_db]
Query: DROP DATABASE test_db
Statement: DROP DATABASE, Tables/DBs: [test_db]
Query: TRUNCATE TABLE users
Statement: TRUNCATE TABLE, Tables/DBs: [users]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use vitess with your suggestion, looks good to me, ptal!
go.mod
Outdated
@@ -22,6 +22,7 @@ require ( | |||
github.com/hashicorp/go-version v1.7.0 | |||
github.com/pkg/errors v0.9.1 | |||
github.com/stretchr/testify v1.9.0 | |||
github.com/xwb1989/sqlparser v0.0.0-20180606152119-120387863bf2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we instead use the code this was based on: https://github.com/vitessio/vitess/tree/main/go/vt/sqlparser
It seems like this was a fork of that repository1, but it hasn't been maintained or synced since that fork.
Would it be too large of a dependency to just rely on vitess directly?
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did think about using vitess directly, but it seemed like a lot more than what we needed. Since this is just an implementation detail, we could refactor it later if we want but I think this is the best option for right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned when looking through all the changes that have been applied to the upstream code over the past 6 years, there's a considerable amount of changes. Notably:
- There are many additional statement types that can be parsed: https://pkg.go.dev/vitess.io/vitess/go/vt/sqlparser#StatementType
- The statement type itself can print its name meaning we will not need to add an additional switch statement: https://pkg.go.dev/vitess.io/vitess/go/vt/sqlparser#StatementType.String
- The code is not owned by Google, it is licensed by a CNCF project
- Make Parser Memory Usage More Efficient vitessio/vitess#4174
- Misc fixes vitessio/vitess#2395 which addressed:
- [parser] use table_alias for ENGINE option in CREATE TABLE stmt vitessio/vitess#8307
Should we make our own fork of this pacakge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe get a better understanding of dependency size by looking at what the binary size is based on the possible dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll admit it looks like you've dug into it deeper than I did, hah. These are some great points so maybe that is the better approach. I'll try switching it to that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The binary size went from 30MB->37MB with vitess included, that's not small but I'm not personally concerned about it and it could be something we look to refactor later. wdyt?
d005d0f
to
cccbf66
Compare
{ | ||
"key": "db.query.text", | ||
"value": { | ||
"stringValue": "syntax error" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will follow up to parse error response codes from databases separately, and mark them as error spans appropriately. Don't want to grow this PR too much and this is equivalent to what we do now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an exciting change. Mainly small comments outside of the compatibility policy issue.
Do we need to support older versions of Go? Would that break any downstream users of go.opentelemetry.io/auto
?
@@ -27,6 +27,7 @@ Alternatively, you can add support for additional or different configurations by | |||
| Environment variable | Description | Default value | | |||
|-------------------------------------|--------------------------------------------------------|---------------| | |||
| `OTEL_GO_AUTO_INCLUDE_DB_STATEMENT` | Sets whether to include SQL queries in the trace data. | | | |||
| `OTEL_GO_AUTO_INCLUDE_DB_OPERATION` | Sets whether to include SQL operation in the trace data. | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `OTEL_GO_AUTO_INCLUDE_DB_OPERATION` | Sets whether to include SQL operation in the trace data. | | | |
| `OTEL_GO_AUTO_PARSE_DB_STATEMENT` | Sets whether to parse the SQL statement for trace data. | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not valid to have OTEL_GO_AUTO_INCLUDE_DB_OPERATION
without OTEL_GO_AUTO_INCLUDE_DB_STATEMENT
, right? If so we should probably point that out.
Also, should we state the exact attributes we'll add? i.e db.collection.name
db.operation.name
etc`
@@ -1,6 +1,6 @@ | |||
module go.opentelemetry.io/auto | |||
|
|||
go 1.22.7 | |||
go 1.23.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this might be problematic. This would violate our compatibility policy: https://github.com/open-telemetry/opentelemetry-go-instrumentation?tab=readme-ov-file#compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. this looks quite aggressive.
For reference, when I wanted to update the dependency of this repo I had to bump the go.mod to 1.22.7 which is what we require today (odigos-io/odigos@fb167f8), and we have quite a lot of other dependencies in Odigos and none of them required this relatively new version.
err := os.Setenv(IncludeDBOperationEnvVar, "true") | ||
assert.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err := os.Setenv(IncludeDBOperationEnvVar, "true") | |
assert.NoError(t, err) | |
t.Setenv(IncludeDBOperationEnvVar, "true") |
} | ||
|
||
logger.Info("queryDb called") | ||
logger.Info(fmt.Sprintf("Query: %s", query)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use the structured elements of the logger to handle this.
logger.Info(fmt.Sprintf("Query: %s", query)) | |
logger.Info("queryDB called", "query", query)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for these changes? can you pleas add it to the PR description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this all came in when I ran make update-licenses
(CI license check failed saying it needed updates). I agree it's weird that libbpf was also updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the libbpf script wasn't run.
Ref #743
From the semconv for db spans:
We don't have
collection
(table),namespace
, orserver.address
readily available (right now), but we can at least parse certain operations. We can also support more over time. So this gets us at least a couple useful span names besidesDB
(fallback for unsupported operations)This means we can also set
db.operation.name
with that same value when it is available.