Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SELECT/INSERT/UPDATE/DELETE db operation + span names #1253

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

damemi
Copy link
Contributor

@damemi damemi commented Nov 7, 2024

Ref #743

From the semconv for db spans:

Database spans MUST follow the overall [guidelines for span names](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.37.0/specification/trace/api.md#span).

The span name SHOULD be {db.query.summary} if a summary is available.

If no summary is available, the span name SHOULD be {db.operation.name} {target} provided that a (low-cardinality) db.operation.name is available (see below for the exact definition of the [{target}](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/database-spans.md#target-placeholder) placeholder).

If a (low-cardinality) db.operation.name is not available, database span names SHOULD default to the [{target}](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/database-spans.md#target-placeholder).

If neither {db.operation.name} nor {target} are available, span name SHOULD be {db.system}.

Semantic conventions for individual database systems MAY specify different span name format.

The {target} SHOULD describe the entity that the operation is performed against and SHOULD adhere to one of the following values, provided they are accessible:

db.collection.name SHOULD be used for data manipulation operations or operations on a database collection.
db.namespace SHOULD be used for operations on a specific database namespace.
server.address:server.port SHOULD be used for other operations not targeting any specific database(s) or collection(s)
If a corresponding {target} value is not available for a specific operation, the instrumentation SHOULD omit the {target}. For example, for an operation describing SQL query on an anonymous table like SELECT * FROM (SELECT * FROM table) t, span name should be SELECT.

We don't have collection (table), namespace, or server.address readily available (right now), but we can at least parse certain operations. We can also support more over time. So this gets us at least a couple useful span names besides DB (fallback for unsupported operations)

This means we can also set db.operation.name with that same value when it is available.

@damemi damemi requested a review from a team as a code owner November 7, 2024 19:37
@damemi damemi force-pushed the sql-semconv branch 3 times, most recently from 8f0e456 to 5340675 Compare November 7, 2024 19:53
@@ -8,6 +8,8 @@ import (
"os"
"strconv"

sql "github.com/xwb1989/sqlparser"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through a couple sql parsing libraries, this isn't the most maintained... https://github.com/krasun/gosqlparser seems good but for some reason I don't think it supports * in queries (??). For what we're doing now I think it's fine but this could use another look some day

@damemi damemi force-pushed the sql-semconv branch 2 times, most recently from 89e64c7 to 7116a06 Compare November 8, 2024 19:49
@damemi damemi force-pushed the sql-semconv branch 3 times, most recently from c809878 to 97ccfaf Compare November 20, 2024 16:23

if operation != "" {
span.Attributes().PutStr(string(semconv.DBOperationNameKey), operation)
span.SetName(operation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to build the {db.query.summary} or {db.operation.name} {target} here given we have already parsed the query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally doing db.query.summary, but we don't have any of the values for {target} (db.collection, db.namespace, or server address). So, it recommends just using the operation:

If a corresponding {target} value is not available for a specific operation, the instrumentation SHOULD omit the {target}. For example, for an operation describing SQL query on an anonymous table like SELECT * FROM (SELECT * FROM table) t, span name should be SELECT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table might be defined in the query statement though, right? The query parsing library should be able to get this, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I could find, unfortunately :\ The first library I tried (https://github.com/krasun/gosqlparser) does have that, which is how I was building it at first. But that library breaks when it tries to parse a * character (like SELECT * FROM). Which makes no sense to me, but that's my understanding from trying to debug it and read that library's code. I'm open to add a TODO here for this though because maybe I'm just missing something

Copy link
Contributor

@MrAlias MrAlias Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work, right?

package main

import (
	"fmt"

	"github.com/xwb1989/sqlparser"
)

// Parse takes a SQL query string and returns the parsed query statement type
// and table name, or an error if parsing failed.
func Parse(query string) (string, string, error) {
	stmt, err := sqlparser.Parse(query)
	if err != nil {
		return "", "", fmt.Errorf("failed to parse query: %w", err)
	}

	switch stmt := stmt.(type) {
	case *sqlparser.Select:
		return "SELECT", getTableName(stmt.From), nil
	case *sqlparser.Update:
		return "UPDATE", getTableName(stmt.TableExprs), nil
	case *sqlparser.Insert:
		return "INSERT", stmt.Table.Name.String(), nil
	case *sqlparser.Delete:
		return "DELETE", getTableName(stmt.TableExprs), nil
	default:
		return "UNKNOWN", "", fmt.Errorf("unsupported statement type")
	}
}

// getTableName extracts the table name from a SQL node.
func getTableName(node sqlparser.SQLNode) string {
	switch tableExpr := node.(type) {
	case sqlparser.TableName:
		return tableExpr.Name.String()
	case sqlparser.TableExprs:
		for _, expr := range tableExpr {
			if tableName, ok := expr.(*sqlparser.AliasedTableExpr); ok {
				if name, ok := tableName.Expr.(sqlparser.TableName); ok {
					return name.Name.String()
				}
			}
		}
	}
	return ""
}

func main() {
	queries := []string{
		"SELECT * FROM users WHERE id = 1",
		"SELECT id, name FROM users WHERE id = 1",
		"INSERT INTO orders (id, amount) VALUES (1, 100)",
		"UPDATE products SET price = 19.99 WHERE id = 10",
		"DELETE FROM sessions WHERE expired = true",
		"CREATE TABLE logs (id INT, message TEXT)",
	}
	for _, query := range queries {
		fmt.Println("Query: ", query)
		statement, table, err := Parse(query)
		if err != nil {
			fmt.Println("Error:", err)
			continue
		}
		fmt.Printf("Statement: %s, Table: %s\n", statement, table)
	}
}
$ go run .
Query:  SELECT * FROM users WHERE id = 1
Statement: SELECT, Table: users
Query:  SELECT id, name FROM users WHERE id = 1
Statement: SELECT, Table: users
Query:  INSERT INTO orders (id, amount) VALUES (1, 100)
Statement: INSERT, Table: orders
Query:  UPDATE products SET price = 19.99 WHERE id = 10
Statement: UPDATE, Table: products
Query:  DELETE FROM sessions WHERE expired = true
Statement: DELETE, Table: sessions
Query:  CREATE TABLE logs (id INT, message TEXT)
Error: unsupported statement type

FWIW, looking at vitess, it seems like it should be able to support more statement types in the table lookup (i.e. table create).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using "vitess.io/vitess/go/vt/sqlparser":

package main

import (
	"fmt"

	"vitess.io/vitess/go/vt/sqlparser"
)

// Parse takes a SQL query string and returns the parsed query statement type
// and table name(s), or an error if parsing failed.
func Parse(query string) (string, []string, error) {
	p, err := sqlparser.New(sqlparser.Options{})
	if err != nil {
		return "", nil, fmt.Errorf("failed to create parser: %w", err)
	}

	stmt, err := p.Parse(query)
	if err != nil {
		return "", nil, fmt.Errorf("failed to parse query: %w", err)
	}

	var statementType string
	var tables []string

	switch stmt := stmt.(type) {
	case *sqlparser.Select:
		statementType = "SELECT"
		tables = extractTables(stmt.From)
	case *sqlparser.Update:
		statementType = "UPDATE"
		tables = extractTables(stmt.TableExprs)
	case *sqlparser.Insert:
		statementType = "INSERT"
		tables = []string{stmt.Table.TableNameString()}
	case *sqlparser.Delete:
		statementType = "DELETE"
		tables = extractTables(stmt.TableExprs)
	case *sqlparser.CreateTable:
		statementType = "CREATE TABLE"
		tables = []string{stmt.Table.Name.String()}
	case *sqlparser.AlterTable:
		statementType = "ALTER TABLE"
		tables = []string{stmt.Table.Name.String()}
	case *sqlparser.DropTable:
		statementType = "DROP TABLE"
		for _, table := range stmt.FromTables {
			tables = append(tables, table.Name.String())
		}
	case *sqlparser.CreateDatabase:
		statementType = "CREATE DATABASE"
		tables = []string{stmt.DBName.String()}
	case *sqlparser.DropDatabase:
		statementType = "DROP DATABASE"
		tables = []string{stmt.DBName.String()}
	case *sqlparser.TruncateTable:
		statementType = "TRUNCATE TABLE"
		tables = []string{stmt.Table.Name.String()}
	default:
		return "UNKNOWN", nil, fmt.Errorf("unsupported statement type")
	}

	return statementType, tables, nil
}

// extractTables extracts table names from a list of SQL nodes.
func extractTables(exprs sqlparser.TableExprs) []string {
	var tables []string
	for _, expr := range exprs {
		switch tableExpr := expr.(type) {
		case *sqlparser.AliasedTableExpr:
			if name, ok := tableExpr.Expr.(sqlparser.TableName); ok {
				tables = append(tables, name.Name.String())
			}
		}
	}
	return tables
}

func main() {
	queries := []string{
		"SELECT * FROM users WHERE id = 1",
		"SELECT id, name FROM users WHERE id = 1",
		"INSERT INTO users (id, name) VALUES (1, 'Alice')",
		"UPDATE users SET name = 'Bob' WHERE id = 1",
		"DELETE FROM users WHERE id = 1",
		"CREATE TABLE users (id INT, name VARCHAR(100))",
		"ALTER TABLE users ADD COLUMN age INT",
		"DROP TABLE users",
		"CREATE DATABASE test_db",
		"DROP DATABASE test_db",
		"TRUNCATE TABLE users",
	}

	for _, query := range queries {
		fmt.Println("Query: ", query)
		statement, tables, err := Parse(query)
		if err != nil {
			fmt.Printf("Error parsing query: %s\nQuery: %s\n\n", err, query)
			continue
		}
		fmt.Printf("Statement: %s, Tables/DBs: %v\n", statement, tables)
	}
}
$ go run .
Query:  SELECT * FROM users WHERE id = 1
Statement: SELECT, Tables/DBs: [users]
Query:  SELECT id, name FROM users WHERE id = 1
Statement: SELECT, Tables/DBs: [users]
Query:  INSERT INTO users (id, name) VALUES (1, 'Alice')
Statement: INSERT, Tables/DBs: [users]
Query:  UPDATE users SET name = 'Bob' WHERE id = 1
Statement: UPDATE, Tables/DBs: [users]
Query:  DELETE FROM users WHERE id = 1
Statement: DELETE, Tables/DBs: [users]
Query:  CREATE TABLE users (id INT, name VARCHAR(100))
Statement: CREATE TABLE, Tables/DBs: [users]
Query:  ALTER TABLE users ADD COLUMN age INT
Statement: ALTER TABLE, Tables/DBs: [users]
Query:  DROP TABLE users
Statement: DROP TABLE, Tables/DBs: [users]
Query:  CREATE DATABASE test_db
Statement: CREATE DATABASE, Tables/DBs: [test_db]
Query:  DROP DATABASE test_db
Statement: DROP DATABASE, Tables/DBs: [test_db]
Query:  TRUNCATE TABLE users
Statement: TRUNCATE TABLE, Tables/DBs: [users]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use vitess with your suggestion, looks good to me, ptal!

internal/pkg/instrumentation/bpf/database/sql/probe.go Outdated Show resolved Hide resolved
go.mod Outdated
@@ -22,6 +22,7 @@ require (
github.com/hashicorp/go-version v1.7.0
github.com/pkg/errors v0.9.1
github.com/stretchr/testify v1.9.0
github.com/xwb1989/sqlparser v0.0.0-20180606152119-120387863bf2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead use the code this was based on: https://github.com/vitessio/vitess/tree/main/go/vt/sqlparser

It seems like this was a fork of that repository1, but it hasn't been maintained or synced since that fork.

Would it be too large of a dependency to just rely on vitess directly?

Footnotes

  1. https://github.com/xwb1989/sqlparser?tab=readme-ov-file#notice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did think about using vitess directly, but it seemed like a lot more than what we needed. Since this is just an implementation detail, we could refactor it later if we want but I think this is the best option for right now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned when looking through all the changes that have been applied to the upstream code over the past 6 years, there's a considerable amount of changes. Notably:

Should we make our own fork of this pacakge?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe get a better understanding of dependency size by looking at what the binary size is based on the possible dependencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit it looks like you've dug into it deeper than I did, hah. These are some great points so maybe that is the better approach. I'll try switching it to that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binary size went from 30MB->37MB with vitess included, that's not small but I'm not personally concerned about it and it could be something we look to refactor later. wdyt?

@damemi damemi force-pushed the sql-semconv branch 2 times, most recently from d005d0f to cccbf66 Compare December 2, 2024 21:12
{
"key": "db.query.text",
"value": {
"stringValue": "syntax error"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will follow up to parse error response codes from databases separately, and mark them as error spans appropriately. Don't want to grow this PR too much and this is equivalent to what we do now

Copy link
Contributor

@MrAlias MrAlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an exciting change. Mainly small comments outside of the compatibility policy issue.

Do we need to support older versions of Go? Would that break any downstream users of go.opentelemetry.io/auto?

@@ -27,6 +27,7 @@ Alternatively, you can add support for additional or different configurations by
| Environment variable | Description | Default value |
|-------------------------------------|--------------------------------------------------------|---------------|
| `OTEL_GO_AUTO_INCLUDE_DB_STATEMENT` | Sets whether to include SQL queries in the trace data. | |
| `OTEL_GO_AUTO_INCLUDE_DB_OPERATION` | Sets whether to include SQL operation in the trace data. | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `OTEL_GO_AUTO_INCLUDE_DB_OPERATION` | Sets whether to include SQL operation in the trace data. | |
| `OTEL_GO_AUTO_PARSE_DB_STATEMENT` | Sets whether to parse the SQL statement for trace data. | |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not valid to have OTEL_GO_AUTO_INCLUDE_DB_OPERATION without OTEL_GO_AUTO_INCLUDE_DB_STATEMENT, right? If so we should probably point that out.
Also, should we state the exact attributes we'll add? i.e db.collection.name db.operation.name etc`

@@ -1,6 +1,6 @@
module go.opentelemetry.io/auto

go 1.22.7
go 1.23.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this might be problematic. This would violate our compatibility policy: https://github.com/open-telemetry/opentelemetry-go-instrumentation?tab=readme-ov-file#compatibility

Copy link
Contributor

@RonFed RonFed Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. this looks quite aggressive.
For reference, when I wanted to update the dependency of this repo I had to bump the go.mod to 1.22.7 which is what we require today (odigos-io/odigos@fb167f8), and we have quite a lot of other dependencies in Odigos and none of them required this relatively new version.

Comment on lines +74 to +75
err := os.Setenv(IncludeDBOperationEnvVar, "true")
assert.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err := os.Setenv(IncludeDBOperationEnvVar, "true")
assert.NoError(t, err)
t.Setenv(IncludeDBOperationEnvVar, "true")

}

logger.Info("queryDb called")
logger.Info(fmt.Sprintf("Query: %s", query))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use the structured elements of the logger to handle this.

Suggested change
logger.Info(fmt.Sprintf("Query: %s", query))
logger.Info("queryDB called", "query", query))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for these changes? can you pleas add it to the PR description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this all came in when I ran make update-licenses (CI license check failed saying it needed updates). I agree it's weird that libbpf was also updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the libbpf script wasn't run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

3 participants