Add test cases for DB query parsing and sanitization #1923

alanwest · 2025-02-19T22:09:40Z

Towards #984

Opening this PR to start a discussion for how we want to format test cases for database query parsing and sanitization.

Each test case includes the following fields

A name identifying the test case
A query statement representing the input
db.system.name indicates which dialect that the test case targets. A value of other_sql indicates it is applicable to all dialects.
An array of valid db.query.text values.
- For example, the specification states that IN clauses MAY be collapsed. So, both IN (?) and IN (?,?,?) are valid.
db.query.summary represents what db.query.summary should equal

For this first iteration I would like to settle on the format for presenting these test cases. In follow ups we can expand the test cases and also handle additional idiosyncrasies of other dialects.

alanwest · 2025-02-20T19:43:02Z

@trask @lmolkova I plan to attend the DB semconv meeting tomorrow so we have a chance to discuss this synchronously first.

docs/non-normative/db-test-cases.json

docs/non-normative/db-sql-test-cases.json

lmolkova · 2025-03-10T02:32:38Z

docs/non-normative/db-sql-test-cases.json

@@ -0,0 +1,219 @@
+[
+    {
+        "name": "numeric_literal_integers",


would it be possible to add a small and informal description of this schema? Also, would it be possible to change it to differentiate inputs/outputs - e.g.

{ "test_name": "numeric_literal_integers", "input": { "query": "SELECT 12, -12, +12", "db.namespace": "test" }, "expected": { "span_name": "SELECT", "attributes": { "db.system.name": "other_sql", "db.query.text": "SELECT ?, ?, ?", "db.query.summary": "SELECT" } }

differentiate inputs/outputs

nice 👍

Yes I like this idea. Few things to consider though

db.system.name in this context is treated as an input rather than an output. Originally, I called this field dialects. The idea is that many test cases are applicable to multiple dialects. So, it would have a structure like following:

{ "test_name": "numeric_literal_integers", "input": { "query": "SELECT 12, -12, +12", "dialects": [ "mssql", "postgres", "mysql" ] }, "expected": { "span_name": "SELECT", "attributes": { "db.system.name": "whatever the input dialect was", "db.query.text": "SELECT ?, ?, ?", "db.query.summary": "SELECT" } }

When @trask and I discussed this, we toyed with this idea of using other_sql to indicate that a test case was applicable to all dialects simplifying things so the array was unnecessary.

What your idea regarding adding db.namespace as an input?

having an array of dialects sounds like a good future possibility, I like the idea of using other_sql to keep things simple now.

What your idea regarding adding db.namespace as an input?

I was thinking about edge cases when you'd run a query with database a target (e.g. list tables), then the span name would include the namespace. It can wait until we have a test-case for it.

docs/non-normative/db-sql-test-cases.json

trask · 2025-03-12T21:13:27Z

docs/non-normative/database-test-cases/db-sql-test-cases.json

+            "db.query.text": [
+                "CREATE  TABLE MyTable (\n    ID NOT NULL IDENTITY(?,?) PRIMARY KEY\n)"
+            ],
+            "db.query.summary": "CREATE  TABLE MyTable"


while preserving case has benefits (some people prefer lower vs upper, and some systems might be case sensitive), do you think there's any downside to normalizing spaces?

Right, forgot we discussed this. I normalized the whitespace in db.query.summary but left db.query.text alone. WDYT?

Add test cases for DB query parsing and sanitization

32a8f60

alanwest requested review from a team as code owners February 19, 2025 22:09

trask reviewed Feb 21, 2025

View reviewed changes

docs/non-normative/db-test-cases.json Outdated Show resolved Hide resolved

docs/non-normative/db-test-cases.json Outdated Show resolved Hide resolved

trask reviewed Feb 21, 2025

View reviewed changes

docs/non-normative/db-test-cases.json Outdated Show resolved Hide resolved

trask reviewed Feb 21, 2025

View reviewed changes

docs/non-normative/db-test-cases.json Outdated Show resolved Hide resolved

alanwest added 5 commits March 4, 2025 10:57

Remove collection and operation

caf7bf1

Rename file

9b00811

Dialects -> db.system.name

9505a76

summary -> db.query.summary

3a46775

sanitized -> db.query.text

02037fe

lmolkova reviewed Mar 10, 2025

View reviewed changes

alanwest added 3 commits March 10, 2025 15:09

Move to subfolder

a1ae043

Fix test case

673324f

input/expected

70469be

trask reviewed Mar 12, 2025

View reviewed changes

alanwest added 2 commits March 19, 2025 09:58

Merge remote-tracking branch 'upstream/main' into alanwest/db-test-cases

5d7a811

Normalize whitespace in db.query.summary

6674511

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test cases for DB query parsing and sanitization #1923

Add test cases for DB query parsing and sanitization #1923

alanwest commented Feb 19, 2025 •

edited

Loading

alanwest commented Feb 20, 2025

lmolkova Mar 10, 2025

trask Mar 10, 2025

alanwest Mar 10, 2025

lmolkova Mar 11, 2025

trask Mar 12, 2025

alanwest Mar 19, 2025

Add test cases for DB query parsing and sanitization #1923

Are you sure you want to change the base?

Add test cases for DB query parsing and sanitization #1923

Conversation

alanwest commented Feb 19, 2025 • edited Loading

alanwest commented Feb 20, 2025

lmolkova Mar 10, 2025

Choose a reason for hiding this comment

trask Mar 10, 2025

Choose a reason for hiding this comment

alanwest Mar 10, 2025

Choose a reason for hiding this comment

lmolkova Mar 11, 2025

Choose a reason for hiding this comment

trask Mar 12, 2025

Choose a reason for hiding this comment

alanwest Mar 19, 2025

Choose a reason for hiding this comment

alanwest commented Feb 19, 2025 •

edited

Loading