Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test cases for DB query parsing and sanitization #1923

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

alanwest
Copy link
Member

@alanwest alanwest commented Feb 19, 2025

Towards #984

Opening this PR to start a discussion for how we want to format test cases for database query parsing and sanitization.

Each test case includes the following fields

  • A name identifying the test case
  • A query statement representing the input
  • db.system.name indicates which dialect that the test case targets. A value of other_sql indicates it is applicable to all dialects.
  • An array of valid db.query.text values.
    • For example, the specification states that IN clauses MAY be collapsed. So, both IN (?) and IN (?,?,?) are valid.
  • db.query.summary represents what db.query.summary should equal

For this first iteration I would like to settle on the format for presenting these test cases. In follow ups we can expand the test cases and also handle additional idiosyncrasies of other dialects.

@alanwest alanwest requested review from a team as code owners February 19, 2025 22:09
@alanwest
Copy link
Member Author

@trask @lmolkova I plan to attend the DB semconv meeting tomorrow so we have a chance to discuss this synchronously first.

@@ -0,0 +1,219 @@
[
{
"name": "numeric_literal_integers",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to add a small and informal description of this schema? Also, would it be possible to change it to differentiate inputs/outputs - e.g.

  {
    "test_name": "numeric_literal_integers",
    "input": { 
        "query": "SELECT 12, -12, +12",
        "db.namespace": "test"
     },
     "expected": {
       "span_name": "SELECT",
       "attributes": {
          "db.system.name": "other_sql",
          "db.query.text": "SELECT ?, ?, ?",
          "db.query.summary": "SELECT"
       }
  }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

differentiate inputs/outputs

nice 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I like this idea. Few things to consider though

  1. db.system.name in this context is treated as an input rather than an output. Originally, I called this field dialects. The idea is that many test cases are applicable to multiple dialects. So, it would have a structure like following:
{
    "test_name": "numeric_literal_integers",
    "input": { 
        "query": "SELECT 12, -12, +12",
        "dialects": [
            "mssql",
            "postgres",
            "mysql"
        ]
     },
     "expected": {
       "span_name": "SELECT",
       "attributes": {
          "db.system.name": "whatever the input dialect was",
          "db.query.text": "SELECT ?, ?, ?",
          "db.query.summary": "SELECT"
       }
  }

When @trask and I discussed this, we toyed with this idea of using other_sql to indicate that a test case was applicable to all dialects simplifying things so the array was unnecessary.

  1. What your idea regarding adding db.namespace as an input?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having an array of dialects sounds like a good future possibility, I like the idea of using other_sql to keep things simple now.

What your idea regarding adding db.namespace as an input?

I was thinking about edge cases when you'd run a query with database a target (e.g. list tables), then the span name would include the namespace. It can wait until we have a test-case for it.

"db.query.text": [
"CREATE TABLE MyTable (\n ID NOT NULL IDENTITY(?,?) PRIMARY KEY\n)"
],
"db.query.summary": "CREATE TABLE MyTable"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while preserving case has benefits (some people prefer lower vs upper, and some systems might be case sensitive), do you think there's any downside to normalizing spaces?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, forgot we discussed this. I normalized the whitespace in db.query.summary but left db.query.text alone. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Awaiting SIG approval
Development

Successfully merging this pull request may close these issues.

3 participants