-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test cases for DB query parsing and sanitization #1923
base: main
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,219 @@ | |||
[ | |||
{ | |||
"name": "numeric_literal_integers", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to add a small and informal description of this schema? Also, would it be possible to change it to differentiate inputs/outputs - e.g.
{
"test_name": "numeric_literal_integers",
"input": {
"query": "SELECT 12, -12, +12",
"db.namespace": "test"
},
"expected": {
"span_name": "SELECT",
"attributes": {
"db.system.name": "other_sql",
"db.query.text": "SELECT ?, ?, ?",
"db.query.summary": "SELECT"
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
differentiate inputs/outputs
nice 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I like this idea. Few things to consider though
db.system.name
in this context is treated as an input rather than an output. Originally, I called this fielddialects
. The idea is that many test cases are applicable to multiple dialects. So, it would have a structure like following:
{
"test_name": "numeric_literal_integers",
"input": {
"query": "SELECT 12, -12, +12",
"dialects": [
"mssql",
"postgres",
"mysql"
]
},
"expected": {
"span_name": "SELECT",
"attributes": {
"db.system.name": "whatever the input dialect was",
"db.query.text": "SELECT ?, ?, ?",
"db.query.summary": "SELECT"
}
}
When @trask and I discussed this, we toyed with this idea of using other_sql
to indicate that a test case was applicable to all dialects simplifying things so the array was unnecessary.
- What your idea regarding adding
db.namespace
as an input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having an array of dialects sounds like a good future possibility, I like the idea of using other_sql to keep things simple now.
What your idea regarding adding db.namespace as an input?
I was thinking about edge cases when you'd run a query with database a target (e.g. list tables), then the span name would include the namespace. It can wait until we have a test-case for it.
"db.query.text": [ | ||
"CREATE TABLE MyTable (\n ID NOT NULL IDENTITY(?,?) PRIMARY KEY\n)" | ||
], | ||
"db.query.summary": "CREATE TABLE MyTable" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while preserving case has benefits (some people prefer lower vs upper, and some systems might be case sensitive), do you think there's any downside to normalizing spaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, forgot we discussed this. I normalized the whitespace in db.query.summary
but left db.query.text
alone. WDYT?
Towards #984
Opening this PR to start a discussion for how we want to format test cases for database query parsing and sanitization.
Each test case includes the following fields
name
identifying the test casequery
statement representing the inputdb.system.name
indicates which dialect that the test case targets. A value ofother_sql
indicates it is applicable to all dialects.db.query.text
values.IN
clauses MAY be collapsed. So, bothIN (?)
andIN (?,?,?)
are valid.db.query.summary
represents whatdb.query.summary
should equalFor this first iteration I would like to settle on the format for presenting these test cases. In follow ups we can expand the test cases and also handle additional idiosyncrasies of other dialects.