-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URI Parser design change proposal for performance and stability #3193
base: main
Are you sure you want to change the base?
Conversation
public ExpressionNodeKind Kind => _parser[_index].Kind; | ||
public int GetIntValue() | ||
{ | ||
return int.TryParse(_parser[_index].Range.GetValue(_parser.Source)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this is meant to be _parser._nodes[_index]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the parser could have an indexer as well that forwards to _parser._nodes
.
```csharp | ||
interface IQueryNodeHandler<T> | ||
{ | ||
public T HandleIntNode(QueryNode intLiteralNode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a huge perf overhead for these methods to take in more specific types? Like, it would be great if we could make this IntQueryNode
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we have to be very careful about this because we will need to constantly add support for async
, unsafe
, methods that require parameterized context, allows ref struct
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on your suggestions, I will investigate the cost of adding more specific types on top QueryNode
for the different methods, e.g. Handle(IntQueryNode)
Where IntQueryNode
could be defined as
struct IntQueryNode
{
private QueryNode _node;
public int GetIntValue() => _node.GetIntValue();
}
Maybe it could be optimized to a ref struct itself or it could reference the expression parser directly if such optimizations prove meaningful and practical.
The key changes proposed: | ||
|
||
- Avoid premature string allocations. Reference segments in the source string using `ReadOnlySpan<char>` until the user explicitly asks for a `string`. | ||
- Store parsed nodes in array of structs instead of graph of objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to be clear about this aspect and the terminology that we are using. By not having a tree, this is more akin to the readers that we have. This is not really "parsing" anymore. There's nothing wrong with this, I'm more having an issue with the terminology, and the terminology is important to me because (as you note later in the document), we don't differentiate between the different kinds of notes (in the same way that the current reader doesn't): the caller is expected to understand what kinds of nodes can come next and needs to be aware of this based on the context of the previous nodes.
```csharp | ||
internal struct ExpressionNode | ||
{ | ||
public ExpressionNodeKind Kind { get; internal set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use class
es to model the enumeration instead of an enum
, we could have a visitor on those classes which would help implement the code that generates the ref struct
nodes that could be used in the handler that I suggested below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could compare both and see what the perf overhead is for each. But I assume the enum would have less overhead, it's smaller value, which could make it easier to pack the struct more densely and we don't have to cast.
Another I opted for an enum is that we could pack additional data in the enum. For example we could have a bit flag that tells us whether the item is a binary operator, another to tell us whether the string needs escaping or whether it could be returned as is (in which case we could return a span of the string to the user without allocating), etc. If we have an int enum we could use the lower 16 bits for the value and higher 16 bits for flags (we could also go for a smaller enum type depending on how many node kinds we'll have and whether alignment would make it worthwhile). Such flags could help optimize the common case. But I haven't yet done benchmarks to prove their worth.
@@ -0,0 +1,272 @@ | |||
# OData URI Parser design change for higher performance and reliability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was brought up in the meeting that we have the whole URI string in memory, so we should leverage that for performance gains instead of trying to overly generalize. Have you considered what it looks like for someone to compose an OData URI? For example, in the OData Client, we compose the URI as the caller tells us what data they are requesting. In that situation, we actually don't have the full URI anywhere in memory. Are we thinking we will have an entirely separate AST model for the client code to leverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The client can either consume or produce OData URI. When it consumes OData URI, the URI is available in full, in memory. When it produces the URI, for example when translating LINQ expressions to an OData request, it generates the URI on the fly using a StringBuilder, the output of the translation will be string that's in memory, but even if it wasn't, this URI generation is not the role of the parser. The parser as described in this doc (and as implemented by the existing ODataUriParser
) takes the URI as input, it does not generate it.
Issues
This pull request fixes #xxx.
Description
Briefly describe the changes of this pull request.
Checklist (Uncheck if it is not completed)
Additional work necessary
If documentation update is needed, please add "Docs Needed" label to the issue and provide details about the required document change in the issue.