New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

URI Parser design change proposal for performance and stability #3193

Draft

habbes wants to merge 5 commits into OData:main from habbes:uri-parser-rewrite-proposal

+274 −1

Contributor

habbes commented Mar 4, 2025

Issues

This pull request fixes #xxx.

Description

Briefly describe the changes of this pull request.

Checklist (Uncheck if it is not completed)

Test cases added
Build and test with one-click build and test script passed

Additional work necessary

If documentation update is needed, please add "Docs Needed" label to the issue and provide details about the required document change in the issue.

habbes added 5 commits

March 3, 2025 22:35


          Start URI parser rewrite proposal

3ff28c5


          Add notes on allocation-free parser

ea05d53


          AST generation

3b836a8


          Update title

16a5f7b


          Semantic binding

eb47505

habbes changed the title ~~URI Parser rewrite for performance and stability~~ URI Parser design change proposal for performance and stability

corranrogue9 reviewed

View reviewed changes

docs/uri-parser-rewriter-proposal.md

+                  public ExpressionNodeKind Kind => _parser[_index].Kind;
+                  public int GetIntValue()
+                  {
+                      return int.TryParse(_parser[_index].Range.GetValue(_parser.Source));

Contributor

corranrogue9 Mar 4, 2025

I'm assuming this is meant to be _parser._nodes[_index]?

Contributor Author

habbes Mar 13, 2025

Yes, but the parser could have an indexer as well that forwards to _parser._nodes.

corranrogue9 reviewed

View reviewed changes

docs/uri-parser-rewriter-proposal.md

+              ```csharp
+              interface IQueryNodeHandler<T>
+              {
+                  public T HandleIntNode(QueryNode intLiteralNode);

Contributor

corranrogue9 Mar 4, 2025

Is there a huge perf overhead for these methods to take in more specific types? Like, it would be great if we could make this IntQueryNode instead

Contributor

corranrogue9 Mar 4, 2025

Also, we have to be very careful about this because we will need to constantly add support for async, unsafe, methods that require parameterized context, allows ref struct, etc.

Contributor Author

habbes Mar 13, 2025

Based on your suggestions, I will investigate the cost of adding more specific types on top QueryNode for the different methods, e.g. Handle(IntQueryNode)

Where IntQueryNode could be defined as

struct IntQueryNode
{
     private QueryNode _node;
     public int GetIntValue() => _node.GetIntValue();
}

Maybe it could be optimized to a ref struct itself or it could reference the expression parser directly if such optimizations prove meaningful and practical.

corranrogue9 reviewed

View reviewed changes

docs/uri-parser-rewriter-proposal.md Show resolved Hide resolved

corranrogue9 reviewed

View reviewed changes

docs/uri-parser-rewriter-proposal.md

+              The key changes proposed:
+              - Avoid premature string allocations. Reference segments in the source string using `ReadOnlySpan<char>` until the user explicitly asks for a `string`.
+              - Store parsed nodes in array of structs instead of graph of objects.

Contributor

corranrogue9 Mar 4, 2025 •

edited

Loading

I want to be clear about this aspect and the terminology that we are using. By not having a tree, this is more akin to the readers that we have. This is not really "parsing" anymore. There's nothing wrong with this, I'm more having an issue with the terminology, and the terminology is important to me because (as you note later in the document), we don't differentiate between the different kinds of notes (in the same way that the current reader doesn't): the caller is expected to understand what kinds of nodes can come next and needs to be aware of this based on the context of the previous nodes.

corranrogue9 reviewed

View reviewed changes

docs/uri-parser-rewriter-proposal.md

+              ```csharp
+              internal struct ExpressionNode
+              {
+                  public ExpressionNodeKind Kind { get; internal set; }

Contributor

corranrogue9 Mar 4, 2025

If we use classes to model the enumeration instead of an enum, we could have a visitor on those classes which would help implement the code that generates the ref struct nodes that could be used in the handler that I suggested below

Contributor Author

habbes Mar 13, 2025 •

edited

Loading

We could compare both and see what the perf overhead is for each. But I assume the enum would have less overhead, it's smaller value, which could make it easier to pack the struct more densely and we don't have to cast.

Another I opted for an enum is that we could pack additional data in the enum. For example we could have a bit flag that tells us whether the item is a binary operator, another to tell us whether the string needs escaping or whether it could be returned as is (in which case we could return a span of the string to the user without allocating), etc. If we have an int enum we could use the lower 16 bits for the value and higher 16 bits for flags (we could also go for a smaller enum type depending on how many node kinds we'll have and whether alignment would make it worthwhile). Such flags could help optimize the common case. But I haven't yet done benchmarks to prove their worth.

corranrogue9 reviewed

View reviewed changes

docs/uri-parser-rewriter-proposal.md

		@@ -0,0 +1,272 @@
		# OData URI Parser design change for higher performance and reliability

Contributor

corranrogue9 Mar 4, 2025

It was brought up in the meeting that we have the whole URI string in memory, so we should leverage that for performance gains instead of trying to overly generalize. Have you considered what it looks like for someone to compose an OData URI? For example, in the OData Client, we compose the URI as the caller tells us what data they are requesting. In that situation, we actually don't have the full URI anywhere in memory. Are we thinking we will have an entirely separate AST model for the client code to leverage?

Contributor Author

habbes Mar 13, 2025 •

edited

Loading

The client can either consume or produce OData URI. When it consumes OData URI, the URI is available in full, in memory. When it produces the URI, for example when translating LINQ expressions to an OData request, it generates the URI on the fly using a StringBuilder, the output of the translation will be string that's in memory, but even if it wasn't, this URI generation is not the role of the parser. The parser as described in this doc (and as implemented by the existing ODataUriParser) takes the URI as input, it does not generate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet