Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/expr parser #21

Merged
merged 5 commits into from
May 8, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
377 changes: 357 additions & 20 deletions Cargo.lock

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ rust-version = "1.60"

[workspace.dependencies]
substreams-macro = { version = "0.5.13", path = "./substreams-macro" }
pest= "2.7.10"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pest= "2.7.10"
pest = "2.7.10"

pest_derive = "2.7.10"
rstest = "0.19.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just for testing, if yes, should go in dev-dependencies bucket


[profile.release]
lto = true
Expand Down
3 changes: 3 additions & 0 deletions substreams/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ prost = "0.11"
prost-types = "0.11"
substreams-macro = { workspace = true }
thiserror = "1"
pest= "2.7.10"
pest_derive = "2.7.10"
rstest = "0.19.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull from the workspace dependencies set

Suggested change
pest= "2.7.10"
pest_derive = "2.7.10"
rstest = "0.19.0"
pest = { workspace = true }
pest_derive = { workspace = true }
rstest = { workspace = true }


[build-dependencies]
prost-build = "0.11"
1 change: 1 addition & 0 deletions substreams/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ mod state;

pub mod key;
pub mod store;
pub mod parser;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that it's called parser, parser doesn't say anything about what is being parsed.

Second, let's not publicly export the module, remove pub. We should only publicly expose the minimum we need for the library to be useful.


mod operation;

Expand Down
53 changes: 53 additions & 0 deletions substreams/src/parser.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
use pest::{iterators::Pair, Parser};
use pest_derive::Parser;

#[derive(Parser)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check if it's possible to generate the code and commit it? It would be preferable leading to faster compiling for users since less time running the derive macro.

#[grammar = "parser_rule.pest"]
pub struct EParser;

pub fn evaluate_expression(keys: Vec<String>, input: &str) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to receive a owned copy of a Vec<String>. We should accept variant of String like &str too. I would change to

Suggested change
pub fn evaluate_expression(keys: Vec<String>, input: &str) -> bool {
pub fn evaluate_expression<K: AsRef<str>, I: AsRef<str>>(keys: &[K], input: I) -> bool {

And this can be used now with different variations of String, &String and &str:

    evaluate_expression(&vec!["a", "b", "c"], "d");
    evaluate_expression(
        &vec!["a".to_string(), "a".to_string(), "a".to_string()],
        "d",
    );

    let keys = vec!["a".to_string(), "a".to_string(), "a".to_string()];
    let keys_ref = keys.iter().map(|x| x).collect::<Vec<&String>>();
    evaluate_expression(&keys_ref, "d");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also find a more meaningful name for this function, it will be the only one exposed IMO so we better find it a very good clear name.

let successful_parse = EParser::parse(Rule::expression, input).unwrap();
return evaluate_rule(successful_parse.clone().into_iter().next().unwrap(), keys);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid all .clone() in your code, there is a good chance that we can remove all of them.

}

fn evaluate_rule(pair: Pair<Rule>, keys: Vec<String>) -> bool {
match pair.as_rule() {
Rule::expression => {
let inner_pair = pair.into_inner().next().unwrap();
return evaluate_rule(inner_pair, keys);
}
Rule::or => {
let mut result = false;
for inner_pair in pair.into_inner() {
result = result || evaluate_rule(inner_pair, keys.clone());
}
return result;
},
Rule::and => {
let mut result = true;
for inner_pair in pair.into_inner() {
result = result && evaluate_rule(inner_pair, keys.clone());
}
return result;
},
Rule::value => {
let inner_pair = pair.into_inner().next().unwrap();
return evaluate_rule(inner_pair, keys);
}
Rule::keyterm => {
return keys.contains(&pair.as_str().to_string());
}
Rule::singleQuoteKeyTerm => {
return keys.contains(&pair.as_str().to_string().replace("'", ""));
}
Rule::doubleQuoteKeyTerm => {
return keys.contains(&pair.as_str().to_string().replace("\"", ""));
}
_ => {panic!("Unexpected rule encountered")}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to remove all the unwrap here and the panic. Transform the method to return a Result<bool, anyhow::Error>, all error code path should have a .context attached to the error so we now where it came from.






12 changes: 12 additions & 0 deletions substreams/src/parser_rule.pest
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
expression = { or }
or = { (and ~ (space? ~ ("||") ~ space? ~ and )* ) }
and = { (value ~ ((space ~ value) | (space? ~ ("&&") ~ space? ~ value ~ space?))* ) }
value = { (space? ~ singleQuoteKeyTerm ~ space?) | (space? ~ doubleQuoteKeyTerm ~ space?) | keyterm | (space? ~ "(" ~ space? ~ expression ~ space? ~ ")" ~ space?) }
keyterm = { (letters | digits | symbols)+ }
singleQuoteKeyTerm = { ("'") ~ (!"'" ~ ANY)+ ~ ("'")}
doubleQuoteKeyTerm = { ("\"") ~ (!"\"" ~ ANY)+ ~ ("\"")}

digits = _{ '0' .. '9' }
letters = _{ 'a' .. 'z' | 'A' .. 'Z' }
symbols = _{ "_" | ":" }
space = _{ (" " | "\t" | "\n" )+ }
12 changes: 12 additions & 0 deletions substreams/src/pb/mod.rs
Original file line number Diff line number Diff line change
@@ -1,2 +1,14 @@
pub mod sf {
pub mod substreams {
pub mod index {
// @@protoc_insertion_point(attribute:sf.substreams.index.v1)
pub mod v1 {
include!("sf.substreams.index.v1.rs");
// @@protoc_insertion_point(sf.substreams.index.v1)
}
}
}
}

#[path = "./sf.substreams.v1.rs"]
pub mod substreams;
8 changes: 8 additions & 0 deletions substreams/src/pb/sf.substreams.index.v1.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
// @generated
#[allow(clippy::derive_partial_eq_without_eq)]
#[derive(Clone, PartialEq, ::prost::Message)]
pub struct Keys {
#[prost(string, repeated, tag="1")]
pub keys: ::prost::alloc::vec::Vec<::prost::alloc::string::String>,
}
// @@protoc_insertion_point(module)
50 changes: 50 additions & 0 deletions substreams/tests/parser_test.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
use rstest::rstest;
use substreams::parser::{evaluate_expression};

static TEST_KEYS: &[&str] = &["test", "test1", "test2", "test3", "test4", "test5", "test 6"];

#[rstest]
#[case(TEST_KEYS, "test", true)]
#[case(TEST_KEYS, "'test'", true)]
#[case(TEST_KEYS, "'test 6' || test7", true)]
#[case(TEST_KEYS, "'test_6' && test3", false)]
#[case(TEST_KEYS, "\"test 6\" || test7", true)]
#[case(TEST_KEYS, "\"test 6\" && test3", true)]

#[case(TEST_KEYS, "test1 || test", true)]
#[case(TEST_KEYS, "test1 || test6", true)]
#[case(TEST_KEYS, "test6 || test7", false)]

#[case(TEST_KEYS, "test1 || test || test2", true)]
#[case(TEST_KEYS, "test1 || test6 || test7", true)]
#[case(TEST_KEYS, "test6 || test7 || test8", false)]

#[case(TEST_KEYS, "test1 && test", true)]
#[case(TEST_KEYS, "test1 && test6", false)]
#[case(TEST_KEYS, "test6 && test7", false)]

#[case(TEST_KEYS, "test1 && test && test2", true)]
#[case(TEST_KEYS, "test1 && test2 && test7", false)]
#[case(TEST_KEYS, "test6 && test7 && test8", false)]

#[case(TEST_KEYS, "test1 test", true)]
#[case(TEST_KEYS, "test1 test6", false)]
#[case(TEST_KEYS, "test6 test7", false)]

#[case(TEST_KEYS, "(test1)", true)]
#[case(TEST_KEYS, "(test1 test6)", false)]

#[case(TEST_KEYS, "test1 test2 ", true)]
#[case(TEST_KEYS, "test1 && test2 ", true)]
#[case(TEST_KEYS, "test1 && test6", false)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you validate that this is parsed correctly? It parses ok, but the resulting expression should be checked.

In the Go version, I created a string representation of the tree structure, so it can be verified too.

#[case(TEST_KEYS, "(test1 || test3) && test6 ", false)]
#[case(TEST_KEYS, "(test1 || test6 || test7 ) && (test4 || test5) && test3 ", true)]

#[case(TEST_KEYS, "(test1 || test6 || test7) && (test4 || test5) && test3 ", true)]
#[case(TEST_KEYS, "(test1 && test6 && test7) || (test4 && test5) || test3 ", true)]

fn test_parse(#[case] keys: &[&str], #[case] input: &str, #[case] expected: bool) {
let keys: Vec<String> = keys.iter().map(|s| s.to_string()).collect();
assert_eq!(evaluate_expression(keys, input), expected);
}

Loading