Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: All Rules are "Top-Level" Functions #484

Open
rneswold opened this issue Feb 3, 2025 · 11 comments
Open

Feature Request: All Rules are "Top-Level" Functions #484

rneswold opened this issue Feb 3, 2025 · 11 comments

Comments

@rneswold
Copy link
Contributor

rneswold commented Feb 3, 2025

This is a great project and I've used it in a handful of Rust applications. Thanks!

There is one feature from the OCaml tools (ocamllex and ocamlyacc) have that I found convenient. In the code that ocamlyacc generates, all the grammar rules create functions you can call (if you need to parse a subset of the grammar.)

For instance, I'm working on a project that uses a string to describe data acquisition. There's a "name" field, optional range specification, option field names, and a field for the event on which to sample. All of that is straightforward to implement. However, we also have an API which would like just the event portion and another API that just wants the device name.

We could have a struct with a bunch of Option fields, but I'd rather be able to have Event as a parameter to a function.

If this feature would be too disruptive, I wonder how others would solve this with the current tools.

@ltratt
Copy link
Member

ltratt commented Feb 3, 2025

Right now, grmtools doesn't generate functions in this way. I guess it's possible to do so and to call "into the middle of" the LR statetable. I haven't thought about that before and it might need a bit of thought about exactly what it means.

In the interim, I think there is a (horrible) hack one can do: you can duplicate the grammar (including in a build.rs file), change the %start line and output to a different file(s).

@ratmice
Copy link
Collaborator

ratmice commented Feb 3, 2025

Comes to mind that with the horrible hack, is seems undoubtedly likely to produce the 'unused rule' and 'unused token' warnings/errors. You'll likely need to set at least warnings_are_errors and more than likely want to set show_warnings to false entirely.

@ltratt
Copy link
Member

ltratt commented Feb 3, 2025

@ratmice Definitely! It would be good to do something nicer here, though an interesting question is what "unused" means if you have multiple start rules.

@rneswold
Copy link
Contributor Author

rneswold commented Feb 3, 2025

What if I created a grammar for each portion and then a grammar that called the subsets? I'd have to know where the previous parsing ended to feed the next one. I'd be nicer to have it all in one module, but this might be doable...

@rneswold
Copy link
Contributor Author

rneswold commented Feb 3, 2025

Comes to mind that with the horrible hack, is seems undoubtedly likely to produce the 'unused rule' and 'unused token' warnings/errors.

I would delete the unused rules in the "horrible hack". However, I was hoping to use the same lex file, so the "unused token" warnings would be a problem.

@ratmice
Copy link
Collaborator

ratmice commented Feb 3, 2025

@ratmice Definitely! It would be good to do something nicer here, though an interesting question is what "unused" means if you have multiple start rules.

Interesting question, my inclination would be to define unused as unreachable from any start rule.

I'm assuming this is considering some sort of feature that lifts the (current) restriction that there is a single start rule,
and making some sort of parser entry point for each start rule?

So for the purposes of checking unused rules the start rule would be treated as though each start rule were a production of an implicit rule, as in the following.

%start start1 start2
^: start1 | start2

I don't think it would be hard to to modify the unused_symbols function that does these checks to work in that way at least.

@rneswold
Copy link
Contributor Author

rneswold commented Feb 3, 2025

So for the purposes of checking unused rules the start rule would be treated as though each start rule were a production of an implicit rule, as in the following.

%start start1 start2
^: start1 | start2

I don't think it would be hard to to modify the unused_symbols function that does these checks to work in that way at least.

This is nice because I really don't need every rule to be top-level. Out of the 1/2 dozen sections of my string, I really only need 3 field parsers. But it might be too complicated to have some rules be top-level callable and others purely internal.

Also, each of these start targets is probably a different type (in my use, that's definitely the case.)

@ratmice
Copy link
Collaborator

ratmice commented Feb 3, 2025

Also, each of these start targets is probably a different type (in my use, that's definitely the case.)

Ahh, yeah there are definitely complexities with this multiple start rules idea, that is one I hadn't considered.
Another is that IIRC the start rule is given index 0 by default, rather than something like rules.len() which might be a more expandable location. But at least I think it seems like a reasonable interpretation of unused.

IIRC anyways, though I couldn't remember exactly where to look off-hand to verify this 0 index rule in the moment.

@ajuvercr
Copy link

ajuvercr commented Feb 6, 2025

Is there a way to import .y files in other .y files?
By 'inversing' the dependencies, it is probably prettier then copying the file many times.

Nvm I saw #110

@ltratt
Copy link
Member

ltratt commented Feb 6, 2025

Composition in a general sense is a very hard problem. This issue (relative to my memory of #110) is more limited: it's asking to subset an existing grammar. My intuition is that subsets are always OK -- we just don't happen to support taking advantage of that right now.

For example, if we generated Rust code directly, specifically one Rust function per rule, I suspect this would fall out of the hat. Doing so isn't rocket science (I think it's what e.g. LALRPOP does), but someone has to put in the hard yards.

@rneswold
Copy link
Contributor Author

rneswold commented Feb 6, 2025

I think what I'll do is use an enumeration. Something like:

enum DAQSpec {
    FullSpec { device_name: String,
               field: String,
               range: Option<Range<usize>>,
               event: Event },
    DeviceSpec(String),
    EventSpec(Event)
}

The grammar can recognize when a subset is specified and return the smaller-scoped enum values. Then I can make some simple wrapper functions:

fn parse_event(spec: String) -> Option<Event> {
    if let DAQSpec::EventSpec(ev) = parsing the input {
        Some(ev)
    } else {
        None
    }
}

Thanks for the discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants