-
Is this the right way to deal with userinfo backtracking?
I added the userinfo_at rule so I can add a control class failure method for the rule that notices when backtracking happens. See the below code snippet (which I extracted from a working program, but didn't try to compile standalone). There is a lot of boilerplate for a relatively simple task :( I am not sure if I am using PEGTL the way it is intended to be used. 🤷♂️
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
Funny enough, it works without changes to the grammar with a control rule on "one<'@'>", but that is scary :) |
Beta Was this translation helpful? Give feedback.
-
First of all: Thank you for the feedback. There is probably a lot that can be said about how to handle backtracking, but in the end there is no one right way, no one-size-fits-all solution. The problem is that, depending on the grammar, you might have "easy" hacks available to deal with it or not. I'll therefore focus on your specific problem, which is parsing URIs. Since the parser can not know in advance whether the #include <tao/pegtl.hpp>
#include <tao/pegtl/contrib/uri.hpp>
#include <iostream>
namespace pegtl = tao::TAOCPP_PEGTL_NAMESPACE;
struct URI
{
std::string scheme;
std::string authority;
std::string userinfo;
std::string host;
std::string port;
std::string path;
std::string query;
std::string fragment;
explicit URI( const std::string& uri );
};
namespace uri
{
template< std::string URI::*Field >
struct bind
{
template< typename Input >
static void apply( const Input& in, URI& uri )
{
uri.*Field = in.string();
}
};
// clang-format off
template< typename Rule > struct action : tao::pegtl::nothing< Rule > {};
template<> struct action< pegtl::uri::scheme > : bind< &URI::scheme > {};
template<> struct action< pegtl::uri::authority > : bind< &URI::authority > {};
// userinfo: see below
template<> struct action< pegtl::uri::host > : bind< &URI::host > {};
template<> struct action< pegtl::uri::port > : bind< &URI::port > {};
template<> struct action< pegtl::uri::path_noscheme > : bind< &URI::path > {};
template<> struct action< pegtl::uri::path_rootless > : bind< &URI::path > {};
template<> struct action< pegtl::uri::path_absolute > : bind< &URI::path > {};
template<> struct action< pegtl::uri::path_abempty > : bind< &URI::path > {};
template<> struct action< pegtl::uri::query > : bind< &URI::query > {};
template<> struct action< pegtl::uri::fragment > : bind< &URI::fragment > {};
// clang-format on
template<>
struct action< pegtl::uri::opt_userinfo >
{
template< typename Input >
static void apply( const Input& in, URI& uri )
{
if( !in.empty() ) {
std::cout << "USER: " << in.string() << "\n";
uri.userinfo = std::string( in.begin(), in.size() - 1 );
}
}
};
struct grammar : pegtl::must< pegtl::uri::URI >
{
};
}
URI::URI( const std::string& uri )
{
pegtl::memory_input<> input( uri.data(), uri.size(), "uri" );
pegtl::parse< uri::grammar, uri::action >( input, *this );
}
int main( int argc, char** argv )
{
for( int i = 1; i < argc; ++i ) {
std::cout << "Parsing " << argv[ i ] << std::endl;
const URI uri( argv[ i ] );
std::cout << "URI.scheme: " << uri.scheme << std::endl;
std::cout << "URI.authority: " << uri.authority << std::endl;
std::cout << "URI.userinfo: " << uri.userinfo << std::endl;
std::cout << "URI.host: " << uri.host << std::endl;
std::cout << "URI.port: " << uri.port << std::endl;
std::cout << "URI.path: " << uri.path << std::endl;
std::cout << "URI.query: " << uri.query << std::endl;
std::cout << "URI.fragment: " << uri.fragment << std::endl;
}
return 0;
} Does this work for you? |
Beta Was this translation helpful? Give feedback.
-
Clearly I have a lot to learn yet about pegtl. From an aesthetic point of view, the -1 feels a bit hackish, but I do know how to avoid that (with one more extra state and action), and I learned a new trick (bind) to avoid a lot of boilerplate, so I'm good for now. Thanks a lot! |
Beta Was this translation helpful? Give feedback.
-
Related to #46 |
Beta Was this translation helpful? Give feedback.
First of all: Thank you for the feedback.
There is probably a lot that can be said about how to handle backtracking, but in the end there is no one right way, no one-size-fits-all solution. The problem is that, depending on the grammar, you might have "easy" hacks available to deal with it or not. I'll therefore focus on your specific problem, which is parsing URIs. Since the parser can not know in advance whether the
userinfo
that just matched will be followed by a@
, I'd suggest to simply defer the decision. I've modified the grammar forcontrib/uri.hpp
, you can use it without the need for a control class like this: