Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce per-scraper-version, per-website parsing success statistics #21

Closed
jayaddison opened this issue Jul 23, 2020 · 1 comment
Closed
Labels
enhancement New feature or request

Comments

@jayaddison
Copy link
Member

Is your feature request related to a problem? Please describe.
The crawler and underlying recipe-scrapers codebases will evolve over time, and as with any software project, bugs may be introduced or fixed over time.

In addition, the content of recipe websites may change over time too, as websites decide to reformat their contents or rebrand their page look-and-feel.

It would be useful to continuously track the performance of scraper versions against real recipe website content.

Describe the solution you'd like
It should be possible to record historical statistics regarding the success/failure rate of recipe content crawling.

It should be possible to break this down by crawler version, by recipe-scrapers version, by website and also by time interval.

This data should be exposed via the diagnostics service and made available via the corresponding diagnostics component of the frontend application.

Describe alternatives you've considered
Real-time alerting on crawler failures (per-domain and overall) could also be beneficial, but is a slightly different use case and can be considered separately.

@jayaddison
Copy link
Member Author

Resolving in favour of openculinary/tardir#1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

1 participant