Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web Scraping Library with Proxy Rotation #176

Open
armanist opened this issue Nov 5, 2024 · 1 comment
Open

Web Scraping Library with Proxy Rotation #176

armanist opened this issue Nov 5, 2024 · 1 comment

Comments

@armanist
Copy link
Member

armanist commented Nov 5, 2024

No description provided.

@armanist
Copy link
Member Author

armanist commented Nov 5, 2024

Develop a streamlined web scraper library with key functionality:

Proxy Rotation: Automate proxy usage to avoid rate limits and IP blocks.
Data Extraction: Allow customizable data extraction patterns (e.g., CSS selectors, XPath) for broad compatibility across sites.
HTTP Requests: Use existing HttpClient library for managing requests and responses.
Requirements:

Proxy Rotation:
Design a ProxyManager component that rotates proxies based on predefined rules.
Allow users to set proxies manually or load from a list.
Data Extraction:
Include support for flexible, user-defined selectors.
Allow modular selectors for various page structures and content types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant