A flexible web scraper with a user-friendly GUI, designed to fetch data based on user-provided CSS selectors from any website.
This web scraper is developed in Python, leveraging the Scrapy framework for data extraction and tkinter for the GUI.
- scrapy
- pandas
- urllib
- tkinter
- openpyxl
You can install the required modules using: pip install scrapy pandas openpyxl
- User-friendly graphical interface.
- Customizable scraping using CSS selectors.
- Supports scraping various data types based on user input.
- Outputs data in
.txt
or.xlsx
format. - Automatically names output files based on the domain being scraped.
- Clone this repository.
- Navigate to the directory containing the scraper script.
- Install the required Python modules.
- Run the script to open the GUI.
- Enter the target website's URL.
- Provide a descriptive name for the data you're extracting.
- Specify the CSS selector for the data you wish to extract.
- Choose the desired output format (either
.txt
or.xlsx
). - Click "Start Scraping" and wait for the process to finish.
- Check the script's directory for the output file.
Feel free to fork this project and enhance it. Pull requests are welcome. For major changes, please open an issue first to discuss what you'd like to change.
Thanks to the Scrapy and tkinter developers for their excellent tools.