🎉 Welcome to Pixabay Scraper! A versatile and user-friendly tool designed to effortlessly download high-quality photos and videos from Pixabay. 🎉
This program provides a menu-driven interface, allowing you to choose between scraping photos or videos from Pixabay. Leveraging the Pixabay API and Selenium for browser automation, it ensures high-quality media downloads with ease.
- ⌨️ Menu-Driven Interface: Select between photos and videos at the start of the program.
- ⬇️ High-Quality Downloads:
- For videos, the scraper intelligently selects the best available resolution, excluding low-quality "tiny" options.
- For photos, it targets
largeImageURL
.
- ♻️ Progress Handling: Automatically saves progress, allowing you to resume scraping sessions without losing your place.
- 📜 Detailed Logging: Utilizes a custom logger to provide comprehensive debug and activity logs, ensuring transparency and easy troubleshooting.
- 🔧 Highly Configurable: Easily adjust settings via a
config.json
file and environment variables, tailoring the scraper to your specific needs.
Before you begin, ensure you have the following:
-
🐍 Python 3.x: Make sure Python 3 or higher is installed.
-
:firefox: Firefox Browser: The scraper relies on Firefox for browser automation.
-
🌐 GeckoDriver: WebDriver Manager automatically handles GeckoDriver installation.
-
📦 Required Python Packages:
- Selenium
- webdriver-manager
- requests
- tqdm
- python-dotenv
- ultraprint
- ultraconfiguration
Install the dependencies using pip:
pip install -r requirements.txt
💡 Note: Ensure you have a valid
.env
file with your Pixabay API key.
Follow these steps to set up the Pixabay Scraper:
-
📁 Configuration:
- Create a
config.json
file to customize the scraper's behavior. This file allows you to specify settings such as:target_downloads
: The number of media items to download.firefox_binary
: The path to your Firefox binary.download_format
: The file format for downloads.download_delay
: The delay between each download request.
Example
config.json
:{ "target_downloads": 100, "firefox_binary": "/usr/bin/firefox", "download_format": "application/mp4", "download_delay": 2 }
- Create a
-
🔑 API Key Setup:
- Obtain a Pixabay API key from the Pixabay website.
- Create a
.env
file in the root directory of the project and add your API key:
API_KEY=your_pixabay_api_key
-
📂 Folder Structure:
- Video downloads will be stored in the
data/videos/video_files
directory. - Photo downloads will be stored in the
data/images/photo_files
directory. - Progress and metadata are stored in corresponding
progress.json
andmetadata.json
files within their respective directories.
- Video downloads will be stored in the
-
:terminal: Open your terminal and navigate to the project directory.
-
🏃 Run the scraper:
python scrape.py
-
☝️ When prompted, select the content type by typing photos or videos and press Enter.
-
🔒 For videos, if login is required, follow the on-screen instructions and press Enter once you've logged in.
-
🔎 The scraper will automatically scroll through the page, download media, and update its progress.
The scraper provides detailed logs to the console, allowing you to monitor its progress and troubleshoot any issues. You can customize the logging level in the configuration section.
- 🚫 Firefox Binary Not Found:
- Ensure the correct path to your Firefox binary is specified in the
config.json
file.
- Ensure the correct path to your Firefox binary is specified in the
⚠️ API Issues:- Verify that your API key is correct and that the Pixabay API is reachable.
- ⌛ Incomplete Downloads:
- The scraper saves progress in
progress.json
, allowing you to resume interrupted sessions.
- The scraper saves progress in
This tool is provided "as is", without any warranty. Use at your own risk.
😃 Happy Scraping! 😃