Skip to content

Kawai-Senpai/Pixabay-Fusion-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pixabay Scraper 📷 📹

🎉 Welcome to Pixabay Scraper! A versatile and user-friendly tool designed to effortlessly download high-quality photos and videos from Pixabay. 🎉


👀 Overview

This program provides a menu-driven interface, allowing you to choose between scraping photos or videos from Pixabay. Leveraging the Pixabay API and Selenium for browser automation, it ensures high-quality media downloads with ease.


✨ Features

  • ⌨️ Menu-Driven Interface: Select between photos and videos at the start of the program.
  • ⬇️ High-Quality Downloads:
    • For videos, the scraper intelligently selects the best available resolution, excluding low-quality "tiny" options.
    • For photos, it targets largeImageURL.
  • ♻️ Progress Handling: Automatically saves progress, allowing you to resume scraping sessions without losing your place.
  • 📜 Detailed Logging: Utilizes a custom logger to provide comprehensive debug and activity logs, ensuring transparency and easy troubleshooting.
  • 🔧 Highly Configurable: Easily adjust settings via a config.json file and environment variables, tailoring the scraper to your specific needs.

⚙️ Prerequisites

Before you begin, ensure you have the following:

  • 🐍 Python 3.x: Make sure Python 3 or higher is installed.

  • :firefox: Firefox Browser: The scraper relies on Firefox for browser automation.

  • 🌐 GeckoDriver: WebDriver Manager automatically handles GeckoDriver installation.

  • 📦 Required Python Packages:

    • Selenium
    • webdriver-manager
    • requests
    • tqdm
    • python-dotenv
    • ultraprint
    • ultraconfiguration

    Install the dependencies using pip:

    pip install -r requirements.txt

    💡 Note: Ensure you have a valid .env file with your Pixabay API key.


🔨 Setup

Follow these steps to set up the Pixabay Scraper:

  1. 📁 Configuration:

    • Create a config.json file to customize the scraper's behavior. This file allows you to specify settings such as:
      • target_downloads: The number of media items to download.
      • firefox_binary: The path to your Firefox binary.
      • download_format: The file format for downloads.
      • download_delay: The delay between each download request.

    Example config.json:

    {
        "target_downloads": 100,
        "firefox_binary": "/usr/bin/firefox",
        "download_format": "application/mp4",
        "download_delay": 2
    }
  2. 🔑 API Key Setup:

    • Obtain a Pixabay API key from the Pixabay website.
    • Create a .env file in the root directory of the project and add your API key:
    API_KEY=your_pixabay_api_key
  3. 📂 Folder Structure:

    • Video downloads will be stored in the data/videos/video_files directory.
    • Photo downloads will be stored in the data/images/photo_files directory.
    • Progress and metadata are stored in corresponding progress.json and metadata.json files within their respective directories.

🚀 Usage

  1. :terminal: Open your terminal and navigate to the project directory.

  2. 🏃 Run the scraper:

    python scrape.py
  3. ☝️ When prompted, select the content type by typing photos or videos and press Enter.

  4. 🔒 For videos, if login is required, follow the on-screen instructions and press Enter once you've logged in.

  5. 🔎 The scraper will automatically scroll through the page, download media, and update its progress.


📋 Logging

The scraper provides detailed logs to the console, allowing you to monitor its progress and troubleshoot any issues. You can customize the logging level in the configuration section.


🔧 Troubleshooting

  • 🚫 Firefox Binary Not Found:
    • Ensure the correct path to your Firefox binary is specified in the config.json file.
  • ⚠️ API Issues:
    • Verify that your API key is correct and that the Pixabay API is reachable.
  • Incomplete Downloads:
    • The scraper saves progress in progress.json, allowing you to resume interrupted sessions.

📄 License

This tool is provided "as is", without any warranty. Use at your own risk.


😃 Happy Scraping! 😃