x.com (twitter) media scraper

With provided user ID, this program will scrape statuses containing mediafiles and download mediafile resources. Currently only image resources are supported. Uses selenium so no API knowledge required but may break in future if markup changes.

Build

poetry install

Usage

On the initial run, cache login information:

x_media_scraper --cache-directory=cache login

In selenium window log in to website then return to terminal and press Enter.

You now should be able to use scrape command line, for example:

x_media_scraper --cache-directory=cache scrape --user=TWITTER_USER_ID --output-directory=out

selenium.common.exceptions.TimeoutException

At some point you will face the Elmo's notorious rate-limiter. The website just stops returning any meaningful data and then you get the above exception. In such case simply run the application again and it will pick where it left. To force re-download existing items again delete the file cache/visited.sqlite3.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
tests		tests
x_media_scraper		x_media_scraper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

x.com (twitter) media scraper

Build

Usage

selenium.common.exceptions.TimeoutException

About

Releases

Packages

Languages

License

rufiorogue/xdotcom-media-scraper

Folders and files

Latest commit

History

Repository files navigation

x.com (twitter) media scraper

Build

Usage

selenium.common.exceptions.TimeoutException

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages