Skip to content

This repository is comprised of scripts that use BeautifulSoup to scrape journal articles and subject repositories for researching purposes. Some are modifications of the scripts I wrote for the IASGE project: https://gitlab.com/investigating-archiving-git/journal-scraping

License

Notifications You must be signed in to change notification settings

GenevieveMilliken/Web_Scraping_for_Research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraping for Research

This is a WIP repository; scripts will be added as they are completed

Using web scraping for research purposes is one tool that can be added to the scholarly workflow. This repository contains Python scripts that use BeautifulSoup to scrape a variety of open access journals and subject repositories, including:

Please note that specific search terms are sometimes used to build a URL for some journals so that outputs are specific and limited in quantity and serve as proof of concept. Search terms that are specific to your research question need to be included in building the main URL, which can be derived from using the advance search function and the resulting URL.

When possible, JSON outputs of metadata and/or full text are stored in each directory. For journals that host articles as PDFs, secondary scripts to download those PDFs are included. In these cases, running the first script results in a JSON output of article metadata and the second script uses that JSON output to download the PDFs.

About

This repository is comprised of scripts that use BeautifulSoup to scrape journal articles and subject repositories for researching purposes. Some are modifications of the scripts I wrote for the IASGE project: https://gitlab.com/investigating-archiving-git/journal-scraping

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages