GitHub - Koshqua/scrapio: Simple and easy-to-use scraper and crawler in Go.

Scrapio

Scrapio - is a lightweight and user-friendy web crawling and scraping library. The main goal of creating the project was to make scraping big amounts of similar data from web easy and user-friendly. It might be useful for wide range of applications, like data mining, data processing and archiving. After some time, I am going to make it a standalone service, which will work as an API.

Installation

Features

At the moment works as a library which can be used to crawl and scrap data from web. What it can do:

Crawl all pages on host, return all the links.
Scrap text, image urls and links from Crawl Result pages.
It leaves the choice of data output(csv,json, etc) up to you.
It's free and quite powerful.
Written in go, concurrent, depending on Network Speed can crawl and scrap up to 2k pages/minute.

Installation

go get github.com/koshqua/scrapio

Usage

Crawler is easy to use. You just need to specify a starting URL and it will crawl all the URL on the host.

    //init a new crawler, give it a start url, it's not necessary should be basic URL
    cr := &crawler.Crawler{StartURL: "https://gulfnews.com/"}
    //Start crawling func. 
    //After some time im going to implement more configs for this func, like max results, etc.
    cr.Crawl()
    //Do something with result, it's up to you

Scraper uses data structure given by crawler. Before initiating a scraper, you need to create a few selectors, to assign them to scraper. Selectors are the simple css-like selectors.

    //create some Selectors, which you want to scrap.
    h2 := scraper.NewSelector("h2", true, true, true)
    img := scraper.NewSelector("img", true, true, true)
    p := scraper.NewSelector("p:first-of-type", true, true, true)
    //Initiate a new scrapper with given selectors
    //Scraper depends on the crawler from previous code snippet.
    //It gets pages and creates new structure with selectors and scrap results.
    sc := scraper.InitScraper(*cr, []scraper.Selector{h2, img, p})
    //And just start scraping
	err := sc.Scrap()
	if err != nil {
		log.Fatalln(err)
	}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
api		api
crawler		crawler
scraper		scraper
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
main.go		main.go
result.json		result.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapio

Installation

Features

Installation

Usage

About

Releases

Packages

Contributors 3

Languages

License

Koshqua/scrapio

Folders and files

Latest commit

History

Repository files navigation

Scrapio

Installation

Features

Installation

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages