feat: William: Google image scraper #312

Willi8910 · 2025-03-19T10:11:04Z

Hi, I just completed the task

Approach
My approach to do this task is to use selenium external dependency to scrape the information. So in short, it's a web automation that really opens the web browser than can capture the information right there. There's a challenge from Google Captcha that limit the access from bot or automation, but with customized automation browser setting, google treat it as normal user

Setup
Since it's required to do in this current repo, and there's no requirement to use framework, I do this using pure ruby, for other dependencies I use require function

I also separate between execution function and main scraper logic to make it more maintainable and easier to read.
Main execution function is start_scraper.rb, where query args is required, This is example to call the execution file

ruby start_scraper.rb query="Van Gogh Painting"

Main Logic
In requirement, it mention that we can scrape image result other than painting, then image result is 'Image' field box, which is kinda different with 'artwork' box. So I separate function so we can scrape it separately. If they have both then we fetch both of them.

For result, I put into result directory, inside the query search directory name as well.
There are 3 files in this directory, expected array, page_source, and screenshot file. For page source and screenshot is the same as in the example. But for expected array, I separate results from image with artworks to a separate json field.

Testing
I use rspec testing for this logic, I apply integration testing in my google scraper test, because I check your example result also do integration testing, by fetch the real data in website and compare results from that, so I adjust convention accordingly.
I also add a small test for execution function to add more coverage

Willi8910 added 3 commits March 19, 2025 17:46

feat: implement main function

4cf496a

feat: Example Result 1

d7195b7

feat: Add another example result

994f3c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: William: Google image scraper #312

feat: William: Google image scraper #312

Willi8910 commented Mar 19, 2025 •

edited

Loading

feat: William: Google image scraper #312

Are you sure you want to change the base?

feat: William: Google image scraper #312

Conversation

Willi8910 commented Mar 19, 2025 • edited Loading

Willi8910 commented Mar 19, 2025 •

edited

Loading