C++ Webcrawler

My approach for this project was to make a library that would be responsible for the HTTP session with the server. Once I could send messages back and forth with the server I made a library to parse the HTML into element classes. Using these two libraries I made a web crawler that makes a request to a website and follows any links that are on the response page.

I used wireshark to view TLS errors I was getting due to the formatting of my HTTP requests. I also made a test program so that I could debug the HTML library without pinging the server repeatedly.

The main issue I ran into while implementing the HTTP library was correctly formatting the end of the message correctly. I also struggled with reading the correct size from the socket. I fixed this by reading the HTTP header one byte at a time until the double return. After that I check if there is a content length field and read that amount from the socket.

Because Gradescope did not like my nested files last time I have left them all in the root folder. Here is a guide to what is in each file.

Main File: webcrawler.cpp

HTTP Library: HTTP* Main HTTP file: HTTPSSession.cpp

HTML: HTMLElement.cpp

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.gitignore		.gitignore
HTMLElement.cpp		HTMLElement.cpp
HTMLElement.h		HTMLElement.h
HTMLTest.cpp		HTMLTest.cpp
HTTPMessage.cpp		HTTPMessage.cpp
HTTPMessage.h		HTTPMessage.h
HTTPMethod.cpp		HTTPMethod.cpp
HTTPMethod.h		HTTPMethod.h
HTTPRequestMessage.cpp		HTTPRequestMessage.cpp
HTTPRequestMessage.h		HTTPRequestMessage.h
HTTPResponseMessage.cpp		HTTPResponseMessage.cpp
HTTPResponseMessage.h		HTTPResponseMessage.h
HTTPSSession.cpp		HTTPSSession.cpp
HTTPSSession.h		HTTPSSession.h
Makefile		Makefile
README.md		README.md
secret_flags		secret_flags
webcrawler.cpp		webcrawler.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

C++ Webcrawler

About

Releases

Packages

Languages

tjswierzewski/HTTP-Webcrawler

Folders and files

Latest commit

History

Repository files navigation

C++ Webcrawler

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages