Skip to content

Commit

Permalink
Avoid reading duplicate files
Browse files Browse the repository at this point in the history
  • Loading branch information
akariv committed Feb 20, 2025
1 parent bec67f0 commit 971fc03
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion odds/backend/scanner/website/website_scanner.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,13 @@ async def scrape(self, url: str) -> list[str]:
# check content type to ensure it's html:
content_type = r.headers.get('content-type', '').lower()
if content_type.startswith('text/html'):
content = r.text
content_ = r.text
content_hash = sha256(content_.encode()).hexdigest()
content_hash_file = self.CACHE / f'{content_hash}.touch'
if not content_hash_file.exists():
content = content_
content_hash_file.open('w').write(content_hash).close()
content = content_
final_url = str(r.url)
with open(cache_file, 'w') as file:
json.dump({
Expand Down

0 comments on commit 971fc03

Please sign in to comment.