Skip to content

Daily updates from leaked data search engines, aggregators and similar services.

Notifications You must be signed in to change notification settings

notdls/known-breaches

Repository files navigation

Known Breaches

A compliation of breach information gathered from data aggregators, breach lookup and similar services. This also ships with a basic HTML/JS client-side search (now using sql.js) for the datasets, this can be hosted on any webserver or ran locally, alternatively you can search the data online here.

Indexed Services

Service Name Breach Count Total Records Automatic Updates
BreachDirectory 4,416 27,859,679,523
Dehashed 1,071 16,133,955,377
Hashmob 2,664 4,733,006,982
HaveIBeenPwned 870 14,661,189,352
Leak-Lookup 4,425 27,958,155,583
LeakCheck.io 1,208 6,329,956,229
Leaked.Domains 307 10,390,618,831
ScatteredSecrets 4,669 Unavailable
BreachAware 3,685 6,117,426,271
BreachForums_Official_Index 1,006 16,041,441,468
BreachForums_Unofficial_Index 184 1,335,823,055
BreachNet.pw 662 3,220,636,003
Cit0day 23,562 1,215,545
Citadel.pw 475 Unavailable
databases.today 1,256 Unavailable
DataViper.io 8,225 14,699,175,144
DeepSearch 111 2,292,279,815
Hacked-Emails 16,497 10,767,621,128
HackNotice.com 50,967 11,194,593,862
Hashes.org 2,508 3,259,811,660
Keeper 2,939 Unavailable
LeakBase.pw 655 4,335,625,510
LeakCheck.net 780 Unavailable
leakfind 660 Unavailable
RaidForums 505 10,586,929,316
Siphon 765 Unavailable
Snusbase 673 Unavailable
vigilante-pw 6,413 6,263,886,497
WeLeakInfo_1 10,369 Unavailable
WeLeakInfo_2 945 Unavailable
xam 308 5,789,597

You can find the datasets in datasets/, each file here contains data obtained from the individual providers with the exception of combined.json which is a compilation of all data.

As vigilante.pw is currently down and has been for a while, the data set from the following github repository was used https://github.com/wedataintelligence/Vigilante.pw.

If you have any suggestions feel free to create an issue or submit a PR :)

Usage

Updating the datasets

If you would like to run this yourself, the scraper now requires two things to run successfully:

  • A FlareServerr Host (FLARESERVERR_URL) - This is used to get around Cloudflare bot checks
  • A Hashmob API Key (HASHMOB_API_KEY) - This is used to interact with the Hashmob API to retrieve a list of official breaches.

These values are expected to be passed as environment variables (FLARESERVERR_URL and HASHMOB_API_KEY).

It is possible to run the script without them, but the generated datasets will be missing several live sources.

To run the script, you simply need to run the following command. This will reach out to all of the live providers, and update their datasets as well as regenerate the combined dataset.

$ FLARESOLVERR_URL=REPLACEME HASHMOB_API_KEY=REPLACEME python3 scraper.py

Viewing the datasets

A very simple static site is provided within this repository, you can simply drag and drop, clone or copy this repository to a webserver and it should work fine, alternatively, you can also run a simple python webserver by running python3 -m http.server and viewing the site on http://localhost:8000, or view the public instance here.

TODO

  • Normalize breach dates

About

Daily updates from leaked data search engines, aggregators and similar services.

Topics

Resources

Stars

Watchers

Forks