Skip to content

Latest commit

 

History

History
38 lines (26 loc) · 891 Bytes

README.md

File metadata and controls

38 lines (26 loc) · 891 Bytes

url_filter

This is a demonstration of 16-bit binary fused filters applied to URL filtering.

Usage:

make
./index data/top-1m.csv 

Possible output:

$ ./index data/top-1m.csv 
loaded 1000000 names
average length 23.1128 bytes/name

duplicated string reddog.microsoft.com
total volume 23112842 bytes
number of duplicates hashes 1
ratio of duplicates  hashes 0.000001

filter memory usage : 2261032 bytes (9.8 % of input)

filter memory usage : 18 bits/entry

false-positive rate 0.000010
Benchmarking queries:
binary_fuse16_contain          :   1.42 GB/s   22.4 Ma/s  44.60 ns/d 
Benchmarking construction speed
binary_fuse16_populate         :   1.15 GB/s   49.8 Ma/s  20.09 ns/d 

Reference

Thomas Mueller Graf, Daniel Lemire, Binary Fuse Filters: Fast and Smaller Than Xor Filters, Journal of Experimental Algorithmics 27, 2022