This repository implements an optimized multi-attribute filtering mechanism for Hierarchical Navigable Small World (HNSW) graphs used in Approximate Nearest Neighbor (ANN) search. The goal is to reduce query latency and improve filtering performance for datasets with large attribute spaces (e.g., 1000+ labels per point) by integrating:
- Bitsets: For efficient representation and evaluation of attributes.
- Roaring Bitmaps: For compact storage and high-performance operations on sparse attributes.
- Bitset-based filtering: Enables rapid evaluation of attribute matches using bitwise operations.
- Roaring Bitmap integration: Optimizes storage and set operations for sparse datasets.
- Dynamic query handling: Automatically switches between brute-force and ANN search based on filtered set size.
- HNSWLib enhancements: Extends HNSW traversal and filtering mechanisms for better performance.
- Represent attributes as bitsets and Roaring Bitmaps for efficient filtering.
- Integrate filtering logic into HNSWLib traversal algorithms.
- Benchmark performance improvements in query latency and memory consumption.
- Ensure high recall accuracy and robust performance across various datasets.
- C++ Compiler: GCC, Clang, or equivalent supporting C++17.
- CMake: For configuring and building the project.
- Ninja (optional): For faster builds (recommended).
-
Clone the repository:
git clone https://github.com/your-username/HNSW-Attribute-Filtering-Optimization.git
-
Navigate to the project directory:
cd HNSW-Attribute-Filtering-Optimization
-
Create and navigate to the build directory:
mkdir build && cd build
-
Build the project:
cmake .. -G "Ninja" # Use Ninja for faster builds, or replace with "Unix Makefiles". ninja # Or use 'make' if "Unix Makefiles" was selected.
After building, the executables will be available in the build
directory. You can run them as follows:
-
Run the Bitset Filtering Example:
./bitset_filter
-
Run the Linear Filtering Example:
./linear_filter
-
Run the Roaring Bitmap Filtering Example:
./roaring_filter
-
Run the Test Roaring Bitmap:
./test_roaring
Each filtering example writes its results to a corresponding text file in the build
directory:
bitset_filter_results.txt
linear_filter_results.txt
roaring_filter_results.txt
To evaluate the performance improvements of the optimized filtering:
- Prepare datasets with varying attribute distributions and save them in a supported format.
- Run the benchmark tool included in the project:
./benchmark [dataset_path]
- Customize the following in
config.h
:- Number of attributes per point: Define the size of the bitsets.
- Filtering thresholds: Set constraints for filtering.
- Traversal parameters: Adjust HNSW layer configurations and search heuristics.
Contributions to the project are welcome! To contribute:
- Fork this repository.
- Create a branch for your feature or bug fix:
git checkout -b feature/your-feature
- Commit your changes with clear messages:
git commit -m "Add detailed description of changes"
- Push your changes:
git push origin feature/your-feature
- Open a pull request, explaining your additions or fixes.
- The project builds upon the Hierarchical Navigable Small World Graphs algorithm. Read the foundational paper here: HNSW Paper.
- For efficient bitmap operations, we use the CRoaring Library.
This project is licensed under the MIT License. See the LICENSE file for details.
For inquiries or feedback, please contact:
- Ilyas Boudhaine: ilyasboudhaine1@gmail.com