Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Custom Faiss Base Image Build Code #7

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Rajrahane
Copy link
Member

Description

Adds the faiss submodule and build scripts to create a custom faiss installed base image to be used by the remote-vector-index-build-service-worker.

Issues Resolved

Resolves #4

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Rajvaibhav Rahane <rrahane@amazon.com>
@Rajrahane
Copy link
Member Author

Fixed the build-docker-image.sh line 45 to chmod -R 777 .
initially was chmod 777 -R .

Copy link
Collaborator

@jed326 jed326 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to have some more formalized and automated testing here, as well as clean up the formatting of everything since we are targeting main with this PR

@@ -0,0 +1,105 @@
import sys
import os.path
# If you faiss python installed you should skip this line
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're trying to merge to main can we clean up this file a bit? For example the dead comments, print statements, etc.
Separately it would be good to add a lint checking CI action to this repo + some auto formatting similar to the spotless checks in Java.

Comment on lines 53 to 56
12. Now to test if everthing is build correctly or not run the below command
```
python faiss-test.py
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be run automatically as a CI action

@@ -0,0 +1,14 @@
import numpy as np

if __name__ == "__main__":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create a scripts package for files like this? Also it seems like this is a pre-requisite for running the faiss-test tests but I don't see that mentioned anywhere

## Creating the image
Below are the steps to create the image that can have a custom faiss python code build.
1. The basic docker file is present in this folder named Dockerfile.
2. Ensure that faiss is checked out before you trigger the build. The commit of faiss I have tested with is: 3c8dc4194907e9b911551d5a009468106f8b9c7f
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a GH link

@Rajrahane Rajrahane disabled auto-merge February 19, 2025 23:42
@Rajrahane Rajrahane marked this pull request as draft February 27, 2025 19:02
Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make the folder name: custom-faiss-image

@@ -0,0 +1,3 @@
[submodule "custom-faiss-installed-image/faiss"]
path = custom-faiss-installed-image/faiss
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to faiss

Comment on lines 21 to 47
+/*
+Adding add function to support add_with_ids functionality from IndexIDMap index type.
+*/
+void GpuIndexCagra::add(idx_t n, const float* x) {
+ train(n, x);
+}
+
+
bool GpuIndexCagra::addImplRequiresIDs_() const {
return false;
};
diff --git a/faiss/gpu/GpuIndexCagra.h b/faiss/gpu/GpuIndexCagra.h
index d6fae29..d28bc6d 100644
--- a/faiss/gpu/GpuIndexCagra.h
+++ b/faiss/gpu/GpuIndexCagra.h
@@ -248,6 +248,8 @@ struct GpuIndexCagra : public GpuIndex {
/// Trains CAGRA based on the given vector data
void train(idx_t n, const float* x) override;

+ void add(idx_t n, const float* x) override;
+
/// Initialize ourselves from the given CPU index; will overwrite
/// all data in ourselves
void copyFrom(const faiss::IndexHNSWCagra* index);
--
2.25.1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all this is not needed. as the changes are merged already in faiss

@@ -0,0 +1,45 @@
FROM nvidia/cuda:12.1.1-base-ubuntu22.04
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check other image types rather than ubuntu. I think this base image has a lot of vulnerabilities. Better to pick the rockyLinux base image

https://hub.docker.com/r/nvidia/cuda/tags?name=base-rocky

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


if __name__ == "__main__":
d = 768
np.random.seed(1234)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need these files

import numpy as np
import time

from numpy import dtype
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we are adding these files

Signed-off-by: Rajvaibhav Rahane <rrahane@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build Faiss from source and create a base image for Faiss
3 participants