Support mega service on Xeon of ChatQnA (opea-project#111)

letonghan · web-flow · commit 997e8358524d · 2024-05-10T08:53:28.000+08:00
* support mega service on xeon of ChatQnA

Signed-off-by: letonghan &lt;letong.han@intel.com&gt;
diff --git a/ChatQnA/microservice/xeon/README.md b/ChatQnA/microservice/xeon/README.md
@@ -0,0 +1,171 @@
+# Build Mega Service of ChatQnA on Xeon
+
+This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `llm`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service.
+
+## 🚀 Apply Xeon Server on AWS
+
+To apply a Xeon server on AWS, start by creating an AWS account if you don't have one already. Then, head to the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home) to begin the process. Within the EC2 service, select the Amazon EC2 M7i or M7i-flex instance type to leverage the power of 4th Generation Intel Xeon Scalable processors. These instances are optimized for high-performance computing and demanding workloads.
+
+For detailed information about these instance types, you can refer to this [link](https://aws.amazon.com/ec2/instance-types/m7i/). Once you've chosen the appropriate instance type, proceed with configuring your instance settings, including network configurations, security groups, and storage options.
+
+After launching your instance, you can connect to it using SSH (for Linux instances) or Remote Desktop Protocol (RDP) (for Windows instances). From there, you'll have full access to your Xeon server, allowing you to install, configure, and manage your applications as needed.
+
+## 🚀 Build Docker Images
+
+First of all, you need to build Docker Images locally and install the python package of it.
+
+```bash
+git clone https://github.com/opea-project/GenAIComps.git
+cd GenAIComps
+pip install -r requirements.txt
+pip install .
+```
+
+### 1. Build Embedding Image
+
+```bash
+docker build -t opea/gen-ai-comps:embedding-tei-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/docker/Dockerfile .
+```
+
+### 2. Build Retriever Image
+
+```bash
+docker build -t opea/gen-ai-comps:retriever-redis-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/langchain/docker/Dockerfile .
+```
+
+### 3. Build Rerank Image
+
+```bash
+docker build -t opea/gen-ai-comps:reranking-tei-xeon-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/reranks/docker/Dockerfile .
+```
+
+### 4. Build LLM Image
+
+```bash
+docker build -t opea/gen-ai-comps:llm-tgi-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llm/langchain/docker/Dockerfile .
+```
+
+### 5. Pull qna-rag-redis-server Image
+
+```bash
+docker pull intel/gen-ai-examples:qna-rag-redis-server
+```
+
+Then run the command `docker images`, you will have the following four Docker Images:
+
+1. `opea/gen-ai-comps:embedding-tei-server`
+2. `opea/gen-ai-comps:retriever-redis-server`
+3. `opea/gen-ai-comps:reranking-tei-xeon-server`
+4. `opea/gen-ai-comps:llm-tgi-server`
+5. `intel/gen-ai-examples:qna-rag-redis-server`
+
+## 🚀 Start Microservices
+
+### Setup Environment Variables
+
+Since the `docker_compose_xeon.yaml` will consume some environment variables, you need to setup them in advance as below.
+
+```bash
+export http_proxy=${your_http_proxy}
+export https_proxy=${your_http_proxy}
+export EMBEDDING_MODEL_ID="BAAI/bge-large-en-v1.5"
+export RERANK_MODEL_ID="BAAI/bge-reranker-large"
+export LLM_MODEL_ID="m-a-p/OpenCodeInterpreter-DS-6.7B"
+export TEI_EMBEDDING_ENDPOINT="http://${your_ip}:8090"
+export TEI_RERANKING_ENDPOINT="http://${your_ip}:6060"
+export TGI_LLM_ENDPOINT="http://${your_ip}:8008"
+export REDIS_URL="redis://${your_ip}:6379"
+export INDEX_NAME=${your_index_name}
+export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
+```
+
+### Start Microservice Docker Containers
+
+```bash
+docker compose -f docker_compose_xeon.yaml up -d
+```
+
+### Validate Microservices
+
+1. TEI Embedding Service
+
+```bash
+curl ${your_ip}:8090/embed \
+    -X POST \
+    -d '{"inputs":"What is Deep Learning?"}' \
+    -H 'Content-Type: application/json'
+```
+
+2. Embedding Microservice
+
+```bash
+curl http://${your_ip}:6000/v1/embeddings\
+  -X POST \
+  -d '{"text":"hello"}' \
+  -H 'Content-Type: application/json'
+```
+
+3. Retriever Microservice
+
+```bash
+curl http://${your_ip}:7000/v1/retrieval\
+  -X POST \
+  -d '{"text":"test","embedding":[1,1,...1]}' \
+  -H 'Content-Type: application/json'
+```
+
+4. TEI Reranking Service
+
+```bash
+curl http://${your_ip}:6060/rerank \
+    -X POST \
+    -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
+    -H 'Content-Type: application/json'
+```
+
+5. Reranking Microservice
+
+```bash
+curl http://${your_ip}:8000/v1/reranking\
+  -X POST \
+  -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
+  -H 'Content-Type: application/json'
+```
+
+6. TGI Service
+
+```bash
+curl http://${your_ip}:8008/generate \
+  -X POST \
+  -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
+  -H 'Content-Type: application/json'
+```
+
+7. LLM Microservice
+
+```bash
+curl http://${your_ip}:9000/v1/chat/completions\
+  -X POST \
+  -d '{"text":"What is Deep Learning?"}' \
+  -H 'Content-Type: application/json'
+```
+
+Following the validation of all aforementioned microservices, we are now prepared to construct a mega-service. However, before launching the mega-service, it's essential to ingest data into the vector store.
+
+## 🚀 Ingest Data Into Vector Database
+
+```bash
+docker exec -it qna-rag-redis-server bash
+cd /ws
+python ingest.py
+```
+
+## 🚀 Construct Mega Service
+
+Modify the `initial_inputs` of line 34 in `chatqna.py`, then you will get the ChatQnA result of this mega service.
+
+All of the intermediate results will be printed for each microservices. Users can check the accuracy of the results to make targeted modifications.
+
+```bash
+python chatqna.py
+```
diff --git a/ChatQnA/microservice/xeon/chatqna.py b/ChatQnA/microservice/xeon/chatqna.py
@@ -0,0 +1,47 @@
+# Copyright (c) 2024 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from comps import RemoteMicroService, ServiceOrchestrator
+
+
+class MyServiceOrchestrator:
+    def __init__(self, port=8000):
+        self.service_builder = ServiceOrchestrator(port=port)
+
+    def add_remote_service(self):
+        embedding = RemoteMicroService(
+            name="embedding", host="10.165.57.68", port=6000, expose_endpoint="/v1/embeddings"
+        )
+        retriever = RemoteMicroService(
+            name="retriever", host="10.165.57.68", port=7000, expose_endpoint="/v1/retrieval"
+        )
+        rerank = RemoteMicroService(name="rerank", host="10.165.57.68", port=8000, expose_endpoint="/v1/reranking")
+        llm = RemoteMicroService(name="llm", host="10.165.57.68", port=9000, expose_endpoint="/v1/chat/completions")
+        self.service_builder.add(embedding).add(retriever).add(rerank).add(llm)
+        self.service_builder.flow_to(embedding, retriever)
+        self.service_builder.flow_to(retriever, rerank)
+        self.service_builder.flow_to(rerank, llm)
+
+    def schedule(self):
+        self.service_builder.schedule(initial_inputs={"text": "What is the revenue of Nike?"})
+        self.service_builder.get_all_final_outputs()
+        result_dict = self.service_builder.result_dict
+        print(result_dict)
+
+
+if __name__ == "__main__":
+    service_ochestrator = MyServiceOrchestrator(port=9001)
+    service_ochestrator.add_remote_service()
+    service_ochestrator.schedule()
diff --git a/ChatQnA/microservice/xeon/docker_compose.yaml b/ChatQnA/microservice/xeon/docker_compose.yaml
@@ -0,0 +1,126 @@
+# Copyright (c) 2024 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: "3.8"
+
+services:
+  redis-vector-db:
+    image: redis/redis-stack:7.2.0-v9
+    container_name: redis-vector-db
+    ports:
+      - "6379:6379"
+      - "8001:8001"
+  qna-rag-redis-server:
+    image: intel/gen-ai-examples:qna-rag-redis-server
+    container_name: qna-rag-redis-server
+    environment:
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      REDIS_PORT: 6379
+      EMBED_MODEL: BAAI/bge-base-en-v1.5
+      REDIS_SCHEMA: schema_dim_768.yml
+      VECTOR_DATABASE: REDIS
+    ulimits:
+      memlock:
+        soft: -1 # Set memlock to unlimited (no soft or hard limit)
+        hard: -1
+    volumes:
+      - ../redis:/ws
+      - ../test:/test
+    network_mode: "host"
+  tei_embedding_service:
+    image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
+    container_name: tei_embedding_server
+    ports:
+      - "8090:80"
+    volumes:
+      - "./data:/data"
+    shm_size: 1g
+    environment:
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+    command: --model-id ${EMBEDDING_MODEL_ID}
+  embedding:
+    image: intel/gen-ai-comps:embedding-tei-server
+    container_name: embedding-tei-server
+    ports:
+      - "6000:6000"
+    ipc: host
+    environment:
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
+    restart: unless-stopped
+  retriever:
+    image: intel/gen-ai-comps:retriever-redis-server
+    container_name: retriever-redis-server
+    ports:
+      - "7000:7000"
+    ipc: host
+    environment:
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      REDIS_URL: ${REDIS_URL}
+      INDEX_NAME: ${INDEX_NAME}
+    restart: unless-stopped
+  tei_xeon_service:
+    image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
+    container_name: tei_xeon_server
+    ports:
+      - "8808:80"
+    volumes:
+      - "./data:/data"
+    shm_size: 1g
+    environment:
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+    command: --model-id ${RERANK_MODEL_ID}
+  reranking:
+    image: intel/gen-ai-comps:reranking-tei-xeon-server
+    container_name: reranking-tei-xeon-server
+    ports:
+      - "8000:8000"
+    ipc: host
+    environment:
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      TEI_RERANKING_ENDPOINT: ${TEI_RERANKING_ENDPOINT}
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+    restart: unless-stopped
+  tgi_service:
+    image: ghcr.io/huggingface/text-generation-inference:1.4
+    container_name: tgi_service
+    ports:
+      - "8008:80"
+    volumes:
+      - "./data:/data"
+    shm_size: 1g
+    command: --model-id ${LLM_MODEL_ID}
+  llm:
+    image: intel/gen-ai-comps:llm-tgi-server
+    container_name: llm-tgi-server
+    ports:
+      - "9000:9000"
+    ipc: host
+    environment:
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+    restart: unless-stopped
+
+networks:
+  default:
+    driver: bridge