Skip to content

Commit 997e835

Browse files
authored
Support mega service on Xeon of ChatQnA (opea-project#111)
* support mega service on xeon of ChatQnA Signed-off-by: letonghan <letong.han@intel.com>
1 parent 54c1508 commit 997e835

File tree

3 files changed

+344
-0
lines changed

3 files changed

+344
-0
lines changed

ChatQnA/microservice/xeon/README.md

+171
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
# Build Mega Service of ChatQnA on Xeon
2+
3+
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `llm`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service.
4+
5+
## 🚀 Apply Xeon Server on AWS
6+
7+
To apply a Xeon server on AWS, start by creating an AWS account if you don't have one already. Then, head to the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home) to begin the process. Within the EC2 service, select the Amazon EC2 M7i or M7i-flex instance type to leverage the power of 4th Generation Intel Xeon Scalable processors. These instances are optimized for high-performance computing and demanding workloads.
8+
9+
For detailed information about these instance types, you can refer to this [link](https://aws.amazon.com/ec2/instance-types/m7i/). Once you've chosen the appropriate instance type, proceed with configuring your instance settings, including network configurations, security groups, and storage options.
10+
11+
After launching your instance, you can connect to it using SSH (for Linux instances) or Remote Desktop Protocol (RDP) (for Windows instances). From there, you'll have full access to your Xeon server, allowing you to install, configure, and manage your applications as needed.
12+
13+
## 🚀 Build Docker Images
14+
15+
First of all, you need to build Docker Images locally and install the python package of it.
16+
17+
```bash
18+
git clone https://github.com/opea-project/GenAIComps.git
19+
cd GenAIComps
20+
pip install -r requirements.txt
21+
pip install .
22+
```
23+
24+
### 1. Build Embedding Image
25+
26+
```bash
27+
docker build -t opea/gen-ai-comps:embedding-tei-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/docker/Dockerfile .
28+
```
29+
30+
### 2. Build Retriever Image
31+
32+
```bash
33+
docker build -t opea/gen-ai-comps:retriever-redis-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/langchain/docker/Dockerfile .
34+
```
35+
36+
### 3. Build Rerank Image
37+
38+
```bash
39+
docker build -t opea/gen-ai-comps:reranking-tei-xeon-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/reranks/docker/Dockerfile .
40+
```
41+
42+
### 4. Build LLM Image
43+
44+
```bash
45+
docker build -t opea/gen-ai-comps:llm-tgi-server --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llm/langchain/docker/Dockerfile .
46+
```
47+
48+
### 5. Pull qna-rag-redis-server Image
49+
50+
```bash
51+
docker pull intel/gen-ai-examples:qna-rag-redis-server
52+
```
53+
54+
Then run the command `docker images`, you will have the following four Docker Images:
55+
56+
1. `opea/gen-ai-comps:embedding-tei-server`
57+
2. `opea/gen-ai-comps:retriever-redis-server`
58+
3. `opea/gen-ai-comps:reranking-tei-xeon-server`
59+
4. `opea/gen-ai-comps:llm-tgi-server`
60+
5. `intel/gen-ai-examples:qna-rag-redis-server`
61+
62+
## 🚀 Start Microservices
63+
64+
### Setup Environment Variables
65+
66+
Since the `docker_compose_xeon.yaml` will consume some environment variables, you need to setup them in advance as below.
67+
68+
```bash
69+
export http_proxy=${your_http_proxy}
70+
export https_proxy=${your_http_proxy}
71+
export EMBEDDING_MODEL_ID="BAAI/bge-large-en-v1.5"
72+
export RERANK_MODEL_ID="BAAI/bge-reranker-large"
73+
export LLM_MODEL_ID="m-a-p/OpenCodeInterpreter-DS-6.7B"
74+
export TEI_EMBEDDING_ENDPOINT="http://${your_ip}:8090"
75+
export TEI_RERANKING_ENDPOINT="http://${your_ip}:6060"
76+
export TGI_LLM_ENDPOINT="http://${your_ip}:8008"
77+
export REDIS_URL="redis://${your_ip}:6379"
78+
export INDEX_NAME=${your_index_name}
79+
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
80+
```
81+
82+
### Start Microservice Docker Containers
83+
84+
```bash
85+
docker compose -f docker_compose_xeon.yaml up -d
86+
```
87+
88+
### Validate Microservices
89+
90+
1. TEI Embedding Service
91+
92+
```bash
93+
curl ${your_ip}:8090/embed \
94+
-X POST \
95+
-d '{"inputs":"What is Deep Learning?"}' \
96+
-H 'Content-Type: application/json'
97+
```
98+
99+
2. Embedding Microservice
100+
101+
```bash
102+
curl http://${your_ip}:6000/v1/embeddings\
103+
-X POST \
104+
-d '{"text":"hello"}' \
105+
-H 'Content-Type: application/json'
106+
```
107+
108+
3. Retriever Microservice
109+
110+
```bash
111+
curl http://${your_ip}:7000/v1/retrieval\
112+
-X POST \
113+
-d '{"text":"test","embedding":[1,1,...1]}' \
114+
-H 'Content-Type: application/json'
115+
```
116+
117+
4. TEI Reranking Service
118+
119+
```bash
120+
curl http://${your_ip}:6060/rerank \
121+
-X POST \
122+
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
123+
-H 'Content-Type: application/json'
124+
```
125+
126+
5. Reranking Microservice
127+
128+
```bash
129+
curl http://${your_ip}:8000/v1/reranking\
130+
-X POST \
131+
-d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
132+
-H 'Content-Type: application/json'
133+
```
134+
135+
6. TGI Service
136+
137+
```bash
138+
curl http://${your_ip}:8008/generate \
139+
-X POST \
140+
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
141+
-H 'Content-Type: application/json'
142+
```
143+
144+
7. LLM Microservice
145+
146+
```bash
147+
curl http://${your_ip}:9000/v1/chat/completions\
148+
-X POST \
149+
-d '{"text":"What is Deep Learning?"}' \
150+
-H 'Content-Type: application/json'
151+
```
152+
153+
Following the validation of all aforementioned microservices, we are now prepared to construct a mega-service. However, before launching the mega-service, it's essential to ingest data into the vector store.
154+
155+
## 🚀 Ingest Data Into Vector Database
156+
157+
```bash
158+
docker exec -it qna-rag-redis-server bash
159+
cd /ws
160+
python ingest.py
161+
```
162+
163+
## 🚀 Construct Mega Service
164+
165+
Modify the `initial_inputs` of line 34 in `chatqna.py`, then you will get the ChatQnA result of this mega service.
166+
167+
All of the intermediate results will be printed for each microservices. Users can check the accuracy of the results to make targeted modifications.
168+
169+
```bash
170+
python chatqna.py
171+
```

ChatQnA/microservice/xeon/chatqna.py

+47
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Copyright (c) 2024 Intel Corporation
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
16+
from comps import RemoteMicroService, ServiceOrchestrator
17+
18+
19+
class MyServiceOrchestrator:
20+
def __init__(self, port=8000):
21+
self.service_builder = ServiceOrchestrator(port=port)
22+
23+
def add_remote_service(self):
24+
embedding = RemoteMicroService(
25+
name="embedding", host="10.165.57.68", port=6000, expose_endpoint="/v1/embeddings"
26+
)
27+
retriever = RemoteMicroService(
28+
name="retriever", host="10.165.57.68", port=7000, expose_endpoint="/v1/retrieval"
29+
)
30+
rerank = RemoteMicroService(name="rerank", host="10.165.57.68", port=8000, expose_endpoint="/v1/reranking")
31+
llm = RemoteMicroService(name="llm", host="10.165.57.68", port=9000, expose_endpoint="/v1/chat/completions")
32+
self.service_builder.add(embedding).add(retriever).add(rerank).add(llm)
33+
self.service_builder.flow_to(embedding, retriever)
34+
self.service_builder.flow_to(retriever, rerank)
35+
self.service_builder.flow_to(rerank, llm)
36+
37+
def schedule(self):
38+
self.service_builder.schedule(initial_inputs={"text": "What is the revenue of Nike?"})
39+
self.service_builder.get_all_final_outputs()
40+
result_dict = self.service_builder.result_dict
41+
print(result_dict)
42+
43+
44+
if __name__ == "__main__":
45+
service_ochestrator = MyServiceOrchestrator(port=9001)
46+
service_ochestrator.add_remote_service()
47+
service_ochestrator.schedule()
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# Copyright (c) 2024 Intel Corporation
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
version: "3.8"
16+
17+
services:
18+
redis-vector-db:
19+
image: redis/redis-stack:7.2.0-v9
20+
container_name: redis-vector-db
21+
ports:
22+
- "6379:6379"
23+
- "8001:8001"
24+
qna-rag-redis-server:
25+
image: intel/gen-ai-examples:qna-rag-redis-server
26+
container_name: qna-rag-redis-server
27+
environment:
28+
http_proxy: ${http_proxy}
29+
https_proxy: ${https_proxy}
30+
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
31+
REDIS_PORT: 6379
32+
EMBED_MODEL: BAAI/bge-base-en-v1.5
33+
REDIS_SCHEMA: schema_dim_768.yml
34+
VECTOR_DATABASE: REDIS
35+
ulimits:
36+
memlock:
37+
soft: -1 # Set memlock to unlimited (no soft or hard limit)
38+
hard: -1
39+
volumes:
40+
- ../redis:/ws
41+
- ../test:/test
42+
network_mode: "host"
43+
tei_embedding_service:
44+
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
45+
container_name: tei_embedding_server
46+
ports:
47+
- "8090:80"
48+
volumes:
49+
- "./data:/data"
50+
shm_size: 1g
51+
environment:
52+
http_proxy: ${http_proxy}
53+
https_proxy: ${https_proxy}
54+
command: --model-id ${EMBEDDING_MODEL_ID}
55+
embedding:
56+
image: intel/gen-ai-comps:embedding-tei-server
57+
container_name: embedding-tei-server
58+
ports:
59+
- "6000:6000"
60+
ipc: host
61+
environment:
62+
http_proxy: ${http_proxy}
63+
https_proxy: ${https_proxy}
64+
TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
65+
restart: unless-stopped
66+
retriever:
67+
image: intel/gen-ai-comps:retriever-redis-server
68+
container_name: retriever-redis-server
69+
ports:
70+
- "7000:7000"
71+
ipc: host
72+
environment:
73+
http_proxy: ${http_proxy}
74+
https_proxy: ${https_proxy}
75+
REDIS_URL: ${REDIS_URL}
76+
INDEX_NAME: ${INDEX_NAME}
77+
restart: unless-stopped
78+
tei_xeon_service:
79+
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
80+
container_name: tei_xeon_server
81+
ports:
82+
- "8808:80"
83+
volumes:
84+
- "./data:/data"
85+
shm_size: 1g
86+
environment:
87+
http_proxy: ${http_proxy}
88+
https_proxy: ${https_proxy}
89+
command: --model-id ${RERANK_MODEL_ID}
90+
reranking:
91+
image: intel/gen-ai-comps:reranking-tei-xeon-server
92+
container_name: reranking-tei-xeon-server
93+
ports:
94+
- "8000:8000"
95+
ipc: host
96+
environment:
97+
http_proxy: ${http_proxy}
98+
https_proxy: ${https_proxy}
99+
TEI_RERANKING_ENDPOINT: ${TEI_RERANKING_ENDPOINT}
100+
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
101+
restart: unless-stopped
102+
tgi_service:
103+
image: ghcr.io/huggingface/text-generation-inference:1.4
104+
container_name: tgi_service
105+
ports:
106+
- "8008:80"
107+
volumes:
108+
- "./data:/data"
109+
shm_size: 1g
110+
command: --model-id ${LLM_MODEL_ID}
111+
llm:
112+
image: intel/gen-ai-comps:llm-tgi-server
113+
container_name: llm-tgi-server
114+
ports:
115+
- "9000:9000"
116+
ipc: host
117+
environment:
118+
http_proxy: ${http_proxy}
119+
https_proxy: ${https_proxy}
120+
TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
121+
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
122+
restart: unless-stopped
123+
124+
networks:
125+
default:
126+
driver: bridge

0 commit comments

Comments
 (0)