Skip to content

Commit 3d4bb02

Browse files
authored
add tutorial doc for semantic search with OpenAI embedding model (opensearch-project#1929)
Signed-off-by: Yaliang Wu <ylwu@amazon.com>
1 parent 849fecf commit 3d4bb02

File tree

1 file changed

+364
-0
lines changed

1 file changed

+364
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,364 @@
1+
# Topic
2+
3+
This doc introduces how to build semantic search in Amazon managed OpenSearch with [OpenAI embedding model](https://platform.openai.com/docs/guides/embeddings).
4+
If you are not using Amazon OpenSearch, you can refer to [openai_connector_embedding_blueprint](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/openai_connector_embedding_blueprint.md) and [OpenSearch semantic search](https://opensearch.org/docs/latest/search-plugins/semantic-search/).
5+
6+
Note: You should replace the placeholders with prefix `your_` with your own value
7+
8+
# Steps
9+
10+
## 0. Create OpenSearch cluster
11+
12+
Go to AWS OpenSearch console UI and create OpenSearch domain.
13+
14+
Copy the domain ARN which will be used in later steps.
15+
16+
## 1. Create secret
17+
Store your OpenAI API key in Secret Manager.
18+
19+
Use default value if not mentioned.
20+
21+
1. Choose "Other type of secret" type.
22+
2. Create a "my_openai_key" key pais with your OpenAI API key as value.
23+
3. On next page, input `my_test_openai_secret` as secret name
24+
25+
Copy the secret ARN which will be used in later steps.
26+
27+
## 2. Create IAM role
28+
To use the secret created in Step1, we need to create an IAM role with read secret permission.
29+
This IAM role will be configured in connector. Connector will use this role to read secret.
30+
31+
Go to IAM console, create IAM role `my_openai_secret_role` with:
32+
33+
- Custom trust policy:
34+
```
35+
{
36+
"Version": "2012-10-17",
37+
"Statement": [
38+
{
39+
"Effect": "Allow",
40+
"Principal": {
41+
"Service": "es.amazonaws.com"
42+
},
43+
"Action": "sts:AssumeRole"
44+
}
45+
]
46+
}
47+
```
48+
- Permission
49+
```
50+
{
51+
"Version": "2012-10-17",
52+
"Statement": [
53+
{
54+
"Action": [
55+
"secretsmanager:GetSecretValue"
56+
],
57+
"Effect": "Allow",
58+
"Resource": "your_secret_arn_created_in_step1"
59+
}
60+
]
61+
}
62+
```
63+
64+
Copy the role ARN which will be used in later steps.
65+
66+
## 3. Configure IAM role in OpenSearch
67+
68+
### 3.1 Create IAM role for Signing create connector request
69+
70+
Generate a new IAM role specifically for signing your create connector request.
71+
72+
73+
Create IAM role `my_create_openai_connector_role` with
74+
- Custom trust policy. Note: `your_iam_user_arn` is the IAM user which will run `aws sts assume-role` in step 4.1
75+
```
76+
{
77+
"Version": "2012-10-17",
78+
"Statement": [
79+
{
80+
"Effect": "Allow",
81+
"Principal": {
82+
"AWS": "your_iam_user_arn"
83+
},
84+
"Action": "sts:AssumeRole"
85+
}
86+
]
87+
}
88+
```
89+
- permission
90+
```
91+
{
92+
"Version": "2012-10-17",
93+
"Statement": [
94+
{
95+
"Effect": "Allow",
96+
"Action": "iam:PassRole",
97+
"Resource": "your_iam_role_arn_created_in_step2"
98+
},
99+
{
100+
"Effect": "Allow",
101+
"Action": "es:ESHttpPost",
102+
"Resource": "your_opensearch_domain_arn_created_in_step0"
103+
}
104+
]
105+
}
106+
```
107+
108+
Copy this role ARN which will be used in later steps.
109+
110+
### 3.2 Map backend role
111+
112+
1. Log in to your OpenSearch Dashboard and navigate to the "Security" page, which you can find in the left-hand menu.
113+
2. Then click "Roles" on security page (you can find it on left-hand), then find "ml_full_access" role and click it.
114+
3. On "ml_full_access" role detail page, click "Mapped users", then click "Manage mapping". Paste IAM role ARN created in Step 3.1 to backend roles part.
115+
Click "Map", then the IAM role configured successfully in your OpenSearch cluster.
116+
117+
![Alt text](images/semantic_search/mapping_iam_role_arn.png)
118+
119+
## 4. Create Connector
120+
121+
Find more details on [connector](https://opensearch.org/docs/latest/ml-commons-plugin/remote-models/connectors/)
122+
123+
124+
### 4.1 Get temporary credential of the role created in step 3.1:
125+
```
126+
aws sts assume-role --role-arn your_iam_role_arn_created_in_step3.1 --role-session-name your_session_name
127+
```
128+
129+
Configure the temporary credential in `~/.aws/credentials` like this
130+
131+
```
132+
[default]
133+
AWS_ACCESS_KEY_ID=your_access_key_of_role_created_in_step3.1
134+
AWS_SECRET_ACCESS_KEY=your_secret_key_of_role_created_in_step3.1
135+
AWS_SESSION_TOKEN=your_session_token_of_role_created_in_step3.1
136+
```
137+
138+
### 4.2 Create connector
139+
140+
Run this python code with the temporary credential configured in `~/.aws/credentials`
141+
142+
```
143+
import boto3
144+
import requests
145+
from requests_aws4auth import AWS4Auth
146+
147+
host = 'your_amazon_opensearch_domain_endpoint_created_in_step0'
148+
region = 'your_amazon_opensearch_domain_region'
149+
service = 'es'
150+
151+
credentials = boto3.Session().get_credentials()
152+
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
153+
154+
path = '/_plugins/_ml/connectors/_create'
155+
url = host + path
156+
157+
payload = {
158+
"name": "OpenAI embedding model connector",
159+
"description": "Connector for OpenAI embedding model",
160+
"version": "1.0",
161+
"protocol": "http",
162+
"credential": {
163+
"secretArn": "your_secret_arn_created_in_step1",
164+
"roleArn": "your_iam_role_arn_created_in_step2"
165+
},
166+
"parameters": {
167+
"model": "text-embedding-ada-002"
168+
},
169+
"actions": [
170+
{
171+
"action_type": "predict",
172+
"method": "POST",
173+
"url": "https://api.openai.com/v1/embeddings",
174+
"headers": {
175+
"Authorization": "Bearer ${credential.secretArn.my_openai_key}"
176+
},
177+
"request_body": "{ \"input\": ${parameters.input}, \"model\": \"${parameters.model}\" }",
178+
"pre_process_function": "connector.pre_process.openai.embedding",
179+
"post_process_function": "connector.post_process.openai.embedding"
180+
}
181+
]
182+
}
183+
184+
headers = {"Content-Type": "application/json"}
185+
186+
r = requests.post(url, auth=awsauth, json=payload, headers=headers)
187+
print(r.text)
188+
```
189+
The script will output connector id.
190+
191+
sample output
192+
```
193+
{"connector_id":"OBUSRI0BTaDH9c7tUxfU"}
194+
```
195+
Copy connector id which will be used in later steps.
196+
## 5. Create Model and test
197+
198+
Login your OpenSearch Dashboard, open DevTools, then run these
199+
200+
1. Create model group
201+
```
202+
POST /_plugins/_ml/model_groups/_register
203+
{
204+
"name": "OpenAI_embedding_model",
205+
"description": "Test model group for OpenAI embedding model"
206+
}
207+
```
208+
Sample output
209+
```
210+
{
211+
"model_group_id": "ORUSRI0BTaDH9c7t9heA",
212+
"status": "CREATED"
213+
}
214+
```
215+
216+
2. Register model
217+
218+
```
219+
POST /_plugins/_ml/models/_register
220+
{
221+
"name": "OpenAI embedding model",
222+
"function_name": "remote",
223+
"description": "test embedding model",
224+
"model_group_id": "ORUSRI0BTaDH9c7t9heA",
225+
"connector_id": "OBUSRI0BTaDH9c7tUxfU"
226+
}
227+
```
228+
Sample output
229+
```
230+
{
231+
"task_id": "OhUTRI0BTaDH9c7tLhcv",
232+
"status": "CREATED",
233+
"model_id": "OxUTRI0BTaDH9c7tLhdE"
234+
}
235+
```
236+
237+
3. Deploy model
238+
```
239+
POST /_plugins/_ml/models/OxUTRI0BTaDH9c7tLhdE/_deploy
240+
```
241+
Sample output
242+
```
243+
{
244+
"task_id": "PkoTRI0BOhavBOmfkCmF",
245+
"task_type": "DEPLOY_MODEL",
246+
"status": "COMPLETED"
247+
}
248+
```
249+
4. Predict
250+
```
251+
POST /_plugins/_ml/models/OxUTRI0BTaDH9c7tLhdE/_predict
252+
{
253+
"parameters": {
254+
"input": ["hello world", "how are you"]
255+
}
256+
}
257+
```
258+
Sample response
259+
```
260+
{
261+
"inference_results": [
262+
{
263+
"output": [
264+
{
265+
"name": "sentence_embedding",
266+
"data_type": "FLOAT32",
267+
"shape": [
268+
1536
269+
],
270+
"data": [
271+
-0.014907048,
272+
0.0013432145,
273+
-0.01851529,
274+
...]
275+
},
276+
{
277+
"name": "sentence_embedding",
278+
"data_type": "FLOAT32",
279+
"shape": [
280+
1536
281+
],
282+
"data": [
283+
-0.014011521,
284+
-0.0067330617,
285+
-0.011700075,
286+
...]
287+
}
288+
],
289+
"status_code": 200
290+
}
291+
]
292+
}
293+
```
294+
295+
## 6. Semantic search
296+
297+
### 6.1 create ingest pipeline
298+
Find more details: [ingest pipeline](https://opensearch.org/docs/latest/ingest-pipelines/)
299+
300+
```
301+
PUT /_ingest/pipeline/my_openai_embedding_pipeline
302+
{
303+
"description": "text embedding pentest",
304+
"processors": [
305+
{
306+
"text_embedding": {
307+
"model_id": "your_embedding_model_id_created_in_step5",
308+
"field_map": {
309+
"text": "text_knn"
310+
}
311+
}
312+
}
313+
]
314+
}
315+
```
316+
### 6.2 create k-NN index
317+
Find more details: [k-NN index](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/)
318+
319+
You should customize your k-NN index for better performance.
320+
```
321+
PUT my_index
322+
{
323+
"settings": {
324+
"index": {
325+
"knn.space_type": "cosinesimil",
326+
"default_pipeline": "my_openai_embedding_pipeline",
327+
"knn": "true"
328+
}
329+
},
330+
"mappings": {
331+
"properties": {
332+
"text_knn": {
333+
"type": "knn_vector",
334+
"dimension": 1536
335+
}
336+
}
337+
}
338+
}
339+
```
340+
### 6.3 ingest test data
341+
```
342+
POST /my_index/_doc/1000001
343+
{
344+
"text": "hello world."
345+
}
346+
```
347+
### 6.4 search
348+
Find more details: [neural search](https://opensearch.org/docs/latest/search-plugins/neural-search/).
349+
```
350+
POST /my_index/_search
351+
{
352+
"query": {
353+
"neural": {
354+
"text_knn": {
355+
"query_text": "hello",
356+
"model_id": "your_embedding_model_id_created_in_step5",
357+
"k": 100
358+
}
359+
}
360+
},
361+
"size": "1",
362+
"_source": ["text"]
363+
}
364+
```

0 commit comments

Comments
 (0)