Skip to content

Commit 02d9b72

Browse files
authored
Create workloads with default query or custom queries (opensearch-project#292)
Signed-off-by: Ian Hoang <hoangia@amazon.com>
1 parent b88e7f4 commit 02d9b72

7 files changed

+181
-5
lines changed

CREATE_WORKLOAD_GUIDE.md

+114
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Create Workload Guide
2+
3+
This guide explores how users can use the `create-workload` subcommand in OpenSearch Benchmark to create a workload based on pre-existing data in a cluster.
4+
5+
### Create a Workload from Pre-Existing Indices in a Cluster
6+
7+
**Prerequisites:**
8+
* OpenSearch cluster with data ingested into it in an index. Ensure that index has 1000+ docs. If not, a workload will be created but users cannot run the workload with `--test-mode`.
9+
* Ensure that your cluster is permissive.
10+
11+
Create a workload with the following command:
12+
```
13+
$ opensearch-benchmark create-workload \
14+
--workload="<WORKLOAD NAME>" \
15+
--target-hosts="<CLUSTER ENDPOINT>" \
16+
--client-options="basic_auth_user:'<USERNAME>',basic_auth_password:'<PASSWORD>'" \
17+
--indices="<INDICES TO GENERATE WORKLOAD FROM>" \
18+
--output-path="<LOCAL DIRECTORY PATH TO STORE WORKLOAD>"
19+
```
20+
Note that:
21+
* `--indices` can be 1+ indices specified in a comma-separated list.
22+
* If the cluster uses basic authentication and has TLS enabled, users will need to provide them through `--client-options`.
23+
24+
The following is an example output of when a user creates a workload from an index called movies that contains 2000 docs.
25+
26+
```
27+
____ _____ __ ____ __ __
28+
/ __ \____ ___ ____ / ___/___ ____ ___________/ /_ / __ )___ ____ _____/ /_ ____ ___ ____ ______/ /__
29+
/ / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \ / __ / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
30+
/ /_/ / /_/ / __/ / / /__/ / __/ /_/ / / / /__/ / / / / /_/ / __/ / / / /__/ / / / / / / / / /_/ / / / ,<
31+
\____/ .___/\___/_/ /_/____/\___/\__,_/_/ \___/_/ /_/ /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/ /_/|_|
32+
/_/
33+
34+
[INFO] You did not provide an explicit timeout in the client options. Assuming default of 10 seconds.
35+
[INFO] Connected to OpenSearch cluster [380d8fd64dd85b5f77c0ad81b0799e1e] version [1.1.0].
36+
37+
Extracting documents for index [movies] for test mode... 1000/1000 docs [100.0% done]
38+
Extracting documents for index [movies]... 2000/2000 docs [100.0% done]
39+
40+
[INFO] Workload movies has been created. Run it with: opensearch-benchmark --workload-path=/Users/hoangia/Desktop/workloads/movies
41+
42+
-------------------------------
43+
[INFO] SUCCESS (took 2 seconds)
44+
-------------------------------
45+
```
46+
47+
By default, workloads created will come with the following operations run in the following order:
48+
* **delete-index**: Deletes any pre-existing indices with the same name(s) as the indices provided in `--indices`
49+
* **create-index**: Creates the index with the same name(s) as the indices provided in `--indices`
50+
* **cluster-health**: Verifies that cluster health is green before proceeding with the ingestion
51+
* **bulk**: Ingests documents collected from the indices specified in `--indices`
52+
* **default**: Runs a match-all query on the index for a number of iterations
53+
54+
To invoke the newly created workload, run the following:
55+
```
56+
$ opensearch-benchmark execute_test \
57+
--pipeline="benchmark-only" \
58+
--workload-path="<PATH OUTPUTTED IN THE OUTPUT OF THE CREATE-WORKLOAD COMMAND>" \
59+
--target-host="<CLUSTER ENDPOINT>" \
60+
--client-options="basic_auth_user:'<USERNAME>',basic_auth_password:'<PASSWORD>'"
61+
```
62+
63+
Users have the options to specify a subset of documents from the index or override the default match_all query. See the following sections for more information on how.
64+
65+
### Adding Custom Queries
66+
Add `--custom-queries` to the `create-workload` command. This parameter takes in a JSON filepath. This overrides the default match_all query with the queries present in the input file.
67+
68+
Requirements:
69+
* Ensure that queries are properly formatted and adhere to JSON schema
70+
* Ensure that all queries are contained within a list. Exception: If providing only a single query, it does not have to be in a list.
71+
72+
Adding to the previous example, a user wants to override default query with the following two custom queries in a JSON file.
73+
```
74+
[
75+
{
76+
"name": "default",
77+
"operation-type": "search",
78+
"body": {
79+
"query": {
80+
"match_all": {}
81+
}
82+
}
83+
},
84+
{
85+
"name": "term",
86+
"operation-type": "search",
87+
"body": {
88+
"query": {
89+
"term": {
90+
"director": "Ian"
91+
}
92+
}
93+
}
94+
}
95+
]
96+
```
97+
98+
To do this, the user can provide the JSON filepath to `--custom-queries` parameter:
99+
```
100+
$ opensearch-benchmark create-workload \
101+
--workload="<WORKLOAD NAME>" \
102+
--target-hosts="<CLUSTER ENDPOINT>" \
103+
--client-options="basic_auth_user:'<USERNAME>',basic_auth_password:'<PASSWORD>'" \
104+
--indices="<INDICES TO GENERATE WORKLOAD FROM>" \
105+
--output-path="<LOCAL DIRECTORY PATH TO STORE WORKLOAD>" \
106+
--custom-queries="<JSON filepath containing queries>"
107+
```
108+
109+
### Common Errors
110+
When adding custom queries, users might experience the following error will occur if the queries do not adhere to JSON schema standards or are not in a list.
111+
```
112+
[INFO] You did not provide an explicit timeout in the client options. Assuming default of 10 seconds.
113+
[ERROR] Cannot create-workload. Ensure JSON schema is valid and queries are contained in a list: Extra data: line 9 column 2 (char 113)
114+
```

README.md

+3
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,9 @@ After the test execution, a summary report is written to the command line:
103103
[INFO] SUCCESS (took 2634 seconds)
104104
----------------------------------
105105

106+
Creating Your Own Workloads
107+
---------------------------
108+
For more information on how users can create their own workloads, see [the Create Workload Guide](./CREATE_WORKLOAD_GUIDE.md)
106109

107110
Getting help
108111
------------

osbenchmark/benchmark.py

+6
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,11 @@ def add_workload_source(subparser):
176176
"--output-path",
177177
default=os.path.join(os.getcwd(), "workloads"),
178178
help="Workload output directory (default: workloads/)")
179+
create_workload_parser.add_argument(
180+
"--custom-queries",
181+
type=argparse.FileType('r'),
182+
help="Input JSON file to use containing custom workload queries that override the default match_all query"
183+
)
179184

180185
generate_parser = subparsers.add_parser("generate", help="Generate artifacts")
181186
generate_parser.add_argument(
@@ -904,6 +909,7 @@ def dispatch_sub_command(arg_parser, args, cfg):
904909
cfg.add(config.Scope.applicationOverride, "generator", "indices", args.indices)
905910
cfg.add(config.Scope.applicationOverride, "generator", "output.path", args.output_path)
906911
cfg.add(config.Scope.applicationOverride, "workload", "workload.name", args.workload)
912+
cfg.add(config.Scope.applicationOverride, "workload", "custom_queries", args.custom_queries)
907913
configure_connection_params(arg_parser, args, cfg)
908914

909915
workload_generator.create_workload(cfg)

osbenchmark/resources/workload.json.j2 osbenchmark/resources/base-workload.json.j2

+3-2
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@
5050
"ingest-percentage": {{ingest_percentage | default(100)}}
5151
},
5252
"clients": {{bulk_indexing_clients | default(8)}}
53-
}
54-
]{% endraw %}
53+
},{% endraw -%}
54+
{% block queries %}{% endblock %}
55+
]
5556
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
{% extends "base-workload.json.j2" %}
2+
3+
{%- block queries -%}
4+
{% for query in custom_queries %}
5+
{
6+
"operation": {
7+
"name": "{{query.name}}",
8+
"operation-type": "{{query['operation-type']}}",
9+
"body": {{query.body | replace("'", '"') }}
10+
}
11+
}{% if not loop.last %},{% endif -%}
12+
{% endfor %}
13+
{%- endblock %}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{% extends "base-workload.json.j2" %}
2+
3+
{% block queries %}
4+
{
5+
"operation": {
6+
"operation-type": "search",
7+
"body": {
8+
"query": {
9+
"match_all": {}
10+
}
11+
}
12+
},{% raw %}
13+
"clients": {{search_clients | default(8)}}{% endraw %}
14+
}
15+
{%- endblock %}

osbenchmark/workload_generator/workload_generator.py

+27-3
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,12 @@
2424

2525
import logging
2626
import os
27+
import json
2728

2829
from opensearchpy import OpenSearchException
2930
from jinja2 import Environment, FileSystemLoader, select_autoescape
3031

31-
from osbenchmark import PROGRAM_NAME
32+
from osbenchmark import PROGRAM_NAME, exceptions
3233
from osbenchmark.client import OsClientFactory
3334
from osbenchmark.workload_generator import corpus, index
3435
from osbenchmark.utils import io, opts, console
@@ -61,6 +62,19 @@ def extract_mappings_and_corpora(client, output_path, indices_to_extract):
6162

6263
return indices, corpora
6364

65+
def process_custom_queries(custom_queries):
66+
if not custom_queries:
67+
return []
68+
69+
with custom_queries as queries:
70+
try:
71+
data = json.load(queries)
72+
if isinstance(data, dict):
73+
data = [data]
74+
except ValueError as err:
75+
raise exceptions.SystemSetupError(f"Ensure JSON schema is valid and queries are contained in a list: {err}")
76+
77+
return data
6478

6579
def create_workload(cfg):
6680
logger = logging.getLogger(__name__)
@@ -70,6 +84,9 @@ def create_workload(cfg):
7084
root_path = cfg.opts("generator", "output.path")
7185
target_hosts = cfg.opts("client", "hosts")
7286
client_options = cfg.opts("client", "options")
87+
unprocessed_custom_queries = cfg.opts("workload", "custom_queries")
88+
89+
custom_queries = process_custom_queries(unprocessed_custom_queries)
7390

7491
logger.info("Creating workload [%s] matching indices [%s]", workload_name, indices)
7592

@@ -89,12 +106,19 @@ def create_workload(cfg):
89106
template_vars = {
90107
"workload_name": workload_name,
91108
"indices": indices,
92-
"corpora": corpora
109+
"corpora": corpora,
110+
"custom_queries": custom_queries
93111
}
94112

113+
logger.info("Template Vars: %s", template_vars)
114+
95115
workload_path = os.path.join(output_path, "workload.json")
96116
templates_path = os.path.join(cfg.opts("node", "benchmark.root"), "resources")
97-
process_template(templates_path, "workload.json.j2", template_vars, workload_path)
117+
118+
if custom_queries:
119+
process_template(templates_path, "custom-query-workload.json.j2", template_vars, workload_path)
120+
else:
121+
process_template(templates_path, "default-query-workload.json.j2", template_vars, workload_path)
98122

99123
console.println("")
100124
console.info(f"Workload {workload_name} has been created. Run it with: {PROGRAM_NAME} --workload-path={output_path}")

0 commit comments

Comments
 (0)