|
| 1 | +# Create Workload Guide |
| 2 | + |
| 3 | +This guide explores how users can use the `create-workload` subcommand in OpenSearch Benchmark to create a workload based on pre-existing data in a cluster. |
| 4 | + |
| 5 | +### Create a Workload from Pre-Existing Indices in a Cluster |
| 6 | + |
| 7 | +**Prerequisites:** |
| 8 | +* OpenSearch cluster with data ingested into it in an index. Ensure that index has 1000+ docs. If not, a workload will be created but users cannot run the workload with `--test-mode`. |
| 9 | +* Ensure that your cluster is permissive. |
| 10 | + |
| 11 | +Create a workload with the following command: |
| 12 | +``` |
| 13 | +$ opensearch-benchmark create-workload \ |
| 14 | +--workload="<WORKLOAD NAME>" \ |
| 15 | +--target-hosts="<CLUSTER ENDPOINT>" \ |
| 16 | +--client-options="basic_auth_user:'<USERNAME>',basic_auth_password:'<PASSWORD>'" \ |
| 17 | +--indices="<INDICES TO GENERATE WORKLOAD FROM>" \ |
| 18 | +--output-path="<LOCAL DIRECTORY PATH TO STORE WORKLOAD>" |
| 19 | +``` |
| 20 | +Note that: |
| 21 | +* `--indices` can be 1+ indices specified in a comma-separated list. |
| 22 | +* If the cluster uses basic authentication and has TLS enabled, users will need to provide them through `--client-options`. |
| 23 | + |
| 24 | +The following is an example output of when a user creates a workload from an index called movies that contains 2000 docs. |
| 25 | + |
| 26 | +``` |
| 27 | + ____ _____ __ ____ __ __ |
| 28 | + / __ \____ ___ ____ / ___/___ ____ ___________/ /_ / __ )___ ____ _____/ /_ ____ ___ ____ ______/ /__ |
| 29 | + / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \ / __ / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/ |
| 30 | +/ /_/ / /_/ / __/ / / /__/ / __/ /_/ / / / /__/ / / / / /_/ / __/ / / / /__/ / / / / / / / / /_/ / / / ,< |
| 31 | +\____/ .___/\___/_/ /_/____/\___/\__,_/_/ \___/_/ /_/ /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/ /_/|_| |
| 32 | + /_/ |
| 33 | +
|
| 34 | +[INFO] You did not provide an explicit timeout in the client options. Assuming default of 10 seconds. |
| 35 | +[INFO] Connected to OpenSearch cluster [380d8fd64dd85b5f77c0ad81b0799e1e] version [1.1.0]. |
| 36 | +
|
| 37 | +Extracting documents for index [movies] for test mode... 1000/1000 docs [100.0% done] |
| 38 | +Extracting documents for index [movies]... 2000/2000 docs [100.0% done] |
| 39 | +
|
| 40 | +[INFO] Workload movies has been created. Run it with: opensearch-benchmark --workload-path=/Users/hoangia/Desktop/workloads/movies |
| 41 | +
|
| 42 | +------------------------------- |
| 43 | +[INFO] SUCCESS (took 2 seconds) |
| 44 | +------------------------------- |
| 45 | +``` |
| 46 | + |
| 47 | +By default, workloads created will come with the following operations run in the following order: |
| 48 | +* **delete-index**: Deletes any pre-existing indices with the same name(s) as the indices provided in `--indices` |
| 49 | +* **create-index**: Creates the index with the same name(s) as the indices provided in `--indices` |
| 50 | +* **cluster-health**: Verifies that cluster health is green before proceeding with the ingestion |
| 51 | +* **bulk**: Ingests documents collected from the indices specified in `--indices` |
| 52 | +* **default**: Runs a match-all query on the index for a number of iterations |
| 53 | + |
| 54 | +To invoke the newly created workload, run the following: |
| 55 | +``` |
| 56 | +$ opensearch-benchmark execute_test \ |
| 57 | +--pipeline="benchmark-only" \ |
| 58 | +--workload-path="<PATH OUTPUTTED IN THE OUTPUT OF THE CREATE-WORKLOAD COMMAND>" \ |
| 59 | +--target-host="<CLUSTER ENDPOINT>" \ |
| 60 | +--client-options="basic_auth_user:'<USERNAME>',basic_auth_password:'<PASSWORD>'" |
| 61 | +``` |
| 62 | + |
| 63 | +Users have the options to specify a subset of documents from the index or override the default match_all query. See the following sections for more information on how. |
| 64 | + |
| 65 | +### Adding Custom Queries |
| 66 | +Add `--custom-queries` to the `create-workload` command. This parameter takes in a JSON filepath. This overrides the default match_all query with the queries present in the input file. |
| 67 | + |
| 68 | +Requirements: |
| 69 | +* Ensure that queries are properly formatted and adhere to JSON schema |
| 70 | +* Ensure that all queries are contained within a list. Exception: If providing only a single query, it does not have to be in a list. |
| 71 | + |
| 72 | +Adding to the previous example, a user wants to override default query with the following two custom queries in a JSON file. |
| 73 | +``` |
| 74 | +[ |
| 75 | + { |
| 76 | + "name": "default", |
| 77 | + "operation-type": "search", |
| 78 | + "body": { |
| 79 | + "query": { |
| 80 | + "match_all": {} |
| 81 | + } |
| 82 | + } |
| 83 | + }, |
| 84 | + { |
| 85 | + "name": "term", |
| 86 | + "operation-type": "search", |
| 87 | + "body": { |
| 88 | + "query": { |
| 89 | + "term": { |
| 90 | + "director": "Ian" |
| 91 | + } |
| 92 | + } |
| 93 | + } |
| 94 | + } |
| 95 | +] |
| 96 | +``` |
| 97 | + |
| 98 | +To do this, the user can provide the JSON filepath to `--custom-queries` parameter: |
| 99 | +``` |
| 100 | +$ opensearch-benchmark create-workload \ |
| 101 | +--workload="<WORKLOAD NAME>" \ |
| 102 | +--target-hosts="<CLUSTER ENDPOINT>" \ |
| 103 | +--client-options="basic_auth_user:'<USERNAME>',basic_auth_password:'<PASSWORD>'" \ |
| 104 | +--indices="<INDICES TO GENERATE WORKLOAD FROM>" \ |
| 105 | +--output-path="<LOCAL DIRECTORY PATH TO STORE WORKLOAD>" \ |
| 106 | +--custom-queries="<JSON filepath containing queries>" |
| 107 | +``` |
| 108 | + |
| 109 | +### Common Errors |
| 110 | +When adding custom queries, users might experience the following error will occur if the queries do not adhere to JSON schema standards or are not in a list. |
| 111 | +``` |
| 112 | +[INFO] You did not provide an explicit timeout in the client options. Assuming default of 10 seconds. |
| 113 | +[ERROR] Cannot create-workload. Ensure JSON schema is valid and queries are contained in a list: Extra data: line 9 column 2 (char 113) |
| 114 | +``` |
0 commit comments