Skip to content

Commit adf5dcd

Browse files
spbjssAlex
authored andcommitted
Update readme to add more information (opensearch-project#81)
* Create JvmService instance on demand. Signed-off-by: Alex <pengsun@amazon.com> * Move the ml_parameters from XContent to the request parameters to avoid the conflict with search XContent input. Signed-off-by: Alex <pengsun@amazon.com> * Fix the security risks found by PenTest. 1. unhandled 500 server error. 2. Insecure Deserialization Signed-off-by: Alex <pengsun@amazon.com> * Remove unnecessory '*' from the welcome list of model deserializer. Signed-off-by: Alex <pengsun@amazon.com> * Update readme to add more information. Signed-off-by: Alex <pengsun@amazon.com> * Add developer guide to the document. Signed-off-by: Alex <pengsun@amazon.com> * Add documents for ml-commens. Signed-off-by: Alex <pengsun@amazon.com> * Sync the build scripts Signed-off-by: Alex <pengsun@amazon.com> * Remove the dependencies added to support Mleap. Signed-off-by: Alex <pengsun@amazon.com> Co-authored-by: Alex <pengsun@amazon.com>
1 parent 24a4c9c commit adf5dcd

File tree

4 files changed

+183
-106
lines changed

4 files changed

+183
-106
lines changed

CONTRIBUTING.md

+3-102
Original file line numberDiff line numberDiff line change
@@ -1,103 +1,4 @@
1-
# Contributing Guidelines
1+
## Contributing to this Project
22

3-
Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
4-
documentation, we greatly value feedback and contributions from our community.
5-
6-
Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
7-
information to effectively respond to your bug report or contribution.
8-
9-
10-
## Reporting Bugs/Feature Requests
11-
12-
We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13-
14-
When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
15-
reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16-
17-
* A reproducible test case or series of steps
18-
* The version of our code being used
19-
* Any modifications you've made relevant to the bug
20-
* Anything unusual about your environment or deployment
21-
22-
23-
## Contributing via Pull Requests
24-
Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25-
26-
1. You are working against the latest source on the *main* branch.
27-
2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28-
3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29-
30-
To send us a pull request, please:
31-
32-
1. Fork the repository.
33-
2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34-
3. Ensure local tests pass.
35-
4. Commit to your fork using clear commit messages.
36-
5. Send us a pull request, answering any default questions in the pull request interface.
37-
6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38-
39-
GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40-
[creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41-
42-
43-
## Finding contributions to work on
44-
Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
45-
46-
47-
## Code of Conduct
48-
This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49-
For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50-
opensource-codeofconduct@amazon.com with any additional questions or comments.
51-
52-
## Developer Certificate of Origin
53-
54-
OpenSearch is an open source product released under the Apache 2.0 license (see either [the Apache site](https://www.apache.org/licenses/LICENSE-2.0) or the [LICENSE.txt file](./LICENSE.txt)). The Apache 2.0 license allows you to freely use, modify, distribute, and sell your own products that include Apache 2.0 licensed software.
55-
56-
We respect intellectual property rights of others and we want to make sure all incoming contributions are correctly attributed and licensed. A Developer Certificate of Origin (DCO) is a lightweight mechanism to do that.
57-
58-
The DCO is a declaration attached to every contribution made by every developer. In the commit message of the contribution, the developer simply adds a `Signed-off-by` statement and thereby agrees to the DCO, which you can find below or at [DeveloperCertificate.org](http://developercertificate.org/).
59-
60-
```
61-
Developer's Certificate of Origin 1.1
62-
63-
By making a contribution to this project, I certify that:
64-
65-
(a) The contribution was created in whole or in part by me and I
66-
have the right to submit it under the open source license
67-
indicated in the file; or
68-
69-
(b) The contribution is based upon previous work that, to the
70-
best of my knowledge, is covered under an appropriate open
71-
source license and I have the right under that license to
72-
submit that work with modifications, whether created in whole
73-
or in part by me, under the same open source license (unless
74-
I am permitted to submit under a different license), as
75-
Indicated in the file; or
76-
77-
(c) The contribution was provided directly to me by some other
78-
person who certified (a), (b) or (c) and I have not modified
79-
it.
80-
81-
(d) I understand and agree that this project and the contribution
82-
are public and that a record of the contribution (including
83-
all personal information I submit with it, including my
84-
sign-off) is maintained indefinitely and may be redistributed
85-
consistent with this project or the open source license(s)
86-
involved.
87-
```
88-
We require that every contribution to OpenSearch is signed with a Developer Certificate of Origin. Additionally, please use your real name. We do not accept anonymous contributors nor those utilizing pseudonyms.
89-
90-
Each commit must include a DCO which looks like this
91-
92-
```
93-
Signed-off-by: Jane Smith <jane.smith@email.com>
94-
```
95-
You may type this line on your own when writing your commit messages. However, if your user.name and user.email are set in your git configs, you can use `-s` or `--signoff` to add the `Signed-off-by` line to the end of the commit message.
96-
97-
## Security issue notifications
98-
If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
99-
100-
101-
## Licensing
102-
103-
See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
3+
OpenSearch is a community project that is built and maintained by people just like **you**.
4+
[This document](https://github.com/opensearch-project/.github/blob/main/CONTRIBUTING.md) explains how you can contribute to this and related projects.

DEVELOPER_GUIDE.md

+65
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
- [Developer Guide](#developer-guide)
2+
- [Forking and Cloning](#forking-and-cloning)
3+
- [Install Prerequisites](#install-prerequisites)
4+
- [JDK 14](#jdk-14)
5+
- [Setup](#setup)
6+
- [Build](#build)
7+
- [Building from the command line](#building-from-the-command-line)
8+
- [Building from the IDE](#building-from-the-ide)
9+
- [Debugging](#debugging)
10+
11+
## Developer Guide
12+
13+
### Forking and Cloning
14+
15+
Fork this repository on GitHub, and clone locally with `git clone`.
16+
17+
### Install Prerequisites
18+
19+
#### JDK 14
20+
21+
OpenSearch components build using Java 14 at a minimum. This means you must have a JDK 14 installed with the environment variable `JAVA_HOME` referencing the path to Java home for your JDK 14 installation, e.g. `JAVA_HOME=/usr/lib/jvm/jdk-14`.
22+
23+
### Setup
24+
25+
1. Clone the repository (see [Forking and Cloning](#forking-and-cloning))
26+
2. Make sure `JAVA_HOME` is pointing to a Java 14 JDK (see [Install Prerequisites](#install-prerequisites))
27+
3. Launch Intellij IDEA, Choose Import Project.
28+
29+
### Build
30+
31+
This package uses the [Gradle](https://docs.gradle.org/current/userguide/userguide.html) build system. Gradle comes with excellent documentation that should be your first stop when trying to figure out how to operate or modify the build. we also use the OpenSearch build tools for Gradle. These tools are idiosyncratic and don't always follow the conventions and instructions for building regular Java code using Gradle. Not everything in this package will work the way it's described in the Gradle documentation. If you encounter such a situation, the OpenSearch build tools [source code](https://github.com/opensearch-project/OpenSearch/tree/main/buildSrc/src/main/groovy/org/opensearch/gradle) is your best bet for figuring out what's going on.
32+
33+
#### Building from the command line
34+
35+
1. `./gradlew build` builds and tests
36+
2. `./gradlew :run` launches a single node cluster with ml-commons plugin installed
37+
3. `./gradlew :integTest` launches a single node cluster with ml-commons plugin installed and runs all integration tests except security
38+
4. ` ./gradlew :integTest --tests="**.test execute foo"` runs a single integration test class or method
39+
5. `./gradlew spotlessApply` formats code. And/or import formatting rules in `.eclipseformat.xml` with IDE.
40+
41+
When launching a cluster using one of the above commands logs are placed in `/build/cluster/run node0/opensearch-<version>/logs`. Though the logs are teed to the console, in practices it's best to check the actual log file.
42+
43+
#### Building from the IDE
44+
45+
Currently, the only IDE we support is IntelliJ IDEA. It's free, it's open source, it works. The gradle tasks above can also be launched from IntelliJ's Gradle toolbar and the extra parameters can be passed in via the Launch Configurations VM arguments.
46+
47+
#### Debugging
48+
49+
Sometimes it's useful to attach a debugger to either the OpenSearch cluster or the integ tests to see what's going on. When running unit tests you can just hit 'Debug' from the IDE's gutter to debug the tests. To debug code running in an actual server run:
50+
51+
```
52+
./gradlew :integTest --debug-jvm # to start a cluster and run integ tests
53+
OR
54+
./gradlew :run --debug-jvm # to just start a cluster that can be debugged
55+
```
56+
57+
The OpenSearch server JVM will launch suspended and wait for a debugger to attach to `localhost:8000` before starting the OpenSearch server.
58+
59+
To debug code running in an integ test (which exercises the server from a separate JVM) run:
60+
61+
```
62+
./gradlew -Dtest.debug :integTest
63+
```
64+
65+
The test runner JVM will start suspended and wait for a debugger to attach to `localhost:5005` before running the tests.

LICENSE

+27
Original file line numberDiff line numberDiff line change
@@ -173,3 +173,30 @@
173173
defend, and hold each Contributor harmless for any liability
174174
incurred by, or claims asserted against, such Contributor by reason
175175
of your accepting any such warranty or additional liability.
176+
177+
END OF TERMS AND CONDITIONS
178+
179+
APPENDIX: How to apply the Apache License to your work.
180+
181+
To apply the Apache License to your work, attach the following
182+
boilerplate notice, with the fields enclosed by brackets "[]"
183+
replaced with your own identifying information. (Don't include
184+
the brackets!) The text should be enclosed in the appropriate
185+
comment syntax for the file format. We also recommend that a
186+
file or class name and description of purpose be included on the
187+
same "printed page" as the copyright notice for easier
188+
identification within third-party archives.
189+
190+
Copyright [yyyy] [name of copyright owner]
191+
192+
Licensed under the Apache License, Version 2.0 (the "License");
193+
you may not use this file except in compliance with the License.
194+
You may obtain a copy of the License at
195+
196+
http://www.apache.org/licenses/LICENSE-2.0
197+
198+
Unless required by applicable law or agreed to in writing, software
199+
distributed under the License is distributed on an "AS IS" BASIS,
200+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201+
See the License for the specific language governing permissions and
202+
limitations under the License.

README.md

+88-4
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,96 @@
1-
## OpenSearch Machine Learning
1+
<img src="https://opensearch.org/assets/brand/SVG/Logo/opensearch_logo_default.svg" height="64px"/>
22

3-
Machine Learning Framework for OpenSearch is a new solution that make it easy to develop new machine learning feature. It allows engineers to leverage existing opensource machine learning algorithms and reduce the efforts to build any new machine learning feature. It also removes the necessity from engineers to manage the machine learning tasks which will help to speed the feature developing process.
3+
<!-- TOC -->
4+
5+
- [OpenSearch Machine Learning Commons](#opensearch-machine-learning-commons)
6+
- [Contributing](#contributing)
7+
- [Code of Conduct](#code-of-conduct)
8+
- [Security](#security)
9+
- [License](#license)
10+
- [Copyright](#copyright)
11+
12+
<!-- /TOC -->
13+
14+
## OpenSearch Machine Learning Commons
15+
16+
Machine Learning Commons for OpenSearch is a new solution that make it easy to develop new machine learning feature. It allows engineers to leverage existing opensource machine learning algorithms and reduce the efforts to build any new machine learning feature. It also removes the necessity from engineers to manage the machine learning tasks which will help to speed the feature developing process.
17+
18+
### Problem Statement
19+
20+
Until today, the challenge is significant to build a new machine learning feature inside OpenSearch. The reasons include:
21+
22+
* **Disruption to OpenSearch Core features**. Machine learning is very computationally intensive. But currently there is no way to add dedicated computation resources in OpenSearch for machine learning jobs, hence these jobs have to share same resources with Core features, such as: indexing and searching. That might cause the latency increasing on search request, and cause circuit breaker exception on memory usage. To address this, we have to carefully distribute models and limit the data size to run the AD job. When more and more ML features are added into OpenSearch, it will become much harder to manage.
23+
* **Lack of support for machine learning algorithms.** Customers need more algorighms within Opensearch, otherwise the data need be exported to outside of elasticsearch, such as s3 first to do the job, which will bring extra cost and latency.
24+
* **Lack of resource management mechanism between multiple machine learning jobs.** It's hard to coordinate the resources between multi features.
25+
26+
27+
In the meanwhile, we observe more and more machine learning features required to be supported in OpenSearch to power end users’ business needs. For instance:
28+
29+
* **Forecasting**: Forecasting is very popular in time series data analysis. Although the past data isn’t always an indicator for the future, it’s still very powerful tool used in some use cases, such as capacity planning to scale up/down the service hosts in IT operation.
30+
* **Root Cause Analysis in DevOps**: Today some customers use OpenSearch for IT operations. It becomes more and more complicated to identify the root cause of an outage or incident since it needs to gather all the information in the ecosystem, such as log, traces, metrics. Machine learning technique is a great fit to address this issue by building topology models of the system automatically, and understanding the similarity and casual relations between events, etc.
31+
* **Machine Learning in SIEM**: SIEM(Security Information and Event Management) is another domain in OpenSearch. Machine learning is also very useful in SIEM to help facilitate security analytics, and it can reduce the effort on sophisticated tasks, enable real time threat analysis and uncover anomalies.
32+
33+
### Solution
34+
The solution is to introduce a new Machine Learning library inside the OpenSearch cluster. The major functionalities in this solution include:
35+
36+
* **Unified Client Interfaces:** clients can use common interfaces for training and inference tasks, and then follow the algorithm interface to give right input parameters, such as input data, hyperparameters. A client library will be built for easy use.
37+
* **ML Plugin:** ML plugin will help to initiate the ML nodes, and choose the right nodes and allocate the resources for each request, and manage machine learning tasks with monitoring and failure handing supports, and store the model results; it will be the bridge for the communication between OpenSearch process and ML engine.
38+
* **ML Engine**: This engine will be the host for ML algorithms. Java based machine learning algorithms will be supported in the first release.
39+
40+
This solution makes it easy to develop new machine learning features. It allows engineers to leverage existing open-source machine learning algorithms, and reduce the efforts to build any new machine learning feature. It also removes the necessity from engineers to manage the machine learning tasks which will help to speed up the feature developing process.
41+
42+
### How to use it for new feature development
43+
44+
As mentioned above, new interfaces including both prediction and training will be provided to customers through Rest APIs, and to other plugins through transport action. Here are the transport action for prediction and training interfaces.
45+
46+
* Predict Transport Action for prediction job request
47+
```
48+
Request: {
49+
"algorithm": "ARIMA", //the name of algorithm
50+
"parameters": {"forecasts_en":10, "seasonal"=true}, // parameters of the algorithm, can be null or empty
51+
"modelId":123, //the id for trainded model.
52+
"inputData": [[1.0, 2, 3.1, true, "v1"],[1.1, 4, 5.2, false, "v2"]] // internal data frame interface
53+
}
54+
55+
Response: {
56+
"taskId": "123", //the id of the job request
57+
"status": "SUCCESS", // the job execution status
58+
"predictionResult": [[6.0],[7.0]] // internal data frame interface
59+
}
60+
```
61+
* Training Transport Action to start training job request - Async Interface
62+
```
63+
Request: {
64+
"algorithm": "ARIMA", //the name of algorithm
65+
"parameters": {"forecasts_en":10, "seasonal"=true}, // parameters of the algorithm, can be null or empty
66+
"inputData": [[1.0, 2, 3.1, true, "v1"],[1.1, 4, 5.2, false, "v2"]] // internal data frame interface
67+
}
68+
69+
70+
Response: {
71+
"taskId": "123", //the id of the job request
72+
"status": "IN_PROGRESS" // the job execution status
73+
74+
}
75+
```
76+
77+
## Contributing
78+
79+
See [developer guide](DEVELOPER_GUIDE.md) and [how to contribute to this project](CONTRIBUTING.md).
80+
81+
## Code of Conduct
82+
83+
This project has adopted the [Amazon Open Source Code of Conduct](CODE_OF_CONDUCT.md). For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq), or contact [opensource-codeofconduct@amazon.com](mailto:opensource-codeofconduct@amazon.com) with any additional questions or comments.
484

585
## Security
686

7-
See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
87+
If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public GitHub issue.
888

989
## License
1090

11-
See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
91+
This project is licensed under the [Apache v2.0 License](LICENSE).
92+
93+
## Copyright
94+
95+
Copyright 2020-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.
1296

0 commit comments

Comments
 (0)