Skip to content

Commit 754fe91

Browse files
committed
Initial setup of RandomCutForest
1 parent 765076d commit 754fe91

File tree

89 files changed

+14660
-7
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+14660
-7
lines changed

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
build
2+
*.iml

LICENSE

+27
Original file line numberDiff line numberDiff line change
@@ -173,3 +173,30 @@
173173
defend, and hold each Contributor harmless for any liability
174174
incurred by, or claims asserted against, such Contributor by reason
175175
of your accepting any such warranty or additional liability.
176+
177+
END OF TERMS AND CONDITIONS
178+
179+
APPENDIX: How to apply the Apache License to your work.
180+
181+
To apply the Apache License to your work, attach the following
182+
boilerplate notice, with the fields enclosed by brackets "[]"
183+
replaced with your own identifying information. (Don't include
184+
the brackets!) The text should be enclosed in the appropriate
185+
comment syntax for the file format. We also recommend that a
186+
file or class name and description of purpose be included on the
187+
same "printed page" as the copyright notice for easier
188+
identification within third-party archives.
189+
190+
Copyright [yyyy] [name of copyright owner]
191+
192+
Licensed under the Apache License, Version 2.0 (the "License");
193+
you may not use this file except in compliance with the License.
194+
You may obtain a copy of the License at
195+
196+
http://www.apache.org/licenses/LICENSE-2.0
197+
198+
Unless required by applicable law or agreed to in writing, software
199+
distributed under the License is distributed on an "AS IS" BASIS,
200+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201+
See the License for the specific language governing permissions and
202+
limitations under the License.

NOTICE

+1
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
1+
RandomCutForest
12
Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.

README.md

+81-7
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,87 @@
1-
## My Project
1+
# Random Cut Forest
22

3-
TODO: Fill this README out!
3+
This package is an implementation of the Random Cut Forest probabilistic data
4+
structure. Random Cut Forests (RCFs) were originally developed at Amazon to use in an
5+
anomaly detection algorithm. After anomaly detection was launched, new
6+
algorithms based on RCFs were developed for density estimation, imputation,
7+
and forecasting. The goal of this library is to be easy to use and to strike
8+
a balance between efficiency and extensibility.
49

5-
Be sure to:
10+
The public interface to this package is the RandomCutForest class, which
11+
defines methods for anomaly detection, anomaly detection with attribution,
12+
density estimation, imputation, and forecasting.
613

7-
* Change the title in this README
8-
* Edit your repository description on GitHub
14+
## Basic operations
915

10-
## License
16+
To create a RandomCutForest instance with all parameters set to defaults:
1117

12-
This project is licensed under the Apache-2.0 License.
18+
```java
19+
int dimensions = 1; // The number of dimensions in the input data, required
20+
RandomCutForest forest = RandomCutForest.defaultForest(dimensions);
21+
```
22+
23+
To explicitly set optional parameters like number of trees in the forest or
24+
sample size, RandomCutForest provides a builder:
25+
26+
```java
27+
RandomCutForest forest = RandomCutForest.builder()
28+
.numberOfTrees(90)
29+
.sampleSize(200)
30+
.dimensions(2) // still required!
31+
.lambda(0.2)
32+
.randomSeed(123)
33+
.storeSequenceIndexesEnabled(true)
34+
.centerOfMassEnabled(true)
35+
.build();
36+
```
37+
38+
Typical usage of a forest is to compute a statistic on an input data point and
39+
then update the forest with that point in a loop.
40+
41+
```java
42+
Supplier<double[]> input = ...;
43+
44+
while (true) {
45+
double[] point = input.get();
46+
double score = forest.getAnomalyScore(point);
47+
forest.update(point);
48+
System.out.println("Anomaly Score: " + score);
49+
}
50+
```
51+
52+
## Command-line usage
53+
54+
For each algorithm included in this package there is CLI application that can
55+
be used for experiments. These applications use `String::split` to read
56+
delimited data, and as such are **not intended for production use**. Instead,
57+
use these applications as example code and as a way to learn about the
58+
algorithms and their hyperparameters.
59+
60+
You can build a local archive by running the Maven package command. Use the "excludedGroups" flag to disable the
61+
long-running "functional" tests, which take about 10 minutes to complete.
62+
63+
```text
64+
% mvn package -DexcludedGroups=functional
65+
```
66+
67+
You can then invoke an example CLI application by adding the superjar to your classpath. For example:
68+
69+
```text
70+
% java -cp target/random-cut-forest-1.0.jar com.amazon.randomcutforest.runner.AnomalyScoreRunner --help
71+
Usage: java -cp RandomCutForest-1.0-super.jar com.amazon.randomcutforest.runner.AnomalyScoreRunner [options] < input_file > output_file
72+
73+
Compute scalar anomaly scores from the input rows and append them to the output rows.
74+
75+
Options:
76+
--delimiter, -d: The character or string used as a field delimiter. (default: ,)
77+
--header-row: Set to 'true' if the data contains a header row. (default: false)
78+
--number-of-trees, -n: Number of trees to use in the forest. (default: 100)
79+
--random-seed: Random seed to use in the Random Cut Forest (default: 42)
80+
--sample-size, -s: Number of points to keep in sample for each tree. (default: 256)
81+
--shingle-cyclic, -c: Set to 'true' to use cyclic shingles instead of linear shingles. (default: false)
82+
--shingle-size, -g: Shingle size to use. (default: 1)
83+
--window-size, -w: Window size of the sample or 0 for no window. (default: 0)
84+
85+
--help, -h: Print this help message and exit.
86+
```
1387

0 commit comments

Comments
 (0)