|
1 |
| -## My Project |
| 1 | +# Random Cut Forest |
2 | 2 |
|
3 |
| -TODO: Fill this README out! |
| 3 | +This package is an implementation of the Random Cut Forest probabilistic data |
| 4 | +structure. Random Cut Forests (RCFs) were originally developed at Amazon to use in an |
| 5 | +anomaly detection algorithm. After anomaly detection was launched, new |
| 6 | +algorithms based on RCFs were developed for density estimation, imputation, |
| 7 | +and forecasting. The goal of this library is to be easy to use and to strike |
| 8 | +a balance between efficiency and extensibility. |
4 | 9 |
|
5 |
| -Be sure to: |
| 10 | +The public interface to this package is the RandomCutForest class, which |
| 11 | +defines methods for anomaly detection, anomaly detection with attribution, |
| 12 | +density estimation, imputation, and forecasting. |
6 | 13 |
|
7 |
| -* Change the title in this README |
8 |
| -* Edit your repository description on GitHub |
| 14 | +## Basic operations |
9 | 15 |
|
10 |
| -## License |
| 16 | +To create a RandomCutForest instance with all parameters set to defaults: |
11 | 17 |
|
12 |
| -This project is licensed under the Apache-2.0 License. |
| 18 | +```java |
| 19 | +int dimensions = 1; // The number of dimensions in the input data, required |
| 20 | +RandomCutForest forest = RandomCutForest.defaultForest(dimensions); |
| 21 | +``` |
| 22 | + |
| 23 | +To explicitly set optional parameters like number of trees in the forest or |
| 24 | +sample size, RandomCutForest provides a builder: |
| 25 | + |
| 26 | +```java |
| 27 | +RandomCutForest forest = RandomCutForest.builder() |
| 28 | + .numberOfTrees(90) |
| 29 | + .sampleSize(200) |
| 30 | + .dimensions(2) // still required! |
| 31 | + .lambda(0.2) |
| 32 | + .randomSeed(123) |
| 33 | + .storeSequenceIndexesEnabled(true) |
| 34 | + .centerOfMassEnabled(true) |
| 35 | + .build(); |
| 36 | +``` |
| 37 | + |
| 38 | +Typical usage of a forest is to compute a statistic on an input data point and |
| 39 | +then update the forest with that point in a loop. |
| 40 | + |
| 41 | +```java |
| 42 | +Supplier<double[]> input = ...; |
| 43 | + |
| 44 | +while (true) { |
| 45 | + double[] point = input.get(); |
| 46 | + double score = forest.getAnomalyScore(point); |
| 47 | + forest.update(point); |
| 48 | + System.out.println("Anomaly Score: " + score); |
| 49 | +} |
| 50 | +``` |
| 51 | + |
| 52 | +## Command-line usage |
| 53 | + |
| 54 | +For each algorithm included in this package there is CLI application that can |
| 55 | +be used for experiments. These applications use `String::split` to read |
| 56 | +delimited data, and as such are **not intended for production use**. Instead, |
| 57 | +use these applications as example code and as a way to learn about the |
| 58 | +algorithms and their hyperparameters. |
| 59 | + |
| 60 | +You can build a local archive by running the Maven package command. Use the "excludedGroups" flag to disable the |
| 61 | +long-running "functional" tests, which take about 10 minutes to complete. |
| 62 | + |
| 63 | +```text |
| 64 | +% mvn package -DexcludedGroups=functional |
| 65 | +``` |
| 66 | + |
| 67 | +You can then invoke an example CLI application by adding the superjar to your classpath. For example: |
| 68 | + |
| 69 | +```text |
| 70 | +% java -cp target/random-cut-forest-1.0.jar com.amazon.randomcutforest.runner.AnomalyScoreRunner --help |
| 71 | +Usage: java -cp RandomCutForest-1.0-super.jar com.amazon.randomcutforest.runner.AnomalyScoreRunner [options] < input_file > output_file |
| 72 | +
|
| 73 | +Compute scalar anomaly scores from the input rows and append them to the output rows. |
| 74 | +
|
| 75 | +Options: |
| 76 | + --delimiter, -d: The character or string used as a field delimiter. (default: ,) |
| 77 | + --header-row: Set to 'true' if the data contains a header row. (default: false) |
| 78 | + --number-of-trees, -n: Number of trees to use in the forest. (default: 100) |
| 79 | + --random-seed: Random seed to use in the Random Cut Forest (default: 42) |
| 80 | + --sample-size, -s: Number of points to keep in sample for each tree. (default: 256) |
| 81 | + --shingle-cyclic, -c: Set to 'true' to use cyclic shingles instead of linear shingles. (default: false) |
| 82 | + --shingle-size, -g: Shingle size to use. (default: 1) |
| 83 | + --window-size, -w: Window size of the sample or 0 for no window. (default: 0) |
| 84 | +
|
| 85 | + --help, -h: Print this help message and exit. |
| 86 | +``` |
13 | 87 |
|
0 commit comments