This is a heavily modified version of the Python Exceptional Model Mining implemenation emm by MathynS.
Either run .\setup.ps1
when pyenv-windows is installed on your machine with Python 3.12.0 set as local or global pyenv, or install the necessary packages (see /requirements.txt
) in your Python 3.12.0 installation.
A good example on how to use A-EPM is the executable file /main.py
.
Attribute | Type | Default | Notes | Options | Description |
---|---|---|---|---|---|
algorithm | str | 'apriori' | - | ('apriori', 'best_first') | Decide what algorithm to use to construct the subgroups |
width | int | None | When algorithm 'depth_first' is used: this, evaluation_threshold or both required. | - | Width parameter of the search: amount of subgroups to keep before moving on to the next depth step |
depth | int | - | Required | - | Depth parameter of the search: amount of iterations in the process, subgroups are described by at most depth attributes |
evaluation_metric | str or callable | - | Required | ('rw_norm', 'rw_norm_mode', 'rw_cov', 'lw_norm', 'pw_max') | Function to evaluate the subgroups with. You can choose one of the from our paper as a string or create your own evaluation function |
evaluation_threshold | float | None | Should be a positive float, except when using 'rc_cov' metric. When algorithm 'depth_first' is used: this, width or both required. | - | Quality metric threshold used to prune subgroups after each depth step. |
frequency_threshold | float | None | Required when algorithm 'apriori' is used. | - | Frequency threshold used to prune subgroups after each depth step for the 'apriori' algorithm. Example: a frequency threshold of 0.2 means that the subgroup should cover at least 20% of the dataset |
n_bins | int | 8 | Each depth step new bins are created | - | For int or float columns of the dataset not all options are used to create subgroups. Values are divided into bins for which the amount of bins can be specified |
bin_strategy | str | 'equidepth' | - | ('equidepth', 'equiwidth') | Method to create bins for int and float columns |
bin_subgroups | str | 'both' | - | ('both', 'per_bin', 'per_split') | When creating subgroups of bins, decide whether to make a subgroup on a split (e.g. x <= 5), a bin (e.g. 3 < x <= 5) or both. |
candidate_size | int | width^2 | - | - | Amount of subgroups to keep in memory each depth step while using the 'best_first' algorithm |
log_level | int | 50 | - | - | Choose the logging log level. When using a log_level of 0, the found subgroups will be shown in the console |
This method only requires a single df
argument with a dataset containing a ranking
column.
Attribute | Type | Default | Description |
---|---|---|---|
descriptive_cols | str or list | All columns except ranking column |
Single column or list of columns that can be used to create subgroups with |
This method has a single optional subgroups_amount
argument expecting an int
. When this method is called (after calling load_data()
and search()
), this will visualise the minimum of (subgroups_amount
, #subgroups) best subgroups.
subgroups_amount
, all subgroups will be visualised.