Skip to content

Latest commit

 

History

History
48 lines (41 loc) · 2.62 KB

README.md

File metadata and controls

48 lines (41 loc) · 2.62 KB

feature_anomaly_detection

Purpose: to search and count anomalies in feature correlations for shops
Path on DS-instance: /home/ubuntu/DusFolder/anomaly_research/fea_conv/feature_anomaly_detection

how to run main code

change parameters in conv_config.py and run eshop_anomaly_count.py
OR
change parameters in kpi_config.py and run eshop_anomaly_kpi.py
OR
change parameters in kpi_config.py and run eshop_anomaly_kpi_basic.py for fixed weekly report

parameters

ACCOUNT_ID - shop id
SLICE - batch size in sessions (for eshop_anomaly_count only)
BEG_DATE - begin date of timeframe
END_DATE - end date of timeframe
LQ - left quantile aka bottom border for anomalies
RQ - right quantile aka top border for anomalies
AN_BORD - minimum border for anomaly count to select for tops_kpi (for eshop_anomaly_kpi only)
--to_sql - key to export result to db
--no_sql - key to not export result to db (default)

result tables

eshop_anomaly_count.py :

anomaly_matrix.csv

marked anomalies in feature correlations. matrix contains periods which have correlation values outside of selected quantile frames

batch_counts.csv (sql data.feature_anomaly_batch_counts)

aggregated anomaly count for each period of fixed size

eshop_anomaly_kpi.py :

anomaly_table.csv (sql data.eshop_anomaly_table)

for each KPI in ['bounce_rate', 'conversion_rate', 'med_duration'] anomalies were identified both in general and with division by channels

metric_lines.csv

attribute // threshold from below // threshold from above - quantiles on a given dataset

tops_kpi.csv (sql data.eshop_anomaly_tops_kpi)

for each Source the best combination of MCID for each of the 3 KPI is derived. main condition is for combination of MCID to have more than 10 sessions for each source. second condition is for each source to have 3+ alternative MCID

cache_log.txt

results from tops_kpi.csv for last day in text representation with initial parameters

eshop_anomaly_kpi_basic.py

same as eshop_anomaly_kpi.py but much faster
timeframe is set to be the previous week and an_bord = 0, the only specks to change are account_id and RQ, LQ.
for every source the top-5/bot-5 candidates of each KPI metric is shown if possible

tops_kpi.csv

same result as for eshop_anomaly_kpi.py but with said changes

cache_log.txt

very different format compared to result for eshop_anomaly_kpi.py