From ea1c48e9a2741ed4951bb0c397966a0433033334 Mon Sep 17 00:00:00 2001 From: r-dube Date: Tue, 23 Jan 2024 19:40:11 -0800 Subject: [PATCH] added peer reviewed research paper to readme --- README.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index ff9314c..7f34fb2 100644 --- a/README.md +++ b/README.md @@ -2,8 +2,17 @@ Documentation on the IDS analysis project is at https://r-dube.github.io/CICIDS/ ### Technical Report -There is also a technical report at http://dx.doi.org/10.13140/RG.2.2.25435.64809 +There is a technical report on the project at http://dx.doi.org/10.13140/RG.2.2.25435.64809 **Title:** (Mis)use of the CICIDS 2017 Dataset in Information Security Research **Abstract:** The summarized traffic flow version of the CICIDS 2017 dataset created at the University of New Brunswick is popular in the information security data science research community. Typically, researchers use the summarized data to develop supervised machine learning models and test the classification performance of these models. In this paper, we explore the adequacy of the summarized data for high-performance classification. We show that machine learning models developed over summarized data are unlikely to have practical import. Finally, we postulate that researchers may have a higher probability of creating a useful system if they use raw (non-summarized) data. + +### Peer-reviewed Research Paper +**URL:** https://www.researchgate.net/publication/376891771 + +**Title:** Faulty use of the CIC-IDS 2017 dataset in information security research + +**Abstract:** The summarized traffic flow version of the Canadian Institute for Cybersecurity Intrusion Detection Evaluation dataset created at the University of New Brunswick in 2017 is popular in the information security data science research community. Typically, researchers use the summarized data to develop supervised machine learning models and test the classification performance of these models. In this paper, we explore the adequacy of the summarized data for high-performance classification. We show that machine learning models developed over summarized data are unlikely to have practical import. Finally, we postulate that researchers may have a higher probability of creating a useful system if they use raw (non-summarized) data. + +**Keywords:** Machine learning, Classification, Network security, Intrusion detection system, Network traffic analysis