added peer reviewed research paper to readme

r-dube · Jan 24, 2024 · ea1c48e · ea1c48e
1 parent 68c51b3
commit ea1c48e
Showing 1 changed file with 10 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -2,8 +2,17 @@
 Documentation on the IDS analysis project is at https://r-dube.github.io/CICIDS/
 
 ### Technical Report
-There is also a technical report at http://dx.doi.org/10.13140/RG.2.2.25435.64809
+There is a technical report on the project at http://dx.doi.org/10.13140/RG.2.2.25435.64809
 
 **Title:** (Mis)use of the CICIDS 2017 Dataset in Information Security Research
 
 **Abstract:** The summarized traffic flow version of the CICIDS 2017 dataset created at the University of New Brunswick is popular in the information security data science research community. Typically, researchers use the summarized data to develop supervised machine learning models and test the classification performance of these models. In this paper, we explore the adequacy of the summarized data for high-performance classification. We show that machine learning models developed over summarized data are unlikely to have practical import. Finally, we postulate that researchers may have a higher probability of creating a useful system if they use raw (non-summarized) data.
+
+### Peer-reviewed Research Paper
+**URL:** https://www.researchgate.net/publication/376891771
+
+**Title:**  Faulty use of the CIC-IDS 2017 dataset in information security research
+
+**Abstract:** The summarized traffic flow version of the Canadian Institute for Cybersecurity Intrusion Detection Evaluation dataset created at the University of New Brunswick in 2017 is popular in the information security data science research community. Typically, researchers use the summarized data to develop supervised machine learning models and test the classification performance of these models. In this paper, we explore the adequacy of the summarized data for high-performance classification. We show that machine learning models developed over summarized data are unlikely to have practical import. Finally, we postulate that researchers may have a higher probability of creating a useful system if they use raw (non-summarized) data.
+
+**Keywords:** Machine learning, Classification, Network security, Intrusion detection system, Network traffic analysis