Skip to content

Commit

Permalink
added peer reviewed research paper to readme
Browse files Browse the repository at this point in the history
  • Loading branch information
r-dube committed Jan 24, 2024
1 parent 68c51b3 commit ea1c48e
Showing 1 changed file with 10 additions and 1 deletion.
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,17 @@
Documentation on the IDS analysis project is at https://r-dube.github.io/CICIDS/

### Technical Report
There is also a technical report at http://dx.doi.org/10.13140/RG.2.2.25435.64809
There is a technical report on the project at http://dx.doi.org/10.13140/RG.2.2.25435.64809

**Title:** (Mis)use of the CICIDS 2017 Dataset in Information Security Research

**Abstract:** The summarized traffic flow version of the CICIDS 2017 dataset created at the University of New Brunswick is popular in the information security data science research community. Typically, researchers use the summarized data to develop supervised machine learning models and test the classification performance of these models. In this paper, we explore the adequacy of the summarized data for high-performance classification. We show that machine learning models developed over summarized data are unlikely to have practical import. Finally, we postulate that researchers may have a higher probability of creating a useful system if they use raw (non-summarized) data.

### Peer-reviewed Research Paper
**URL:** https://www.researchgate.net/publication/376891771

**Title:** Faulty use of the CIC-IDS 2017 dataset in information security research

**Abstract:** The summarized traffic flow version of the Canadian Institute for Cybersecurity Intrusion Detection Evaluation dataset created at the University of New Brunswick in 2017 is popular in the information security data science research community. Typically, researchers use the summarized data to develop supervised machine learning models and test the classification performance of these models. In this paper, we explore the adequacy of the summarized data for high-performance classification. We show that machine learning models developed over summarized data are unlikely to have practical import. Finally, we postulate that researchers may have a higher probability of creating a useful system if they use raw (non-summarized) data.

**Keywords:** Machine learning, Classification, Network security, Intrusion detection system, Network traffic analysis

0 comments on commit ea1c48e

Please sign in to comment.