A Practitioner's Guide to the Performance of Deep-Learning Based Open Set Recognition Algorithms for Network Intrusion Detection Systems
- Clone the repository using the following commands:
git clone https://github.com/bayegaspard/OpenSetPerf.git
- Make sure you do that from the
dev
branch. - Download the Payload-Byte NIDS Dataset
- Navigate to the root folder and place the downloaded CSV file in the
dataset
folder. New structure will bedataset\Payload_data_CICIDS2017.csv
for the CIC dataset anddataset\UNSW-NB15
. - If you don't have pip3 installed, you can use the command below to install one.
sudo apt-get install python3-pip
- Navigate to the
src
directory using the commandcd OpenSetPerf\src
directory. - It is recommended to perform this test in a virtual environment. This step is optional.
pip3 install virtualenv
virtualenv opensetperf
source opensetperf/bin/activate
-
Install required packages using the command below:
pip3 install -r requirements.txt
-
Navigate up one directory
cd ..
into the root directory for the Repo. -
You may need to create some folders in the Saves directory namely,
OpenSetPerf\Saves\conf
,OpenSetPerf\Saves\models
, andOpenSetPerf\Saves\roc
. -
Run the model using
python3 src\main\main.py
. -
Get hyperparameter information by using
python3 src\main\main.py -h
. -
Alternatively, you can run
chmod +x ./threeRuns.sh
and then.\threeRuns.sh
to run the model three times. -
Saves and model outputs will generate in the
Saves
folder. -
Edit the
src/main/Config.py
file to change the hyperparameters for the model. More information insrc/main/README.md
. You can also edit the hyperparameters using command line parameters, seepython3 src\main\main.py -h
for more details.
-
requirements.txt
- File containing the version numbers of the required external libraries
-
src
-
This is the folder that contains all of the code from this project
-
main
- Folder containing all of the central aspects for running the model.
main.py
- The control file for the entire model.Config.py
- This is the file that controls all of the hyperparameters of the model.Dataload.py
- This file gets the whole dataset and splits it up into chunks that the model can read.FileHandeling.py
- This file controls dealing with files.ModelStruct.py
- This file defines the model and its structure. But it does not implement the diffrent algorithms.EndLayer.py
- This file works with the folderCodeFromImplementations
to implement each of the diffrent algorithms.plots.py
- This file generates 4 png files of diffrent matplotlib graphs.GPU.py
- This file helps run the model on diffrent GPUs or move tensors from one GPU to the CPU.GenerateImages
- This file reads the save file and generates images inSaves/images/
to visualize the data.helperFunctions.py
- This file contains all other functions that are not contained in another file.
- Folder containing all of the central aspects for running the model.
-
CodeFromImplementations
- This is the code we used to implement the diffrent algorithms including:
- OpenMax
- Energy OOD
- Competitive Overcomlete Output Layer (COOL)
- Deep Open Classification (DOC)
- OpenNet (iiMod) Created from the equations in the related paper.
- This code is modified as minimally as possible.
- All of the implementations list a link at the top of where they were sourced from.
- This is the code we used to implement the diffrent algorithms including:
-
-
Saves
- This is the output file that will save all of metrics from the model.
- It has many diffrent types of files such as:
- Data/DataTest - This saves the specific dataloaders from the last time the model was run including the train/test split as to not contaiminate the model if it is run again.
- Scoresall.csv - This file saves the model state and all standard metrics from the model.
- models - This file stores the most trained models for each of the algorithms
- images - this is a folder that stores graphical representations of the
Scoresall.csv
- ROC_data - these are files that store the reciever operator characteristic data. It can be used to inform about the settings of threshold. The first line is false positives, the second is true positives, and the third is thesholds to acheve those positives.
- The following files are still generated but are not used:
- fscore - this saves the Config parameters and the associated f-score that those parameters got to.
- history/history{Algorithm} - These save all of the output measures from each of the algorithms after each epoch.
- phase - Unused from a privious refactor, it used to be a save of where in the models training we last got to.
- scores/scores{Algorithm} - Unused from a privios refactor. It is now unknown what is being stored.
- hyperparam - saves the Config of the last time the file was run.
- unknown - saves which classes were unknown from the last time the file was run.
- batch - saves information about each batch that has been run. NOTE: this file can break if it is saved to too many times, you may need to delete it and allow it to regenerate.
-
Tests
- This is a folder of pytest tests to ensure the model works properly.
- test_endlayer.py - Tests that the endlayer outputs correctly given a valid input.
- test_fullRun.py - Runs thw whole model on minimial settings with No-save mode so that it does not mess up the save file.
- test_LongTests.py - Runs the code on the whole loop. Takes a long time so it is not tested. Does not work.
- test_loops.py - tests if the loop functions in main.py work.
- test_other.py - tests things not in the other catagoies.
-
datasets
- We place the NIDS dataset in this folder.
- Up to two more folders will automatically ganerate. If you get a warning that a file does not exist, try deleting the generated folders and allowing them to regenerate.
-
build_number.txt
- This is a number that is included in
Saves/Scoresall.csv
to inform about which version of the code was used to generate the outputs.
- This is a number that is included in
-
threeRuns.sh
- This is a simple and small shell script to run the model three times.
-The data located at Saves/Scoresall-ArchivedForPaper.csv
is the data generated for a paper. Not all of the data in the file was used for the paper as some of it is old. Look at the Version column, the values of version numbers grater than or equal to 422 were used with the src/main/Generateimages.py
file to create the data for the paper.