Skip to content

HPCSys-Lab/ML-cybersec-students-material

Repository files navigation

Module MCS 016 - Machine Learning for Cyber-Security & Artificial Intelligence

Part 1: Fundamental concepts

  • Introduction
    • Examples
  • Hands-on: Small problem for students
  • (Break)
  • Decision Trees: Concept and characteristics, Implementation, and Example of application
    • Information gain
    • Notion of supervised learning
    • Classification
    • Algorithms and implementation principles: J48, VFDT
  • Hands-on on cyber-security data
    • 1st ex: without normalization
    • Normalization process
    • Examples of use using Weka with/without normalization
    • Analysis of results
  • Bayesian networks: Concept and characteristics, Implementation, and Example of application
    • Probability and inference
    • Algorithms and implementation principles: Naive Bayes
    • Examples of application in cyber-security using Weka
  • Clustering: Concept and characteristics, implementation
    • Distance metrics
    • Notion of unsupervised learning
    • Algorithms and implementation principles: K-means, KNN
    • Examples of application in cyber-security using Weka
      • Binarization

Part 2 - Neural networks

  • Concept and characteristics, the perceptron model
  • Feed-Forward (FF) Networks, Multiple Feed-Forward Networks
  • Back-propagation and learning
  • Deep learning
  • Algorithms and implementation principles: Linear regression, MLP
  • Implementation, and Example of application in cyber-security using Weka

Part 3 - Stream mining

  • Concept and characteristics: data streams, single pass, concept drift
  • Implementation, and Example of application in cyber-security
  • Algorithms and implementation principles: Novelty detection
  • Implementation, and Example of application in cyber-security using MOA

Part 4: Data mining and machine learning for cyber-security

  • Phases of knowledge data discovery
  • Pre-processing: normalization, binarization
  • Data types for cybersecurity: packets, flows, log files
  • Public datasets for cybersecurity: CIC-IDS, CTU13, Kyoto, ICSX-botnet, ICSX-SlowDoS
  • Reduction of false positives, tradeoff between precision and recall

Part 5 - Hybrid learning

  • Ensembles
  • Bagging
  • Boosting

Part 6: Tools

  • An overview of tools
    • Tools for learning/testing techniques and algorithms (WEKA, MOA)
    • Deep learning frameworks (PyToch, TensorFlow, etc)
    • BigData (Hadoop, Spark, Flink)

Bibliography

  • Data Mining: Practical Machine Learning Tools and Techniques, 4th Edition. Ian H. Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal. Morgan Kauffman, 2017.
  • Machine Learning, Tom Mitchell. McGraw-Hill, 1997.

Scientific papers:

  • Buczak, Anna L., and Erhan Guven. "A survey of data mining and machine learning methods for cyber security intrusion detection." IEEE Communications surveys & tutorials 18.2 (2015): 1153-1176.
  • Viegas, Eduardo, et al. "Bigflow: Real-time and reliable anomaly-based intrusion detection for high-speed networks." Future Generation Computer Systems 93 (2019): 473-485.

Datasets

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published