https://events.kaspersky.com/hackathon/
https://github.com/aguschin, https://github.com/canorbal, https://github.com/ohld
check out our elective course on Data Mining in MIPT - https://github.com/vkantor/MIPT_Data_Mining_In_Action_2016
Multivariate time series classification ("normal" TS vs TS with anomalies) based on Tennessee Eastman Problem http://users.abo.fi/khaggblo/RS/McAvoy.pdf
Detailed task description can be found in README.pdf
Data can be downloaded from https://yadi.sk/d/LzWCsMmo3GvWrt
- Train LSTM to predict timeseries on 10 ticks ahead using "normal" TS as training data. lstm_baseline_nextstep.ipynb
- Use LSTM to predict all TS from Train and Test and calculate new features based on error amount statistics.
- Train Xgboost in xgboost_baseline.ipynb (producing xgb_best_4_knn.csv).
- Train ExtraTrees in extratrees_baseline-window-lstm.ipynb (producing et_window_250_lstm.csv)
- Train KNN in KNN_baseline.ipynb (producing knn_best.csv and final mixed submission knn_xgb_et_RANKS_FINAL_002.csv)
Basically, all three models share the same features - different statistics based upon different columns and their derivatives which belong to the same file and thus have the same label (either 1 for anomalies or 0 for "normal" TS). Extratrees also have "error features" provided by LSTM predictions.