GAMA: A Multi-graph-based Anomaly Detection Framework for Business Processes via Graph Neural Networks
This is the source code of our paper 'GAMA: A Multi-graph-based Anomaly Detection Framework for Business Processes via Graph Neural Networks'.
python main.py --mode eval --TF FAP
Two modes have been implemented:
- eval: Utilizing the anomalous event logs located in the eventlogs folder to obtain evaluation results (For reproducibility of the experiments).
- test: Detecting anomalies in the event log with the 'xes' format and obtaining anomaly detection results (For practical application).
Three teacher forcing (TF) styles have been implemented:
- AN: We consider that the current attribute value depends mainly on the current activity name. Therefore, at current event , the ground truth activity name is used to guide the prediction of the probability distribution.
- PAV: We consider that the current attribute value depends mainly on the previous attribute value. Therefore, the previous ground truth attribute value is used to guide the prediction of the probability distribution.
- FAP: We consider that the current attribute value depends both on the current activity name and the previous attribute value. Therefore, the fusion of current ground truth activity name and the previous ground truth attribute value is used to guide the prediction of the probability distribution.
Six commonly used real-life logs:
i) Billing: This log contains events that pertain to the billing of medical services provided by a hospital.
ii) Receipt: This log contains records of the receiving phase of the building permit application process in an anonymous municipality.
iii) Sepsis: This log contains events of sepsis cases from a hospital.
iv) RTFMP: Real-life event log of an information system managing road traffic fines.
v) Permit: it contains events related to travel permits (including all related events of relevant prepaid travel cost declarations and travel declarations).
vi) Declaration: it contains events related to international travel declarations.
Eight synthetic logs: i.e., Paper, P2P, Small, Medium, Large, Huge, Gigantic, and Wide.
The summary of statistics for each event log is presented below:
Log | #Activities | #Traces | #Events | Max trace length | Min trace length | #Attributes | #Attribute values |
---|---|---|---|---|---|---|---|
Gigantic | 76-78 | 5000 | 28243-31989 | 11 | 3 | 1-4 | 70-363 |
Huge | 54 | 5000 | 36377-42999 | 11 | 5 | 1-4 | 69-340 |
Large | 42 | 5000 | 51099-56850 | 12 | 10 | 1-4 | 68-292 |
Medium | 32 | 5000 | 28416-31372 | 8 | 3 | 1-4 | 66-276 |
P2p | 13 | 5000 | 37941-42634 | 11 | 7 | 1-4 | 39-146 |
Paper | 14 | 5000 | 49839-54390 | 12 | 9 | 1-4 | 36-128 |
Small | 20 | 5000 | 42845-46060 | 10 | 7 | 1-4 | 39-144 |
Wide | 23-34 | 5000 | 29128-31228 | 7 | 5-6 | 1-4 | 53-264 |
Billing | 18 | 100000 | 451359 | 217 | 1 | 0 | 0 |
Receipt | 27 | 1434 | 8577 | 25 | 1 | 2 | 58 |
Sepsis | 16 | 1050 | 15214 | 185 | 3 | 1 | 26 |
RTFMP | 11 | 150370 | 561470 | 20 | 2 | 0 | 0 |
Permit | 51 | 7065 | 86581 | 90 | 3 | 2 | 10 |
Declaration | 34 | 6449 | 72151 | 27 | 3 | 2 | 10 |
Logs containing artificial anomalies ranging from 5% to 45% are stored in the folder 'eventlogs'. The file names are formatted as log_name-anomaly_ratio-ID.
Critical difference diagram over trace-level anomaly detection: Critical difference diagram over attribute-level anomaly detection:
F−scores over synthetic logs where 'T' and 'A' represent trace- and attribute-level anomaly detection respectively.
Paper | Paper | P2P | P2P | Small | Small | Medium | Medium | Large | Large | Huge | Huge | Gigantic | Gigantic | Wide | Wide | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
T | A | T | A | T | A | T | A | T | A | T | A | T | A | T | A | |
OC-SVM | 0.498 | - | 0.480 | - | 0.522 | - | 0.446 | - | 0.480 | - | 0.446 | - | 0.462 | - | 0.460 | - |
Naive | 0.866 | - | 0.850 | - | 0.898 | - | 0.691 | - | 0.715 | - | 0.690 | - | 0.574 | - | 0.779 | - |
Sampling | 0.901 | - | 0.886 | - | 0.896 | - | 0.860 | - | 0.910 | - | 0.890 | - | 0.800 | - | 0.888 | - |
GAE | 0.472 | - | 0.559 | - | 0.468 | - | 0.449 | - | 0.530 | - | 0.429 | - | 0.434 | - | 0.561 | - |
DAE | 0.799 | 0.468 | 0.767 | 0.475 | 0.829 | 0.463 | 0.713 | 0.436 | 0.747 | 0.433 | 0.691 | 0.415 | 0.580 | 0.288 | 0.753 | 0.455 |
VAE | 0.828 | 0.190 | 0.655 | 0.212 | 0.788 | 0.219 | 0.637 | 0.230 | 0.772 | 0.201 | 0.589 | 0.213 | 0.495 | 0.181 | 0.640 | 0.230 |
LAE | 0.678 | 0.243 | 0.666 | 0.266 | 0.748 | 0.239 | 0.584 | 0.270 | 0.571 | 0.250 | 0.531 | 0.268 | 0.504 | 0.234 | 0.699 | 0.271 |
BINet | 0.543 | 0.330 | 0.557 | 0.342 | 0.566 | 0.358 | 0.521 | 0.319 | 0.549 | 0.333 | 0.526 | 0.331 | 0.525 | 0.320 | 0.551 | 0.345 |
GAMA-AN | 0.949 | 0.701 | 0.950 | 0.686 | 0.955 | 0.717 | 0.873 | 0.716 | 0.945 | 0.768 | 0.916 | 0.763 | 0.821 | 0.701 | 0.921 | 0.724 |
GAMA-PAV | 0.976 | 0.675 | 0.974 | 0.664 | 0.981 | 0.663 | 0.903 | 0.654 | 0.944 | 0.678 | 0.909 | 0.663 | 0.809 | 0.614 | 0.950 | 0.670 |
GAMA-FAP | 0.955 | 0.699 | 0.949 | 0.683 | 0.955 | 0.708 | 0.872 | 0.700 | 0.947 | 0.752 | 0.922 | 0.750 | 0.833 | 0.691 | 0.923 | 0.712 |
F−scores over real-life logs where 'T' and 'A' represent trace- and attribute-level anomaly detection respectively.
Billing | Billing | Receipt | Receipt | Sepsis | Sepsis | RTFMP | RTFMP | Permit | Permit | Declaration | Declaration | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
T | A | T | A | T | A | T | A | T | A | T | A | |
OC-SVM | 0.340 | - | 0.464 | - | 0.415 | - | 0.507 | - | 0.405 | - | 0.449 | - |
Naive | 0.668 | - | 0.638 | - | 0.392 | - | 0.776 | - | 0.462 | - | 0.495 | - |
Sampling | 0.701 | - | 0.647 | - | 0.391 | - | 0.721 | - | 0.458 | - | 0.507 | - |
GAE | 0.385 | - | 0.420 | - | 0.391 | - | 0.341 | - | 0.386 | - | 0.406 | - |
DAE | 0.754 | 0.444 | 0.650 | 0.158 | 0.461 | 0.136 | 0.865 | 0.498 | 0.522 | 0.182 | 0.576 | 0.201 |
VAE | 0.731 | 0.435 | 0.524 | 0.134 | 0.448 | 0.172 | 0.813 | 0.517 | 0.484 | 0.188 | 0.476 | 0.180 |
LAE | 0.784 | 0.509 | 0.526 | 0.218 | 0.408 | 0.126 | 0.874 | 0.505 | 0.486 | 0.287 | 0.514 | 0.345 |
BINet | 0.621 | 0.442 | 0.575 | 0.416 | 0.435 | 0.192 | 0.744 | 0.493 | 0.641 | 0.423 | 0.678 | 0.499 |
GAMA-AN | 0.792 | 0.545 | 0.763 | 0.548 | 0.570 | 0.457 | 0.899 | 0.534 | 0.682 | 0.428 | 0.727 | 0.461 |
GAMA-PAV | 0.791 | 0.544 | 0.778 | 0.538 | 0.510 | 0.376 | 0.914 | 0.574 | 0.634 | 0.369 | 0.669 | 0.406 |
GAMA-FAP | 0.811 | 0.548 | 0.752 | 0.535 | 0.573 | 0.450 | 0.914 | 0.576 | 0.679 | 0.402 | 0.718 | 0.444 |
@article{guan2024gama,
title={GAMA: A multi-graph-based anomaly detection framework for business processes via graph neural networks},
author={Guan, Wei and Cao, Jian and Gu, Yang and Qian, Shiyou},
journal={Information Systems},
volume={124},
pages={102405},
year={2024},
publisher={Elsevier}
}