Skip to content

Commit

Permalink
Minor changes to the readme and comments.
Browse files Browse the repository at this point in the history
  • Loading branch information
federicozaiter committed Feb 13, 2020
1 parent 9d34e8a commit b556cd9
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 3 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ optional arguments:
trained features_dir input/output directory path
(default: None)
--logs_type logs_type
Input type of logs. (default: ['original'])
Input type of logs. (default: ['open_Apache'])
--kfold kfold kfold crossvalidation (default: None)
--healthy_label healthy_label
the labels of unlabeled logs (default: ['unlabeled'])
Expand Down Expand Up @@ -313,7 +313,7 @@ High level overview of each of the experiments included in the repository.
It would compare PULearning+RandomForest with any other given anomaly detection algorithm. Using the given data, it would start with having only healthy logs on the unlabeled data and gradually increase this up to 10%. To test PULearning, run the following command in the home directory of this project:

```
python -m LogClass.test_pu --logs_type "bgl_old" --raw_logs "./Data/RAS from Weibin/RAS_raw_label.dat" --binary_classifier regular --ratio 8 --step 1 --top_percentage 11 --kfold 3
python -m LogClass.test_pu --logs_type "bgl" --raw_logs "./Data/RAS from Weibin/RAS_raw_label.dat" --binary_classifier regular --ratio 8 --step 1 --top_percentage 11 --kfold 3
```

This would first preprocess the logs. Then, for each kfold iteration, it will perform feature extraction and force a 1:8 ratio of anomalous:healthy logs. Finally with a step of 1% it will go from 0% to 10% anomalous logs in the unlabeled set and compare the accuracy of both anomaly detection algorithms. If none specified it will default to a plain RF.
Expand Down
1 change: 0 additions & 1 deletion preprocess/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ def remove_parameters(msg):
msg = re.sub(re_sub_5, "", msg)
msg = re.sub(re_sub_6, " ", msg)
L = msg.split()
# p = re.compile("[^(A-Za-z)]")
# Filtering strings that have non-letter tokens
new_msg = [k for k in L if not p.search(k)]
msg = " ".join(new_msg)
Expand Down

0 comments on commit b556cd9

Please sign in to comment.