Minor changes to the readme and comments.

RayMusk · Feb 13, 2020 · b556cd9 · b556cd9
1 parent 9d34e8a
commit b556cd9
Show file tree

Hide file tree

Showing 2 changed files with 2 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -84,7 +84,7 @@ optional arguments:
                         trained features_dir input/output directory path
                         (default: None)
   --logs_type logs_type
-                        Input type of logs. (default: ['original'])
+                        Input type of logs. (default: ['open_Apache'])
   --kfold kfold         kfold crossvalidation (default: None)
   --healthy_label healthy_label
                         the labels of unlabeled logs (default: ['unlabeled'])
@@ -313,7 +313,7 @@ High level overview of each of the experiments included in the repository.
 It would compare PULearning+RandomForest with any other given anomaly detection algorithm. Using the given data, it would start with having only healthy logs on the unlabeled data and gradually increase this up to 10%. To test PULearning, run the following command in the home directory of this project: 
 
 ```
-python -m LogClass.test_pu --logs_type "bgl_old" --raw_logs "./Data/RAS from Weibin/RAS_raw_label.dat" --binary_classifier regular --ratio 8 --step 1 --top_percentage 11 --kfold 3
+python -m LogClass.test_pu --logs_type "bgl" --raw_logs "./Data/RAS from Weibin/RAS_raw_label.dat" --binary_classifier regular --ratio 8 --step 1 --top_percentage 11 --kfold 3
 ```
 
 This would first preprocess the logs. Then, for each kfold iteration, it will perform feature extraction and force a 1:8 ratio of anomalous:healthy logs. Finally with a step of 1% it will go from 0% to 10% anomalous logs in the unlabeled set and compare the accuracy of both anomaly detection algorithms. If none specified it will default to a plain RF. 

diff --git a/preprocess/utils.py b/preprocess/utils.py
@@ -21,7 +21,6 @@ def remove_parameters(msg):
     msg = re.sub(re_sub_5, "", msg)
     msg = re.sub(re_sub_6, " ", msg)
     L = msg.split()
-    # p = re.compile("[^(A-Za-z)]")
     # Filtering strings that have non-letter tokens
     new_msg = [k for k in L if not p.search(k)]
     msg = " ".join(new_msg)