- Run 6 methods on 6 datasets
- Add L-diversity, Classic Mondrian (no hierarchies), Datafly algorithm
- Make NCP loss a separated module
- Implement DM, CAVG metrics
- Implement classification models (basic classifier, clustering)
- Run experiment on 6 datasets x 6 methods x 2 ML models
- Finish report
- (Improvement) T-closeness method, Incognito Algorithm
- (Optional) Simple Deanonymize Attack
To anonymize dataset, run:
python anonymize.py --method=<model_type> --k=<k-anonymity> --dataset=<dataset_name>
- model_type: [mondrian | classic_mondrian | mondrian_ldiv | topdown | cluster | datafly]
- dataset_name: [adult | cahousing | cmc | mgm | informs | italia]
- Basic Mondrian, Top-Down Greedy, Cluster-based (https://github.com/fhstp/k-AnonML)
- L-Diversity (https://github.com/Nuclearstar/K-Anonymity, https://github.com/qiyuangong/Mondrian_L_Diversity)
- Classic Mondrian (https://github.com/qiyuangong/Mondrian)
- Datafly Algorithm (https://github.com/nazilkbahar/python-datafly)
- Normalized Certainty Penalty from Utility-Based Anonymization for Privacy Preservation with Less Information Loss
- Discernibility, Average Equivalent Class Size from A Systematic Comparison and Evaluation of k-Anonymization Algorithms for Practitioners
- Privacy in a Mobile-Social World
- Code and idea based on k-Anonymity in Practice: How Generalisation and Suppression Affect Machine Learning Classifiers