DPPred: An Effective Prediction Framework with Concise Discriminative Patterns

Publications

Please cite the following two papers if you are using our tools. Thanks!

Jingbo Shang, Meng Jiang, Wenzhu Tong, Jinfeng Xiao, Jian Peng, Jiawei Han. "DPPred: An Effective Prediction Framework with Concise Discriminative Patterns", accepted by IEEE Transactions on Knowledge and Data Engineering, Sept. 2017.
Jingbo Shang, Wenzhu Tong, Jian Peng, and Jiawei Han, "DPClass: An Effective but Concise Discriminative Patterns-Based Classification Framework", in Proc of 2016 SIAM Int. Conf. on Data Mining (SDM'16), Miami, FL, May 2016. [code] [slides]

Applications in Real-World Medical Dataset

Qian Cheng, Jingbo Shang, Joshua Juen, Jiawei Han and Bruce Schatz, "Mining Discriminative Patterns to Predict Health Status for Cardiopulmonary Patients", in Proc. of 2016 ACM Conf. on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB'16), Seattle, WA, Oct. 2016.

New Features

Comparing to DPClass, DPPred is now supporting two new features:

Regression & Multi-Class Classification.
Local Discriminative Pattern Discovery after Clustering.

Please find more details in this paper.

Requirements

This tool mainly requires

g++-4.8 or higher
matlab
python

The python libs that we are using

sklearn

We developed and tested it on Ubuntu 16.04.

The current executables require OpenMP, which does not come by default on OS X. To be able to run it on OS X, follow this stackoverflow post.

Simple Run

You could execute the code in the following way:

./run.sh <dataset_name> <task_type>

Example:

./run.sh adult classification
./run.sh bike regression

Local Patterns

If you are interested in local patterns, please use run_with_clustering.sh following the same format of run.sh.

Parameters

Overall, there are some parameters related to the pattern generation.

TOPK (default = 20) is the number of (global) discriminative patterns that you want to use in the prediction. For regression tasks or high-dimensional datasets, we recommend a larger value like 30.
MIN_SUP (default = 10) is the minimum number of training instances that should be contained in each leaf node in the random decision tree.
MAX_DEPTH (default = 6) is the maximum depth of the random decision tree, which is also the maximum length of patterns.
RANDOM_FEATURES (default = 4) is the number of random features will be tried for each node split in random decision trees.
RANDOM_POSITIONS (default = 8) is the number of random values will be tried for each selected feature during node split in random decision trees.
TREES (default = 100) is the number of trees.

If you are interested in local patterns, there are two more parameters:

CLUSTERS (default = 2) is the number of clusters you want to further investigate.
LOCAL_TOPK (default = 10) is the number of local discriminative patterns within each cluster.

We also provide K-Means as an alternative clustering method to the LDA.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
glmnet_matlab		glmnet_matlab
src		src
tools/lda/mallet-2.0.7		tools/lda/mallet-2.0.7
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
run.sh		run.sh
run_with_clustering.sh		run_with_clustering.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DPPred: An Effective Prediction Framework with Concise Discriminative Patterns

Publications

Applications in Real-World Medical Dataset

New Features

Requirements

Simple Run

Local Patterns

Parameters

About

Releases

Packages

Languages

License

shangjingbo1226/DPPred

Folders and files

Latest commit

History

Repository files navigation

DPPred: An Effective Prediction Framework with Concise Discriminative Patterns

Publications

Applications in Real-World Medical Dataset

New Features

Requirements

Simple Run

Local Patterns

Parameters

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages