Skip to content

shangjingbo1226/DPPred

Repository files navigation

DPPred: An Effective Prediction Framework with Concise Discriminative Patterns

Publications

Please cite the following two papers if you are using our tools. Thanks!

Applications in Real-World Medical Dataset

New Features

Comparing to DPClass, DPPred is now supporting two new features:

  • Regression & Multi-Class Classification.
  • Local Discriminative Pattern Discovery after Clustering.

Please find more details in this paper.

Requirements

This tool mainly requires

g++-4.8 or higher
matlab
python

The python libs that we are using

sklearn

We developed and tested it on Ubuntu 16.04.

The current executables require OpenMP, which does not come by default on OS X. To be able to run it on OS X, follow this stackoverflow post.

Simple Run

You could execute the code in the following way:

./run.sh <dataset_name> <task_type>

Example:

./run.sh adult classification
./run.sh bike regression

Local Patterns

If you are interested in local patterns, please use run_with_clustering.sh following the same format of run.sh.

Parameters

Overall, there are some parameters related to the pattern generation.

  • TOPK (default = 20) is the number of (global) discriminative patterns that you want to use in the prediction. For regression tasks or high-dimensional datasets, we recommend a larger value like 30.
  • MIN_SUP (default = 10) is the minimum number of training instances that should be contained in each leaf node in the random decision tree.
  • MAX_DEPTH (default = 6) is the maximum depth of the random decision tree, which is also the maximum length of patterns.
  • RANDOM_FEATURES (default = 4) is the number of random features will be tried for each node split in random decision trees.
  • RANDOM_POSITIONS (default = 8) is the number of random values will be tried for each selected feature during node split in random decision trees.
  • TREES (default = 100) is the number of trees.

If you are interested in local patterns, there are two more parameters:

  • CLUSTERS (default = 2) is the number of clusters you want to further investigate.
  • LOCAL_TOPK (default = 10) is the number of local discriminative patterns within each cluster.

We also provide K-Means as an alternative clustering method to the LDA.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published