VariablesSelection

Variables Selection detecting important features from huge input variables.

This file VariableSelection.py provides a class "FeatureImportance" combining most vital feature selection models, which is convenient for users to call.

All the 14 methods we use are : LASSO, ElasticNet, SCAD, Knockoff, RandomForest, AdaBoost, GradientBoosting , ExtraTrees, LassoNet,GradientLearning, LassoNet , GroupLasso ,Layer-WiseRelevancePropagation and SHAP.

Among these algorithms, LASSO, ElasticNet, SCAD , GroupLasso are based on linear model ; RandomForest, AdaBoost, GradientBoosting , ExtraTrees are Tree ensemble models ; LassoNet and Layer-WiseRelevancePropagation combine the neural network and features seletion

Paper Links

Packages Version Need

knockpy==1.3.0
lassonet==0.0.14
numpy==1.24.4
group-lasso==1.5.0
matplotlib==3.7.2
torch==2.0.1
shap==0.42.1
statsmodels==0.13.5
captum==0.6.0

Method To Use

Regardless of the method used, first instantiate the 'FeatureImportance' class.

To use LASSO, ElasticNet, SCAD, RandomForest, AdaBoost, ExtraTrees, GroupLasso:

filter=FeatureImportance(x,y,test_ratio=0.2,threshold=0,wanted_num=2,task='regression',scarler=None,times=10)
coef, total=filter.GetCoefficient1(filter.ExtraTreesModel,max_depth=5,estimator_num=100)

To use GradientLearning, SHAP , Layer-WiseRelevancePropagation ,DeepLIFT, Knockoff:

filter=FeatureImportance(x,y,test_ratio=0.001,threshold=0,wanted_num=2,task='regression',scarler=None,times=10)
coef, total=filter.GetCoefficient2(filter_fun=filter.GradientLearningFilter,eps=0.25,l1_lamda=0.5,kernel_type="Gaussian")

To use LassoNet :

filter=FeatureImportance(x,y,test_ratio=0.2,threshold=0,wanted_num=2,task='regression',scarler=None,times=10)
coef, total=filter.LassoNetModel(hidden_dims=(64,),M=10,plot=True)

'coef' is the important score of each feature, and 'total' is the summaration time of the feature be choosen during all the experiments.

Additionnaly, we provide a C++ version for gradient learning algorithm.

This file is written by armadillo package , to use it ,please input the command in console :

 g++ gradientLearning.cpp -o gradientLearning -std=c++11 -O2 -larmadillo
./gradientLearning

Example

#create data
n=200
p=50
xita=0.25
w=np.random.normal(loc=1,scale=1,size=(n,p))
u=np.random.normal(loc=1,scale=1,size=(n,p))
x=(w+xita*u)/(1+xita)
y=((2*x[:,0]-1)*(2*x[:,1]-1)).reshape((-1,1))

#execute feature selection 
filter=FeatureImportance(x,y,test_ratio=0.2,threshold=0,wanted_num=2,task='regression',scarler='MinMaxScaler',times=20)
coef, total=filter.GetCoefficient2(filter.SHAP,hidden_num=(12,),plot=True)

In the function filter.GetCoefficient1 or filter.GetCoefficient2 , you need to pass a feature selection function in 'FeatureImportance' class as first parameter, other parameters passed depends on the feature selection method.

Visualization

if the parameter 'plot' in filter.GetCoefficient1 and filter.GetCoefficient2

plot=True

The results of knockoff can be visualized as :

The results of SHAP can be visualized as :

Visualization of LassoNet's tuning hyperparameters process:

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.idea		.idea
.readme.swp		.readme.swp
README.md		README.md
VariablesSelection.py		VariablesSelection.py
gradientLearning.cpp		gradientLearning.cpp
knokcoff_result.png		knokcoff_result.png
lassonet_result.png		lassonet_result.png
vital.png		vital.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VariablesSelection

Paper Links

Packages Version Need

Method To Use

Example

Visualization

About

Releases

Packages

Languages

ZeonlungPun/VariablesSelection

Folders and files

Latest commit

History

Repository files navigation

VariablesSelection

Paper Links

Packages Version Need

Method To Use

Example

Visualization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages