Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

HW2 Incoming Prediction

HomeWork description

Requirement

Dataset and Task Introduction

  • TASK: Binary Classification

    Dtermine whether a person makes over 50K a year

  • Dataset: ADULT

    Extraction was done by Barry Becker from the 1994 Census database.
    A set of reasonably clean records was extracted using the following conditions: ((AGE>16) && (AGI>100) && (AFNLWGT>1) && (HRSWK>0)).

  • Reference

Data Attribute Information

  • train.csv 、test.csv:
    age, workclass, fnlwgt, education, education num, marital-status, occupation
    relationship, race, sex, capital-gain, capital-loss, hours-per-week,
    native-country

  • make over 50K a year or not

  • For more details please check out Kaggle’s Description Page

Result

Probabilstic Generative Model

在这个模型里面假设数据集属于高斯分布,采用两个种类shared_sigma的模型,通过数据集算出各自的,然后直接带入公式求解

训练数据直接将其中的10%当作valid,最后valid accuracy为0.843366, test accuracy为0.843867,结果还是很不错的

Logistic Regression

实验过程中采取过Ada但是效果不如sgd,使用mini-batch加快速率,batch-size为32,一共进行300epoch,最终结果

随着epoch-time的增多,loss不断减小,最后valid accuracy为0.852858, test accuracy为0.852343,优于Probability Generative Model这是因为这个模型下不需要假设采样数据的分布

Neural Network

使用keras三层的fully-connected neural network,使用的loss是binary_crossentropy,activation='sigmoid',optimizer='adam',前两层都是600个units,batch-size=32, epoch-times=50
最后在valid set上的acc=0.9084, test-set上的acc为0.8426