Name		Name	Last commit message	Last commit date
parent directory ..
data		data
output		output
README.md		README.md
gp.py		gp.py
lr.py		lr.py
nn.py		nn.py

README.md

HW2 Incoming Prediction

HomeWork description

Requirement

Dataset and Task Introduction

TASK: Binary Classification

Dtermine whether a person makes over 50K a year
Dataset: ADULT

Extraction was done by Barry Becker from the 1994 Census database.
A set of reasonably clean records was extracted using the following conditions: ((AGE>16) && (AGI>100) && (AFNLWGT>1) && (HRSWK>0)).
Reference

Data Attribute Information

train.csv 、test.csv:
age, workclass, fnlwgt, education, education num, marital-status, occupation
relationship, race, sex, capital-gain, capital-loss, hours-per-week,
native-country
make over 50K a year or not
For more details please check out Kaggle’s Description Page

Result

Probabilstic Generative Model

在这个模型里面假设数据集属于高斯分布，采用两个种类shared_sigma的模型，通过数据集算出各自的 $\mu _{1}, \mu _{2}, \Sigma$ ，然后直接带入公式求解

训练数据直接将其中的10%当作valid，最后valid accuracy为0.843366, test accuracy为0.843867，结果还是很不错的

Logistic Regression

实验过程中采取过Ada但是效果不如sgd，使用mini-batch加快速率，batch-size为32，一共进行300epoch，最终结果

随着epoch-time的增多，loss不断减小，最后valid accuracy为0.852858, test accuracy为0.852343，优于Probability Generative Model这是因为这个模型下不需要假设采样数据的分布

Neural Network

使用keras三层的fully-connected neural network，使用的loss是binary_crossentropy，activation='sigmoid'，optimizer='adam'，前两层都是600个units，batch-size=32， epoch-times=50
最后在valid set上的acc=0.9084, test-set上的acc为0.8426

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HW2

HW2

README.md

HW2 Incoming Prediction

Requirement

Result

Files

HW2

Directory actions

More options

Directory actions

More options

Latest commit

History

HW2

Folders and files

parent directory

README.md

HW2 Incoming Prediction

Requirement

Result