Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
HezhiWang committed Apr 16, 2017
2 parents 30cff86 + adb3129 commit 1e65ff3
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 20 deletions.
31 changes: 30 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,30 @@
# Big_Data_project
# Big_Data_project Part 1

## Contributors
Hezhi Wang (hw1567)

Han Zhao (hz1411)

Jin U Bak (jub205)

## Dataset
Dataset is downloaded from the following links:

Dataset for 2009:
[https://data.cityofnewyork.us/Social-Services/new-311/9s88-aed8](https://data.cityofnewyork.us/Social-Services/new-311/9s88-aed8)

Dataset for 2010-present:
[https://data.cityofnewyork.us/Social-Services/311/wpe2-h2i5]https://data.cityofnewyork.us/Social-Services/311/wpe2-h2i5

Two datasets are combined as one and analyzed.

Dataset is also available on NYU HPC HDFS, **/user/jub205/311all.csv**

## Data Quality Issues
We first generated columns.txt for base type and semantic type of each columns, which can be accessed at **/user/jub205/columns.txt**.

Then for the summary of data quality, which is counting the number of empty/missing/invalid values in each column, sign in to dumbo, and run
```sh
$ spark-submit data_quality.py
```

19 changes: 0 additions & 19 deletions src/plot_by_date.py → src/plot_by_month.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,25 +20,6 @@
plt.ylabel('Number of Complaints')
plt.savefig('../plots/month.png')
f.close()
plt.clf()

f = open('../output/count_day.out','r')
xx = []
yy = []
for line in f.readlines():
day = line.split('\t')[0]
xx.append(day)
yy.append(int(line.split('\t')[1]))
date = [datetime.strptime(x,'%Y-%m-%d') for x in xx]
plt.figure(figsize=[15,12])
l = pd.Series(data=yy,index=date)
l.plot()
plt.title('Number of Complaints per day')
plt.xlabel('Time')
plt.ylabel('Number of Complaints')
plt.savefig('../plots/day.png')
f.close()




0 comments on commit 1e65ff3

Please sign in to comment.