forked from udacity/AIPND-cn
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
c57d5f6
commit 4fd6bdc
Showing
2 changed files
with
11,136 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,256 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 探索电影数据集\n", | ||
"\n", | ||
"在这个项目中,你将尝试使用所学的知识,使用 `NumPy`、`Pandas`、`matplotlib`、`seaborn` 库中的函数,来对电影电影数据集进行探索。\n", | ||
"\n", | ||
"如果遇到问题,你可以请教助教,或者提交项目获得指导。\n", | ||
"\n", | ||
"---\n", | ||
"\n", | ||
"---" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"source": [ | ||
"## 第一节 读取库、导入数据\n", | ||
"\n", | ||
"在这一部分,你需要编写代码,完成以下任务:\n", | ||
"\n", | ||
"1. 载入需要的库 `NumPy`、`Pandas`、`matplotlib`、`seaborn`。\n", | ||
"2. 利用 `Pandas` 库,读取 `tmdb-movies.csv` 中的数据,保存为 `movie_data`。\n", | ||
"3. 使用 `.head()` 方法,来获取数据的前几条数据。\n", | ||
"4. 根据获取的数据,提出两个问题,作为接下来探索数据的目标。\n", | ||
"\n", | ||
"提示:\n", | ||
"1. 记得使用 notebook 中的魔法指令 `%matplotlib inline`,否则会导致你接下来无法打印出图像。\n", | ||
"2. 提出的问题应当和数据中的**某个**特征息息相关,例如:大部分电影的票房(revenue)是怎样分布的、大部分电影的知名度(popularity)是怎样分布的。" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"---\n", | ||
"\n", | ||
"**任务1:**按照要求完成代码。" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"---\n", | ||
"\n", | ||
"**任务2:**根据上述数据,提出两个问题,作为接下来探索数据的目标。" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n", | ||
"- 问题 1:(回答区)\n", | ||
"\n", | ||
"- 问题 2:(回答区)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"---\n", | ||
"\n", | ||
"---\n", | ||
"\n", | ||
"## 第二节 获取数据的统计信息\n", | ||
"\n", | ||
"读取数据之后,我们需要获取数据的一些统计信息,例如最大值、最小值、平均数、中位数等。" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n", | ||
"\n", | ||
"**任务3:**请写代码,计算出数据有多少行、多少列?" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"outputs": [], | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"---\n", | ||
"\n", | ||
"**任务4:**获取数据中任意两列的一些统计信息,可以是最大值、最小值、平均数、中位数、标准差等。你可以使用 `.describe` 方法获取整张数据表的统计信息。" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"outputs": [], | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"---\n", | ||
"\n", | ||
"**任务5:**上述获取的统计信息,对你回答提出的两个问题有何帮助?" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n", | ||
"- 问题 1:(回答区)\n", | ||
"\n", | ||
"- 问题 2:(回答区)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"---\n", | ||
"\n", | ||
"---\n", | ||
"\n", | ||
"## 第三节 绘图与可视化\n", | ||
"\n", | ||
"接着你要尝试对你的数据进行图像的绘制以及可视化。根据课程的所学内容,你可以根据不同的数据类型,绘制这些图像:\n", | ||
"\n", | ||
"1. 条形图\n", | ||
"2. 饼图\n", | ||
"3. 直方图\n", | ||
"4. 散点图\n", | ||
"5. 折线图\n", | ||
"6. 箱线图\n", | ||
"7. 热力图\n", | ||
"8. 小提琴图\n", | ||
"9. 轴须图\n", | ||
"10. 带状图\n", | ||
"11. 堆积图\n", | ||
"\n", | ||
"那么接下来该你尝试使用所学的知识,来对我们的数据进行可视化啦!" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"**任务6:**请根据你的问题1,来对某个数据特征进行适当的可视化,并尝试回答你的问题。" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"outputs": [], | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"(问题回答)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"---\n", | ||
"\n", | ||
"**任务7:**请根据你的问题2,来对某个数据特征进行适当的可视化,并尝试回答你的问题。" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"outputs": [], | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"(问题回答)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"---\n", | ||
"\n", | ||
"**任务8:**(挑战)请尝试挑选一组特征,进行多变量的可视化。多变量的可视化能够帮我们揭示数据之间的关系,例如:电影的票房和知名度的关系。" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.6.1" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Oops, something went wrong.