Skip to content

Commit

Permalink
Add project 2 file
Browse files Browse the repository at this point in the history
  • Loading branch information
lolipopshock committed May 1, 2018
1 parent c57d5f6 commit 4fd6bdc
Show file tree
Hide file tree
Showing 2 changed files with 11,136 additions and 0 deletions.
256 changes: 256 additions & 0 deletions P2_Explore_Movie_Dataset/Untitled.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,256 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 探索电影数据集\n",
"\n",
"在这个项目中,你将尝试使用所学的知识,使用 `NumPy`、`Pandas`、`matplotlib`、`seaborn` 库中的函数,来对电影电影数据集进行探索。\n",
"\n",
"如果遇到问题,你可以请教助教,或者提交项目获得指导。\n",
"\n",
"---\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## 第一节 读取库、导入数据\n",
"\n",
"在这一部分,你需要编写代码,完成以下任务:\n",
"\n",
"1. 载入需要的库 `NumPy`、`Pandas`、`matplotlib`、`seaborn`。\n",
"2. 利用 `Pandas` 库,读取 `tmdb-movies.csv` 中的数据,保存为 `movie_data`。\n",
"3. 使用 `.head()` 方法,来获取数据的前几条数据。\n",
"4. 根据获取的数据,提出两个问题,作为接下来探索数据的目标。\n",
"\n",
"提示:\n",
"1. 记得使用 notebook 中的魔法指令 `%matplotlib inline`,否则会导致你接下来无法打印出图像。\n",
"2. 提出的问题应当和数据中的**某个**特征息息相关,例如:大部分电影的票房(revenue)是怎样分布的、大部分电影的知名度(popularity)是怎样分布的。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"**任务1:**按照要求完成代码。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"**任务2:**根据上述数据,提出两个问题,作为接下来探索数据的目标。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"- 问题 1:(回答区)\n",
"\n",
"- 问题 2:(回答区)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"---\n",
"\n",
"## 第二节 获取数据的统计信息\n",
"\n",
"读取数据之后,我们需要获取数据的一些统计信息,例如最大值、最小值、平均数、中位数等。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"**任务3:**请写代码,计算出数据有多少行、多少列?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"**任务4:**获取数据中任意两列的一些统计信息,可以是最大值、最小值、平均数、中位数、标准差等。你可以使用 `.describe` 方法获取整张数据表的统计信息。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"**任务5:**上述获取的统计信息,对你回答提出的两个问题有何帮助?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"- 问题 1:(回答区)\n",
"\n",
"- 问题 2:(回答区)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"---\n",
"\n",
"## 第三节 绘图与可视化\n",
"\n",
"接着你要尝试对你的数据进行图像的绘制以及可视化。根据课程的所学内容,你可以根据不同的数据类型,绘制这些图像:\n",
"\n",
"1. 条形图\n",
"2. 饼图\n",
"3. 直方图\n",
"4. 散点图\n",
"5. 折线图\n",
"6. 箱线图\n",
"7. 热力图\n",
"8. 小提琴图\n",
"9. 轴须图\n",
"10. 带状图\n",
"11. 堆积图\n",
"\n",
"那么接下来该你尝试使用所学的知识,来对我们的数据进行可视化啦!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**任务6:**请根据你的问题1,来对某个数据特征进行适当的可视化,并尝试回答你的问题。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(问题回答)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"**任务7:**请根据你的问题2,来对某个数据特征进行适当的可视化,并尝试回答你的问题。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(问题回答)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"**任务8:**(挑战)请尝试挑选一组特征,进行多变量的可视化。多变量的可视化能够帮我们揭示数据之间的关系,例如:电影的票房和知名度的关系。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading

0 comments on commit 4fd6bdc

Please sign in to comment.