Skip to content

Commit

Permalink
Minor typo's
Browse files Browse the repository at this point in the history
  • Loading branch information
GraemeMalcolm committed Aug 22, 2020
1 parent 17b6559 commit fe33f01
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions 01 - Data Exploration.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,7 @@
"metadata": {},
"cell_type": "markdown",
"source": [
"Note that in addition to the columns you specified, the DataFrame includes an *index* to unique identify each row. We could have specified the index explicitly, and assigned any kind of appropriate value (for example, an email address); but because we didn't specify an index, one has been created with a unique intege value for each row.\n",
"Note that in addition to the columns you specified, the DataFrame includes an *index* to unique identify each row. We could have specified the index explicitly, and assigned any kind of appropriate value (for example, an email address); but because we didn't specify an index, one has been created with a unique integer value for each row.\n",
"\n",
"### Finding and filtering data in a DataFrame\n",
"\n",
Expand Down Expand Up @@ -331,7 +331,7 @@
"source": [
"Look carefully at the `iloc[0:5]` results, and compare them to the `loc[0:5]` results you obtained previously. Can you spot the difference?\n",
"\n",
"The **loc** method returned rows with index values between *0* and *5* - which includes *0*, *1*, *2*, *3*, *4*, and *5* (six rows). However, the **iloc** method returns the rows in the specified range of *positions*, starting at 0 and returning the first rows.\n",
"The **loc** method returned rows with index values between *0* and *5* - which includes *0*, *1*, *2*, *3*, *4*, and *5* (six rows). However, the **iloc** method returns the rows in the specified range of *positions*, starting at 0 and returning the first five rows.\n",
"\n",
"**iloc** identiies data values in a DataFrame by *position*, which extends beyond rows to columns. So for example, you can use it to find the values for the columns in positions 1 and 2 in row 0, like this:"
],
Expand Down Expand Up @@ -453,9 +453,9 @@
},
{
"source": [
"Note that the new students include some null values, which when the DataFrame is retrieved show up as **NaN** (*not a number*). These values are used to indicate missing numeric data.\n",
"Note that the new student records don't include values for all columns. When the DataFrame is retrieved, the missing values show up as **NaN** (*not a number*). These values are used to indicate missing numeric data.\n",
"\n",
"So how would we know that the DataFrame contains missing values? Wellm you can use the **isnull** method to identify which individual values are null, like this:"
"So how would we know that the DataFrame contains missing values? You can use the **isnull** method to identify which individual values are null, like this:"
],
"cell_type": "markdown",
"metadata": {}
Expand Down Expand Up @@ -507,7 +507,7 @@
"source": [
"So now that we've found the null values, what can we do about them?\n",
"\n",
"One common approach is to *impute* replacement values. For example, if the number of study hours is missing, we could just assume that the student studioed for an average amount of time and replace the missing value with the mean study hours. To do this, we can use the **fillna** method, like this:"
"One common approach is to *impute* replacement values. For example, if the number of study hours is missing, we could just assume that the student studied for an average amount of time and replace the missing value with the mean study hours. To do this, we can use the **fillna** method, like this:"
],
"cell_type": "markdown",
"metadata": {}
Expand Down Expand Up @@ -543,7 +543,7 @@
"source": [
"### Explore data in the DataFrame\n",
"\n",
"Now that we've cleaned up the missing values, we're ready to explore the data in the DataFrame. Let's start by compating the mean study hours and grades."
"Now that we've cleaned up the missing values, we're ready to explore the data in the DataFrame. Let's start by comparing the mean study hours and grades."
],
"cell_type": "markdown",
"metadata": {}
Expand Down Expand Up @@ -1228,7 +1228,7 @@
"metadata": {},
"cell_type": "markdown",
"source": [
"The horizontal colored lines show the percentage of data within 1, 2, and 3 standard deviations of the mean (plus or minus).\n",
"The horizontal lines show the percentage of data within 1, 2, and 3 standard deviations of the mean (plus or minus).\n",
"\n",
"In any normal distribution:\n",
"- Approximately 68.26% of values fall within one standard deviation from the mean.\n",
Expand Down

0 comments on commit fe33f01

Please sign in to comment.