Merge branch 'master' of https://github.com/Yorko/mlcourse.ai

Blueneuron · Feb 2, 2019 · c9f3ed6 · c9f3ed6
2 parents f52724d + 2e4faf2
commit c9f3ed6
Show file tree

Hide file tree

Showing 2 changed files with 852 additions and 5 deletions.
diff --git a/jupyter_english/topic05_ensembles_random_forests/topic5_part1_bagging.ipynb b/jupyter_english/topic05_ensembles_random_forests/topic5_part1_bagging.ipynb
@@ -24,11 +24,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In the previous articles, you saw different algorithms for classification as well as techniques for how to properly validate and evaluate the quality of your models.\n",
+    "In previous articles, you saw different algorithms for classification as well as techniques that can be used to properly validate and evaluate the quality of your models.\n",
     "\n",
     "Now, suppose that you have chosen the best possible model for a particular problem and are struggling to further improve its accuracy. In this case, you would need to apply some more advanced machine learning techniques that are collectively referred to as *ensembles*.\n",
     "\n",
-    "An *ensemble* is a set of elements that collectively contribute to a whole. A familiar example is a musical ensemble, which blends the sounds of several musical instruments to create a beautiful harmony, or architectural ensembles, which are a set of buildings designed as a unit. In ensembles, the (whole) harmonious outcome is more important than the performance of any individual part."
+    "An *ensemble* is a set of elements that collectively contribute to a whole. A familiar example is a musical ensemble, which blends the sounds of several musical instruments to create harmony, or architectural ensembles, which are a set of buildings designed as a unit. In ensembles, the (whole) harmonious outcome is more important than the performance of any individual part."
    ]
   },
   {
@@ -137,7 +137,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Looks like loyal customers make fewer calls to customer service than those who eventually left. Now, it might be a good idea to estimate the average number of customer service calls in each group. Since our dataset is small, we would not get a good estimate by simply calculating the mean of the original sample. We will be better off applying the bootstrap method. Let's generate 1000 new bootstrap samples from our original population and produce an interval estimate of the mean."
+    "Looks like loyal customers make fewer calls to customer service than those who eventually leave. Now, it might be a good idea to estimate the average number of customer service calls in each group. Since our dataset is small, we would not get a good estimate by simply calculating the mean of the original sample. We will be better off applying the bootstrap method. Let's generate 1000 new bootstrap samples from our original population and produce an interval estimate of the mean."
    ]
   },
   {
@@ -192,7 +192,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "For the interpretation of confidence intervals, you can address [this](https://www.graphpad.com/guides/prism/7/statistics/stat_more_about_confidence_interval.htm?toc=0&printWindow) concise note or any course on statistics. It's not correct to say that a confidence interval contains 95% of values. Note that the interval for the loyal customers is narrower, which is reasonable since they make fewer calls (0, 1 or 2) in comparison with the churned clients who called until they became fed up and switched providers. "
+    "For the interpretation of confidence intervals, you can address [this](https://www.graphpad.com/guides/prism/7/statistics/stat_more_about_confidence_interval.htm?toc=0&printWindow) concise note or any course on statistics. It's not correct to say that a confidence interval contains 95% of values. Note that the interval for the loyal customers is narrower, which is reasonable since they make fewer calls (0, 1 or 2) in comparison with the churned clients who call until they are fed up and decide to switch providers. "
    ]
   },
   {
@@ -257,7 +257,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Bagging reduces the variance of a classifier by decreasing the difference in error when we train the model on different datasets. In other words, bagging prevents overfitting. The efficacy of bagging comes from the fact that the individual models are quite different due to the different training data and their errors cancel out during voting. Additionally, outliers are likely omitted in some of the training bootstrap samples.\n",
+    "Bagging reduces the variance of a classifier by decreasing the difference in error when we train the model on different datasets. In other words, bagging prevents overfitting. The efficiency of bagging comes from the fact that the individual models are quite different due to the different training data and their errors cancel each other out during voting. Additionally, outliers are likely omitted in some of the training bootstrap samples.\n",
     "\n",
     "The `scikit-learn` library supports bagging with meta-estimators `BaggingRegressor` and `BaggingClassifier`. You can use most of the algorithms as a base.\n",
     "\n",