Feature/upd poetry & re-run the whole book (Yorko#736)

* re-run all jupyter-book * fix some of the errors
imangazin04 · Feb 3, 2023 · 8d21af1 · 8d21af1
1 parent 9671c80
commit 8d21af1
Show file tree

Hide file tree

Showing 10 changed files with 47 additions and 53 deletions.
diff --git a/mlcourse_ai_jupyter_book/_config.yml b/mlcourse_ai_jupyter_book/_config.yml
@@ -12,7 +12,7 @@ repository:
   branch: main              # Which branch of the repository should be used when creating links (optional)
 
 execute:
-  execute_notebooks           : cache
+  execute_notebooks           : force
   timeout: -1
 
 # exclude some content

diff --git a/...jupyter_book/book/topic02/assignment02_analyzing_cardiovascular_desease_data.md b/...jupyter_book/book/topic02/assignment02_analyzing_cardiovascular_desease_data.md
@@ -119,7 +119,7 @@ df.head()
 
 It would be instructive to peek into the values of our variables.
 
-Let's convert the data into *long* format and depict the value counts of the categorical features using [`factorplot()`](https://seaborn.pydata.org/generated/seaborn.factorplot.html).
+Let's convert the data into *long* format and depict the value counts of the categorical features using [`catplot()`](https://seaborn.pydata.org/generated/seaborn.catplot.html).
 
 
 ```{code-cell} ipython3
@@ -137,6 +137,7 @@ df_uniques = (
 sns.catplot(
     x="variable", y="count", hue="value", data=df_uniques, kind="bar"
 )
+plt.xticks(rotation='vertical');
 ```
 
 We can see that the target classes are balanced. That's great!
@@ -165,6 +166,7 @@ sns.catplot(
     data=df_uniques,
     kind="bar",
 )
+plt.xticks(rotation='vertical');
 ```
 
 You can see that the distribution of cholesterol and glucose levels great differs by the value of the target variable. Is this a coincidence?

diff --git a/...ook/book/topic02/assignment02_analyzing_cardiovascular_desease_data_solution.md b/...ook/book/topic02/assignment02_analyzing_cardiovascular_desease_data_solution.md
@@ -118,7 +118,7 @@ df.head()
 
 It would be instructive to peek into the values of our variables.
 
-Let's convert the data into *long* format and depict the value counts of the categorical features using [`factorplot()`](https://seaborn.pydata.org/generated/seaborn.factorplot.html).
+Let's convert the data into *long* format and depict the value counts of the categorical features using [`catplot()`](https://seaborn.pydata.org/generated/seaborn.catplot.html).
 
 
 ```{code-cell} ipython3
@@ -135,7 +135,8 @@ df_uniques = (
 
 sns.catplot(
     x="variable", y="count", hue="value", data=df_uniques, kind="bar"
-);
+)
+plt.xticks(rotation='vertical');
 ```
 
 We can see that the target classes are balanced. That's great!
@@ -163,7 +164,8 @@ sns.catplot(
     col="cardio",
     data=df_uniques,
     kind="bar"
-);
+)
+plt.xticks(rotation='vertical');
 ```
 
 You can see that the target variable greatly affects the distribution of cholesterol and glucose levels. Is this a coincidence?

diff --git a/mlcourse_ai_jupyter_book/book/topic05/topic5_part1_bagging.md b/mlcourse_ai_jupyter_book/book/topic05/topic5_part1_bagging.md
@@ -165,6 +165,7 @@ For the interpretation of confidence intervals, you can address [this](https://w
 Now that you've grasped the idea of bootstrapping, we can move on to *bagging*.
 
 Suppose that we have a training set $\large X$. Using bootstrapping, we generate samples $\large X_1, \dots, X_M$. Now, for each bootstrap sample, we train its own classifier $\large a_i(x)$. The final classifier will average the outputs from all these individual classifiers. In the case of classification, this technique corresponds to voting:
+
 $$\large a(x) = \frac{1}{M}\sum_{i = 1}^M a_i(x).$$
 
 The picture below illustrates this algorithm:

diff --git a/mlcourse_ai_jupyter_book/book/topic05/topic5_part2_random_forest.md b/mlcourse_ai_jupyter_book/book/topic05/topic5_part2_random_forest.md
@@ -69,6 +69,7 @@ The algorithm for constructing a random forest of $\large N$ trees goes as follo
         * For each split, we first randomly pick $\large m$ features from the $\large d$ original ones and then search for the next best split only among the subset.
 
 The final classifier is defined by:
+
 $$\large a(x) = \frac{1}{N}\sum_{k = 1}^N b_k(x)$$
 
 We use the majority voting for classification and the mean for regression.

diff --git a/mlcourse_ai_jupyter_book/book/topic05/topic5_part3_feature_importance.md b/mlcourse_ai_jupyter_book/book/topic05/topic5_part3_feature_importance.md
@@ -64,9 +64,9 @@ Note that by definition ${PI}^{(t)}=0$, if variable $X_j$ isn't in tree $t$.
 
 Now, we can give the feature importance calculation for ensembles:
 * not normalized:
-$${PI}\left(X_j\right)=\frac{\sum_{t=1}^N {PI}^{(t)}(X_j)}{N}$$
+${PI}\left(X_j\right)=\frac{\sum_{t=1}^N {PI}^{(t)}(X_j)}{N}$
 * normalized by the standard deviation of the differences:
-$$z_j=\frac{{PI}\left(X_j\right)}{\frac{\hat{\sigma}}{\sqrt{N}}}$$
+$z_j=\frac{{PI}\left(X_j\right)}{\frac{\hat{\sigma}}{\sqrt{N}}}$
 
 ## 2. Illustrating permutation importance
 

diff --git a/mlcourse_ai_jupyter_book/book/topic07/assignment07_unsupervised_learning.md b/mlcourse_ai_jupyter_book/book/topic07/assignment07_unsupervised_learning.md
@@ -288,7 +288,8 @@ Calculate the Adjusted Rand Index (`sklearn.metrics`) for the resulting clusteri
 **Question 6:** <br>
 Select all the correct statements. <br>
 
-** Answer options:**
+**Answer options:**
+
 - According to ARI, KMeans handled clustering worse than Agglomerative Clustering
 - For ARI, it does not matter which tags are assigned to the cluster, only the partitioning of instances into clusters matters
 - In case of random partitioning into clusters, ARI will be close to zero

diff --git a/mlcourse_ai_jupyter_book/book/topic07/topic7_pca_clustering.md b/mlcourse_ai_jupyter_book/book/topic07/topic7_pca_clustering.md
@@ -91,44 +91,31 @@ Let's start by uploading all of the essential modules and try out the iris examp
 
 
 ```{code-cell} ipython3
-import matplotlib.pyplot as plt
+
 import numpy as np
-import seaborn as sns
+import pandas as pd
+from sklearn import datasets, decomposition
 
+import matplotlib.pyplot as plt
+import seaborn as sns
 sns.set(style="white")
+from mpl_toolkits.mplot3d import Axes3D
 %matplotlib inline
 %config InlineBackend.figure_format = 'retina'
-from mpl_toolkits.mplot3d import Axes3D
-from sklearn import datasets, decomposition
 
-# Loading the dataset
+# Load the iris dataset
 iris = datasets.load_iris()
 X = iris.data
 y = iris.target
 
-# Let's create a beautiful 3d-plot
-fig = plt.figure(1, figsize=(6, 5))
-plt.clf()
-ax = Axes3D(fig, rect=[0, 0, 0.95, 1], elev=48, azim=134)
-
-plt.cla()
-
-for name, label in [("Setosa", 0), ("Versicolour", 1), ("Virginica", 2)]:
-    ax.text3D(
-        X[y == label, 0].mean(),
-        X[y == label, 1].mean() + 1.5,
-        X[y == label, 2].mean(),
-        name,
-        horizontalalignment="center",
-        bbox=dict(alpha=0.5, edgecolor="w", facecolor="w"),
-    )
-# Change the order of labels, so that they match
-y_clr = np.choose(y, [1, 2, 0]).astype(np.float32)
-ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y_clr, cmap=plt.cm.nipy_spectral)
-
-ax.w_xaxis.set_ticklabels([])
-ax.w_yaxis.set_ticklabels([])
-ax.w_zaxis.set_ticklabels([]);
+# Plot the dataset in 3D ignoring Petal Width
+fig = plt.figure(figsize=(10, 10))
+ax = fig.add_subplot(111, projection='3d')
+ax.scatter(X[:, 0], X[:, 1], X[:, 2],  
+           c=y, cmap='viridis', alpha=0.7)
+ax.set_xlabel('Sepal Length')
+ax.set_ylabel('Sepal Width')
+ax.set_zlabel('Petal Length');
 ```
 
 Now let's see how PCA will improve the results of a simple model that is not able to correctly fit all of the training data:
@@ -293,7 +280,7 @@ plt.show();
 
 The main idea behind clustering is pretty straightforward. Basically, we say to ourselves, "I have these points here, and I can see that they organize into groups. It would be nice to describe these things more concretely, and, when a new point comes in, assign it to the correct group." This general idea encourages exploration and opens up a variety of algorithms for clustering.
 
-<figure><img align="center" src="https://habrastorage.org/getpro/habr/post_images/8b9/ae5/586/8b9ae55861f22a2809e8b3a00ef815ad.png"><figcaption>*The examples of the outcomes from different algorithms from scikit-learn*</figcaption></figure>
+<figure><img align="center" src="https://habrastorage.org/getpro/habr/post_images/8b9/ae5/586/8b9ae55861f22a2809e8b3a00ef815ad.png"><figcaption><it>The examples of the outcomes from different algorithms from scikit-learn</it></figcaption></figure>
 
 The algorithms listed below do not cover all the clustering methods out there, but they are the most commonly used ones.
 
@@ -402,7 +389,7 @@ from sklearn.cluster import KMeans
 ```{code-cell} ipython3
 inertia = []
 for k in range(1, 8):
-    kmeans = KMeans(n_clusters=k, random_state=1).fit(X)
+    kmeans = KMeans(n_clusters=k, random_state=1, n_init='auto').fit(X)
     inertia.append(np.sqrt(kmeans.inertia_))
 ```
 
@@ -561,7 +548,7 @@ data = datasets.load_digits()
 X, y = data.data, data.target
 
 algorithms = []
-algorithms.append(KMeans(n_clusters=10, random_state=1))
+algorithms.append(KMeans(n_clusters=10, random_state=1, n_init='auto'))
 algorithms.append(AffinityPropagation())
 algorithms.append(
     SpectralClustering(n_clusters=10, random_state=1, affinity="nearest_neighbors")

diff --git a/mlcourse_ai_jupyter_book/book/topic08/topic08_sgd_hashing_vowpal_wabbit.md b/mlcourse_ai_jupyter_book/book/topic08/topic08_sgd_hashing_vowpal_wabbit.md
@@ -261,7 +261,7 @@ This idea is implemented in the `OneHotEncoder` class from `sklearn.preprocessin
 
 
 ```{code-cell} ipython3
-onehot_encoder = OneHotEncoder(sparse=False)
+onehot_encoder = OneHotEncoder(sparse_output=False)
 ```
 
 
@@ -453,15 +453,15 @@ with open(os.path.join(PATH_TO_WRITE_DATA, "20news_test.vw"), "w") as vw_test_da
 Now, we pass the created training file to Vowpal Wabbit. We solve the classification problem with a hinge loss function (linear SVM). The trained model will be saved in the `20news_model.vw` file:
 
 
-```{code-cell} ipython3
+```
 #!vw -d $PATH_TO_WRITE_DATA/20news_train.vw \
 # --loss_function hinge -f $PATH_TO_WRITE_DATA/20news_model.vw
 ```
 
 VW prints a lot of interesting info while training (one can suppress it with the `--quiet` parameter). You can see [documentation](https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/tutorials/cmd_linear_regression.html#vowpal-wabbit-output) of the diagnostic output. Note how average loss drops while training. For loss computation, VW uses samples it has never seen before, so this measure is usually accurate. Now, we apply our trained model to the test set, saving predictions into a file with the `-p` flag:  
 
 
-```{code-cell} ipython3
+```
 #!vw -i $PATH_TO_WRITE_DATA/20news_model.vw -t -d $PATH_TO_WRITE_DATA/20news_test.vw \
 # -p $PATH_TO_WRITE_DATA/20news_test_predictions.txt
 ```
@@ -477,13 +477,13 @@ with open(os.path.join(PATH_TO_WRITE_DATA, "20news_test_predictions.txt")) as pr
 auc = roc_auc_score(test_labels, test_prediction)
 roc_curve = roc_curve(test_labels, test_prediction)
 
-with plt.xkcd():
-    plt.plot(roc_curve[0], roc_curve[1])
-    plt.plot([0, 1], [0, 1])
-    plt.xlabel("FPR")
-    plt.ylabel("TPR")
-    plt.title("test AUC = %f" % (auc))
-    plt.axis([-0.05, 1.05, -0.05, 1.05]);
+
+plt.plot(roc_curve[0], roc_curve[1])
+plt.plot([0, 1], [0, 1])
+plt.xlabel("FPR")
+plt.ylabel("TPR")
+plt.title("test AUC = %f" % (auc))
+plt.axis([-0.05, 1.05, -0.05, 1.05]);
 ```
 
 The AUC value we get shows that we have achieved high classification quality.
@@ -524,12 +524,12 @@ We train Vowpal Wabbit in multiclass classification mode, passing the `oaa` para
 Additionally, we can try automatic Vowpal Wabbit parameter tuning with [Hyperopt](https://github.com/hyperopt/hyperopt).
 
 
-```{code-cell} ipython3
+```
 #!vw --oaa 20 $PATH_TO_WRITE_DATA/20news_train_mult.vw -f $PATH_TO_WRITE_DATA/ \
 #20news_model_mult.vw --loss_function=hinge
 ```
 
-```{code-cell} ipython3
+```
 #%%time
 #!vw -i $PATH_TO_WRITE_DATA/20news_model_mult.vw -t -d $PATH_TO_WRITE_DATA/20news_test_mult.vw \
 #-p $PATH_TO_WRITE_DATA/20news_test_predictions_mult.txt

diff --git a/mlcourse_ai_jupyter_book/book/topic09/topic9_part1_time_series_python.md b/mlcourse_ai_jupyter_book/book/topic09/topic9_part1_time_series_python.md
@@ -901,7 +901,7 @@ We can fight non-stationarity using different approaches: various order differen
 
 ## Getting rid of non-stationarity and building SARIMA
 
-Let's build an ARIMA model by walking through all the ~~circles of hell~~ stages of making a series stationary.
+Let's build an ARIMA model by walking through all the *circles of hell* stages of making a series stationary.
 
 Here is the code to render plots.
 
@@ -1555,7 +1555,7 @@ But, this victory is decieving, and it might not be the brightest idea to fit `x
 
 # Conclusion
 
-We discussed different time series analysis and prediction methods. Unfortunately, or maybe luckily, there is no one way to solve these kind of problems. Methods developed in the 1960s (and some even in the beginning of the 21st century) are still popular, along with LSTMs and RNNs (not covered in this article). This is partially related to the fact that the prediction task, like any other data-related task, requires creativity in so many aspects and definitely requires research. In spite of the large number of formal quality metrics and approaches to parameters estimation, it is often necessary to try something different for each time series. Last but not least, the balance between quality and cost is important. As a good example, the SARIMA model can produce spectacular results after tuning but can require many hours of ~~tambourine dancing~~ time series manipulation while a simple linear regression model can be built in 10 minutes and can achieve more or less comparable results.
+We discussed different time series analysis and prediction methods. Unfortunately, or maybe luckily, there is no one way to solve these kind of problems. Methods developed in the 1960s (and some even in the beginning of the 21st century) are still popular, along with LSTMs and RNNs (not covered in this article). This is partially related to the fact that the prediction task, like any other data-related task, requires creativity in so many aspects and definitely requires research. In spite of the large number of formal quality metrics and approaches to parameters estimation, it is often necessary to try something different for each time series. Last but not least, the balance between quality and cost is important. As a good example, the SARIMA model can produce spectacular results after tuning but can require many hours of *tambourine dancing* time series manipulation while a simple linear regression model can be built in 10 minutes and can achieve more or less comparable results.
 
 # Useful resources