-
Notifications
You must be signed in to change notification settings - Fork 7
/
01_basic_usage.py
384 lines (304 loc) · 9.18 KB
/
01_basic_usage.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
# ---
# jupyter:
# jupytext:
# formats: ipynb,py:percent
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.16.2
# kernelspec:
# display_name: .venv
# language: python
# name: python3
# ---
# %% [markdown]
# # Basic usage of `skore`
# %% [markdown]
# ## Introduction
#
# This guide is to illustrate some of the main features that `skore` currently provides. `skore` an open-source package that aims at enable data scientist to:
# 1. Store objects of different types from their Python code: python lists and dictionaries, `numpy` arrays, `pandas` dataframes, `scikit-learn` fitted pipelines, `matplotlib` / `plotly` / `altair` figures, and more.
# 2. **Track** and **visualize** these stored objects on a user-friendly dashboard.
# 3. Export the dashboard to a HTML file.
#
# This notebook stores some items that have been used to generated a `skore` report available at [this link](https://sylvaincom.github.io/files/probabl/skore/01_basic_usage.html).
# %% [markdown]
# Imports:
# %%
import altair as alt
import io
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import PIL
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Lasso
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from skore import load
from skore.item import MediaItem
# %% [markdown]
# ## Initialize and use a Project
#
# From your shell, initialize a `skore` project, here named `project`, that will be in your current working directory:
# ```bash
# python -m skore create "project"
# ```
# This will create a skore project directory named `project.skore` in the current directory.
#
# Now that you have created the `project.skore` folder (even though nothing has yet been stored), you can launch the UI.
#
# From your shell (in the same directory), start the UI locally:
# ```bash
# python -m skore launch project
# ```
# This will automatically open a browser at the UI's location.
#
# ---
# **NOTE**: If you already had a `project.skore` directory from a previous run -- you can check for that using your shell:
# ```bash
# ls
# ```
# and if you no longer need it, we recommend deleting this folder using your shell:
# ```bash
# rm -r project.skore
# ```
# This deletion needs to be done before the cells above: before initializing the store and before launching the UI!
#
# ---
# %% [markdown]
# Now that the project file exists, we can load it in our notebook so that we can read from and write to it:
# %%
project = load("project")
# %% [markdown]
# ### Storing an integer
# %% [markdown]
# Now, let us store our first object, for example an integer:
# %%
project.put("my_int", 3)
# %% [markdown]
# Here, the name of my object is `my_int` and the integer value is 3.
#
# You can read it from the Project:
# %%
project.get("my_int")
# %% [markdown]
# Careful; like in a traditional Python dictionary, the `put` method will *overwrite* past data if you use a key which already exists!
# %%
project.put("my_int", 30_000)
# %% [markdown]
# Let us check the updated value:
# %%
project.get("my_int")
# %% [markdown]
# By using the `delete_item` method, you can also delete an object so that your `skore` UI does not become cluttered:
# %%
project.put("my_int_2", 10)
# %%
project.delete_item("my_int_2")
# %% [markdown]
# You can use `project.list_item_keys` to display all the keys in your project:
# %%
project.list_item_keys()
# %% [markdown]
# ### Storing a string
# %% [markdown]
# We just stored a integer, now let us store some text using strings!
# %%
project.put("my_string", "Hello world!")
# %%
project.get("my_string")
# %% [markdown]
# `project.get` infers the type of the inserted object by default. For example, strings are assumed to be in Markdown format. Hence, you can customize the display of your text:
# %%
project.put(
"my_string_2",
(
"""Hello world!, **bold**, *italic*, `code`
```python
def my_func(x):
return x+2
```
"""
),
)
# %% [markdown]
# Moreover, you can also explicitly tell `skore` the media type of an object, for example in HTML:
# %%
# Note we use `put_item` instead of `put`
project.put_item(
"my_string_3",
MediaItem.factory(
"<p><h1>Title</h1> <b>bold</b>, <i>italic</i>, etc.</p>", media_type="text/html"
),
)
# %% [markdown]
# Note that the media type is only used for the UI, and not in this notebook at hand:
# %%
project.get("my_string_3")
# %% [markdown]
# You can also conveniently use Python f-strings:
# %%
x = 2
y = [1, 2, 3, 4]
project.put("my_string_4", f"The value of `x` is {x} and the value of `y` is {y}.")
# %% [markdown]
# ### Storing many kinds of data
# %% [markdown]
# Python list:
# %%
my_list = [1, 2, 3, 4]
project.put("my_list", my_list)
# %% [markdown]
# Python dictionary:
# %%
my_dict = {
"company": "probabl",
"year": 2023,
}
project.put("my_dict", my_dict)
# %% [markdown]
# NumPy array:
# %%
my_arr = np.random.randn(3, 3)
project.put("my_arr", my_arr)
# %% [markdown]
# Pandas data frame:
# %%
my_df = pd.DataFrame(np.random.randn(3, 3))
project.put("my_df", my_df)
# %% [markdown]
# ### Data visualization
#
# Note that, in the dashboard, the interactivity of plots is supported, for example for `altair` and `plotly`.
# %% [markdown]
# Matplotlib figures:
# %%
x = np.linspace(0, 2, 100)
fig, ax = plt.subplots(figsize=(5, 2.7), layout="constrained")
ax.plot(x, x, label="linear")
ax.plot(x, x**2, label="quadratic")
ax.plot(x, x**3, label="cubic")
ax.set_xlabel("x label")
ax.set_ylabel("y label")
ax.set_title("Simple Plot")
ax.legend()
plt.show()
project.put("my_figure", fig)
# %% [markdown]
# Altair charts:
# %%
num_points = 100
df_plot = pd.DataFrame(
{"x": np.random.randn(num_points), "y": np.random.randn(num_points)}
)
my_altair_chart = (
alt.Chart(df_plot)
.mark_circle()
.encode(x="x", y="y", tooltip=["x", "y"])
.interactive()
.properties(title="My title")
)
my_altair_chart.show()
project.put("my_altair_chart", my_altair_chart)
# %% [markdown]
# Plotly figures:
#
# > NOTE: Some users reported the following error when running the Plotly cells:
# > ```
# > ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed
# > ```
# > This is a Plotly issue which is documented [here](https://github.com/plotly/plotly.py/issues/3285); to solve it, we recommend installing nbformat in your environment, e.g. with
# > ```sh
# > pip install --upgrade nbformat
# > ```
# %%
df = px.data.iris()
fig = px.scatter(df, x=df.sepal_length, y=df.sepal_width, color=df.species, size=df.petal_length)
fig.show()
project.put("my_plotly_fig", fig)
# %% [markdown]
# Animated plotly figures:
# %%
df = px.data.gapminder()
my_anim_plotly_fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
size="pop", color="continent", hover_name="country",
log_x=True, size_max=55, range_x=[100,100000], range_y=[25,90])
my_anim_plotly_fig.show()
project.put("my_anim_plotly_fig", my_anim_plotly_fig)
# %% [markdown]
# PIL images:
# %%
my_pil_image = PIL.Image.new("RGB", (100, 100), color="red")
with io.BytesIO() as output:
my_pil_image.save(output, format="png")
project.put("my_pil_image", my_pil_image)
# %% [markdown]
# ### Scikit-learn models and pipelines
#
# As `skore` is developed by :probabl., the spin-off of scikit-learn, `skore` treats scikit-learn models and pipelines as first-class citizens.
#
# First of all, you can store a scikit-learn model:
# %%
my_model = Lasso(alpha=2)
project.put("my_model", my_model)
# %% [markdown]
# You can also store scikit-learn pipelines:
# %%
my_pipeline = Pipeline(
[("standard_scaler", StandardScaler()), ("lasso", Lasso(alpha=2))]
)
project.put("my_pipeline", my_pipeline)
# %% [markdown]
# Moreover, you can store fitted scikit-learn pipelines:
# %%
diabetes = load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
my_pipeline.fit(X, y)
project.put("my_fitted_pipeline", my_pipeline)
# %% [markdown]
# ---
# ## Cross-validation with skore
# %%
from sklearn import datasets, linear_model
from skore.cross_validate import cross_validate
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()
cv_results = cross_validate(lasso, X, y, cv=3, project=project)
# %% [markdown]
# _Stay tuned for some new features!_
# %% [markdown]
# ---
# ## Manipulating the skore UI
#
# The following is just some `skore` strings that we generate in order to provide more context on the obtained report.
# %%
project.put(
"my_comment_1",
"<p><h1>Welcome to skore!</h1><p><code>skore</code> allows data scientists to create tracking and visualizations from their Python code. This HTML document is actually a skore report generated using the <code>01_basic_usage.ipynb</code> example notebook then exported (into HTML)!<p>",
)
# %%
project.put(
"my_comment_2",
"<p><h2>Integers</h1></p>",
)
# %%
project.put("my_comment_3", "<p><h2>Strings</h1></p>")
# %%
project.put(
"my_comment_4",
"<p><h2>Many kinds of data</h1></p>",
)
# %%
project.put(
"my_comment_5",
"<p><h2>Plots</h1></p>",
)
# %%
project.put("my_comment_6", "<p><h2>Scikit-learn models and pipelines</h1></p>")
# %%