Skip to content

Commit 39e7f27

Browse files
authored
Add more documentation datasets (apple#251)
1 parent 28f01cc commit 39e7f27

15 files changed

+25
-21
lines changed

userguide/activity_classifier/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ The activity classifier in Turi Create creates a deep learning model capable of
88

99
#### Introductory Example
1010

11-
In this example we create a model to classify physical activities done by users of a handheld phone, using both accelerometer and gyroscope data. We will use data from the [HAPT experiment](http://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions) which contains recording sessions of multiple users, each performing certain physical activities. The performed activities are walking, climbing up stairs, climbing down stairs, sitting, standing, and laying.
11+
In this example we create a model to classify physical activities done by users of a handheld phone, using both accelerometer and gyroscope data. We will use data from the [HAPT experiment](http://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions) which contains recording sessions of multiple users, each performing certain physical activities.[<sup>1</sup>](../datasets.md) The performed activities are walking, climbing up stairs, climbing down stairs, sitting, standing, and laying.
1212

1313
Sensor data can be collected at varying frequencies. In the HAPT dataset, the sensors were sampled at 50Hz each - meaning 50 times per second. However, most applications would want to show outputs to the user at larger intervals. We control the output prediction rate via the ```prediction_window``` parameter. For example, if we want to produce a prediction every 5 seconds, and the sensors are sampled at 50Hz - we would set the ```prediction_window``` to 250 (5 sec * 50 samples per second).
1414

@@ -66,4 +66,4 @@ We've seen how we can quickly create an activity classifier given recorded sessi
6666

6767
* [Advanced usage](advanced-usage.md)
6868
* [Deployment via Core ML](export_coreml.md)
69-
* [How does it work](how-it-works.md)
69+
* [How does it work](how-it-works.md)

userguide/activity_classifier/data-preperation.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# HAPT Data Preparation
22

3-
In this section we will see how to get the [HAPT experiment](http://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions) data into the SFrame format expected by the activity classifier.
3+
In this section we will see how to get the [HAPT experiment](http://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions) data into the SFrame format expected by the activity classifier.[<sup>1</sup>](../datasets.md)
44

55
First we need to download the data from [here](http://archive.ics.uci.edu/ml/machine-learning-databases/00341/HAPT%20Data%20Set.zip) in zip format. The code below assumes the data was unzipped into a directory named `HAPT Data Set`. This folder contains 3 types of files - a file containing the performed activities for each experiment, files containing the collected accelerometer samples, and files containing the collected gyroscope samples.
66

@@ -93,4 +93,4 @@ data = data.remove_column('activity_id')
9393
data.save('hapt_data.sframe')
9494
```
9595

96-
To learn more about the expected input format of the activity classifier please visit the [advanced usage](advanced-usage.md) section.
96+
To learn more about the expected input format of the activity classifier please visit the [advanced usage](advanced-usage.md) section.

userguide/clustering/dbscan.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ advantages:
5050

5151
To illustrate the basic usage of DBSCAN and how the results can differ from
5252
K-means, we simulate non-spherical, low-dimensional data using the scikit-learn
53-
datasets module.
53+
datasets module.[<sup>1</sup>](../datasets.md)
5454

5555
```python
5656
import turicreate as tc

userguide/clustering/kmeans.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,9 @@ distance from point $$x$$ to center $$B$$ when assigning $$x$$ to a cluster.
2828

2929
#### Basic Usage
3030

31-
We illustrate usage of Turi Create K-means with a dataset used to classify
32-
schizophrenic subjects based on MRI scans. The original data consists of
31+
We illustrate usage of Turi Create K-means with the dataset from the [June
32+
2014 Kaggle competition to classify schizophrenic subjects based on MRI
33+
scans](https://www.kaggle.com/c/mlsp-2014-mri). Download **Train.zip** from the data tab.[<sup>1</sup>](../datasets.md) The original data consists of
3334
two sets of features: functional network connectivity (FNC) features and
3435
source-based morphometry (SBM) features, which we incorporate into a single
3536
[`SFrame`](https://apple.github.io/turicreate/docs/api/generated/turicreate.SFrame.html)

userguide/datasets.md

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# User Guide Datasets
2+
Apple has provided links to certain datasets for reference purposes only and on an “as is” basis. You are solely responsible for your use of the datasets and for complying with applicable terms and conditions, including any use restrictions and attribution requirements. Apple shall not be liable for, and specifically disclaims any warranties, express or implied, in connection with, the use of the datasets, including any warranties of fitness for a particular purpose or non-infringement.

userguide/image_classifier/README.md

+6-5
Original file line numberDiff line numberDiff line change
@@ -12,16 +12,16 @@ create a high quality image classifier model.
1212

1313
#### Loading Data
1414

15-
Suppose we have a dataset containing labeled cat and dog images.
15+
The [Kaggle Cats and Dogs Dataset](https://www.microsoft.com/en-us/download/details.aspx?id=54765) provides labeled cat and dog images.[<sup>1</sup>](../datasets.md) After downloading and decompressing the dataset, navigate to the main **kagglecatsanddogs** folder, which contains a **PetImages** subfolder.
1616

1717
```python
1818
import turicreate as tc
1919

20-
# Load images
21-
data = tc.image_analysis.load_images('train', with_path=True)
20+
# Load images (Note: you can ignore 'Not a JPEG file' errors)
21+
data = tc.image_analysis.load_images('PetImages', with_path=True)
2222

2323
# From the path-name, create a label column
24-
data['label'] = data['path'].apply(lambda path: 'dog' if 'dog' in path else 'cat')
24+
data['label'] = data['path'].apply(lambda path: 'dog' if '/Dog' in path else 'cat')
2525

2626
# Save the data for future use
2727
data.save('cats-dogs.sframe')
@@ -44,7 +44,8 @@ data = tc.SFrame('cats-dogs.sframe')
4444
# Make a train-test split
4545
train_data, test_data = data.random_split(0.8)
4646

47-
# Automatically picks the right model based on your data.
47+
# Automatically pick the right model based on your data.
48+
# Note: Because the dataset is large, model creation may take hours.
4849
model = tc.image_classifier.create(train_data, target='label')
4950

5051
# Save predictions to an SArray

userguide/image_similarity/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ unsupervised.
1313
In this example, we use the [Caltech-101
1414
dataset](http://www.vision.caltech.edu/Image_Datasets/Caltech101/)
1515
which contains images objects belonging to 101 categories with about 40
16-
to 800 images per category.
16+
to 800 images per category.[<sup>1</sup>](../datasets.md)
1717

1818
```python
1919
import turicreate as tc

userguide/recommender/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ interaction data and use that model to make recommendations.
99
Creating a recommender model typically requires a data set to use for
1010
training the model, with columns that contain the user IDs, the item
1111
IDs, and (optionally) the ratings. For this example, we use the [MovieLens
12-
dataset](https://grouplens.org/datasets/movielens/).
12+
20M dataset](https://grouplens.org/datasets/movielens/20m/).[<sup>1</sup>](../datasets.md)
1313

1414
```python
1515
import turicreate as tc

userguide/sframe/sframe-intro.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ A very common data format is the comma separated value (csv) file, which
1313
is what we'll use for these examples. We will use some preprocessed data from
1414
the
1515
[Million Song Dataset](https://labrosa.ee.columbia.edu/millionsong/) to
16-
aid our SFrame-related examples. The first table contains metadata
16+
aid our SFrame-related examples.[<sup>1</sup>](../datasets.md) The first table contains metadata
1717
about each song in the database. Here's how we load it into an SFrame:
1818

1919
```python

userguide/supervised-learning/boosted_trees_classifier.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ decision trees.
1010

1111
##### Introductory Example
1212

13-
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).
13+
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).[<sup>1</sup>](../datasets.md)
1414
```python
1515
import turicreate as tc
1616

userguide/supervised-learning/boosted_trees_regression.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ The algorithm simply fit a new decision tree to the residual at each iteration.
5151

5252
##### Introductory Example
5353

54-
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).
54+
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).[<sup>1</sup>](../datasets.md)
5555

5656
```python
5757
import turicreate as tc

userguide/supervised-learning/decision_tree_classifier.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ on decision trees.
88

99
##### Introductory Example
1010

11-
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).
11+
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).[<sup>1</sup>](../datasets.md)
1212
```python
1313
import turicreate as tc
1414

userguide/supervised-learning/decision_tree_regression.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ for more details).
1111

1212
##### Introductory Example
1313

14-
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).
14+
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).[<sup>1</sup>](../datasets.md)
1515

1616
```python
1717
import turicreate as tc

userguide/supervised-learning/random_forest_classifier.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ forests.
88

99
##### Introductory Example
1010

11-
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).
11+
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).[<sup>1</sup>](../datasets.md)
1212
```python
1313
import turicreate as tc
1414

userguide/supervised-learning/random_forest_regression.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ forests, all the base models are constructed independently using a
2424

2525
##### Introductory Example
2626

27-
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).
27+
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).[<sup>1</sup>](../datasets.md)
2828

2929
```python
3030
import turicreate as tc

0 commit comments

Comments
 (0)