Skip to content

Commit

Permalink
docs: add docs for hub datasets (ultralytics#3212)
Browse files Browse the repository at this point in the history
Co-authored-by: Glenn Jocher <[email protected]>
  • Loading branch information
sergiuwaxmann and glenn-jocher authored Jun 17, 2023
1 parent a0ba8ef commit 62916b3
Show file tree
Hide file tree
Showing 2 changed files with 145 additions and 33 deletions.
164 changes: 136 additions & 28 deletions docs/hub/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,46 +6,154 @@ keywords: Ultralytics, HUB, Datasets, Upload, Visualize, Train, Custom Data, YAM

# HUB Datasets

## 1. Upload a Dataset
Ultralytics HUB datasets are a practical solution for managing and leveraging your custom datasets.

Ultralytics HUB datasets are just like YOLOv5 and YOLOv8 🚀 datasets, they use the same structure and the same label formats to keep
Once uploaded, datasets can be immediately utilized for model training. This integrated approach facilitates a seamless transition from dataset management to model training, significantly simplifying the entire process.

## Upload Dataset

Ultralytics HUB datasets are just like YOLOv5 and YOLOv8 🚀 datasets. They use the same structure and the same label formats to keep
everything simple.

When you upload a dataset to Ultralytics HUB, make sure to **place your dataset YAML inside the dataset root directory**
as in the example shown below, and then zip for upload to [https://hub.ultralytics.com](https://hub.ultralytics.com/). Your **dataset YAML, directory
and zip** should all share the same name. For example, if your dataset is called 'coco8' as in our
example [ultralytics/hub/example_datasets/coco8.zip](https://github.com/ultralytics/hub/blob/master/example_datasets/coco8.zip), then you should have a `coco8.yaml` inside your `coco8/` directory, which should zip to create `coco8.zip` for upload:
Before you upload a dataset to Ultralytics HUB, make sure to **place your dataset YAML file inside the dataset root directory** and that **your dataset YAML, directory and ZIP have the same name**, as shown in the example below, and then zip the dataset directory.

For example, if your dataset is called "coco8", as our [COCO8](https://docs.ultralytics.com/datasets/detect/coco8) example dataset, then you should have a `coco8.yaml` inside your `coco8/` directory, which will create a `coco8.zip` when zipped:

```bash
zip -r coco8.zip coco8
```

The [example_datasets/coco8.zip](https://github.com/ultralytics/hub/blob/master/example_datasets/coco8.zip) dataset in this repository can be downloaded and unzipped to see exactly how to structure your custom dataset.
You can download our [COCO8](https://github.com/ultralytics/hub/blob/master/example_datasets/coco8.zip) example dataset and unzip it to see exactly how to structure your dataset.

<p align="center">
<img width="80%" src="https://user-images.githubusercontent.com/26833433/201424843-20fa081b-ad4b-4d6c-a095-e810775908d8.png" title="COCO8" />
<img src="https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_upload_dataset_1.jpg" alt="COCO8 Dataset Structure" width="80%" />
</p>

The dataset YAML is the same standard YOLOv5 and YOLOv8 YAML format. See
the [YOLOv5 and YOLOv8 Train Custom Data tutorial](https://docs.ultralytics.com/yolov5/tutorials/train_custom_data/) for full details.

```yaml
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: # dataset root dir (leave empty for HUB)
train: images/train # train images (relative to 'path') 8 images
val: images/val # val images (relative to 'path') 8 images
test: # test images (optional)

# Classes
names:
0: person
1: bicycle
2: car
3: motorcycle
...
The dataset YAML is the same standard YOLOv5 and YOLOv8 YAML format.

!!! example "coco8.yaml"

```yaml
--8<-- "ultralytics/datasets/coco8.yaml"
```

After zipping your dataset, you should validate it before uploading it to Ultralytics HUB. Ultralytics HUB conducts the dataset validation check post-upload, so by ensuring your dataset is correctly formatted and error-free ahead of time, you can forestall any setbacks due to dataset rejection.

```py
from ultralytics.hub import check_dataset
check_dataset('path/to/coco8.zip')
```

After zipping your dataset, sign in to [Ultralytics HUB](https://bit.ly/ultralytics_hub) and click the Datasets tab.
Click 'Upload Dataset' to upload, scan and visualize your new dataset before training new YOLOv5 or YOLOv8 models on it!
Once your dataset ZIP is ready, navigate to the [Datasets](https://hub.ultralytics.com/datasets) page by clicking on the **Datasets** button in the sidebar.

![Ultralytics HUB screenshot of the Home page with an arrow pointing to the Datasets button in the sidebar](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_upload_dataset_2.jpg)

??? tip "Tip"

You can also upload a dataset directly from the [Home](https://hub.ultralytics.com/home) page.

![Ultralytics HUB screenshot of the Home page with an arrow pointing to the Upload Dataset card](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_upload_dataset_3.jpg)

Click on the **Upload Dataset** button on the top right of the page. This action will trigger the **Upload Dataset** dialog.

![Ultralytics HUB screenshot of the Dataset page with an arrow pointing to the Upload Dataset button](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_upload_dataset_4.jpg)

Upload your dataset in the _Dataset .zip file_ field.

You have the additional option to set a custom name and description for your Ultralytics HUB dataset.

When you're happy with your dataset configuration, click **Upload**.

![Ultralytics HUB screenshot of the Upload Dataset dialog with an arrow pointing to the Upload button](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_upload_dataset_5.jpg)

After your dataset is uploaded and processed, you will be able to access it from the Datasets page.

![Ultralytics HUB screenshot of the Datasets page with an arrow pointing to one of the datasets](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_upload_dataset_6.jpg)

You can view the images in your dataset grouped by splits (Train, Validation, Test).

![Ultralytics HUB screenshot of the Dataset page with an arrow pointing to the Images tab](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_upload_dataset_7.jpg)

??? tip "Tip"

Each image can be enlarged for better visualization.

![Ultralytics HUB screenshot of the Images tab inside the Dataset page with an arrow pointing to the expand icon](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_upload_dataset_8.jpg)

![Ultralytics HUB screenshot of the Images tab inside the Dataset page with one of the images expanded](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_upload_dataset_9.jpg)

Also, you can analyze your dataset by click on the **Overview** tab.

![Ultralytics HUB screenshot of the Dataset page with an arrow pointing to the Overview tab](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_upload_dataset_10.jpg)

Next, [train a model](./models.md) on your dataset.

![Ultralytics HUB screenshot of the Dataset page with an arrow pointing to the Train Model button](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_upload_dataset_11.jpg)

## Share Dataset

!!! info "Info"

Ultralytics HUB's sharing functionality provides a convenient way to share datasets with others. This feature is designed to accommodate both existing Ultralytics HUB users and those who have yet to create an account.

??? note "Note"

You have control over the general access of your datasets.

You can choose to set the general access to "Private", in which case, only you will have access to it. Alternatively, you can set the general access to "Unlisted" which grants viewing access to anyone who has the direct link to the dataset, regardless of whether they have an Ultralytics HUB account or not.

Navigate to the Dataset page of the dataset you want to share, open the dataset actions dropdown and click on the **Share** option. This action will trigger the **Share Dataset** dialog.

![Ultralytics HUB screenshot of the Dataset page with an arrow pointing to the Share option](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_share_dataset_1.jpg)

??? tip "Tip"

You can also share a dataset directly from the [Datasets](https://hub.ultralytics.com/datasets) page.

![Ultralytics HUB screenshot of the Datasets page with an arrow pointing to the Share option of one of the datasets](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_share_dataset_2.jpg)

Set the general access to "Unlisted" and click **Save**.

![Ultralytics HUB screenshot of the Share Dataset dialog with an arrow pointing to the dropdown and one to the Save button](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_share_dataset_3.jpg)

Now, anyone who has the direct link to your dataset can view it.

??? tip "Tip"

You can easily click on the dataset's link shown in the **Share Dataset** dialog to copy it.

![Ultralytics HUB screenshot of the Share Dataset dialog with an arrow pointing to the dataset's link](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_share_dataset_4.jpg)

## Edit Dataset

Navigate to the Dataset page of the dataset you want to edit, open the dataset actions dropdown and click on the **Edit** option. This action will trigger the **Update Dataset** dialog.

![Ultralytics HUB screenshot of the Dataset page with an arrow pointing to the Edit option](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_edit_dataset_1.jpg)

??? tip "Tip"

You can also edit a dataset directly from the [Datasets](https://hub.ultralytics.com/datasets) page.

![Ultralytics HUB screenshot of the Datasets page with an arrow pointing to the Edit option of one of the datasets](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_edit_dataset_2.jpg)

Apply the desired modifications to your dataset and then confirm the changes by clicking **Save**.

![Ultralytics HUB screenshot of the Update Dataset dialog with an arrow pointing to the Save button](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_edit_dataset_3.jpg)

## Delete Dataset

Navigate to the Dataset page of the dataset you want to delete, open the dataset actions dropdown and click on the **Delete** option. This action will delete the dataset.

![Ultralytics HUB screenshot of the Dataset page with an arrow pointing to the Delete option](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_delete_dataset_1.jpg)

??? tip "Tip"

You can also delete a dataset directly from the [Datasets](https://hub.ultralytics.com/datasets) page.

![Ultralytics HUB screenshot of the Datasets page with an arrow pointing to the Delete option of one of the datasets](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_delete_dataset_2.jpg)

??? note "Note"

If you change your mind, you can restore the dataset from the [Trash](https://hub.ultralytics.com/trash) page.

<img width="100%" alt="HUB Dataset Upload" src="https://user-images.githubusercontent.com/26833433/216763338-9a8812c8-a4e5-4362-8102-40dad7818396.png">
![Ultralytics HUB screenshot of the Trash page with an arrow pointing to the Restore option of one of the datasets](https://raw.githubusercontent.com/ultralytics/assets/main/docs/hub/datasets/hub_delete_dataset_3.jpg)
14 changes: 9 additions & 5 deletions ultralytics/hub/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,17 +93,21 @@ def get_export(model_id='', format='torchscript'):

def check_dataset(path='', task='detect'):
"""
Function for error-checking HUB dataset Zip file before upload
Function for error-checking HUB dataset Zip file before upload. It checks a dataset for errors before it is
uploaded to the HUB. Usage examples are given below.
Arguments
path: Path to data.zip (with data.yaml inside data.zip)
task: Dataset task. Options are 'detect', 'segment', 'pose', 'classify'.
Args:
path (str, optional): Path to data.zip (with data.yaml inside data.zip). Defaults to ''.
task (str, optional): Dataset task. Options are 'detect', 'segment', 'pose', 'classify'. Defaults to 'detect'.
Usage
Example:
```python
from ultralytics.hub import check_dataset
check_dataset('path/to/coco8.zip', task='detect') # detect dataset
check_dataset('path/to/coco8-seg.zip', task='segment') # segment dataset
check_dataset('path/to/coco8-pose.zip', task='pose') # pose dataset
```
"""
HUBDatasetStats(path=path, task=task).get_json()
LOGGER.info('Checks completed correctly ✅. Upload this dataset to https://hub.ultralytics.com/datasets/.')
Expand Down

0 comments on commit 62916b3

Please sign in to comment.