Skip to content

Commit

Permalink
ultralytics 8.0.97 confusion matrix, windows, docs updates (ultraly…
Browse files Browse the repository at this point in the history
…tics#2511)

Co-authored-by: Yonghye Kwon <[email protected]>
Co-authored-by: Dowon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Laughing <[email protected]>
  • Loading branch information
5 people authored May 9, 2023
1 parent 6ee3a9a commit d1107ca
Show file tree
Hide file tree
Showing 138 changed files with 747 additions and 354 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v3
with:
fetch-depth: "0" # pulls all commits (needed correct last updated dates in Docs)
- name: Set up Python environment
uses: actions/setup-python@v4
with:
Expand Down
6 changes: 5 additions & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
---
description: Learn how to install the Ultralytics package in developer mode and build/serve locally using MkDocs. Deploy your project to your host easily.
---

# Ultralytics Docs

Ultralytics Docs are deployed to [https://docs.ultralytics.com](https://docs.ultralytics.com).
Expand Down Expand Up @@ -82,4 +86,4 @@ for your repository and updating the "Custom domain" field in the "GitHub Pages"
![196814117-fc16e711-d2be-4722-9536-b7c6d78fd167](https://user-images.githubusercontent.com/26833433/210150206-9e86dcd7-10af-43e4-9eb2-9518b3799eac.png)

For more information on deploying your MkDocs documentation site, see
the [MkDocs documentation](https://www.mkdocs.org/user-guide/deploying-your-docs/).
the [MkDocs documentation](https://www.mkdocs.org/user-guide/deploying-your-docs/).
6 changes: 5 additions & 1 deletion docs/SECURITY.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
---
description: Learn how Ultralytics prioritize security. Get insights into Snyk and GitHub CodeQL scans, and how to report security issues in YOLOv8.
---

# Security Policy

At [Ultralytics](https://ultralytics.com), the security of our users' data and systems is of utmost importance. To
Expand Down Expand Up @@ -25,4 +29,4 @@ reach out to us directly via our [contact form](https://ultralytics.com/contact)
via [[email protected]](mailto:[email protected]). Our security team will investigate and respond as soon
as possible.

We appreciate your help in keeping the YOLOv8 repository secure and safe for everyone.
We appreciate your help in keeping the YOLOv8 repository secure and safe for everyone.
3 changes: 3 additions & 0 deletions docs/datasets/classify/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
comments: true
description: Learn how torchvision organizes classification image datasets. Use this code to create and train models. CLI and Python code shown.
---

# Image Classification Datasets Overview
Expand Down Expand Up @@ -77,6 +78,7 @@ cifar-10-/
In this example, the `train` directory contains subdirectories for each class in the dataset, and each class subdirectory contains all the images for that class. The `test` directory has a similar structure. The `root` directory also contains other files that are part of the CIFAR10 dataset.

## Usage

!!! example ""

=== "Python"
Expand All @@ -98,4 +100,5 @@ In this example, the `train` directory contains subdirectories for each class in
```

## Supported Datasets

TODO
1 change: 1 addition & 0 deletions docs/datasets/detect/coco.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
comments: true
description: Learn about the COCO dataset, designed to encourage research on object detection, segmentation, and captioning with standardized evaluation metrics.
---

# COCO Dataset
Expand Down
15 changes: 10 additions & 5 deletions docs/datasets/detect/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
comments: true
description: Learn about supported dataset formats for training YOLO detection models, including Ultralytics YOLO and COCO, in this Object Detection Datasets Overview.
---

# Object Detection Datasets Overview
Expand All @@ -15,11 +16,12 @@ The dataset format used for training YOLO detection models is as follows:
1. One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the ".txt" extension.
2. One row per object: Each row in the text file corresponds to one object instance in the image.
3. Object information per row: Each row contains the following information about the object instance:
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
- Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1.
- Object width and height: The width and height of the object, normalized to be between 0 and 1.
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
- Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1.
- Object width and height: The width and height of the object, normalized to be between 0 and 1.

The format for a single row in the detection dataset file is as follows:

```
<object-class> <x> <y> <width> <height>
```
Expand Down Expand Up @@ -55,6 +57,7 @@ The `names` field is a list of the names of the object classes. The order of the
NOTE: Either `nc` or `names` must be defined. Defining both are not mandatory

Alternatively, you can directly define class names like this:

```yaml
names:
0: person
Expand All @@ -72,6 +75,7 @@ names: ['person', 'car']
```
## Usage
!!! example ""
=== "Python"
Expand All @@ -93,6 +97,7 @@ names: ['person', 'car']
```

## Supported Datasets

TODO

## Port or Convert label formats
Expand All @@ -103,4 +108,4 @@ TODO
from ultralytics.yolo.data.converter import convert_coco

convert_coco(labels_dir='../coco/annotations/')
```
```
49 changes: 25 additions & 24 deletions docs/datasets/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
comments: true
description: Ultralytics provides support for various datasets to facilitate multiple computer vision tasks. Check out our list of main datasets and their summaries.
---

# Datasets Overview
Expand All @@ -10,48 +11,48 @@ Ultralytics provides support for various datasets to facilitate computer vision

Bounding box object detection is a computer vision technique that involves detecting and localizing objects in an image by drawing a bounding box around each object.

* [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations.
* [COCO](detect/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning with over 200K labeled images.
* [COCO8](detect/coco8.md): Contains the first 4 images from COCO train and COCO val, suitable for quick tests.
* [Global Wheat 2020](detect/globalwheat2020.md): A dataset of wheat head images collected from around the world for object detection and localization tasks.
* [Objects365](detect/objects365.md): A high-quality, large-scale dataset for object detection with 365 object categories and over 600K annotated images.
* [SKU-110K](detect/sku-110k.md): A dataset featuring dense object detection in retail environments with over 11K images and 1.7 million bounding boxes.
* [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
* [VOC](detect/voc.md): The Pascal Visual Object Classes (VOC) dataset for object detection and segmentation with 20 object classes and over 11K images.
* [xView](detect/xview.md): A dataset for object detection in overhead imagery with 60 object categories and over 1 million annotated objects.
* [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations.
* [COCO](detect/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning with over 200K labeled images.
* [COCO8](detect/coco8.md): Contains the first 4 images from COCO train and COCO val, suitable for quick tests.
* [Global Wheat 2020](detect/globalwheat2020.md): A dataset of wheat head images collected from around the world for object detection and localization tasks.
* [Objects365](detect/objects365.md): A high-quality, large-scale dataset for object detection with 365 object categories and over 600K annotated images.
* [SKU-110K](detect/sku-110k.md): A dataset featuring dense object detection in retail environments with over 11K images and 1.7 million bounding boxes.
* [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
* [VOC](detect/voc.md): The Pascal Visual Object Classes (VOC) dataset for object detection and segmentation with 20 object classes and over 11K images.
* [xView](detect/xview.md): A dataset for object detection in overhead imagery with 60 object categories and over 1 million annotated objects.

## [Instance Segmentation Datasets](segment/index.md)

Instance segmentation is a computer vision technique that involves identifying and localizing objects in an image at the pixel level.

* [COCO](segment/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning tasks with over 200K labeled images.
* [COCO8-seg](segment/coco8-seg.md): A smaller dataset for instance segmentation tasks, containing a subset of 8 COCO images with segmentation annotations.
* [COCO](segment/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning tasks with over 200K labeled images.
* [COCO8-seg](segment/coco8-seg.md): A smaller dataset for instance segmentation tasks, containing a subset of 8 COCO images with segmentation annotations.

## [Pose Estimation](pose/index.md)

Pose estimation is a technique used to determine the pose of the object relative to the camera or the world coordinate system.

* [COCO](pose/coco.md): A large-scale dataset with human pose annotations designed for pose estimation tasks.
* [COCO8-pose](pose/coco8-pose.md): A smaller dataset for pose estimation tasks, containing a subset of 8 COCO images with human pose annotations.
* [COCO](pose/coco.md): A large-scale dataset with human pose annotations designed for pose estimation tasks.
* [COCO8-pose](pose/coco8-pose.md): A smaller dataset for pose estimation tasks, containing a subset of 8 COCO images with human pose annotations.

## [Classification](classify/index.md)

Image classification is a computer vision task that involves categorizing an image into one or more predefined classes or categories based on its visual content.

* [Caltech 101](classify/caltech101.md): A dataset containing images of 101 object categories for image classification tasks.
* [Caltech 256](classify/caltech256.md): An extended version of Caltech 101 with 256 object categories and more challenging images.
* [CIFAR-10](classify/cifar10.md): A dataset of 60K 32x32 color images in 10 classes, with 6K images per class.
* [CIFAR-100](classify/cifar100.md): An extended version of CIFAR-10 with 100 object categories and 600 images per class.
* [Fashion-MNIST](classify/fashion-mnist.md): A dataset consisting of 70,000 grayscale images of 10 fashion categories for image classification tasks.
* [ImageNet](classify/imagenet.md): A large-scale dataset for object detection and image classification with over 14 million images and 20,000 categories.
* [ImageNet-10](classify/imagenet10.md): A smaller subset of ImageNet with 10 categories for faster experimentation and testing.
* [Imagenette](classify/imagenette.md): A smaller subset of ImageNet that contains 10 easily distinguishable classes for quicker training and testing.
* [Imagewoof](classify/imagewoof.md): A more challenging subset of ImageNet containing 10 dog breed categories for image classification tasks.
* [MNIST](classify/mnist.md): A dataset of 70,000 grayscale images of handwritten digits for image classification tasks.
* [Caltech 101](classify/caltech101.md): A dataset containing images of 101 object categories for image classification tasks.
* [Caltech 256](classify/caltech256.md): An extended version of Caltech 101 with 256 object categories and more challenging images.
* [CIFAR-10](classify/cifar10.md): A dataset of 60K 32x32 color images in 10 classes, with 6K images per class.
* [CIFAR-100](classify/cifar100.md): An extended version of CIFAR-10 with 100 object categories and 600 images per class.
* [Fashion-MNIST](classify/fashion-mnist.md): A dataset consisting of 70,000 grayscale images of 10 fashion categories for image classification tasks.
* [ImageNet](classify/imagenet.md): A large-scale dataset for object detection and image classification with over 14 million images and 20,000 categories.
* [ImageNet-10](classify/imagenet10.md): A smaller subset of ImageNet with 10 categories for faster experimentation and testing.
* [Imagenette](classify/imagenette.md): A smaller subset of ImageNet that contains 10 easily distinguishable classes for quicker training and testing.
* [Imagewoof](classify/imagewoof.md): A more challenging subset of ImageNet containing 10 dog breed categories for image classification tasks.
* [MNIST](classify/mnist.md): A dataset of 70,000 grayscale images of handwritten digits for image classification tasks.

## [Multi-Object Tracking](track/index.md)

Multi-object tracking is a computer vision technique that involves detecting and tracking multiple objects over time in a video sequence.

* [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations for multi-object tracking tasks.
* [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
* [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
22 changes: 13 additions & 9 deletions docs/datasets/pose/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
comments: true
description: Learn how to format your dataset for training YOLO models with Ultralytics YOLO format using our concise tutorial and example YAML files.
---

# Pose Estimation Datasets Overview
Expand All @@ -15,26 +16,26 @@ The dataset format used for training YOLO segmentation models is as follows:
1. One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the ".txt" extension.
2. One row per object: Each row in the text file corresponds to one object instance in the image.
3. Object information per row: Each row contains the following information about the object instance:
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
- Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1.
- Object width and height: The width and height of the object, normalized to be between 0 and 1.
- Object keypoint coordinates: The keypoints of the object, normalized to be between 0 and 1.
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
- Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1.
- Object width and height: The width and height of the object, normalized to be between 0 and 1.
- Object keypoint coordinates: The keypoints of the object, normalized to be between 0 and 1.

Here is an example of the label format for pose estimation task:

Format with Dim = 2

```
<class-index> <x> <y> <width> <height> <px1> <py1> <px2> <py2> <pxn> <pyn>
<class-index> <x> <y> <width> <height> <px1> <py1> <px2> <py2> ... <pxn> <pyn>
```

Format with Dim = 3

```
<class-index> <x> <y> <width> <height> <px1> <py1> <p1-visibility> <px2> <py2> <p2-visibility> <pxn> <pyn> <p2-visibility>
```

In this format, `<class-index>` is the index of the class for the object,`<x> <y> <width> <height>` are coordinates of boudning box, and `<px1> <py1> <px2> <py2> <pxn> <pyn>` are the pixel coordinates of the keypoints. The coordinates are separated by spaces.

In this format, `<class-index>` is the index of the class for the object,`<x> <y> <width> <height>` are coordinates of boudning box, and `<px1> <py1> <px2> <py2> ... <pxn> <pyn>` are the pixel coordinates of the keypoints. The coordinates are separated by spaces.

** Dataset file format **

Expand Down Expand Up @@ -62,14 +63,15 @@ The `names` field is a list of the names of the object classes. The order of the
NOTE: Either `nc` or `names` must be defined. Defining both are not mandatory

Alternatively, you can directly define class names like this:

```
names:
0: person
1: bicycle
```

(Optional) if the points are symmetric then need flip_idx, like left-right side of human or face.
For example let's say there're five keypoints of facial landmark: [left eye, right eye, nose, left point of mouth, right point of mouse], and the original index is [0, 1, 2, 3, 4], then flip_idx is [1, 0, 2, 4, 3].(just exchange the left-right index, i.e 0-1 and 3-4, and do not modify others like nose in this example)
For example let's say there're five keypoints of facial landmark: [left eye, right eye, nose, left point of mouth, right point of mouse], and the original index is [0, 1, 2, 3, 4], then flip_idx is [1, 0, 2, 4, 3].(just exchange the left-right index, i.e 0-1 and 3-4, and do not modify others like nose in this example)

** Example **

Expand All @@ -86,6 +88,7 @@ flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]
```

## Usage

!!! example ""

=== "Python"
Expand All @@ -107,6 +110,7 @@ flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]
```

## Supported Datasets

TODO

## Port or Convert label formats
Expand All @@ -117,4 +121,4 @@ TODO
from ultralytics.yolo.data.converter import convert_coco

convert_coco(labels_dir='../coco/annotations/', use_keypoints=True)
```
```
12 changes: 8 additions & 4 deletions docs/datasets/segment/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
comments: true
description: Learn about the Ultralytics YOLO dataset format for segmentation models. Use YAML to train Detection Models. Convert COCO to YOLO format using Python.
---

# Instance Segmentation Datasets Overview
Expand All @@ -15,23 +16,24 @@ The dataset format used for training YOLO segmentation models is as follows:
1. One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the ".txt" extension.
2. One row per object: Each row in the text file corresponds to one object instance in the image.
3. Object information per row: Each row contains the following information about the object instance:
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
- Object bounding coordinates: The bounding coordinates around the mask area, normalized to be between 0 and 1.
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
- Object bounding coordinates: The bounding coordinates around the mask area, normalized to be between 0 and 1.

The format for a single row in the segmentation dataset file is as follows:

```
<class-index> <x1> <y1> <x2> <y2> ... <xn> <yn>
```

In this format, `<class-index>` is the index of the class for the object, and `<x1> <y1> <x2> <y2> ... <xn> <yn>` are the bounding coordinates of the object's segmentation mask. The coordinates are separated by spaces.
In this format, `<class-index>` is the index of the class for the object, and `<x1> <y1> <x2> <y2> ... <xn> <yn>` are the bounding coordinates of the object's segmentation mask. The coordinates are separated by spaces.

Here is an example of the YOLO dataset format for a single image with two object instances:

```
0 0.6812 0.48541 0.67 0.4875 0.67656 0.487 0.675 0.489 0.66
1 0.5046 0.0 0.5015 0.004 0.4984 0.00416 0.4937 0.010 0.492 0.0104
```

Note: The length of each row does not have to be equal.

** Dataset file format **
Expand All @@ -56,6 +58,7 @@ The `names` field is a list of the names of the object classes. The order of the
NOTE: Either `nc` or `names` must be defined. Defining both are not mandatory.

Alternatively, you can directly define class names like this:

```yaml
names:
0: person
Expand All @@ -73,6 +76,7 @@ names: ['person', 'car']
```

## Usage

!!! example ""

=== "Python"
Expand Down Expand Up @@ -103,4 +107,4 @@ names: ['person', 'car']
from ultralytics.yolo.data.converter import convert_coco

convert_coco(labels_dir='../coco/annotations/', use_segments=True)
```
```
Loading

0 comments on commit d1107ca

Please sign in to comment.