This repository collects the most relevant studies applying Deep Learning for Polyp Detection and Classification in Colonoscopy from a technical point of view, focusing on the low-level details for the implementation of the DL models. In first place, each study is categorized in three types: (i) polyp detection and localization, (ii) polyp classification, and (iii) simultaneous polyp detection and classification. Secondly, a summary of the public datasets available as well as the private datasets used in the studies is provided. The third section focuses on technical aspects such as the Deep Learning architectures, the data augmentation techniques and the libraries and frameworks used. Finally, the fourth section summarizes the performance metrics reported by each study.
Suggestions are welcome, please check the contribution guidelines before submitting a pull request.
Table of Contents:
- Research
- Datasets
- Deep Learning Models and Architectures
- Performance
- List of Acronyms and Abbreviations
- References and Further Reading
Study | Date | Endoscopy type | Imaging technology | Localization type | Multiple polyp | Real time |
---|---|---|---|---|---|---|
Tajbakhsh et al. 2014, Tajbakhsh et al. 2015 | Sept. 2014 / Apr. 2015 | Conventional | N/A | Bounding box | No | Yes |
Zhu R. et al. 2015 | Oct. 2015 | Conventional | N/A | Bounding box (16x16 patches) | Yes | No |
Park and Sargent 2016 | March 2016 | Conventional | NBI, WL | Bounding box | No | No |
Yu et al. 2017 | Jan. 2017 | Conventional | NBI, WL | Bounding box | No | No |
Zhang R. et al. 2017 | Jan. 2017 | Conventional | NBI, WL | No | No | No |
Yuan and Meng 2017 | Feb. 2017 | WCE | N/A | No | No | No |
Brandao et al. 2018 | Feb. 2018 | Conventional/WCE | N/A | Binary mask | Yes | No |
Zhang R. et al. 2018 | May 2018 | Conventional | WL | Bounding box | No | No |
Misawa et al. 2018 | June 2018 | Conventional | WL | No | Yes | No |
Zheng Y. et al. 2018 | July 2018 | Conventional | NBI, WL | Bounding box | Yes | Yes |
Shin Y. et al. 2018 | July 2018 | Conventional | WL | Bounding box | Yes | No |
Urban et al. 2018 | Sep. 2018 | Conventional | NBI, WL | Bounding box | No | Yes |
Mohammed et al. 2018, GitHub | Sep. 2018 | Conventional | WL | Binary mask | Yes | Yes |
Wang et al. 2018, Wang et al. 2018 | Oct. 2018 | Conventional | N/A | Binary mask | Yes | Yes |
Qadir et al. 2019 | Apr. 2019 | Conventional | NBI, WL | Bounding box | Yes | No |
Blanes-Vidal et al. 2019 | March 2019 | WCE | N/A | Bounding box | Yes | No |
Zhang X. et al. 2019 | March 2019 | Conventional | N/A | Bounding box | Yes | Yes |
Misawa et al. 2019 | June 2019 | Conventional | N/A | No | Yes | No |
Zhu X. et al. 2019 | June 2019 | Conventional | N/A | No | No | Yes |
Ahmad et al. 2019 | June 2019 | Conventional | WL | Bounding box | Yes | Yes |
Sornapudi et al. 2019 | June 2019 | Conventional/WCE | N/A | Binary mask | Yes | No |
Wittenberg et al. 2019 | Sept. 2019 | Conventional | WL | Binary mask | Yes | No |
Ma Y. et al. 2019 | Oct. 2019 | Conventional | N/A | Bounding box | Yes | No |
Study | Date | Endoscopy type | Imaging technology | Classes | Real time |
---|---|---|---|---|---|
Ribeiro et al. 2016 | Oct. 2016 | Conventional | WL | Neoplastic vs. Non-neoplastic | No |
Zhang R. et al. 2017 | Jan. 2017 | Conventional | NBI, WL | Adenoma vs. hyperplastic Resectable vs. non-resectable Adenoma vs. hyperplastic vs. serrated |
No |
Byrne et al. 2017 | Oct. 2017 | Conventional | NBI | Adenoma vs. hyperplastic | Yes |
Komeda et al. 2017 | Dec. 2017 | Conventional | NBI, WL, Chromoendoscopy | Adenoma vs. non-adenoma | No |
Chen et al. 2018 | Feb. 2018 | Conventional | NBI | Neoplastic vs. hyperplastic | No |
Lui et al. 2019 | Apr. 2019 | Conventional | NBI, WL | Endoscopically curable lesions vs. endoscopically incurable lesion | No |
Kandel et al. 2019 | June 2019 | Conventional | N/A | Adenoma vs. hyperplastic vs. traditional serrated adenoma | No |
Cheng Tao Pu et al. 2020 | Feb. 2020 | Conventional | NBI, BLI | Modified Sano's (MS) classification: MS I - Hyperplastic, MS II - Low-grade tubular adenomas, MS IIo - Nondysplastic or low-grade sessile serrated adenoma/polyp (SSA/P), MS IIIa - Tubulovillous adenomas or villous adenomas or any high-grade colorectal lesion, MS IIIb - Invasive colorectal cancers | Yes |
Study | Date | Endoscopy type | Imaging technology | Localization type | Multiple polyp | Classes | Real time |
---|---|---|---|---|---|---|---|
Liu X. et al. 2019 | Oct. 2019 | Conventional | WL | Bounding box | Yes | Polyp vs. adenoma | No |
Dataset | References | Description | Format | Resolution (w x h) | Ground truth | Used in |
---|---|---|---|---|---|---|
CVC-ClinicDB | Bernal et al. 2015 https://polyp.grand-challenge.org/CVCClinicDB/ |
612 sequential WL images with polyps extracted from 31 sequences with 31 different polyps. | Image | 388 × 284 | Polyp locations (binary mask) | Brandao et al. 2018, Zheng et al. 2018, Shin Y. et al. 2018, Wang et al. 2018, Qadir et al. 2019, Sornapudi et al. 2019, Wittenberg et al. 2019 |
CVC-ColonDB | Bernal et al. 2012 Vázquez et al. 2017 |
380 sequential WL images with polyps extracted from 15 videos. | Image | 574 × 500 | Polyp locations (binary mask) | Tajbakhsh et al. 2015, Brandao et al. 2018, Zheng et al. 2018, Sornapudi et al. 2019 |
CVC-PolypHD | Bernal et al. 2012 Vázquez et al. 2017 |
56 WL images. | Image | 1920 × 1080 | Polyp locations (binary mask) | Sornapudi et al. 2019 |
ETIS-Larib | Silva et al. 2014 https://polyp.grand-challenge.org/EtisLarib/ |
196 WL images with polyps extracted from 34 sequences with 44 different polyps. | Image | 1225 × 966 | Polyp locations (binary mask) | Brandao et al. 2018, Zheng et al. 2018, Shin Y. et al. 2018, Ahmad et al. 2019, Sornapudi et al. 2019, Wittenberg et al. 2019 |
Kvasir-SEG | Pogorelov et al. 2017 https://datasets.simula.no/kvasir-seg |
1 000 polyp images | Image | Various resolutions | Polyp locations (binary mask) | - |
ASU-Mayo Clinic Colonoscopy Video | Tajbakhsh et al. 2016 https://polyp.grand-challenge.org/AsuMayo/ |
38 small SD and HD video sequences: 20 training videos annotated with ground truth and 18 testing videos without ground truth annotations. WL and NBI. | Video | N/A | Polyp locations (binary mask) | Yu et al. 2017, Brandao et al. 2018, Zhang R. et al. 2018, Ahmad et al. 2019, Sornapudi et al. 2019, Wittenberg et al. 2019, Mohammed et al. 2018 |
CVC-ClinicVideoDB | Angermann et al. 2017 | 18 SD videos. | Video | 768 × 576 | Polyp locations (binary mask) | Shin Y. et al. 2018, Qadir et al. 2019 |
Colonoscopic Dataset | Mesejo et al. 2016 http://www.depeca.uah.es/colonoscopy_dataset/ |
76 short videos (both NBI and WL). | Video | 768 × 576 | Polyp classification (Hyperplastic vs. adenoma vs. serrated) | Zhang R. et al. 2017 |
Study | Patients | No. Images | No. Videos | No. Unique Polyps | Purpose | Comments |
---|---|---|---|---|---|---|
Tajbakhsh et al. 2015 | N/A | 35 000 With polyps: 7 000 Without polyps: 28 000 |
40 short videos (20 positive and 20 negative) | N/A | Polyp localization | - |
Zhu R. et al. 2015 | N/A | 180 | - | N/A | Polyp localization | - |
Park and Sargent 2016 | N/A | 652 With polyps: 92 |
35 (20’ to 40’) | N/A | Polyp localization | - |
Ribeiro et al. 2016 | 66 to 86 | 85 to 126 | - | N/A | Polyp classification (neoplastic vs non-neoplastic) | 8 datasets by combining: (i) with or without staining mucosa, (ii) 4 acquisition modes (without CVC, i-Scan1, i-Scan2, i-Scan3). |
Zhang R. et al. 2017, Zheng et al. 2018 | N/A | 1930 Without polyps: 1 104 Hyperplastic: 263 Adenomatous: 563 |
- | 215 polyps (65 hyperplastic and 150 adenomatous) | Polyp classification (hyperplastic vs. adenomatous) | PWH Database. Images taken under either WL or NBI endoscopy. |
Yuan and Meng 2017 | 35 | 4 000 Normal WCE images: 3 000 (1 000 bubbles, 1 000 turbid, and 1 000 clear) Polyp images: 1 000 |
- | N/A | Polyp detection | - |
Byrne et al. 2017 | N/A | N/A | 388 | N/A | Polyp classification (hyperplastic vs. adenomatous) | |
Komeda et al. 2017 | N/A | 1 800 Adenomatous: 1200 Non-adenomatous: 600 |
- | N/A | Polyp classification (adenomatous vs. non-adenomatous) | - |
Chen et al. 2018 | N/A | 2 441 Training: - Neoplastic: 1476 - Hyperplastic: 681 Testing: - Neoplastic: 188 - Hyperplastic: 96 |
- | N/A | Polyp classification (hyperplastic vs. neoplastic) | - |
Misawa et al. 2018 | 73 | N/A | 546 (155 positive and 391 negative) | 155 | Polyp detection | - |
Urban et al. 2018 | > 2000 | 8 641 | - | 4 088 | Polyp localization | Used as training dataset. |
Urban et al. 2018 | N/A | 1 330 With polyps: 672 Without polyps: 658 |
- | 672 | Polyp localization | Used as independent dataset for testing. |
Urban et al. 2018 | 9 | 44 947 With polyps: 13 292 Without polyps: 31 655 |
9 | 45 | Polyp localization | Used as independent dataset for testing. |
Urban et al. 2018 | 11 | N/A | 11 | 73 | Polyp localization | Used as independent dataset for testing with “deliberately more challenging colonoscopy videos.”. |
Wang et al. 2018 | 1 290 | 5 545 With polyps: 3 634 Without polyps: 1 911 |
- | N/A | Polyp localization | Used as training dataset. |
Wang et al. 2018 | 1 138 | 27 113 With polyps: 5 541 Without polyps: 21 572 |
- | 1 495 | Polyp localization | Used as testing dataset. |
Wang et al. 2018 | 110 | - | 138 | 138 | Polyp localization | Used as testing dataset. |
Wang et al. 2018 | 54 | - | 54 | 0 | Polyp localization | Used as testing dataset. |
Lui et al. 2019 | N/A | 8 000 Curable lesions: 4 000 Incurable lesions: 4 000 |
- | Curable lesions: 159 Incurable lesions: 493 |
Polyp classification (endoscopically curable vs. incurable lesions) | Used as training dataset. This study is focused on larger endoscopic lesions with risk of submucosal invasion and lymphovascular permeation. |
Lui et al. 2019 | N/A | 567 | - | Curable: 56 Incurable: 20 |
Polyp classification (endoscopically curable vs. incurable lesions) | Used as testing dataset. This study is focused on larger endoscopic lesions with risk of submucosal invasion and lymphovascular permeation. |
Blanes-Vidal et al. 2019 | 255 | 11 300 With polyps: 4 800 Without polyps: 6 500 |
N/A | 331 polyps (OC) and 375 (CCE) | Polyp localization | CCE: Colorectal capsule endoscopy. OC: conventional optical colonoscopy. |
Zhang X. et al. 2019 | 215 | 404 | - | N/A | Polyp localization | - |
Misawa et al. 2019 | N/A | 3 017 088 | - | 930 | Polyp detection | Used as training set. |
Misawa et al. 2019 | 64 (47 with polyps and 17 without polyps) | N/A | N/A | 87 | Polyp detection | Used as testing set. |
Kandel et al. 2019 | 552 | N/A | - | 963 | Polyp classification (hyperplastic, sessile serrated adenomas, adenomas) | |
Zhu X. et al. 2019 | 283 | 1 991 | - | N/A | Polyp detection | Adenomatous polyps. |
Ahmad et al. 2019 | N/A | 83 716 With polyps: 14 634 Without polyps: 69 082 |
17 | 83 | Polyp localization | White Light Images. |
Sornapudi et al. 2019 | N/A | 55 | N/A | 67 | Polyp localization | Wireless Capsule Endoscopy videos. Used as testing set. |
Sornapudi et al. 2019 | N/A | 1 800 With polyps: 530 Without polyps: 1 270 |
18 | N/A | Polyp localization | Wireless Capsule Endoscopy videos. Used as training set. |
Wittenberg et al. 2019 | N/A | 2 484 | - | 2 513 | Polyp localization | - |
Ma Y. et al. 2019 | 1 661 | 3 428 | - | N/A | Polyp localization | - |
Liu X. et al. 2019 | 2 000 | 8 000 Polyp: 872 Adenoma: 1 210 |
- | N/A | Polyp localization and classification (polyp vs. adenoma) | - |
Cheng Tao Pu et al. 2020 | N/A | 1 235 MS I: 103 MS II: 429 MS IIo: 293 MS IIIa: 295 MS IIIb: 115 |
- | N/A | Polyp classification (5 classes) | Australian (AU) dataset (NBI). Used as training set. |
Cheng Tao Pu et al. 2020 | N/A | 20 MS I: 3 MS II: 5 MS IIo: 2 MS IIIa: 7 MS IIIb: 3 |
- | N/A | Polyp classification (5 classes) | Japan (JP) dataset (NBI). Used as testing set. |
Cheng Tao Pu et al. 2020 | N/A | 49 MS I: 9 MS II: 10 MS IIo: 10 MS IIIa: 11 MS IIIb: 9 |
- | N/A | Polyp classification (5 classes) | Japan (JP) dataset (BLI). Used as testing set. |
Study | Task | Models | Framework | TL | Layers fine-tuned | Layers replaced | Output layer |
---|---|---|---|---|---|---|---|
Ribeiro et al. 2016 | Classification | AlexNet, GoogLeNet, Fast CNN, Medium CNN, Slow CNN, VGG16, VGG19 | - | ImageNet | N/A | Layers after last CNN layer | SVM |
Zhang R. et al. 2017 | Detection and classification | CaffeNet | - | ImageNet and Places205 | N/A | Tested connecting classifier to each convolutional layer (5 convolutional layers) | SVM (Poly, Linear, RBF, and Tahn) |
Chen et al. 2018 | Classification | Inception v3 | - | ImageNet | N/A | Last layer | FCL |
Misawa et al. 2018, Misawa et al. 2019 | Detection | C3D | - | N/A | N/A | N/A | N/A |
Zheng et al. 2018 | Localization | - | YOLOv1 | PASCAL VOC 2007 and 2012 | All | - | - |
Shin Y. et al. 2018 | Localization | Inception ResNet-v2 | Faster R-CNN with post-learning schemes | COCO | All | - | RPN and detector layers |
Urban et al. 2018 | Localization | ResNet-50, VGG16, VGG19 | - | ImageNet Also without TL |
All | Last layer | FCL |
Wang et al. 2018 | Localization | VGG16 | SegNet | N/A | N/A | N/A | N/A |
Wittenberg et al. 2019 | Localization | ResNet101 | Mask R-CNN | COCO | All (incrementally) | Last layer | FCL |
Ma Y. et al. 2019 | Localization | SSD Inception v2 | Tensorflow | N/A | N/A | - | - |
Liu X. et al. 2019 | Localization and classification | Faster R-CNN with Inception Resnet v2 | Tensorflow | COCO | All | - | - |
Study | Task | Based on | Highlights |
---|---|---|---|
Tajbakhsh et al. 2014, Tajbakhsh et al. 2015 | Localization | None | Combination of classic computer vision techniques (detection and location) with DL (correction of prediction). The ML method proposes candidate polyps. Then, three sets of multi-scale patches around the candidate are generated (color, shape and temporal). Each set of patches is fed to a corresponding CNN. Each CNN has 2 convolutional layers, 2 fully connected layers, and an output layer. The maximum score for each set of patches is computed and averaged. |
Zhu R. et al. 2015 | Localization | LeNet-5 | CNN fed with 32x32 images taken from patches generated via a sliding window of 16 pixels over the original images. The LeNet-5 network inspires the CNN architecture. ReLU used as activation function. Last two layers replaced with a cost-sensitive SVM. Positively selected patches are combined to generate the final output. |
Park and Sargent 2016 | Localization | None | Based on a previous work with no DL techniques. An initial quality assessment and preprocessing step filters and cleans images, and proposes candidate regions of interest (RoI). CNN replaces previous feature extractor. Three convolutional layers with two interspersed subsampling layers followed by a fully connected layer. A final step uses a Conditional Random Field (CRF) for RoI classification. |
Yu et al. 2017 | Localization | None | Two 3D-FCN are used: - An offline network trained with a training dataset. - An online network initialized with the offline weights and updated each 60 frames with the video frames. Only the last two layers are updated. The last 16 frames are used for predicting each frame. Two convolutional layers followed by a pooling layer each, followed by two groups of two convolutional layers followed by a pooling layer each and finished with two convolutional layers converted from fully connected layers. The output of each network is combined to generate the final output. |
Yuan and Meng 2017 | Detection | Stacked Sparse AutoEncoder (SSAE) | A modification of a Sparse AutoEncoder to include an image manifold constraint, named Stacked Sparse AutoEncoder with Image Manifold Constraint (SSAEIM). SSAEIM is built by stacking three SAEIM layers followed by an output layer. Image manifold information is used on each layer. |
Byrne et al. 2017 | Classification | Inception v3 | Last layer replaced with a fully connected layer. A credibility score is calculated for each frame with the current frame prediction and the credibility score of the previous frame. |
Komeda et al. 2017 | Classification | None | Two convolutional layers followed by a pooling layer each, followed by a final fully connected output layer. |
Brandao et al. 2018, Ahmad et al. 2019 | Localization | AlexNet, GoogLeNet, ResNet-50, ResNet-101, ResNet-152, VGG | Networks pre-trained with PASCAL VOC and ImageNet datasets where converted into fully-connected convolutional networks by replacing the fully connected and scoring layers with a convolution layer. A final deconvolution layer with an output with the same size as the input. A regularization operation is added between every convolutional and activation layer. VGG, ResNet-101 and ResNet-152 were tested also using shape-form-shading features. |
Zhang R. et al. 2018 | Localization | YOLO | Custom architecture RYCO that consist of two networks: 1. A regression-based deep learning with residual learning (ResYOLO) detection model to locate polyp in a frame. 2. A Discriminative Correlation Filter (DCF) based method called Efficient Convolution Operators (ECO) to track the detected polyps. The ResYOLO network detects new polyps in a frame, starting the polyp tracking. During tracking, both ResYOLO and ECO tracker are used to determine the polyp location. Tracking stops when a confidence score calculated using last frames is under a threshold value. |
Urban et al. 2018 | Detection | None | Two custom CNNs a proposed. First CNN is built just with convolutional, maximum pooling and fully connected layers. Second CNN also includes batch normalization layers and inception modules. |
Urban et al. 2018 | Localization | YOLO | The 5 CNNs used for detection (two custom, VGG16, VGG19 and ResNet-50) are modified by replacing the fully connected layers with convolutional layers. The last layer has 5 filter maps that have its outputs spaced over a grid over the input image. Each grid cell predicts its confidence with a sigmoid unit, the position of the polyp relative to the grid cell center, and its size. The final output is the weighted sum of all the adjusted positions and size predictions, weighted with the confidences. |
Mohammed et al. 2018 | Detection | Y-Net | The frame-work consists of two fully convolution encoder networks which are connected to a single decoder network that matches the encoder network resolution at each down-sampling operation. The network are trained with encoder specific adaptive learning rates that update the parameters of randomly initialized encoder network with a larger step size as compared to the encoder with pre-trained weights. The two encoders features are merged with a decoder network at each down-sampling paththrough sum-skip connection. |
Lui et al. 2019 | Classification | ResNet | Network with 5 convolutional layers and 2 fully connected layers but based on a pre-trained ResNet CNN backbone. |
Qadir et al. 2019 | Localization | None | Framework for false positive (FP) reduction is proposed. The framework adds a FP reduction unit to an RPN network. This unit exploits temporal dependencies between frames (forward and backward) to correct the output. Faster R-CNN and SSD RPNs were tested. |
Blanes-Vidal et al. 2019 | Localization | R-CNN with AlexNet | Several modifications done to AlexNet: - Last fully connected layer replaced to output two classes. - 5 convolutional and 3 fully connected layers were fine-tuned. - Max-Pooling kernels, ReLU activation function and dropout used to avoid overfitting and build robustness to intra-class deformations. - Stochastic gradient descent with momentum used as the optimization algorithm. |
Zhang X. et al. 2019 | Localization | SSD | SSD was modified to add three new pooling layers (Second-Max Pooling, Second-Min Pooling and Min-Pooling) and a new deconvolution layer whose features are concatenated to those from the Max-Pooling layer that are fed into the detection layer. Model was pre-trained on the ILSVRC CLS-LOC dataset. |
Kandel et al. 2019 | Classification | CapsNet | A convolutional layer followed by 7 convolutional capsule layers and finalized with a global average pool by capsule type. |
Sornapudi et al. 2019 | Localization | Mask R-CNN | The region proposal network (RPN) uses a Feature Pyramid Network with a ResNet backbone. ResNet-50 and ResNet-101 were used, improved by extracting features from 5 different levels of layers. ResNet networks were initialized with COCO and ImageNet. Additionally, 76 random balloon images from Flickr were used to fine-tune networks initialized with COCO. The regions proposed by the RPN were filtered before the ROIAlign layer. The ROIAlign layer is followed by a pixel probability mask network, comprised of 4 convolutional layers followed by a transposed convolutional layer and a final convolutional layer with a sigmoid activation function that generates the final output. All convolutional layers except final are built with ReLU activation function. |
Rotation | Flipping | Shearing | Translation | Gaussian smoothing | Crop | Scale | Resize | Random brightness | Zooming | Saturation adjustment | Random contrast | Exposure adjustment | Histogram equalization | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Num. Studies | 17 | 12 | 5 | 3 | 4 | 5 | 3 | 2 | 4 | 2 | 1 | 1 | 1 | 1 |
Tajbakhsh et al. 2015 | X | X | X | X | X | |||||||||
Park and Sargent 2016 | X | X | ||||||||||||
Ribeiro et al. 2016 | X | X | ||||||||||||
Yu et al. 2017 | X | X | ||||||||||||
Byrne et al. 2017 | X | X | X | |||||||||||
Brandao et al. 2018 | X | |||||||||||||
Zhang R. et al. 2018 | X | X | X | X | X | |||||||||
Zheng et al. 2018 | X | |||||||||||||
Shin Y. et al. 2018 | X | X | X | X | X | X | ||||||||
Urban et al. 2018 | X | X | X | |||||||||||
Mohammed et al. 2018 | X | X | X | X | X | X | ||||||||
Qadir et al. 2019 | X | X | X | X | ||||||||||
Blanes-Vidal et al. 2019 | X | X | X | |||||||||||
Zhang X. et al. 2019 | X | |||||||||||||
Zhu X. et al. 2019 | X | X | X | |||||||||||
Sornapudi et al. 2019 | X | X | X | X | X | X | ||||||||
Wittenberg et al. 2019 | X | X | ||||||||||||
Ma Y. et al. 2019 | X | X | X | |||||||||||
Cheng Tao Pu et al. 2020 | X | X | X |
Framework/Library | # Studies | Used by |
---|---|---|
Caffe | 5 | Zhu X. et al. 2019, Yu et al. 2017, Brandao et al. 2018, Wang et al. 2018, Zhang X. et al. 2019 |
Tensorflow | 5 | Chen et al. 2018, Shin Y. et al. 2018, Mohammed et al. 2018, Ma Y. et al. 2019, Liu X. et al. 2019 |
Keras | 4 | Urban et al. 2018, Sornapudi et al. 2019, Wittenberg et al. 2019, Mohammed et al. 2018 |
C3D | 2 | Misawa et al. 2018, Misawa et al. 2019 |
MatConvNet (MATLAB) | 1 | Ribeiro et al. 2016 |
Note: Some performance metrics are not directly reported in the papers, but were derived using raw data or confusion matrices provided by them.
Performance metrics on public and private datasets of all polyp detection and localization studies.
- Between parentheses it is specified the type of performance metric: f = frame-based, p = polyp-based, and pa = patch.
- Between square brackets it is specified the dataset used, where “P” stands for private.
- Performances marked with an * are reported on training datasets.
- AP stands for Average Precision.
Study | Recall (sensitivity) | Precision (PPV) | Specificity | Others | Manually selected images? |
---|---|---|---|---|---|
Tajbakhsh et al. 2015 | 70% (f) [P] | 63% (f) [P] | 90% (f) [P] | F1: 0.66, F2: 0.68 (f) [P] | No |
Zhu R. et al. 2015 | 79.44% (pa) [P] | N/A | 79.54% (pa) [P] | Acc: 79.53% (pa) [P] | Yes |
Park and Sargent 2016 | 86% (f) [P] * | - | 85% (f) [P] * | AUC: 0.86 (f) [P] * | Yes (on training) |
Yu et al. 2017 | 71% (f) [ASU-Mayo] | 88.1% (f) [ASU-Mayo] | N/A | F1: 0.786%, F2: 0.739% (f) [ASU-Mayo] | No |
Zhang R. et al. 2017 | 97.6% (f) [P] | 99.4% (f) [P] | N/A | F1: 0.98, F2: 0.98, AUC: 1.00 (f) [P] | Yes |
Yuan and Meng 2017 | 98% (f) [P] * | 97% (f) [P] * | 99% (f) [P] * | F1: 0.98, F2: 0.98 (f) [P] | Yes |
Brandao et al. 2018 | ~90% (f) [ETIS-Larib] ~90% (f) [CVC-ColonDB] |
~73% (f) [ETIS-Larib] ~80% (f) [CVC-ColonDB] |
N/A | F1: ~0.81, F2: ~0.86 (f) [ETIS-Larib] F1: ~0.85, F2: ~0.88 (f) [CVC-ColonDB] |
Yes |
Zhang R. et al. 2018 | 71.6% (f) [ASU-Mayo] | 88.6% (f) [ASU-Mayo] | 97% (f) [ASU-Mayo] | F1: 0.792%, F2: 0.744% (f) [ASU-Mayo] | No |
Misawa et al. 2018 | 90% (f) [P] 94% (p) [P] |
55.1% (f) [P] 48% (p) [P] |
63.3% (f) [P] 40% (p) [P] |
F1: 0.68 (f) 0.63 (p), F2: 0.79 (f) 0.78 (p) [P] Acc: 76.5% (f) 60% (p) [P] |
No |
Zheng Y. et al. 2018 | 74% (f) [ETIS-Larib] | 77.4% (f) [ETIS-Larib] | N/A | F1: 0.757%, F2: 0.747% (f) [ETIS-Larib] | Yes |
Shin Y. et al. 2018 | 80.3% (f) [ETIS-Larib] 84.2% (f) [ASU-MAYO] 84.3% (f) [CVC-ClinicVideoDB] |
86.5% (f) [ETIS-Larib] 82.7% (f) [ASU-MAYO] 89.7% (f) [CVC-ClinicVideoDB] |
N/A | F1: 0.833, F2: 0.815 (f) [ETIS-Larib] F1: 0.834, F2: 0.839 (f) [ASU-MAYO] F1: 0.869, F2: 0.853 (f) [CVC-ClinicVideoDB] |
Yes (ETIS-Larib) No (ASU-Mayo, CVC-ClincVideoDB) |
Urban et al. 2018 | 93% (f) [P] 100% (p) [P] 93% (p) [P2] |
74% (f) [P] 35% (p) [P] 60% (p) [P2] |
93% (f) [P] | F1: 0.82, F2: 0.88 (f) [P] F1: 0.52, F2: 0.73 (p) [P] F1: 0.73, F2: 0.84 (p) [P2] |
No |
Wang et al. 2018 | 88.24% (f) [CVC-ClinicDB] 94.38% (f) [P (dataset A)] 91.64% (f), 100% (p) [P (dataset C)] |
93.13 (f) [CVC-ClinicDB] 81.85 (f) [P (dataset A)] |
95.40% (f) [P (dataset D)] | F1: 0.91, F2: 0.89 (f) [CVC-ClinicDB] F1: 0.88, F2: 0.92, AUC: 0.984 (f) [P (dataset A)] |
Yes (dataset A, CVC-ClinicDB) No (dataset C/D) |
Mohammed et al. 2018 | 84.4% (f) [ASU-Mayo] | 87.4 % (f) [ASU-Mayo] | N/A | F1: 85.9%, F2: 85.0% (f) [ASU-Mayo] | No |
Qadir et al. 2019 | 81.51% (f) [CVC-ClinicVideoDB] | 87.51% (f) [CVC-ClinicVideoDB] | 84.26% (f) [CVC-ClinicVideoDB] | F1: 0.844, F2: 0.83 (f) [CVC-ClinicVideoDB] | No |
Blanes-Vidal et al. 2019 | 97.1% (f) [P] | 91.4% (f) [P] | 93.3% (f) [P] | Acc: 96.4%, F1: 0.94, F2: 0.95 (f) [P] | N/A (not clear in the paper) |
Zhang X. et al. 2019 | 76.37% (f) [P] | 93.92% (f) [P] | N/A | F1: 0.84, F2: 0.79 (f) [P] | Yes |
Misawa et al. 2019 | 86% (p) [P] | N/A | 74% (f) [P] | - | No |
Zhu X. et al. 2019 | 88.5% (f) [P] | N/A | 96.4% (f) [P] | - | No |
Ahmad et al. 2019 | 91.6% (f) [ETIS-Larib] 84.5% (f) [P] |
75.3% (f) [ETIS-Larib] | 92.5% (f) [P] | F1: 0.83, F2: 0.88 (f) [ETIS-Larib] | Yes (ETIS-Larib) No (private) |
Ahmad et al. 2019 | June 2019 | Conventional | WL | Bounding box | Yes |
Sornapudi et al. 2019 | 91.64% (f) [CVC-ColonDB] 78.12% (f) [CVC-PolypHD] 80.29% (f) [ETIS-Larib] 95.52% (f) [P] |
89.94% (f) [CVC-ColonDB] 83.33% (f) [CVC-PolypHD] 72.93% (f) [ETIS-Larib] 98.46% (f) [P] |
N/A | F1: 0.9073, F2: 0.9127 (f) [CVC-ColonDB] F1: 0.8065, F2: 0.7911 (f) [CVC-PolypHD] F1: 0.7643, F2: 0.7870 (f) [ETIS-Larib] F1: 0.9667%, F2: 0.9610 (f) [P] |
Yes (CVC-ClinicDB, ColonDB, ETIS-Larib) No (WCE video) |
Wittenberg et al. 2019 | 86% (f) [CVC-ClinicDB] 83% (f) [ETIS-Larib] 93% (f) [P] |
80% (f) [CVC-ClinicDB] 74% (f) [ETIS-Larib] 86% (f) [P] |
N/A | F1: 0.82, F2: 0.85 (f) [CVC-ClinicDB] F1: 0.79, F2: 0.81 (f) [ETIS-Larib] F1: 0.89, F2: 0.92 (f) [P] |
Yes |
Ma Y. et al. 2019 | 93.67% (f) [P] | N/A | 98.36% (f) [P] | Accuracy: 96.04%, AP: 94.92% (f) [P] | Yes |
Performance metrics on public and private datasets of all polyp classification studies.
- Between square brackets it is specified the dataset used, where “P” stands for private.
Study | Classes | Sensitivity | Specificity | PPV | NPV | Others | Polyp-level vs. frame-level | Dataset type |
---|---|---|---|---|---|---|---|---|
Zhang R. et al. 2017 | Adenoma vs. hyperplastic Resectable vs. non-resectable Adenoma vs. hyperplastic vs. serrated |
92% (resectable vs. non-resectable) [ColonoscopicDataset] 87.6% (adenoma vs. hyperplastic) [P] |
89.9% (resectable vs. non-resectable) [ColonoscopicDataset] 84.2% (adenoma vs. hyperplastic) [P] |
95.4% (resectable vs. non-resectable) [ColonoscopicDataset] 87.30% (adenoma vs. hyperplastic) [P] |
84.9% (resectable vs. non-resectable) [ColonoscopicDataset] 87.2% (adenoma vs. hyperplastic) [P] |
Acc: 91.3% (resectable vs. non- resectable) [ColonoscopicDataset] Acc: 86.7% (adenoma vs. serrated adenoma vs. hyperplastic) [ColonoscopicDataset] Acc: 85.9% (adenoma vs. hyperplastic) [P] |
frame | video (manually selected images) |
Byrne et al. 2017 | Adenoma vs. hyperplastic | 98% [P] | 83% [P] | 90% [P] | 97% [P] | - | polyp | unaltered video |
Chen et al. 2018 | Neoplastic vs. hyperplastic | 96.3% [P] | 78.1% [P] | 89.6% [P] | 91.5% [P] | N/A | frame | image dataset |
Lui et al. 2019 | Endoscopically curable lesions vs. endoscopically incurable lesions | 88.2% [P] | 77.9% [P] | 92.1% [P] | 69.3% [P] | Acc: 85.5% [P] | frame | image dataset |
Kandel et al. 2019 | Hyperplastic vs. serrated adenoma (near focus) Hyperplastic vs. adenoma (far focus) |
57.14% (hyperplastic vs. serrated) [P] 75.63% (hyperplastic vs. adenoma) [P] |
68.52% (hyperplastic vs. serrated) [P] 63.79% (hyperplastic vs. adenoma) [P] |
N/A | N/A | Acc: 67.21% (hyperplastic vs. serrated) [P] Acc: 72.48% (hyperplastic vs. adenoma) [P] |
frame | image dataset |
Cheng Tao Pu et al. 2020 | 5-class (I, II, IIo, IIIa, IIIb) Adenoma (classes II + IIo + IIIa) vs. hyperplastic (class I) |
97% (adenoma vs. hyperplastic) [P: AU] 100% (adenoma vs. hyperplastic) [P: JP-NBI] 100% (adenoma vs. hyperplastic) [P: JP-BLI] |
51% (adenoma vs. hyperplastic) [P: AU] 0% (adenoma vs. hyperplastic) [P: JP-NBI] 0% (adenoma vs. hyperplastic) [P: JP-BLI] |
95% (adenoma vs. hyperplastic) [P: AU] 82.4% (adenoma vs. hyperplastic) [P: JP-NBI] 77.5% (adenoma vs. hyperplastic) [P: JP-BLI] |
63.5% (adenoma vs. hyperplastic) [P: AU] - (adenoma vs. hyperplastic) [P: JP-NBI] - (adenoma vs. hyperplastic) [P: JP-BLI] |
AUC (5-class): 94.3% [P: AU] AUC (5-class): 84.5% [P: JP-NBI] AUC (5-class): 90.3% [P: JP-BLI] Acc: 72.3% (5-class) [P: AU] Acc: 59.8% (5-class) [P: JP-NBI] Acc: 53.1% (5-class) [P: JP-BLI] Acc: 92.7% (adenoma vs. hyperplastic) [P: AU] Acc: 82.4% (adenoma vs. hyperplastic) [P: JP-NBI] Acc: 77.5% (adenoma vs. hyperplastic) [P: JP-BLI] |
frame | image dataset |
Performance metrics on public and private datasets of all simultaneous polyp detection and classification studies.
- Between square brackets it is specified the dataset used, where “P” stands for private.
- APIoU stands for Average Precision and mAPIoU for Mean Average Precision (i.e. the mean of each class AP), calculated at the specified IoU (Intersection over Union) level.
Study | Classes | AP | mAP | Manually selected images? |
---|---|---|---|---|
Liu X. et al. 2019 | Polyp vs. adenoma | Polyp: AP0.5 = 83.39% [P] Adenoma: AP0.5 = 97.90% [P] |
mAP0.5 = 90.645% [P] | Yes |
- AP: Average Precision.
- BLI: Blue Light Imaging.
- mAP: Mean Average Precision.
- NBI: Narrow Band Imaging.
- WCE: Wireless Capsule Endoscopy.
- WL: White Light.