Skip to content

liushuaicare/satellite-image-deep-learning

Repository files navigation

Introduction

This document primarily lists resources for performing deep learning (DL) on satellite imagery. To a lesser extent Machine learning (ML, e.g. random forests, stochastic gradient descent) are also discussed, as are classical image processing techniques.

Top links

Table of contents

Datasets

Sentinel

Kaggle

Kaggle hosts several large satellite image datasets (> 1 GB). A list if general image datasets is here. A list of land-use datasets is here.

Kaggle - Deepsat - classification challenge

Each sample image is 28x28 pixels and consists of 4 bands - red, green, blue and near infrared. The training and test labels are one-hot encoded 1x6 vectors. Each image patch is size normalized to 28x28 pixels. Data in .mat Matlab format. JPEG?

  • Sat4 500,000 image patches covering four broad land cover classes - barren land, trees, grassland and a class that consists of all land cover classes other than the above three Example notebook
  • Sat6 405,000 image patches each of size 28x28 and covering 6 landcover classes - barren land, trees, grassland, roads, buildings and water bodies.

Kaggle - Amazon from space - classification challenge

Kaggle - DSTL - segmentation challenge

Kaggle - Airbus Ship Detection Challenge

Kaggle - Draper - place images in order of time

  • https://www.kaggle.com/c/draper-satellite-image-chronology/data
  • Images are grouped into sets of five, each of which have the same setId. Each image in a set was taken on a different day (but not necessarily at the same time each day). The images for each set cover approximately the same area but are not exactly aligned.

Kaggle - other

Alternative datasets

There are a variety of datasets suitable for land classification problems.

UC Merced

AWS datasets

Quilt

  • Several people have uploaded datasets to Quilt

Google Earth Engine

Weather Datasets

Online computing resources

Generally a GPU is required for DL. Googles colab is free but limited compute time (12 hours) and somewhat non persistent,

Kaggle

  • Free to use
  • GPU Kernels (may run for 1 hour which limits usefulness?)
  • Tensorflow, pytorch & fast.ai available
  • Advantage that many datasets are already available
  • Read

### Clouderizer

  • https://clouderizer.com/
  • Clouderizer is a cloud computing management service, it takes care of installing the required packages to a cloud computing instance (like Amazon AWS or Google Colab). Clouderizer is free for 200 hours per month (Robbie plan) and does not require a credit card to sign up.
  • Run projects locally, on cloud or both.
  • SSH terminal, Jupyter Notebooks and Tensorboard are securely accessible from Clouderizer Web Console.

AWS

Microsoft Azure

Google

  • ML engine - sklearn, tensorflow, keras
  • Collaboratory (notebooks with GPU as a backend for free for 12 hours at a time),
  • Tensorflow available
  • pytorch can be installed, useful articles

Floydhub

  • https://www.floydhub.com/
  • Cloud GPUs
  • Jupyter Notebooks
  • Tensorboard
  • Version Control for DL
  • Deploy Models as REST APIs
  • Public Datasets

Paperspace

Crestle

Salamander

Interesting DL projects

RoboSat

RoboSat.Pink

DeepOSM

DeepNetsForEO - segmentation

Skynet-data

Production

Custom REST API

Tensorflow Serving

TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. Multiple models, or indeed multiple versions of the same model, can be served simultaneously. TensorFlow Serving comes with a scheduler that groups individual inference requests into batches for joint execution on a GPU

Floydhub

  • Allows exposing model via rest API

modeldepot

Image formats & catalogues

STAC - SpatioTemporal Asset Catalog

State of the art

What are companies doing?

  • Overall trend to using AWS S3 backend for image storage. There are a variety of tools for exploring and having teams collaborate on data on S3, e.g. T4.
  • Just speculating, but a serverless pipeline appears to be where companies are headed for routine compute tasks, whilst providing a Jupyter notebook approach for custom analysis.
  • Cloud optimised geotiffs to become the standard?
  • DigitalGlobe have a cloud hosted Jupyter notebook platform called GBDX. Cloud hosting means they can guarantee the infrastructure supports their algorithms, and they appear to be close/closer to deploying DL. Tutorial notebooks here.
  • Planet have a Jupyter notebook platform which can be deployed locally and requires an API key (14 days free). They have a python wrapper (2.7?!) to their rest API. They are mostly focussed on classical & fast algorithms?

Interesting projects

Techniques

This section explores the different techniques (DL, ML & classical) people are applying to common problems in satellite imagery analysis. Classification problems are the most simply addressed via DL, object detection is harder, and cloud detection harder still (niche interest).

Land classification

Change detection

Image registration

Object detection

Cloud detection

  • A subset of the object detection problem, but surprisingly challenging
  • From this article on sentinelhub there are three popular classical algorithms that detects thresholds in multiple bands in order to identify clouds. In the same article they propose using semantic segmentation combined with a CNN for a cloud classifier (excellent review paper here), but state that this requires too much compute resources.
  • This article compares a number of ML algorithms, random forests, stochastic gradient descent, support vector machines, Bayesian method.
  • DL..

Super resolution

## Pansharpening

Stereo imaging for terrain mapping & DEMs

NVDI - vegetation index

For fun

Useful References

About

Resources for performing deep learning on satellite imagery

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.8%
  • Python 0.2%