Skip to content

Commit

Permalink
Merge pull request BVLC#973 from shelhamer/tutorial-docs
Browse files Browse the repository at this point in the history
Document Caffe by a tour of key subjects + doxygen dev documentation
  • Loading branch information
shelhamer committed Sep 3, 2014
2 parents cdcf888 + c4b9ec5 commit 135786a
Show file tree
Hide file tree
Showing 33 changed files with 4,622 additions and 299 deletions.
2,335 changes: 2,335 additions & 0 deletions .Doxyfile

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ MANIFEST-*
docs/_site
docs/gathered
_site
doxygen
docs/dev

# Sublime Text settings
*.sublime-workspace
Expand Down
28 changes: 27 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,26 @@ ALL_BUILD_DIRS := $(sort \
$(PROTO_BUILD_DIR) $(PROTO_BUILD_INCLUDE_DIR) $(PY_PROTO_BUILD_DIR) \
$(DISTRIBUTE_SUBDIRS))

##############################
# Set directory for Doxygen-generated documentation
##############################
DOXYGEN_CONFIG_FILE ?= ./.Doxyfile
# should be the same as OUTPUT_DIRECTORY in the .Doxyfile
DOXYGEN_OUTPUT_DIR ?= ./doxygen
DOXYGEN_COMMAND ?= doxygen
# All the files that might have Doxygen documentation.
DOXYGEN_SOURCES := $(shell find \
src/$(PROJECT) \
include/$(PROJECT) \
python/ \
matlab/ \
examples \
tools \
-name "*.cpp" -or -name "*.hpp" -or -name "*.cu" -or -name "*.cuh" -or \
-name "*.py" -or -name "*.m")
DOXYGEN_SOURCES += $(DOXYGEN_CONFIG_FILE)


##############################
# Configure build
##############################
Expand Down Expand Up @@ -301,7 +321,7 @@ SUPERCLEAN_EXTS := .so .a .o .bin .testbin .pb.cc .pb.h _pb2.py .cuo
##############################
# Define build targets
##############################
.PHONY: all test clean linecount lint lintclean tools examples $(DIST_ALIASES) \
.PHONY: all test clean docs linecount lint lintclean tools examples $(DIST_ALIASES) \
py mat py$(PROJECT) mat$(PROJECT) proto runtest \
superclean supercleanlist supercleanfiles warn everything

Expand All @@ -317,6 +337,12 @@ lint: $(EMPTY_LINT_REPORT)
lintclean:
@ $(RM) -r $(LINT_OUTPUT_DIR) $(EMPTY_LINT_REPORT) $(NONEMPTY_LINT_REPORT)

docs: $(DOXYGEN_OUTPUT_DIR)
@ cd ./docs ; ln -sfn ../$(DOXYGEN_OUTPUT_DIR)/html doxygen

$(DOXYGEN_OUTPUT_DIR): $(DOXYGEN_CONFIG_FILE) $(DOXYGEN_SOURCES)
$(DOXYGEN_COMMAND) $(DOXYGEN_CONFIG_FILE)

$(EMPTY_LINT_REPORT): $(LINT_OUTPUTS) | $(BUILD_DIR)
@ cat $(LINT_OUTPUTS) > $@
@ if [ -s "$@" ]; then \
Expand Down
1 change: 1 addition & 0 deletions docs/_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
markdown: kramdown
4 changes: 4 additions & 0 deletions docs/_layouts/default.html
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
<!doctype html>
<html>
<head>
<!-- MathJax -->
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="chrome=1">
<title>
Expand Down
2 changes: 2 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ Tested on Ubuntu, Red Hat, OS X.
BVLC provides ready-to-use models for non-commercial use.
* [Developing & Contributing](/development.html)<br />
Guidelines for development and contributing to Caffe.
* [API Documentation](/doxygen/)<br />
Developer documentation automagically generated from code comments.

### Examples

Expand Down
13 changes: 13 additions & 0 deletions docs/tutorial/convolution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
layout: default
---
# Caffeinated Convolution

The Caffe strategy for convolution is to reduce the problem to matrix-matrix multiplication.
This linear algebra computation is highly-tuned in BLAS libraries and efficiently computed on GPU devices.

For more details read Yangqing's [Convolution in Caffe: a memo](https://github.com/Yangqing/caffe/wiki/Convolution-in-Caffe:-a-memo).

As it turns out, this same reduction was independently explored in the context of conv. nets by

> K. Chellapilla, S. Puri, P. Simard, et al. High performance convolutional neural networks for document processing. In Tenth International Workshop on Frontiers in Handwriting Recognition, 2006.
78 changes: 78 additions & 0 deletions docs/tutorial/data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
layout: default
---
# Data: Ins and Outs

Data flows through Caffe as [Blobs](net_layer_blob.html#blob-storage-and-communication).
Data layers load input and save output by converting to and from Blob to other formats.
Common transformations like mean-subtraction and feature-scaling are done by data layer configuration.
New input types are supported by developing a new data layer -- the rest of the Net follows by the modularity of the Caffe layer catalogue.

This data layer definition

layers {
name: "mnist"
# DATA layer loads leveldb or lmdb storage DBs for high-throughput.
type: DATA
# the 1st top is the data itself: the name is only convention
top: "data"
# the 2nd top is the ground truth: the name is only convention
top: "label"
# the DATA layer configuration
data_param {
# path to the DB
source: "examples/mnist/mnist_train_lmdb"
# type of DB: LEVELDB or LMDB (LMDB supports concurrent reads)
backend: LMDB
# batch processing improves efficiency.
batch_size: 64
}
# common data transformations
transform_param {
# feature scaling coefficient: this maps the [0, 255] MNIST data to [0, 1]
scale: 0.00390625
}
}

loads the MNIST digits.

**Tops and Bottoms**: A data layer makes **top** blobs to output data to the model.
It does not have **bottom** blobs since it takes no input.

**Data and Label**: a data layer has at least one top canonically named **data**.
For ground truth a second top can be defined that is canonically named **label**.
Both tops simply produce blobs and there is nothing inherently special about these names.
The (data, label) pairing is a convenience for classification models.

**Transformations**: data preprocessing is parametrized by transformation messages within the data layer definition.

layers {
name: "data"
type: DATA
[...]
transform_param {
scale: 0.1
mean_file_size: mean.binaryproto
# for images in particular horizontal mirroring and random cropping
# can be done as simple data augmentations.
mirror: 1 # 1 = on, 0 = off
# crop a `crop_size` x `crop_size` patch:
# - at random during training
# - from the center during testing
crop_size: 227
}
}

**Prefetching**: for throughput data layers fetch the next batch of data and prepare it in the background while the Net computes the current batch.

**Multiple Inputs**: a Net can have multiple inputs of any number and type. Define as many data layers as needed giving each a unique name and top. Multiple inputs are useful for non-trivial ground truth: one data layer loads the actual data and the other data layer loads the ground truth in lock-step. In this arrangement both data and label can be any 4D array. Further applications of multiple inputs are found in multi-modal and sequence models. In these cases you may need to implement your own data preparation routines or a special data layer.

*Improvements to data processing to add formats, generality, or helper utilities are welcome!*

## Formats

Refer to the layer catalogue of [data layers](layers.html#data-layers) for close-ups on each type of data Caffe understands.

## Deployment Input

For on-the-fly computation deployment Nets define their inputs by `input` fields: these Nets then accept direct assignment of data for online or interactive computation.
Empty file added docs/tutorial/fig/.gitignore
Empty file.
Binary file added docs/tutorial/fig/backward.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/tutorial/fig/forward.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/tutorial/fig/forward_backward.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/tutorial/fig/layer.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/tutorial/fig/logreg.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
37 changes: 37 additions & 0 deletions docs/tutorial/forward_backward.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
layout: default
---
# Forward and Backward

The forward and backward passes are the essential computations of a [Net](net_layer_blob.html).

<img src="fig/forward_backward.png" alt="Forward and Backward" width="480">

Let's consider a simple logistic regression classifier.

The **forward** pass computes the output given the input for inference.
In forward Caffe composes the computation of each layer to compute the "function" represented by the model.
This pass goes from bottom to top.

<img src="fig/forward.jpg" alt="Forward pass" width="320">

The data $x$ is passed through an inner product layer for $g(x)$ then through a softmax for $h(g(x))$ and softmax loss to give $f_W(x)$.

The **backward** pass computes the gradient given the loss for learning.
In backward Caffe reverse-composes the gradient of each layer to compute the gradient of the whole model by automatic differentiation.
This is back-propagation.
This pass goes from top to bottom.

<img src="fig/backward.jpg" alt="Backward pass" width="320">

The backward pass begins with the loss and computes the gradient with respect to the output $\frac{\partial f_W}{\partial h}$. The gradient with respect to the rest of the model is computed layer-by-layer through the chain rule. Layers with parameters, like the `INNER_PRODUCT` layer, compute the gradient with respect to their parameters $\frac{\partial f_W}{\partial W_{\text{ip}}}$ during the backward step.

These computations follow immediately from defining the model: Caffe plans and carries out the forward and backward passes for you.

- The `Net::Forward()` and `Net::Backward()` methods carry out the respective passes while `Layer::Forward()` and `Layer::Backward()` compute each step.
- Every layer type has `forward_{cpu,gpu}()` and `backward_{cpu,gpu}` methods to compute its steps according to the mode of computation. A layer may only implement CPU or GPU mode due to constraints or convenience.

The [Solver](solver.html) optimizes a model by first calling forward to yield the output and loss, then calling backward to generate the gradient of the model, and then incorporating the gradient into a weight update that attempts to minimize the loss. Division of labor between the Solver, Net, and Layer keep Caffe modular and open to development.

For the details of the forward and backward steps of Caffe's layer types, refer to the [layer catalogue](layers.html).

51 changes: 51 additions & 0 deletions docs/tutorial/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
layout: default
---
# Caffe Tutorial

Caffe is a deep learning framework and this tutorial explains its philosophy, architecture, and usage.
This is a practical guide and framework introduction, so the full frontier, context, and history of deep learning cannot be covered here.
While explanations will be given where possible, a background in machine learning and neural networks is helpful.

## Philosophy

In one sip, Caffe is brewed for

- Expression: models and optimizations are defined as plaintext schemas instead of code.
- Speed: for research and industry alike speed is crucial for state-of-the-art models and massive data.
- Modularity: new tasks and settings require flexibility and extension.
- Openness: scientific and applied progress call for common code, reference models, and reproducibility.
- Community: academic research, startup prototypes, and industrial applications all share strength by joint discussion and development in a BSD-2 project.

and these principles direct the project.

## Tour

- [Nets, Layers, and Blobs](net_layer_blob.html): the anatomy of a Caffe model.
- [Forward / Backward](forward_backward.html): the essential computations of layered compositional models.
- [Loss](loss.html): the task to be learned is defined by the loss.
- [Solver](solver.html): the solver coordinates model optimization.
- [Layer Catalogue](layers.html): the layer is the fundamental unit of modeling and computation -- Caffe's catalogue includes layers for state-of-the-art models.
- [Interfaces](interfaces.html): command line, Python, and MATLAB Caffe.
- [Data](data.html): how to caffeinate data for model input.

For a closer look at a few details:

- [Caffeinated Convolution](convolution.html): how Caffe computes convolutions.

## Deeper Learning

There are helpful references freely online for deep learning that complement our hands-on tutorial.
These cover introductory and advanced material, background and history, and the latest advances.

The [Tutorial on Deep Learning for Vision](https://sites.google.com/site/deeplearningcvpr2014/) from CVPR '14 is a good companion tutorial for researchers.
Once you have the framework and practice foundations from the Caffe tutorial, explore the fundamental ideas and advanced research directions in the CVPR '14 tutorial.

A broad introduction is given in the free online draft of [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/index.html) by Michael Nielsen. In particular the chapters on using neural nets and how backpropagation works are helpful if you are new to the subject.

These recent academic tutorials cover deep learning for researchers in machine learning and vision:

- [Deep Learning Tutorial](http://www.cs.nyu.edu/~yann/talks/lecun-ranzato-icml2013.pdf) by Yann LeCun (NYU, Facebook) and Marc'Aurelio Ranzato (Facebook). ICML 2013 tutorial.
- [LISA Deep Learning Tutorial](http://deeplearning.net/tutorial/deeplearning.pdf) by the LISA Lab directed by Yoshua Bengio (U. Montréal).

For an exposition of neural networks in circuits and code, check out [Understanding Neural Networks from a Programmer's Perspective](http://karpathy.github.io/neuralnets/) by Andrej Karpathy (Stanford).
68 changes: 68 additions & 0 deletions docs/tutorial/interfaces.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
layout: default
---
# Interfaces

Caffe has command line, Python, and MATLAB interfaces for day-to-day usage, interfacing with research code, and rapid prototyping. While Caffe is a C++ library at heart and it exposes a modular interface for development, not every occasion calls for custom compilation. The cmdcaffe, pycaffe, and matcaffe interfaces are here for you.

## Command Line

The command line interface -- cmdcaffe -- is the `caffe` tool for model training, scoring, and diagnostics. Run `caffe` without any arguments for help. This tool and others are found in caffe/build/tools. (The following example calls require completing the LeNet / MNIST example first.)

**Training**: `caffe train` learns models from scratch, resumes learning from saved snapshots, and fine-tunes models to new data and tasks. All training requires a solver configuration through the `-solver solver.prototxt` argument. Resuming requires the `-snapshot model_iter_1000.solverstate` argument to load the solver snapshot. Fine-tuning requires the `-weights model.caffemodel` argument for the model initialization.

# train LeNet
caffe train -solver examples/mnist/lenet_solver.prototxt
# train on GPU 2
caffe train -solver examples/mnist/lenet_solver.prototxt -gpu 2
# resume training from the half-way point snapshot
caffe train -solver examples/mnist/lenet_solver.prototxt -snapshot examples/mnist/lenet_iter_5000.solverstate

For a full example of fine-tuning, see examples/finetuning_on_flickr_style, but the training call alone is

# fine-tune CaffeNet model weights for style recognition
caffe train -solver examples/finetuning_on_flickr_style/solver.prototxt -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel

**Testing**: `caffe test` scores models by running them in the test phase and reports the net output as its score. The net architecture must be properly defined to output an accuracy measure or loss as its output. The per-batch score is reported and then the grand average is reported last.

#
# score the learned LeNet model on the validation set as defined in the model architeture lenet_train_test.prototxt
caffe test -model examples/mnist/lenet_train_test.prototxt -weights examples/mnist/lenet_iter_10000 -gpu 0 -iterations 100

**Benchmarking**: `caffe time` benchmarks model execution layer-by-layer through timing and synchronization. This is useful to check system performance and measure relative execution times for models.

# (These example calls require you complete the LeNet / MNIST example first.)
# time LeNet training on CPU for 10 iterations
caffe time -model examples/mnist/lenet_train_test.prototxt -iterations 10
# time a model architecture with the given weights on the first GPU for 10 iterations
# time LeNet training on GPU for the default 50 iterations
caffe time -model examples/mnist/lenet_train_test.prototxt -gpu 0

**Diagnostics**: `caffe device_query` reports GPU details for reference and checking device ordinals for running on a given device in multi-GPU machines.

# query the first device
caffe device_query -gpu 0

## Python

The Python interface -- pycaffe -- is the `caffe` module and its scripts in caffe/python. `import caffe` to load models, do forward and backward, handle IO, visualize networks, and even instrument model solving. All model data, derivatives, and parameters are exposed for reading and writing.

- `caffe.Net` is the central interface for loading, configuring, and running models. `caffe.Classsifier` and `caffe.Detector` provide convenience interfaces for common tasks.
- `caffe.SGDSolver` exposes the solving interface.
- `caffe.io` handles input / output with preprocessing and protocol buffers.
- `caffe.draw` visualizes network architectures.
- Caffe blobs are exposed as numpy ndarrays for ease-of-use and efficiency.

Tutorial IPython notebooks are found in caffe/examples: do `ipython notebook caffe/examples` to try them. For developer reference docstrings can be found throughout the code.

Compile pycaffe by `make pycaffe`. The module dir caffe/python/caffe should be installed in your PYTHONPATH for `import caffe`.

## MATLAB

The MATLAB interface -- matcaffe -- is the `caffe` mex and its helper m-files in caffe/matlab. Load models, do forward and backward, extract output and read-only model weights, and load the binaryproto format mean as a matrix.

A MATLAB demo is in caffe/matlab/caffe/matcaffe_demo.m

Note that MATLAB matrices and memory are in column-major layout counter to Caffe's row-major layout! Double-check your work accordingly.

Compile matcaffe by `make matcaffe`.
Loading

0 comments on commit 135786a

Please sign in to comment.