This is a torch implementation of the paper A Neural Algorithm of Artistic Style by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge.
The paper presents an algorithm for combining the content of one image with the style of another image using convolutional neural networks. Here's an example that maps the artistic style of The Starry Night onto a night-time photograph of the Stanford campus:
Dependencies:
Optional dependencies:
- CUDA 6.5+
- cudnn.torch
After installing dependencies, you'll need to run the following script to download the VGG model:
sh models/download_models.sh
Basic usage:
th neural_style.lua -style_image <image.jpg> -content_image <image.jpg>
Options:
-image_size
: Maximum side length (in pixels) of of the generated image. Default is 512.-gpu
: Zero-indexed ID of the GPU to use; for CPU mode set-gpu
to -1.
Optimization options:
-content_weight
: How much to weight the content reconstruction term. Default is 8.0.-style_weight
: How much to weight the style reconstruction term. Default is 1.0.-num_iterations
: Default is 500.-learning_rate
: Default is 10.0.-momentum
: Default is 0.9.
Output options:
-output_image
: Name of the output image. Default isout.png
.-print_iter
: Print progress everyprint_iter
iterations. Set to 0 to disable printing.-save_iter
: Save the image everysave_iter
iterations. Set to 0 to disable saving intermediate results.
Other options:
-proto_file
: Path to thedeploy.txt
file for the VGG Caffe model.-model_file
: Path to the.caffemodel
file for the VGG Caffe model.-backend
:nn
orcudnn
. Default isnn
.cudnn
requires cudnn.torch.
On a GTX Titan X, running 1000 iterations of gradient descent with -image_size=512
takes about 2 minutes.
In CPU mode on an Intel Core i7-4790k, running the same takes around 40 minutes.
Most of the examples shown here were run for 2000 iterations, but with a bit of parameter tuning most images will
give good results within 1000 iterations.
Images are initialized with white noise, and optimized using gradient descent with momentum.
We perform style reconstructions using the conv1_1
, conv2_1
, conv3_1
, and conv4_1
layers and content reconstructions using the conv2_1
layer. The losses at the four style reconstruction layers are weighted using
hand-tuned weights; these are probably not optimal but seem to give good results for a variety of images.