RBDN is an architecture for Generalized Deep Image to Image Regression which features
- a memory-efficient recursive branched scheme with extensive parameter sharing that computes an early learnable multi-context representation of the input,
- end-to-end preservation of local correspondences from input to output and
- ability to choose context-vs-locality based on task as well as apply a per-pixel multi-context non-linearity.
The core design principle behind the RBDN has been an analysis of the strengths and weaknesses of a wide range of diverse creative architectures followed by an incremental modular construction with thorough empirical testing for each design decision.
Contents
- Architecture: Brief description of the RBDN architecture.
- Experimental Results
- Installation & Usage: Follow these guidelines to set up the RBDN code for reproducing all the results shown in this site and the paper.
- License & Citation
- Acknowledgments
Architecture
Architecture of proposed generic RBDN approach with 3 branches. The various branches extract features at multiple scales. Learnable upsampling with efficient parameter sharing is used to recursively upsample the activations for each branch until it merges with the POOL1
output, leading to a cheap multi-context representation of the input. This multi-context map is subjected to series of 9 convolutions which can supply ample non-linearity and automatically choose how much context is needed based on the task at hand.
Experimental Results
RBDN gives state-of-the-art performance on 3 diverse image-to-image regression tasks: Denoising, Relighting, Colorization.
Denoising
A single 3-branch RBDN model trained over a wide range of noise levels outperforms previously proposed noise-specific state-of-the-art models at every noise level.
Visual comparison of various denoising approaches on a test image from BSD300 with White Gaussian Noise of with st_dev=50
.
Illustrating the capability of a single RBDN model to handle a range of noise levels (yellow box). Top Row: Noisy test image. Bottom Row: Denoised 3-branch RBDN result
Illustrating RBDN’s ability to reliably denoise at st_dev=55
, outside our training bounds (st_dev in [8,50])
. The 18-layer DnCNN (despite using st_dev=55
for training) is outperformed by our 9-layer RBDN. Red, Yellow, Green boxes show the PSNR.
Relighting
The goal is to render faces from various unknown lighting conditions to a fixed lighting condition. Odd rows: Inputs, Even Rows: 3-branch RBDN output. Note that the model is trained exclusively on frontal face images with constrained illumination variations from CMU-MultiPie, but still generalizes reasonably well to unconstrained face images in Janus-CS0 under a variety of poses, illuminations, expressions, occlusions, affordances (hats, glasses, etc.)
Analyzing RBDN with different branches for relighting a subject from the CMU-MultiPie validation set. Top Row: Input images (ground truth is top-left image). Second row: No branches (strong artifacts can be seen). Third-Sixth row: RBDN outputs for 1,2,3,4 branches respectively. Results improve with increase in number of branches up to 3 branches. The network starts overfitting at 4 branches.
Colorization
We first transform a color image into YCbCr
color space and predict the chroma Cb,Cr
channels from the luminance Y-channel
input using RBDN. The input Y-channel
is then combined with the predicted Cb,Cr
channels and converted back to RGB
to yield the predicted color image. We denote this model as RBDN-YCbCr.
Inspired by the recently proposed Colorful Colorizations approach, we train another RBDN model which takes as input the L-channel
of a color image in Lab
space and tries to predict a 313-dimensional vector of probabilities for each pixel (corresponding to 313 ab
pairs resulting from quantizing the ab-space
with a grid-size of 10). Subsequently, the problem is treated as multinomial classification and we use a softmax-cross-entropy loss with class re-balancing. During inference, we use the annealed-mean of the softmax distribution to obtain the predicted ab
channels. We denote this model as RBDN-Lab.
Colorization results for images from MS-COCO test set. The 3,4-branch RBDN-YCbCr models produce decent colorizations, but are very dull and highly under-saturated. The colorizations of RBDN-Lab have a higher saturation and appear more colorful for all images.
Colorizing legacy black-and-white photos: comparing 4-branch RBDN-Lab with the Colorful Colorizations model
Installation & Usage
-
Clone: Run
git clone -b master --single-branch https://github.com/venkai/RBDN.git
-
Setup: Go to repository
cd RBDN
and run./setup.sh
. This will fetch caffe, download pretrained caffe models for all 3 experiments (denoising/relighting/colorization) and inference data, as well as set up the directory structure and symbolic links for all the training/inference scripts. -
Install Caffe: Note that
setup.sh
pulls 2 different branches of caffe into 2 separate directories: namelycaffe_colorization
used for colorization andcaffe_rbdn
which is used for both denoising/relighting experiments. Both these branches will eventually be merged with the master branch in venkai/caffe. However for now, you would have to separately install both these caffe versions if you want to perform all 3 experiments. -
Data:
-
Inference data is automatically downloaded by
setup.sh
. -
Training data/imglist for relighting experiment can be downloaded from either of these mirrors: [1]/[2]
This downloads the filemultipie.tar.gz
. Move it to./data/training
and runtar xvzf multipie.tar.gz && rm multipie.tar.gz
-
Denoising/colorization experiments use the same training data/imglist: which is every single unresized train & validation image from both ImageNet ILSVRC2012 and MS-COCO2014 whose smallest spatial dimension is greater than 128 (~1.7 million images in total). You can simply download these datasets from their respective sources and place/symlink them within
./data/training/
without any preprocessing whatsoever. Place the appropriate imglist in./data/training/imgset/train.txt
with the image-paths intrain.txt
being relative to./data/training
-
Note that data folders are not tracked by git.
-
-
Inference: Each experiment (denoising/relighting/colorization) has its own folder in
./inference
that contains an experiment specific MATLAB inference scriptget_pred.m
which uses the Matcaffe interface to evaluate pretrained models in./models
. The script./inference/run_matcaffe.sh
can be used to load caffe dependencies toLD_LIBRARY_PATH
and then start MATLAB interactively. -
Training: Each experiment (denoising/relighting/colorization) has its own folder in
./training
that contain 2 key experiment specific scripts:-
start_train.sh
: This starts training an RBDN model, either from scratch or from the most recent snapshot in thesnapshot
directory. You can pause training at any moment withCtrl+C
and most recent snapshot will be saved in./snapshot/trn_iter_[*].solverstate
. Running./start_train.sh
again will automatically resume from that snapshot. -
run_bn.sh
: This takes the most recent snapshot in./snapshot
and prepares it for inference by passing training data through the network and computing global mean/variance for all the batch-normalization layers in the network. The resulting inference-ready model is saved as./tst_[ITER].caffemodel
, whereITER
is the iteration corresponding to the most recent snapshot.
-
License & Citation
RBDN is released under a variant of the BSD 2-Clause license.
If you find RBDN useful in your research, please consider citing our paper:
@article{santhanam2016generalized,
title={Generalized Deep Image to Image Regression},
author={Santhanam, Venkataraman and Morariu, Vlad I and Davis, Larry S},
journal={arXiv preprint arXiv:1612.03268},
year={2016}
}
Acknowledgments
-
We would like to thank Yangqing Jia, Evan Shelhamer and the BVLC/BAIR team for creating & maintaining caffe, Richard Zhang for colorization layers in caffe and Hyeonwoo Noh, Seunghoon Hong, Dmytro Mishkin for several useful caffe layers, all of which were instrumental in creating RBDN.
-
This research is based upon work supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA R&D Contract No. 2014-14071600012. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.