Computation Layers¶

class ArgmaxLayer¶

Compute the arg-max along the channel dimension. This layer is only used in the test network to produce predicted classes. It has no ability to do back propagation.

tops¶
bottoms¶: Blob names for output and input.

class ChannelPoolingLayer¶

1D pooling over the channel dimension.

kernel¶: Default 1, pooling kernel size.

stride¶: Default 1, stride for pooling.

pad¶: Default (0,0), a 2-tuple specifying padding in the front and the end.

pooling¶: Default Pooling.Max(). Specify the pooling function to use.

tops¶
bottoms¶: Blob names for output and input.

class ConvolutionLayer¶

Convolution in the spatial dimensions.

kernel¶: Default (1,1), a 2-tuple specifying the width and height of the convolution filters.

stride¶: Default (1,1), a 2-tuple specifying the stride in the width and height dimensions, respectively.

pad¶: Default (0,0), a 2-tuple specifying the two-sided padding in the width and height dimensions, respectively.

n_filter¶: Default 1. Number of filters.

n_group¶: Default 1. Number of groups. This number should divide both n_filter and the number of channels in the input blob. This parameter will divide the input blob along the channel dimension into n_group groups. Each group will operate independently. Each group is assigned with n_filter / n_group filters.

neuron¶: Default Neurons.Identity(), can be used to specify an activation function for the convolution outputs.

filter_init¶: Default XavierInitializer(). The initializer for the filters.

bias_init¶: Default ConstantInitializer(0). The initializer for the bias.

filter_regu¶: Default L2Regu(1), the regularizer for the filters.

bias_regu¶: Default NoRegu(), the regularizer for the bias.

filter_lr¶: Default 1.0. The local learning rate for the filters.

bias_lr¶: Default 2.0. The local learning rate for the bias.

class CropLayer¶

Do image cropping. This layer is primarily used only on top of data layer so backpropagation is currently not implemented.

crop_size¶: A (width, height) tuple of the size of the cropped image.

random_crop¶: Default false. When enabled, randomly place the cropping box instead of putting at the center. This is useful to produce random perturbation of the input images during training.

random_mirror¶: Default faulse. When enabled, randomly (with probability 0.5) mirror the input images (flip the width dimension).

tops¶
bottoms¶: Blob names for output and input.

class DropoutLayer¶

Dropout is typically used during training, and it has been demonstrated to be effective as regularizers for large scale networks. Dropout operates by randomly “turn off” some responses. Specifically, the forward computation is

\[\begin{split}y = \begin{cases}\frac{x}{1-p} & u > p \\ 0 & u <= p\end{cases}\end{split}\]

where \(u\) is a random number uniformly distributed in [0,1], and \(p\) is the ratio hyper-parameter. Note the output is scaled by \(1-p\) such that \(\mathbb{E}[y] = x\).

ratio¶: The probability \(p\) of turning off a response. Or could also be interpreted as the ratio of all the responses that are turned off.

bottoms¶: The names of the input blobs dropout operates on. Note this is a in-place layer, so there is no tops property. The output blobs will be the same as the input blobs.

class ElementWiseLayer¶

Element-wise layer implements basic element-wise operations on inputs.

operation¶: Element-wise operation. Built-in operations are in module ElementWiseFunctors, including Add, Subtract, Multiply and Divide.

tops¶: Output blob names, only one output blob is allowed.

bottoms¶: Input blob names, count must match the number of inputs operation takes.

class InnerProductLayer¶

Densely connected linear layer. The output is computed as

\[y_i = \sum_j w_{ij}x_j + b_i\]

where \(w_{ij}\) are the weights and \(b_i\) are bias.

output_dim¶: Output dimension of the linear map. The input dimension is automatically decided via the inputs.

weight_init¶: Default XavierInitializer(). Specify how the weights \(w_{ij}\) should be initialized.

bias_init¶: Default ConstantInitializer(0), initializing the bias \(b_i\) to 0.

weight_regu¶: Default L2Regu(1). Regularizer for the weights.

bias_regu¶: Default NoRegu(). Regularizer for the bias. Typically no regularization should be applied to the bias.

weight_lr¶: Default 1.0. The local learning rate for the weights.

bias_lr¶: Default 2.0. The local learning rate for the bias.

neuron¶: Default Neurons.Identity(), an optional activation function for the output of this layer.

tops¶
bottoms¶: Blob names for output and input.

class LRNLayer¶

Local Response Normalization Layer. It performs normalization over local input regions via the following mapping

\[x \rightarrow y = \frac{x}{\left( \beta + (\alpha/n)\sum_{x_j\in N(x)}x_j^2 \right)^p}\]

Here \(\beta\) is the shift, \(\alpha\) is the scale, \(p\) is the power, and \(n\) is the size of the local neighborhood. \(N(x)\) denotes the local neighborhood of \(x\) of size \(n\) (including \(x\) itself). There are two types of local neighborhood:

LRNMode.AcrossChannel(): The local neighborhood is a region of shape (1, 1, \(k\), 1) centered at \(x\). In other words, the region extends across nearby channels (with zero padding if needed), but has no spatial extent. Here \(k\) is the kernel size, and \(n=k\) in this case.
LRNMode.WithinChannel(): The local neighborhood is a region of shape (\(k\), \(k\), 1, 1) centered at \(x\). In other words, the region extends spatially (in both the width and the channel dimension), again with zero padding when needed. But it does not extend across different channels. In this case \(n=k^2\).

kernel¶: Default 5, an integer indicating the kernel size. See \(k\) in the descriptions above.

scale¶: Default 1.

shift¶: Default 1 (yes, 1, not 0).

power¶: Default 0.75.

mode¶: Default LRNMode.AcrossChannel().

tops¶
bottoms¶: Names for output and input blobs. Only one input and one output blob are allowed.

class PoolingLayer¶

2D pooling over the 2 image dimensions (width and height).

kernel¶: Default (1,1), a 2-tuple of integers specifying pooling kernel width and height, respectively.

stride¶: Default (1,1), a 2-tuple of integers specifying pooling stride in the width and height dimensions respectively.

pad¶: Default (0,0), a 2-tuple of integers specifying the padding in the width and height dimensions respectively. Paddings are two-sided, so a pad of (1,0) will pad one pixel in both the left and the right boundary of an image.

pooling¶: Default Pooling.Max(). Specify the pooling operation to use.

tops¶
bottoms¶: Blob names for output and input.

class PowerLayer¶

Power layer performs element-wise operations as

\[y = (ax + b)^p\]

where \(a\) is scale, \(b\) is shift, and \(p\) is power. During back propagation, the following element-wise derivatives are computed:

\[\frac{\partial y}{\partial x} = pa(ax + b)^{p-1}\]

Power layer is implemented separately instead of as an Element-wise layer for better performance because there are some many special cases of Power layer that could be computed more efficiently.

power¶: Default 1

scale¶: Default 1

shift¶: Default 0

tops¶
bottoms¶: Blob names for output and input.

class ReshapeLayer¶

Reshape a blob. Can be useful if, for example, you want to make the flat output from an InnerProductLayer meaningful by assigning each dimension spatial information.

Internally there is no data copying going on. The total number of elements in the blob tensor after reshaping should be the same as the original blob tensor.

width¶: Default 1. The new width after reshaping.

height¶: Default 1. The new height after reshaping.

channels¶: Default 1. The new channels after reshaping.

tops¶
bottoms¶: Blob names for output and input.

class SoftmaxLayer¶: Compute softmax over the channel dimension. The inputs \(x_1,\ldots,x_C\) are mapped as

\[\sigma(x_1,\ldots,x_C) = (\sigma_1,\ldots,\sigma_C) = \left(\frac{e^{x_1}}{\sum_j e^{x_j}},\ldots,\frac{e^{x_C}}{\sum_je^{x_j}}\right)\]

class SplitLayer¶

Split layer produces identical copies [1] of the input. The number of copies is determined by the length of the tops property. During back propagation, derivatives from all the output copies are added together and propagated down.

This layer is typically used as a helper to implement some more complicated layers.

bottoms¶: Input blob names, only one input blob is allowed.

tops¶: Output blob names, should be more than one output blobs.

[1]	All the data is shared, so there is no actually data copying.