Operator Schemas

This file is automatically generated from the def files via this script. Do not modify directly and instead edit operator definitions.

Abs
Add
ArgMax
ArgMin
AveragePool
BatchNormalization
Cast
Ceil
Concat
Constant
Conv
ConvTranspose
Div
Dropout
Elu
Exp
Flatten
Floor
Gather
Gemm
GlobalAveragePool
GlobalMaxPool
LRN
LeakyRelu
Log
MatMul
Max
MaxPool
Min
Mul
Neg
OptimizedRNN
PRelu
Pad
Pow
RandomNormal
RandomNormalLike
RandomUniform
RandomUniformLike
Reciprocal
ReduceLogSumExp
ReduceMax
ReduceMean
ReduceMin
ReduceProd
ReduceSum
Relu
Reshape
Selu
Sigmoid
Slice
Softmax
Split
Sqrt
Squeeze
Sub
Sum
Tanh
Transpose
_experimental ATen
_experimental Caffe2ConvTranspose
_experimental ConstantFill
_experimental FC
_experimental GRUUnit
_experimental GivenTensorFill
_experimental Normalize
_experimental Scale
_experimental SpatialBN

Abs

Absolute takes one input data (Tensor) and produces one output data (Tensor) where the absolute is, y = abs(x), is applied to the tensor elementwise.

Inputs

X: Input tensor

Outputs

Y: Output tensor

Add

Performs element-wise binary addition (with limited broadcast support).

If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of size 1 (a scalar value), or having its shape as a contiguous subset of the first tensor's shape. The starting of the mutually equal shape is specified by the argument "axis", and if it is not set, suffix matching is assumed. 1-dim expansion doesn't work yet.

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar
shape(A) = (2, 3, 4, 5), shape(B) = (5,)
shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Attribute broadcast=1 needs to be passed to enable broadcasting.

Attributes

axis : int: If set, defines the broadcast dimensions. See doc for details.
broadcast : int: Pass 1 to enable broadcasting

Inputs

A: First operand, should share the type with the second operand.
B: Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.

Outputs

C: Result, has same dimensions and type as A

ArgMax

Computes the indices of the max elements of the input tensor's element along the provided axis. The resulted tensor has the same rank as the input if keepdims equal 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned. The type of the output tensor is integer.

Attributes

axis : int: The axis in which to compute the arg indices
keepdims : int: Keep the reduced dimension or not, default 1 mean keep reduced dimension.

Inputs

data: An input tensor.

Outputs

reduced: Reduced output tensor with integer data type.

ArgMin

Computes the indices of the min elements of the input tensor's element along the provided axis. The resulted tensor has the same rank as the input if keepdims equal 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned. The type of the output tensor is integer.

Attributes

axis : int: The axis in which to compute the arg indices
keepdims : int: Keep the reduced dimension or not, default 1 mean keep reduced dimension.

Inputs

data: An input tensor.

Outputs

reduced: Reduced output tensor with integer data type.

AveragePool

AveragePool consumes an input tensor X and applies average pooling across the the tensor according to kernel sizes, stride sizes, and pad lengths. Average pooling consisting of averaging all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing.

Attributes

auto_pad : string: auto_pad must be either SAME_UPPER, SAME_LOWER or VALID. Where SAME_UPPER or SAME_LOWER mean pad the input so that the ouput size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the begining for SAME_LOWER. VALID mean no padding, therefore, read the pixel values from the pads attribute.
kernel_shape : list of ints: The size of the kernel along each axis.
pads : list of ints: Padding for lower and upper side along each axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the lower and upper part of the corresponding axis. So `pads` will have two values per axis, first value corresponding to the number of pixels added to the begining of the axis and the second value corresponding to the number of pixels add at the end of the axis.
strides : list of ints: Stride along each axis.

Inputs

X: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimension are in the form of (N x C x D1 x D2 ... Dn), where N is the batch size.

Outputs

Y: Output data tensor from average pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes.

BatchNormalization

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, there are multiple cases for the number of outputs, which we list below:

Output case #1: Y, mean, var, saved_mean, saved_var (training mode) Output case #2: Y (test mode)

Attributes

epsilon : float: The epsilon value to use to avoid division by zero.
is_test : int: If set to nonzero, run spatial batch normalization in test mode.
momentum : float: Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum)
spatial : int: Compute the mean and variance across all spatial elements or per feature.

Inputs

X: The input 4-dimensional tensor of shape NCHW or NHWC depending on the order parameter.
scale: The scale as a 1-dimensional tensor of size C to be applied to the output.
bias: The bias as a 1-dimensional tensor of size C to be applied to the output.
mean: The running mean (training) or the estimated mean (testing) as a 1-dimensional tensor of size C.
var: The running variance (training) or the estimated variance (testing) as a 1-dimensional tensor of size C.

Outputs (0 - ∞)

Y: The output 4-dimensional tensor of the same shape as X.
mean: The running mean after the BatchNormalization operator. Must be in-place with the input mean. Should not be used for testing.
var: The running variance after the BatchNormalization operator. Must be in-place with the input var. Should not be used for testing.
saved_mean: Saved mean used during training to speed up gradient computation. Should not be used for testing.
saved_var: Saved variance used during training to speed up gradient computation. Should not be used for testing.

Cast

The operator casts the elements of a given input tensor to a data type specified by the 'to' argument and returns an output tensor of the same size in the converted type. The 'to' argument must be one of the data types specified in the 'DataType' enum field in the TensorProto message. If the 'to' argument is not provided or is not one of the enumerated types in DataType, Caffe2 throws an Enforce error.

NOTE: Casting to and from strings is not supported yet.

Attributes

to : string: The data type to which the elements of the input tensor are cast.Strictly must be one of the types from DataType enum in TensorProto

Inputs

input: Input tensor to be cast.

Outputs

output: Output tensor with the same shape as input with type specified by the 'to' argument

Ceil

Ceil takes one input data (Tensor) and produces one output data (Tensor) where the ceil is, y = ceil(x), is applied to the tensor elementwise.

Inputs

X: Input tensor

Outputs

Y: Output tensor

Concat

Concatenate a list of tensors into a single tensor

Attributes

axis : int: Which axis to concat on

Inputs (1 - ∞)

inputs...: List of tensors for concatenation

Outputs

concat_result: Concatenated tensor

Constant

A constant tensor.

Attributes

value : tensor: The value for the elements of the output tensor.

Inputs

Outputs

output: Output tensor containing the same value of the provided tensor.

Conv

The convolution operator consumes an input tensor and a filter, and computes the output.

Attributes

auto_pad : string: auto_pad must be either SAME_UPPER, SAME_LOWER or VALID. Where SAME_UPPER or SAME_LOWER mean pad the input so that the ouput size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the begining for SAME_LOWER. VALID mean no padding, therefore, read the pixel values from the pads attribute.
dilations : list of ints: dilation value along each axis of the filter.
group : int: number of groups input channels and output channels are divided into
kernel_shape : list of ints: The shape of the convolution kernel.
pads : list of ints: Padding for lower and upper side along each axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the lower and upper part of the corresponding axis. So `pads` will have two values per axis, first value corresponding to the number of pixels added to the begining of the axis and the second value corresponding to the number of pixels add at the end of the axis.
strides : list of ints: stride along each axis.

Inputs (2 - 3)

X: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image.Otherwise the size is (N x D1 x D2 ... x Dn)
weights: The weight tensor that will be used in the convolutions; has size (M x C x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C x k1 x k2 x ... x kn), where is the dimension of the kernel
bias: Optional 1D bias to be added to the convolution, has size of M.

Outputs

Y: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

ConvTranspose

The convolution transpose operator consumes an input tensor and a filter, and computes the output.

Attributes

auto_pad : string: auto_pad must be either SAME_UPPER, SAME_LOWER or VALID. Where SAME_UPPER or SAME_LOWER mean pad the input so that the ouput size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the begining for SAME_LOWER. VALID mean no padding, therefore, read the pixel values from the pads attribute.
dilations : list of ints: dilation value along each axis of the filter.
group : int: number of groups input channels and output channels are divided into
kernel_shape : list of ints: The shape of the convolution kernel.
output_shape : list of ints: The shape of the output.
pads : list of ints: Padding for lower and upper side along each axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the lower and upper part of the corresponding axis. So `pads` will have two values per axis, first value corresponding to the number of pixels added to the begining of the axis and the second value corresponding to the number of pixels add at the end of the axis.
strides : list of ints: stride along each axis.

Inputs (2 - 3)

X: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image.Otherwise the size is (N x D1 x D2 ... x Dn)
weights: The weight tensor that will be used in the convolutions; has size (C x M x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (C x M x k1 x k2 x ... x kn), where is the dimension of the kernel
bias: Optional 1D bias to be added to the convolution, has size of C.

Outputs

Y: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

Div

Performs element-wise binary division (with limited broadcast support).

If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of size 1 (a scalar value), or having its shape as a contiguous subset of the first tensor's shape. The starting of the mutually equal shape is specified by the argument "axis", and if it is not set, suffix matching is assumed. 1-dim expansion doesn't work yet.

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar
shape(A) = (2, 3, 4, 5), shape(B) = (5,)
shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Attribute broadcast=1 needs to be passed to enable broadcasting.

Attributes

axis : int: If set, defines the broadcast dimensions. See doc for details.
broadcast : int: Pass 1 to enable broadcasting

Inputs

A: First operand, should share the type with the second operand.
B: Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.

Outputs

C: Result, has same dimensions and type as A

Dropout

Dropout takes one input data (Tensor) and produces two Tensor outputs, output (Tensor) and mask (Tensor). Depending on whether it is in test mode or not, the output Y will either be a random dropout, or a simple copy of the input. Note that our implementation of Dropout does scaling in the training phase, so during testing nothing needs to be done.

Attributes

is_test : int: (int, default 0) if nonzero, run dropout in test mode where the output is simply Y = X.
ratio : float: (float, default 0.5) the ratio of random dropout

Inputs

data: The input data as Tensor.

Outputs (1 - 2)

output: The output.
mask: The output mask. If is_test is nonzero, this output is not filled.

Elu

Elu takes one input data (Tensor) and produces one output data (Tensor) where the function f(x) = alpha * (exp(x) - 1.) for x < 0, f(x) = x for x >= 0., is applied to the tensor elementwise.

Attributes

alpha : float: Coefficient of ELU default to 1.0.

Inputs

X: 1D input tensor

Outputs

Y: 1D input tensor

Exp

Calculates the exponential of the given input tensor, element-wise. This operation can be done in an in-place fashion too, by providing the same input and output blobs.

Inputs

input: Input tensor

Outputs

output: The exponential of the input tensor computed element-wise

Flatten

Flattens the input tensor into a 2D matrix. If input tensor has shape (d_0, d_1, ... d_n) then the output will have shape (d_0 X d_1 ... d_(axis-1), d_axis X d_(axis+1) ... X dn).

Attributes

axis : int: (Default to 1) Indicate up to which input dimensions (exclusive) should be flattened to the outer dimension of the output

Inputs

input: A tensor of rank >= axis.

Outputs

output: A 2D tensor with the contents of the input tensor, with input dimensions up to axis flattened to the outer dimension of the output and remaining input dimensions flattened into the inner dimension of the output.

Floor

Floor takes one input data (Tensor) and produces one output data (Tensor) where the floor is, y = floor(x), is applied to the tensor elementwise.

Inputs

X: Input tensor

Outputs

Y: Output tensor

Gather

Given DATA tensor of rank r >= 1, and INDICES tensor of rank q, gather entries of the outer-most dimension of DATA indexed by INDICES, and concatenate them in an output tensor of rank q + (r - 1).

Example: DATA = [ [1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ] INDICES = [ [0, 1], [1, 2], ] OUTPUT = [ [ [1.0, 1.2], [2.3, 3.4], ], [ [2.3, 3.4], [4.5, 5.7], ], ]

Inputs

DATA: Tensor of rank r >= 1.
INDICES: Tensor of int32/int64 indices, of any rank q.

Outputs

OUTPUT: Tensor of rank q + (r - 1).

Gemm

General Matrix multiplication: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3 Compute Y = alpha * A * B + beta * C, where input tensor A has dimension (M X K) , input tensor B has dimension (K X N), input tensor C and output tensor Y have dimension (M X N). Input tensor C can be used inplace as the output tensor Y. If attribute broadcast is non-zero, input tensor C will be broadcasted to match the dimension requirement. If A can be transposed before doing the computation if attribute transA is non-zero, same for B and transB.

Attributes

alpha : float: Scalar multiplier for the product of input tensors A * B
beta : float: Scalar multiplier for input tensor C
broadcast : int: Whether C should be broadcasted
transA : int: Whether A should be transposed
transB : int: Whether B should be transposed

Inputs

A: Input tensor A
B: Input tensor B
C: Input tensor C, can be inplace.

Outputs

Y: Output tensor.

GlobalAveragePool

GlobalAveragePool consumes an input tensor X and applies average pooling across the the values in the same channel. This is equivalent to AveragePool with kernel size equal to the spatial dimension of input tensor.

Inputs

X: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimension are in the form of (N x C x D1 x D2 ... Dn), where N is the batch size.

Outputs

Y: Output data tensor from pooling across the input tensor. Dimensions will be N x C x 1 x 1

GlobalMaxPool

GlobalMaxPool consumes an input tensor X and applies max pooling across the the values in the same channel. This is equivalent to MaxPool with kernel size equal to the spatial dimension of input tensor.

Inputs

X: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimension are in the form of (N x C x D1 x D2 ... Dn), where N is the batch size.

Outputs

Y: Output data tensor from pooling across the input tensor. Dimensions will be N x C x 1 x 1

LRN

Local Response Normalization. It normalizes over local input regions. Each input value is divided by (bias+(alpha/size)*sum(xi^2 for every xi in the local region))^beta.

Attributes

alpha : float (required): Scaling parameter
beta : float (required): The exponent
bias : float: Default to 1
size : int (required): The number of channels to sum over

Inputs

X: Input tensor

Outputs

Y: Output tensor

LeakyRelu

LeakyRelu takes input data (Tensor) and an argument alpha, and produces one output data (Tensor) where the function f(x) = alpha * x for x < 0, f(x) = x for x >= 0, is applied to the data tensor elementwise.

Attributes

alpha : float: Coefficient of leakage

Inputs

X: Input tensor

Outputs

Y: Output tensor

Log

Calculates the natural log of the given input tensor, element-wise. This operation can be done in an in-place fashion too, by providing the same input and output blobs.

Inputs

input: Input tensor

Outputs

output: The natural log of the input tensor computed element-wise

MatMul

Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html

Inputs

A: N-dimensional matrix A
B: N-dimensional matrix B

Outputs

Y: Matrix multiply results from A * B

Max

Element-wise max of each of the input tensors. The first input tensor can be used in-place as the output tensor, in which case the max will be done in place and results will be accumulated in input0. All inputs and outputs must have the same shape and data type.

Inputs (1 - ∞)

data_0: First of the input tensors. Can be inplace.

Outputs

max: Output tensor. Same dimension as inputs.

MaxPool

MaxPool consumes an input tensor X and applies max pooling across the the tensor according to kernel sizes, stride sizes, and pad lengths. Average pooling consisting of averaging all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing.

Attributes

auto_pad : string: auto_pad must be either SAME_UPPER, SAME_LOWER or VALID. Where SAME_UPPER or SAME_LOWER mean pad the input so that the ouput size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the begining for SAME_LOWER. VALID mean no padding, therefore, read the pixel values from the pads attribute.
dilations : list of ints: Dilation along each axis, 1 means no dilation.
kernel_shape : list of ints: The size of the kernel along each axis.
pads : list of ints: Padding for lower and upper side along each axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the lower and upper part of the corresponding axis. So `pads` will have two values per axis, first value corresponding to the number of pixels added to the begining of the axis and the second value corresponding to the number of pixels add at the end of the axis.
strides : list of ints: Stride along each axis.

Inputs

X: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimension are in the form of (N x C x D1 x D2 ... Dn), where N is the batch size.

Outputs

Y: Output data tensor from max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes.

Min

Element-wise min of each of the input tensors. The first input tensor can be used in-place as the output tensor, in which case the max will be done in place and results will be accumulated in input0. All inputs and outputs must have the same shape and data type.

Inputs (1 - ∞)

data_0: First of the input tensors. Can be inplace.

Outputs

max: Output tensor. Same dimension as inputs.

Mul

Performs element-wise binary multiplication (with limited broadcast support).

If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of size 1 (a scalar value), or having its shape as a contiguous subset of the first tensor's shape. The starting of the mutually equal shape is specified by the argument "axis", and if it is not set, suffix matching is assumed. 1-dim expansion doesn't work yet.

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar
shape(A) = (2, 3, 4, 5), shape(B) = (5,)
shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Attribute broadcast=1 needs to be passed to enable broadcasting.

Attributes

axis : int: If set, defines the broadcast dimensions. See doc for details.
broadcast : int: Pass 1 to enable broadcasting

Inputs

A: First operand, should share the type with the second operand.
B: Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.

Outputs

C: Result, has same dimensions and type as A

Neg

Neg takes one input data (Tensor) and produces one output data (Tensor) where each element flipped sign, y = -x, is applied to the tensor elementwise.

Inputs

X: Input tensor

Outputs

Y: Output tensor

OptimizedRNN

Computes a stack of several RNNs in optimized fashion. This operator is usually implemented via CuDNN and thus most of the attributes and weights layout matches directly.

Attributes

cell_type : string (required)

Types of the cell: `relu`, `tanh`, `gru`, `lstm`

Equation definitions: i - input gate o - output gate f - forget gate z - update gate r - reset gate c - cell gate h - hidden gate t - time step (t-1 means previous time step) Xi - input tensor W[izrfcoh] - W parameter weight matrices for the corresponding gates R[izrfcoh] - R parameter weight matrices for the corresponding gates Wb[izrfcoh] - W parameter bias vectors for the corresponding gates Rb[izrfcoh] - R parameter bias vectors for the corresponding gates ReLU(X) - max(X, 0) tanh - hyperbolic tangent of X sigmoid(X) - 1 / (1 + e^-X) [C|H] - Cell/Hidden state

Equations: relu
- Ht = ReLU(WiXt + RiHt-1 + Wbi + Rbi) tanh
- Ht = tanh(WiXt + RiHt-1 + Wbi + Rbi) lstm
- it = sigmoid(WiXt + RiHt-1 + Wbi + Rbi)
- ft = sigmoid(WfXt + RfHt-1 + Wbf + Rbf)
- ot = sigmoid(WoXt + RoHt-1 + Wbo + Rbo)
- ct = tanh(WcXt + RcHt-1 + Wbc + Rbc)
- C = ft * Ct-1 + it * ct
- H = ot * tanh(C) gru
- zt = sigmoid(WzXt + RzHt-1 + Wbz + Rbz)
- rt = sigmoid(WrXt + RrHt-1 + Wbr + Rbr)
- ht = tanh(Wh*Xt + rt (RhHt-1 + Rbh) + Wbh)
- H = (1 - zt) * ht + it * Ht-1

Note, that for LSTM and 2 out of 3 gates for GRU, there are duplicate biases for the gates (model is overparametrized). It follows CuDNN/TensorRT convention and allows to make spec more uniform.

directions : int

Number of directions: 1 for unidirectional (default) and 2 for bidirectional

hidden_size : int

Number of neurons in the hidden layer

num_layers : int

Numbers of RNN layers in the stack, default 1

skip_input_transform : int

If set, skips linear transformation on the input of the first layer

Inputs (2 - 4)

weights

All parameters of the stack packed together in the opaque tensor. The size must be compatible with input attributes passed to the op.

The layout format is the one used by CuDNN and very similar to TensorRT:

The weight structure holds weights and biases for each layer of the network. Each parameter matrix is linearly appended after the previous parameter matrix without padding.

The order of matrixes {K, L, D, R, N, C} is defined as:

K - type of the matrix: weight (first) or bias second
L - The number of layers in the RNN - num_layers
D - The direction of the layer: normal (first) or reverse (second). (in case of directions=2)
R - The type of the connection: input-hidden (first) or hidden-hidden (second)
N - The number of gates matrices in the RNN, dependent on the cell_type: -- For relu or tanh there is one gate -- For gru there are 3 gates ordered as reset, update, hidden -- For lstm there are 4 gates ordered as input, forget, cell, output
C - The size of each matrix, which varies. -- If the linear layer on the input is skipped (skip_input_transform=1) and then for the first layer (L=1) the weight matrix (K=weight) on the input connection (R=input-hidden) is skipped, i.e. has 0 parameters in the list -- For the first layer (L=1) weight matrix (K=weight) on input connection (R=input-hidden), dimensions are {hidden_size, input_size} -- For other layers (L>1) weight matrix (K=weight) on input connection (R=input-hidden), dimensions are {hidden_size, directions * hidden_size} -- For weight matrix (K=weight) on recurrent connection (R=hidden-hidden), dimensions are {hidden_size, hidden_size} -- For all biases (K=bias), dimensions are {hidden_size}

input

The input sequences packed (and potentially padded) into one 3-D tensor with the shape of `[seq_length, batch_size, input_size]`.

initial_h

Optional initial value of the hidden. If not specified - assumed to be 0. Dimensions `[num_layers * directions, batch_size, hidden_size]`

initial_c

For LSTM only: optional initial value of the cell. If not specified - assumed to be 0. Dimensions `[num_layers * directions, batch_size, hidden_size]`

Outputs (1 - 3)

output: The output 3-dim sequence.
output_h: Optional output value of the hidden. Same shape as input_h
output_c: For LSTM only: optional output value of the cell. Same shape as input_h

PRelu

PRelu takes input data (Tensor) and slope tensor as input, and produces one output data (Tensor) where the function f(x) = slope * x for x < 0, f(x) = x for x >= 0., is applied to the data tensor elementwise.

Inputs

X: Input tensor
Slope: Slope tensor. If `Slope` is of size 1, the value is sharedacross different channels

Outputs

Y: Output tensor

Pad

Given DATA tensor, paddings, mode, and value.

Example: Insert 0 paddings to the beginning of the second dimension.

DATA  = [
    [1.0, 1.2],
    [2.3, 3.4],
    [4.5, 5.7],
]
paddings = [0, 0, 2, 0]

OUTPUT = [
    [
        [0.0, 0.0, 1.0, 1.2],
        [0.0, 0.0, 2.3, 3.4],
        [0.0, 0.0, 4.5, 5.7],
    ],
]

Attributes

mode : string: Three modes: constant(default), reflect, edge
paddings : list of ints (required): List of integers indicate the padding sizes, paddings's length should be the double of input's dimension. The order should be axis_0_begin, axis_0_end, axis_1_begin, ..., axis_n_begin, axis_n_end, n is input's dimension.
value : float: One float, indicates the value to be filled, default is 0

Inputs

DATA: Input tensor.

Outputs

OUTPUT: Tensor after padding.

Pow

Pow takes input data (Tensor) and exponent Tensor, and produces one output data (Tensor) where the function f(x) = x^exponent, is applied to the data tensor elementwise.

Inputs

X: Input tensor of any shape, base of the exponent.
Y: Input tensor of any shape broadcastable to X shape, the exponent component.

Outputs

Z: Output tensor (same size as X)

RandomNormal

Generate a tensor with random values drawn from a normal distribution. The shape of the tensor is specified by the shape argument and the parameter of the normal distribution specified by mean and scale.

The data type is specified by the 'dtype' argument. The 'dtype' argument must be one of the data types specified in the 'DataType' enum field in the TensorProto message.

Attributes

dtype : int: The data type for the elements of the output tensor.
mean : float: The mean of the normal distribution.
scale : float: The standard deviation of the normal distribution.
seed : float: (Optional) Seed to the random generator, if not specified we will auto generate one.
shape : list of ints: The shape of the output tensor.

Inputs

Outputs

output: Output tensor of random values drawn from normal distribution

RandomNormalLike

Generate a tensor with random values drawn from a normal distribution. The shape of the tensor is computed from the input argument and the parameter of the normal distribution specified by mean and scale.

The data type is specified by the 'dtype' argument. The 'dtype' argument must be one of the data types specified in the 'DataType' enum field in the TensorProto message.

Attributes

dtype : int: (Optional) The data type for the elements of the output tensor, if not specified, we will usethe data type of the input tensor.
mean : float: The mean of the normal distribution.
scale : float: The standard deviation of the normal distribution.
seed : float: (Optional) Seed to the random generator, if not specified we will auto generate one.

Inputs

input: Input tensor to provide shape information.

Outputs

output: Output tensor of random values drawn from normal distribution

RandomUniform

Generate a tensor with random values drawn from a uniform distribution. The shape of the tensor is specified by the shape argument and the range by low and high.

The data type is specified by the 'dtype' argument. The 'dtype' argument must be one of the data types specified in the 'DataType' enum field in the TensorProto message.

Attributes

dtype : int: The data type for the elements of the output tensor.
high : float: Upper boundary of the output values.
low : float: Lower boundary of the output values.
seed : float: (Optional) Seed to the random generator, if not specified we will auto generate one.
shape : list of ints: The shape of the output tensor.

Inputs

Outputs

output: Output tensor of random values drawn from uniform distribution

RandomUniformLike

Generate a tensor with random values drawn from a uniform distribution. The shape of the tensor is computed from the input argument and the range by low and high.

The data type is specified by the 'dtype' argument. The 'dtype' argument must be one of the data types specified in the 'DataType' enum field in the TensorProto message.

Attributes

dtype : int: (Optional) The data type for the elements of the output tensor, if not specified, we will usethe data type of the input tensor.
high : float: Upper boundary of the output values.
low : float: Lower boundary of the output values.
seed : float: (Optional) Seed to the random generator, if not specified we will auto generate one.

Inputs

input: Input tensor to provide shape information.

Outputs

output: Output tensor of random values drawn from uniform distribution

Reciprocal

Reciprocal takes one input data (Tensor) and produces one output data (Tensor) where the reciprocal is, y = 1/x, is applied to the tensor elementwise.

Inputs

X: Input tensor

Outputs

Y: Output tensor

ReduceLogSumExp

Computes the log sum exponent of the input tensor's element along the provided axes. The resulted tensor has the same rank as the input if keepdims equal 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy default keepdims to False instead of True.

Attributes

axes : list of ints: A list of integers, along which to reduce.
keepdims : int: Keep the reduced dimension or not, default 1 mean keep reduced dimension.

Inputs

data: An input tensor.

Outputs

reduced: Reduced output tensor.

ReduceMax

Computes the max of the input tensor's element along the provided axes. The resulted tensor has the same rank as the input if keepdims equal 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy default keepdims to False instead of True.

Attributes

axes : list of ints: A list of integers, along which to reduce.
keepdims : int: Keep the reduced dimension or not, default 1 mean keep reduced dimension.

Inputs

data: An input tensor.

Outputs

reduced: Reduced output tensor.

ReduceMean

Computes the mean of the input tensor's element along the provided axes. The resulted tensor has the same rank as the input if keepdims equal 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy default keepdims to False instead of True.

Attributes

axes : list of ints: A list of integers, along which to reduce.
keepdims : int: Keep the reduced dimension or not, default 1 mean keep reduced dimension.

Inputs

data: An input tensor.

Outputs

reduced: Reduced output tensor.

ReduceMin

Computes the min of the input tensor's element along the provided axes. The resulted tensor has the same rank as the input if keepdims equal 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy default keepdims to False instead of True.

Attributes

axes : list of ints: A list of integers, along which to reduce.
keepdims : int: Keep the reduced dimension or not, default 1 mean keep reduced dimension.

Inputs

data: An input tensor.

Outputs

reduced: Reduced output tensor.

ReduceProd

Computes the product of the input tensor's element along the provided axes. The resulted tensor has the same rank as the input if keepdims equal 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy default keepdims to False instead of True.

Attributes

axes : list of ints: A list of integers, along which to reduce.
keepdims : int: Keep the reduced dimension or not, default 1 mean keep reduced dimension.

Inputs

data: An input tensor.

Outputs

reduced: Reduced output tensor.

ReduceSum

Computes the sum of the input tensor's element along the provided axes. The resulted tensor has the same rank as the input if keepdims equal 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy default keepdims to False instead of True.

Attributes

axes : list of ints: A list of integers, along which to reduce.
keepdims : int: Keep the reduced dimension or not, default 1 mean keep reduced dimension.

Inputs

data: An input tensor.

Outputs

reduced: Reduced output tensor.

Relu

Relu takes one input data (Tensor) and produces one output data (Tensor) where the rectified linear function, y = max(0, x), is applied to the tensor elementwise.

Inputs

X: Input tensor

Outputs

Y: Output tensor

Reshape

Reshape the input tensor similar to numpy.reshape.

It takes a tensor as input and an argument shape. It outputs the reshaped tensor.

At most one dimension of the new shape can be -1. In this case, the value is inferred from the size of the tensor and the remaining dimensions. A dimension could also be 0, in which case the actual dimension value is going to be copied from the shape argument.

Attributes

shape : list of ints: New shape

Inputs

data: An input tensor.

Outputs

reshaped: Reshaped data.

Selu

Selu takes one input data (Tensor) and produces one output data (Tensor) where the scaled exponential linear unit function, y = gamma * (alpha * e^x - alpha) for x <= 0, y = gamma * x for x > 0, is applied to the tensor elementwise.

Attributes

alpha : float: Coefficient of SELU default to 1.6732.
gamma : float: Coefficient of SELU default to 1.0507.

Inputs

X: Input tensor

Outputs

Y: Output tensor

Sigmoid

Sigmoid takes one input data (Tensor) and produces one output data (Tensor) where the sigmoid function, y = 1 / (1 + exp(-x)), is applied to the tensor elementwise.

Inputs

X: Input tensor

Outputs

Y: Output tensor

Slice

Produces a slice of the input tensor along multiple axes. Similar to numpy: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

Slices uses axes, starts and ends attributes to specify the start and end dimension for each axis in the list of axes, it uses this information to slice the input data tensor. If a negative value is passed for any of the start or end indices, it represent number of elements before the end of that dimension.

Example 1:

data = [
    [1, 2, 3, 4],
    [5, 6, 7, 8],
]
axes = [0, 1]
starts = [1, 0]
ends = [2, 3]

result = [
    [5, 6, 7],
]

Example 2:

data = [
    [1, 2, 3, 4],
    [5, 6, 7, 8],
]
starts = [0]
ends = [-1]

result = [
    [1, 2, 3, 4],
]

Attributes

axes : list of ints: Axes that `starts` and `ends` apply to. It's optional. If not present, will be treated as [0, 1, ..., len(`starts`) - 1].
ends : list of ints (required): Ending indices (exclusive) of corresponding axis in axes`
starts : list of ints (required): Starting indices of corresponding axis in `axes`

Inputs

data: Tensor of data to extract slices from.

Outputs

output: Sliced data tensor.

Softmax

The operator computes the softmax normalized values for each layer in the batch of the given input. The input is a 2-D tensor (Tensor) of size (batch_size x input_feature_dimensions). The output tensor has the same shape and contains the softmax normalized values of the corresponding input.

X does not need to explicitly be a 2D vector; rather, it will be coerced into one. For an arbitrary n-dimensional tensor X \in [a_0, a_1, ..., a_{k-1}, a_k, ..., a_{n-1}] and k is the axis provided, then X will be coerced into a 2-dimensional tensor with dimensions [a_0 * ... * a_{k-1}, a_k * ... * a_{n-1}]. For the default case where axis=1, this means the X tensor will be coerced into a 2D tensor of dimensions [a_0, a_1 * ... * a_{n-1}], where a_0 is often the batch size. In this situation, we must have a_0 = N and a_1 * ... * a_{n-1} = D. Each of these dimensions must be matched correctly, or else the operator will throw errors.

Attributes

axis : int: (int) default to 1; describes the axis of the inputs when coerced to 2D; defaults to one because the 0th axis most likely describes the batch_size

Inputs

input: The input tensor that's coerced into a 2D matrix of size (NxD) as described above.

Outputs

output: The softmax normalized output values with the same shape as input tensor.

Split

Split a tensor into a list of tensors, along the specified 'axis'. The lengths of the split can be specified using argument 'axis' or optional second input blob to the operator. Otherwise, the tensor is split to equal sized parts.

Attributes

axis : int: Which axis to split on
split : list of ints: length of each output

Inputs (1 - 2)

input: The tensor to split
split: Optional list of output lengths (see also arg 'split')

Outputs (1 - ∞)

outputs...: One or more outputs forming list of tensors after splitting

Sqrt

Square root takes one input data (Tensor) and produces one output data (Tensor) where the square root is, y = x^0.5, is applied to the tensor elementwise. If x is negative, then it will return NaN.

Inputs

X: Input tensor

Outputs

Y: Output tensor

Squeeze

Remove single-dimensional entries from the shape of a tensor. Takes a parameter axes with a list of axes to squeeze.

Attributes

axes : list of ints (required): List of positive integers, indicate the dimensions to squeeze.

Inputs

data: Tensors with at least max(dims) dimensions.

Outputs

squeezed: Reshaped tensor with same data as input.

Sub

Performs element-wise binary subtraction (with limited broadcast support).

If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of size 1 (a scalar value), or having its shape as a contiguous subset of the first tensor's shape. The starting of the mutually equal shape is specified by the argument "axis", and if it is not set, suffix matching is assumed. 1-dim expansion doesn't work yet.

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar
shape(A) = (2, 3, 4, 5), shape(B) = (5,)
shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Attribute broadcast=1 needs to be passed to enable broadcasting.

Attributes

axis : int: If set, defines the broadcast dimensions. See doc for details.
broadcast : int: Pass 1 to enable broadcasting

Inputs

A: First operand, should share the type with the second operand.
B: Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.

Outputs

C: Result, has same dimensions and type as A

Sum

Element-wise sum of each of the input tensors. The first input tensor can be used in-place as the output tensor, in which case the sum will be done in place and results will be accumulated in input0. All inputs and outputs must have the same shape and data type.

Inputs (1 - ∞)

data_0: First of the input tensors. Can be inplace.

Outputs

sum: Output tensor. Same dimension as inputs.

Tanh

Calculates the hyperbolic tangent of the given input tensor element-wise. This operation can be done in an in-place fashion too, by providing the same input and output blobs.

Inputs

input: 1-D input tensor

Outputs

output: The hyperbolic tangent values of the input tensor computed element-wise

Transpose

Transpose the input tensor similar to numpy.transpose. For example, when axes=(1, 0, 2), given an input tensor of shape (1, 2, 3), the output shape will be (2, 1, 3).

Attributes

perm : list of ints: A list of integers. By default, reverse the dimensions, otherwise permute the axes according to the values given.

Inputs

data: An input tensor.

Outputs

transposed: Transposed output.

_experimental ATen

Experimental allowing ATen operations to be accessed directly from Caffe2 to allow for quick prototyping when ONNX is missing standard versions of and op

Inputs (0 - ∞)

Outputs (0 - ∞)

_experimental Caffe2ConvTranspose

The transposed convolution consumes an input vector, the filter blob, and the bias blob, and computes the output. Note that other parameters, such as the stride and kernel size, or the pads' sizes in each direction are not necessary for input because they are provided by the ConvTransposeUnpoolOpBase operator. Various dimension checks are done implicitly, and the sizes are specified in the Input docs for this operator. As is expected, the filter is deconvolved with a subset of the image and the bias is added; this is done throughout the image data and the output is computed. As a side note on the implementation layout: conv_transpose_op_impl.h is the templated implementation of the conv_transpose_op.h file, which is why they are separate files.

Attributes

dilations : list of ints
group : int
kernel_shape : list of ints
pads : list of ints
strides : list of ints

Inputs

X: Input data blob from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the NCHW usage. On the other hand, the NHWC Op has a different set of dimension constraints.
filter: The filter blob that will be used in the transposed convolution; has size (M x C x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel.
bias: The 1D bias blob that is added through the convolution;has size (C)

Outputs

Y: Output data blob that contains the result of the transposed convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

_experimental ConstantFill

The operator fills the elements of the output tensor with a constant value specified by the 'value' argument.

The data type is specified by the 'dtype' argument. The 'dtype' argument must be one of the data types specified in the 'DataType' enum field in the TensorProto message. If the 'dtype' argument is not provided, the data type of 'value' is used.

The output tensor shape is specified by the 'shape' argument. If the number of input is 1, the shape will be identical to that of the input at run time with optional additional dimensions appended at the end as specified by 'extra_shape' argument. In that case the 'shape' argument should not be set.

If input_as_shape is set to true, then the input should be a 1D tensor containing the desired output shape (the dimensions specified in extra_shape will also be appended)

NOTE: Currently, it supports data type of float, int32, int64, and bool.

Attributes

dtype : int: The data type for the elements of the output tensor.Strictly must be one of the types from DataType enum in TensorProto.
extra_shape : list of ints: The additional dimensions appended at the end of the shape indicatedby the input blob.Cannot set the extra_shape argument when there is no input blob.
input_as_shape : int: 1D tensor containing the desired output shape. First input must be in CPU context.
shape : list of ints: The shape of the output tensor.Cannot set the shape argument and pass in an input at the same time.
value : float: The value for the elements of the output tensor.

Inputs (0 - 1)

input (optional): Input tensor (optional) to provide shape information.

Outputs

output: Output tensor of constant values specified by 'value'argument and its type is specified by the 'dtype' argument

_experimental FC

Computes the result of passing an input vector X into a fully connected layer with 2D weight matrix W and 1D bias vector b. That is, the layer computes Y = X * W^T + b, where X has size (M x K), W has size (N x K), b has size (N), and Y has size (M x N), where M is often the batch size. NOTE: X does not need to explicitly be a 2D vector; rather, it will be coerced into one. For an arbitrary n-dimensional tensor X \in [a_0, a_1, ...,a_{k-1}, a_k, ..., a_{n-1}] where a_i \in N+ and k is the axis provided, then X will be coerced into a 2-dimensional tensor with dimensions [a_0 * ... * a_{k-1}, a_k * ... * a_{n-1}]. For the default case where axis=1, this means the X tensor will be coerced into a 2D tensor of dimensions [a_0, a_1 * ... * a_{n-1}], where a_0 is often the batch size. In this situation, we must have a_0 = M and a_1 * ... * a_{n-1} = K. Lastly, even though b is a 1D vector of size N, it is copied/resized to be size (M x N) implicitly and added to each vector in the batch. Each of these dimensions must be matched correctly, or else the operator will throw errors.

Attributes

axis : int: (int32_t) default to 1; describes the axis of the inputs; defaults to one because the 0th axis most likely describes the batch_size
axis_w : int: (int32_t) default to 1; describes the axis of the weights; defaults to one because the 0th axis most likely describes the batch_size

Inputs

X: input tensor that's coerced into a 2D matrix of size (MxK) as described above
W: 2D blob of size (KxN) containing fully connected weight matrix
b: 1D blob containing bias vector

Outputs

Y: 2D output tensor

_experimental GRUUnit

GRUUnit computes the activations of a standard GRU, in a sequence-length aware fashion. Concretely, given the (fused) inputs X (TxNxD), the previous hidden state (NxD), and the sequence lengths (N), computes the GRU activations, avoiding computation if the input is invalid (as in, the value at X[t][n] >= seqLengths[n].

Attributes

drop_states : int: Bool to determine if hidden state is zeroes or passed along for timesteps past the given sequence_length.

Inputs

hidden_prev: The previous GRU hidden state.
gates: Unactivated gate outputs from forget, update, and output gates, pre-activation.
seq_lengths: Array of sequence lengths. len(seq_lengths) should equal batch size N.
t: The timestep for this operation.

Outputs

hidden: The new GRU hidden state calculated by this op.

_experimental GivenTensorFill

Attributes

extra_shape : list of ints
input_as_shape : int
shape : list of ints
values : list of floats

Inputs (0 - 1)

shape: The shape of filled tensor

Outputs

X: The filled tensor

_experimental Normalize

Given a matrix, apply L2-normalization along the last dimension.

Inputs

input: Input matrix

Outputs

output: Matrix after normalization

_experimental Scale

Scale takes one input data (Tensor) and produces one output data (Tensor) whose value is the input data tensor scaled element-wise.

Attributes

scale : float: (float, default 1.0) the scale to apply.

Inputs

input: Input data to be scaled

Outputs

output: Output data after scaling

_experimental SpatialBN

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, there are multiple cases for the number of outputs, which we list below:

Output case #1: Y, mean, var, saved_mean, saved_var (training mode) Output case #2: Y (test mode)

Attributes

epsilon : float: The epsilon value to use to avoid division by zero.
is_test : int: If set to nonzero, run spatial batch normalization in test mode.
momentum : float: Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum)

Inputs

X: The input 4-dimensional tensor of shape NCHW.
scale: The scale as a 1-dimensional tensor of size C to be applied to the output.
bias: The bias as a 1-dimensional tensor of size C to be applied to the output.
mean: The running mean (training) or the estimated mean (testing) as a 1-dimensional tensor of size C.
var: The running variance (training) or the estimated variance (testing) as a 1-dimensional tensor of size C.

Outputs (0 - ∞)

Y: The output 4-dimensional tensor of the same shape as X.
mean: The running mean after the spatial BN operator. Must be in-place with the input mean. Should not be used for testing.
var: The running variance after the spatial BN operator. Must be in-place with the input var. Should not be used for testing.
saved_mean: Saved mean used during training to speed up gradient computation. Should not be used for testing.
saved_var: Saved variance used during training to speed up gradient computation. Should not be used for testing.

Files

Operators.md

Latest commit

History

Operators.md

File metadata and controls

Operator Schemas

Inputs

Outputs

Attributes

Inputs

Outputs

Attributes

Inputs

Outputs

Attributes

Inputs

Outputs

Attributes

Inputs

Outputs

Attributes

Inputs

Outputs (0 - ∞)

Attributes

Inputs

Outputs

Inputs

Outputs

Attributes

Inputs (1 - ∞)

Outputs

Attributes

Inputs

Outputs

Attributes

Inputs (2 - 3)

Outputs

Attributes

Inputs (2 - 3)

Outputs

Attributes

Inputs

Outputs

Attributes

Inputs

Outputs (1 - 2)

Attributes

Inputs

Outputs

Inputs

Outputs

Attributes

Inputs

Outputs

Inputs

Outputs

Inputs

Outputs

Attributes

Inputs

Outputs

Inputs

Outputs

Inputs

Outputs

Attributes

Inputs

Outputs

Attributes

Inputs

Outputs

Inputs

Outputs

Inputs

Outputs

Inputs (1 - ∞)

Outputs

Attributes

Inputs