Skip to content

Dropout

Frank Seide edited this page Aug 15, 2016 · 4 revisions

Dropout function.

Dropout (x)

Parameters

  • x: the input to apply the dropout function to

Note: the dropout rate is not a parameter to this function, but instead specified in the SGD section.

Return Value

Dropout() will return the result of the dropout operation applied to the input. The result has the same tensor dimensions as the input.

Description

Dropout is a popular technique to improve generalizability of models. It sets values to 0 with a given probability called the dropout rate.

In CNTK's implementation, the remaining values that are not set to 0 will instead be multiplied with (1 / (1 - dropout rate)). This way, the model parameters learned with dropout are directly applicable in inference. (If this was not done, the user would have to manually scale them before inference.)

In addition, you need to add a parameter dropoutRate to the SGD section to define the dropout rate. This is done in the SGD a section, instead of a parameter to Dropout() itself, in order to allow to start off a training without dropout, and then enable it after a few epochs, which is a common scenario. For this, the dropoutRate is specified as a vector, where each value is for a specific epoch.

When running inference, the Dropout() operation passes its input unmodified (it is a no-op).

Example

The following is a simple convolutional network with a dropout layer towards the end:

features = Input{...}
c = ConvolutionalLayer {32, (5:5), pad=true, activation=ReLU,
                        init="gaussian", initValueScale=0.0043} (features)
p = MaxPoolingLayer {(3:3), stride = (2:2)} (c)
h = DenseLayer {64, activation = ReLU, init = "gaussian", initValueScale = 12} (p)
d = Dropout (h)    #####
z  = LinearLayer {10, init = "gaussian", initValueScale = 1.5} (d)

and this is a corresponding entry in the SGD section which defines to use no dropout for the first 3 epochs, and then to continue with a dropout rate of 50%. This example uses the asterisk (*) syntax to denote repetition:

SGD = {
    ...
    dropoutRate = 0*3:0.5
    ...
}
Clone this wiki locally