Problem with batch normalization in tensorflow










0














I am having trouble understanding the implementation of batch normalization in Tensorflow. To illustrate, I have created a simple network with one input node, one hidden node, and one output node and run with 1 batch, with a batch size of 2. My input x consists of a scalar with 2 values (ie a batch size of 2), one set to 0 and other set to 1.



I run for one epoch, and write out the output from the hidden layer (before and after batch normalization) as well as the batch norm moving mean, variance, gamma, and beta.



Here is my code:



import tensorflow as tf

import numpy as np

N_HIDDEN_1 = 1
N_INPUT= 1
N_OUTPUT = 1

###########################################################

# DEFINE THE Network

# Define placeholders for data that will be fed in during execution
x = tf.placeholder(tf.float32, (None, N_INPUT))
y = tf.placeholder(tf.float32, (None, N_OUTPUT))
lx = tf.placeholder(tf.float32, )
training = tf.placeholder_with_default(False, shape=(), name='training')

# Hidden layers with relu activation
with tf.variable_scope('hidden1'):
hidden_1 = tf.layers.dense(x, N_HIDDEN_1, activation=None, use_bias=False)
bn_1 = tf.layers.batch_normalization(hidden_1, training=training, momentum=0.5)
bn_1x = tf.nn.relu(bn_1)

# Output layer
with tf.variable_scope('output'):
predx = tf.layers.dense(bn_1x, N_OUTPUT, activation=None, use_bias=False)
pred = tf.layers.batch_normalization(predx, training=training, momentum=0.5)

###########################################################

# Define the cost function that is optimized when
# training the network and the optimizer

cost = tf.reduce_mean(tf.square(pred-y))

optimizer = tf.train.AdamOptimizer(learning_rate=lx).minimize(cost)

bout1 = tf.global_variables('hidden1/batch_normalization/moving_mean:0')
bout2 = tf.global_variables('hidden1/batch_normalization/moving_variance:0')
bout3 = tf.global_variables('hidden1/batch_normalization/gamma:0')
bout4 = tf.global_variables('hidden1/batch_normalization/beta:0')

###########################################################

# Train network

init = tf.global_variables_initializer()
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

with tf.Session() as sess:

sess.run(init)

# Create dummy data
batchx = np.zeros((2,1))
batchy = np.zeros((2,1))
batchx[0,0]=0.0
batchx[1,0]=1.0
batchy[0,0]=3.0
batchy[1,0]=4.0

_,_ = sess.run([optimizer, extra_update_ops], feed_dict=training: True, x:batchx, y:batchy, lx: 0.001)

print('weight of hidden layer')
W1 = np.array(sess.run(tf.global_variables('hidden1/dense/kernel:0')))
W1x = np.sum(W1, axis=1)
print(W1x)

print()
print('output from hidden layer, batch norm layer, and relu layer')
hid1,b1,b1x = sess.run([hidden_1, bn_1, bn_1x], feed_dict=training: False, x:batchx)
print('hidden_1', hid1)
print('bn_1', b1)
print('bn_1x', b1x)

print()
print('batchnorm parameters')
print('moving mean', sess.run(bout1))
print('moving variance', sess.run(bout2))
print('gamma', sess.run(bout3))
print('beta', sess.run(bout4))


Here is the output I get when I run the code:



weight of hidden layer [[1.404974]]

output from hidden layer, batch norm layer, and relu layer
hidden_1 [[0. ]
[1.404974]]

bn_1 [[-0.40697935]
[ 1.215785 ]]

bn_1x [[0. ]
[1.215785]]

batchnorm parameters
moving mean [array([0.3514931], dtype=float32)]
moving variance [array([0.74709475], dtype=float32)]
gamma [array([0.999], dtype=float32)]
beta [array([-0.001], dtype=float32)]


I am puzzled by the resulting batchnorm parameters. In this particular case, the output from the hidden layer prior to applying the batch norm
are the scalars 0 and 1.404974. But the batch norm parameter moving mean is 0.3514931. This is for the case where I use momentum = 0.5. It is not clear to me why the moving mean after 1 iteration is not exactly the average of 0 and 1.404974 in this case. I was under the impression that the momentum parameter would only kick in from the second batch on.



Any help would be much appreciated.










share|improve this question


























    0














    I am having trouble understanding the implementation of batch normalization in Tensorflow. To illustrate, I have created a simple network with one input node, one hidden node, and one output node and run with 1 batch, with a batch size of 2. My input x consists of a scalar with 2 values (ie a batch size of 2), one set to 0 and other set to 1.



    I run for one epoch, and write out the output from the hidden layer (before and after batch normalization) as well as the batch norm moving mean, variance, gamma, and beta.



    Here is my code:



    import tensorflow as tf

    import numpy as np

    N_HIDDEN_1 = 1
    N_INPUT= 1
    N_OUTPUT = 1

    ###########################################################

    # DEFINE THE Network

    # Define placeholders for data that will be fed in during execution
    x = tf.placeholder(tf.float32, (None, N_INPUT))
    y = tf.placeholder(tf.float32, (None, N_OUTPUT))
    lx = tf.placeholder(tf.float32, )
    training = tf.placeholder_with_default(False, shape=(), name='training')

    # Hidden layers with relu activation
    with tf.variable_scope('hidden1'):
    hidden_1 = tf.layers.dense(x, N_HIDDEN_1, activation=None, use_bias=False)
    bn_1 = tf.layers.batch_normalization(hidden_1, training=training, momentum=0.5)
    bn_1x = tf.nn.relu(bn_1)

    # Output layer
    with tf.variable_scope('output'):
    predx = tf.layers.dense(bn_1x, N_OUTPUT, activation=None, use_bias=False)
    pred = tf.layers.batch_normalization(predx, training=training, momentum=0.5)

    ###########################################################

    # Define the cost function that is optimized when
    # training the network and the optimizer

    cost = tf.reduce_mean(tf.square(pred-y))

    optimizer = tf.train.AdamOptimizer(learning_rate=lx).minimize(cost)

    bout1 = tf.global_variables('hidden1/batch_normalization/moving_mean:0')
    bout2 = tf.global_variables('hidden1/batch_normalization/moving_variance:0')
    bout3 = tf.global_variables('hidden1/batch_normalization/gamma:0')
    bout4 = tf.global_variables('hidden1/batch_normalization/beta:0')

    ###########################################################

    # Train network

    init = tf.global_variables_initializer()
    extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

    with tf.Session() as sess:

    sess.run(init)

    # Create dummy data
    batchx = np.zeros((2,1))
    batchy = np.zeros((2,1))
    batchx[0,0]=0.0
    batchx[1,0]=1.0
    batchy[0,0]=3.0
    batchy[1,0]=4.0

    _,_ = sess.run([optimizer, extra_update_ops], feed_dict=training: True, x:batchx, y:batchy, lx: 0.001)

    print('weight of hidden layer')
    W1 = np.array(sess.run(tf.global_variables('hidden1/dense/kernel:0')))
    W1x = np.sum(W1, axis=1)
    print(W1x)

    print()
    print('output from hidden layer, batch norm layer, and relu layer')
    hid1,b1,b1x = sess.run([hidden_1, bn_1, bn_1x], feed_dict=training: False, x:batchx)
    print('hidden_1', hid1)
    print('bn_1', b1)
    print('bn_1x', b1x)

    print()
    print('batchnorm parameters')
    print('moving mean', sess.run(bout1))
    print('moving variance', sess.run(bout2))
    print('gamma', sess.run(bout3))
    print('beta', sess.run(bout4))


    Here is the output I get when I run the code:



    weight of hidden layer [[1.404974]]

    output from hidden layer, batch norm layer, and relu layer
    hidden_1 [[0. ]
    [1.404974]]

    bn_1 [[-0.40697935]
    [ 1.215785 ]]

    bn_1x [[0. ]
    [1.215785]]

    batchnorm parameters
    moving mean [array([0.3514931], dtype=float32)]
    moving variance [array([0.74709475], dtype=float32)]
    gamma [array([0.999], dtype=float32)]
    beta [array([-0.001], dtype=float32)]


    I am puzzled by the resulting batchnorm parameters. In this particular case, the output from the hidden layer prior to applying the batch norm
    are the scalars 0 and 1.404974. But the batch norm parameter moving mean is 0.3514931. This is for the case where I use momentum = 0.5. It is not clear to me why the moving mean after 1 iteration is not exactly the average of 0 and 1.404974 in this case. I was under the impression that the momentum parameter would only kick in from the second batch on.



    Any help would be much appreciated.










    share|improve this question
























      0












      0








      0







      I am having trouble understanding the implementation of batch normalization in Tensorflow. To illustrate, I have created a simple network with one input node, one hidden node, and one output node and run with 1 batch, with a batch size of 2. My input x consists of a scalar with 2 values (ie a batch size of 2), one set to 0 and other set to 1.



      I run for one epoch, and write out the output from the hidden layer (before and after batch normalization) as well as the batch norm moving mean, variance, gamma, and beta.



      Here is my code:



      import tensorflow as tf

      import numpy as np

      N_HIDDEN_1 = 1
      N_INPUT= 1
      N_OUTPUT = 1

      ###########################################################

      # DEFINE THE Network

      # Define placeholders for data that will be fed in during execution
      x = tf.placeholder(tf.float32, (None, N_INPUT))
      y = tf.placeholder(tf.float32, (None, N_OUTPUT))
      lx = tf.placeholder(tf.float32, )
      training = tf.placeholder_with_default(False, shape=(), name='training')

      # Hidden layers with relu activation
      with tf.variable_scope('hidden1'):
      hidden_1 = tf.layers.dense(x, N_HIDDEN_1, activation=None, use_bias=False)
      bn_1 = tf.layers.batch_normalization(hidden_1, training=training, momentum=0.5)
      bn_1x = tf.nn.relu(bn_1)

      # Output layer
      with tf.variable_scope('output'):
      predx = tf.layers.dense(bn_1x, N_OUTPUT, activation=None, use_bias=False)
      pred = tf.layers.batch_normalization(predx, training=training, momentum=0.5)

      ###########################################################

      # Define the cost function that is optimized when
      # training the network and the optimizer

      cost = tf.reduce_mean(tf.square(pred-y))

      optimizer = tf.train.AdamOptimizer(learning_rate=lx).minimize(cost)

      bout1 = tf.global_variables('hidden1/batch_normalization/moving_mean:0')
      bout2 = tf.global_variables('hidden1/batch_normalization/moving_variance:0')
      bout3 = tf.global_variables('hidden1/batch_normalization/gamma:0')
      bout4 = tf.global_variables('hidden1/batch_normalization/beta:0')

      ###########################################################

      # Train network

      init = tf.global_variables_initializer()
      extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

      with tf.Session() as sess:

      sess.run(init)

      # Create dummy data
      batchx = np.zeros((2,1))
      batchy = np.zeros((2,1))
      batchx[0,0]=0.0
      batchx[1,0]=1.0
      batchy[0,0]=3.0
      batchy[1,0]=4.0

      _,_ = sess.run([optimizer, extra_update_ops], feed_dict=training: True, x:batchx, y:batchy, lx: 0.001)

      print('weight of hidden layer')
      W1 = np.array(sess.run(tf.global_variables('hidden1/dense/kernel:0')))
      W1x = np.sum(W1, axis=1)
      print(W1x)

      print()
      print('output from hidden layer, batch norm layer, and relu layer')
      hid1,b1,b1x = sess.run([hidden_1, bn_1, bn_1x], feed_dict=training: False, x:batchx)
      print('hidden_1', hid1)
      print('bn_1', b1)
      print('bn_1x', b1x)

      print()
      print('batchnorm parameters')
      print('moving mean', sess.run(bout1))
      print('moving variance', sess.run(bout2))
      print('gamma', sess.run(bout3))
      print('beta', sess.run(bout4))


      Here is the output I get when I run the code:



      weight of hidden layer [[1.404974]]

      output from hidden layer, batch norm layer, and relu layer
      hidden_1 [[0. ]
      [1.404974]]

      bn_1 [[-0.40697935]
      [ 1.215785 ]]

      bn_1x [[0. ]
      [1.215785]]

      batchnorm parameters
      moving mean [array([0.3514931], dtype=float32)]
      moving variance [array([0.74709475], dtype=float32)]
      gamma [array([0.999], dtype=float32)]
      beta [array([-0.001], dtype=float32)]


      I am puzzled by the resulting batchnorm parameters. In this particular case, the output from the hidden layer prior to applying the batch norm
      are the scalars 0 and 1.404974. But the batch norm parameter moving mean is 0.3514931. This is for the case where I use momentum = 0.5. It is not clear to me why the moving mean after 1 iteration is not exactly the average of 0 and 1.404974 in this case. I was under the impression that the momentum parameter would only kick in from the second batch on.



      Any help would be much appreciated.










      share|improve this question













      I am having trouble understanding the implementation of batch normalization in Tensorflow. To illustrate, I have created a simple network with one input node, one hidden node, and one output node and run with 1 batch, with a batch size of 2. My input x consists of a scalar with 2 values (ie a batch size of 2), one set to 0 and other set to 1.



      I run for one epoch, and write out the output from the hidden layer (before and after batch normalization) as well as the batch norm moving mean, variance, gamma, and beta.



      Here is my code:



      import tensorflow as tf

      import numpy as np

      N_HIDDEN_1 = 1
      N_INPUT= 1
      N_OUTPUT = 1

      ###########################################################

      # DEFINE THE Network

      # Define placeholders for data that will be fed in during execution
      x = tf.placeholder(tf.float32, (None, N_INPUT))
      y = tf.placeholder(tf.float32, (None, N_OUTPUT))
      lx = tf.placeholder(tf.float32, )
      training = tf.placeholder_with_default(False, shape=(), name='training')

      # Hidden layers with relu activation
      with tf.variable_scope('hidden1'):
      hidden_1 = tf.layers.dense(x, N_HIDDEN_1, activation=None, use_bias=False)
      bn_1 = tf.layers.batch_normalization(hidden_1, training=training, momentum=0.5)
      bn_1x = tf.nn.relu(bn_1)

      # Output layer
      with tf.variable_scope('output'):
      predx = tf.layers.dense(bn_1x, N_OUTPUT, activation=None, use_bias=False)
      pred = tf.layers.batch_normalization(predx, training=training, momentum=0.5)

      ###########################################################

      # Define the cost function that is optimized when
      # training the network and the optimizer

      cost = tf.reduce_mean(tf.square(pred-y))

      optimizer = tf.train.AdamOptimizer(learning_rate=lx).minimize(cost)

      bout1 = tf.global_variables('hidden1/batch_normalization/moving_mean:0')
      bout2 = tf.global_variables('hidden1/batch_normalization/moving_variance:0')
      bout3 = tf.global_variables('hidden1/batch_normalization/gamma:0')
      bout4 = tf.global_variables('hidden1/batch_normalization/beta:0')

      ###########################################################

      # Train network

      init = tf.global_variables_initializer()
      extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

      with tf.Session() as sess:

      sess.run(init)

      # Create dummy data
      batchx = np.zeros((2,1))
      batchy = np.zeros((2,1))
      batchx[0,0]=0.0
      batchx[1,0]=1.0
      batchy[0,0]=3.0
      batchy[1,0]=4.0

      _,_ = sess.run([optimizer, extra_update_ops], feed_dict=training: True, x:batchx, y:batchy, lx: 0.001)

      print('weight of hidden layer')
      W1 = np.array(sess.run(tf.global_variables('hidden1/dense/kernel:0')))
      W1x = np.sum(W1, axis=1)
      print(W1x)

      print()
      print('output from hidden layer, batch norm layer, and relu layer')
      hid1,b1,b1x = sess.run([hidden_1, bn_1, bn_1x], feed_dict=training: False, x:batchx)
      print('hidden_1', hid1)
      print('bn_1', b1)
      print('bn_1x', b1x)

      print()
      print('batchnorm parameters')
      print('moving mean', sess.run(bout1))
      print('moving variance', sess.run(bout2))
      print('gamma', sess.run(bout3))
      print('beta', sess.run(bout4))


      Here is the output I get when I run the code:



      weight of hidden layer [[1.404974]]

      output from hidden layer, batch norm layer, and relu layer
      hidden_1 [[0. ]
      [1.404974]]

      bn_1 [[-0.40697935]
      [ 1.215785 ]]

      bn_1x [[0. ]
      [1.215785]]

      batchnorm parameters
      moving mean [array([0.3514931], dtype=float32)]
      moving variance [array([0.74709475], dtype=float32)]
      gamma [array([0.999], dtype=float32)]
      beta [array([-0.001], dtype=float32)]


      I am puzzled by the resulting batchnorm parameters. In this particular case, the output from the hidden layer prior to applying the batch norm
      are the scalars 0 and 1.404974. But the batch norm parameter moving mean is 0.3514931. This is for the case where I use momentum = 0.5. It is not clear to me why the moving mean after 1 iteration is not exactly the average of 0 and 1.404974 in this case. I was under the impression that the momentum parameter would only kick in from the second batch on.



      Any help would be much appreciated.







      tensorflow batch-normalization






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 9 at 22:26









      Prasad Kasibhatla

      61




      61






















          1 Answer
          1






          active

          oldest

          votes


















          0














          Because you ran the optimizer, it's hard to know what is really happening inside: the hidden_1 values you are printing are not the ones that were used to update the batch norm statistics; they are the post update values.



          Anyways, I don't really see the issue:



          Moving mean original value = 0.0
          batch mean value = (1.404974 - 0.0) / 2.0 = ~0.7
          Moving mean value = momentum * Moving mean original value + (1 - momentum) * batch mean value
          = 0.0 * 0.5 + (1 - 0.5) * 0.7
          = 0.35

          Moving variance original value = 1.0
          batch variance value = ~0.5
          Moving variance value = momentum * Moving variance original value + (1 - momentum) * batch variance value
          = 1.0 * 0.5 + (1.0 - 0.5) * 0.5
          = 0.75





          share|improve this answer




















          • Thanks! I did not realize that the original values of the moving mean and variance are set to 0 and 1, respectively. I had assumed that the moving mean would come into play only from the second batch on, and therefore I was puzzled the moving means did not match the true means for the first batch.
            – Prasad Kasibhatla
            Nov 10 at 21:18










          • I have quick-follow up question. What is the precision of the calculations in Tensorflow? I ask because when I set the momentum for the moving average to 0.0, I can only reproduce the batch mean (mean of output from hidden_1 in my case) to 2-3 decimal places - I expected to get much higher precision.
            – Prasad Kasibhatla
            Nov 12 at 23:31










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53234062%2fproblem-with-batch-normalization-in-tensorflow%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          Because you ran the optimizer, it's hard to know what is really happening inside: the hidden_1 values you are printing are not the ones that were used to update the batch norm statistics; they are the post update values.



          Anyways, I don't really see the issue:



          Moving mean original value = 0.0
          batch mean value = (1.404974 - 0.0) / 2.0 = ~0.7
          Moving mean value = momentum * Moving mean original value + (1 - momentum) * batch mean value
          = 0.0 * 0.5 + (1 - 0.5) * 0.7
          = 0.35

          Moving variance original value = 1.0
          batch variance value = ~0.5
          Moving variance value = momentum * Moving variance original value + (1 - momentum) * batch variance value
          = 1.0 * 0.5 + (1.0 - 0.5) * 0.5
          = 0.75





          share|improve this answer




















          • Thanks! I did not realize that the original values of the moving mean and variance are set to 0 and 1, respectively. I had assumed that the moving mean would come into play only from the second batch on, and therefore I was puzzled the moving means did not match the true means for the first batch.
            – Prasad Kasibhatla
            Nov 10 at 21:18










          • I have quick-follow up question. What is the precision of the calculations in Tensorflow? I ask because when I set the momentum for the moving average to 0.0, I can only reproduce the batch mean (mean of output from hidden_1 in my case) to 2-3 decimal places - I expected to get much higher precision.
            – Prasad Kasibhatla
            Nov 12 at 23:31















          0














          Because you ran the optimizer, it's hard to know what is really happening inside: the hidden_1 values you are printing are not the ones that were used to update the batch norm statistics; they are the post update values.



          Anyways, I don't really see the issue:



          Moving mean original value = 0.0
          batch mean value = (1.404974 - 0.0) / 2.0 = ~0.7
          Moving mean value = momentum * Moving mean original value + (1 - momentum) * batch mean value
          = 0.0 * 0.5 + (1 - 0.5) * 0.7
          = 0.35

          Moving variance original value = 1.0
          batch variance value = ~0.5
          Moving variance value = momentum * Moving variance original value + (1 - momentum) * batch variance value
          = 1.0 * 0.5 + (1.0 - 0.5) * 0.5
          = 0.75





          share|improve this answer




















          • Thanks! I did not realize that the original values of the moving mean and variance are set to 0 and 1, respectively. I had assumed that the moving mean would come into play only from the second batch on, and therefore I was puzzled the moving means did not match the true means for the first batch.
            – Prasad Kasibhatla
            Nov 10 at 21:18










          • I have quick-follow up question. What is the precision of the calculations in Tensorflow? I ask because when I set the momentum for the moving average to 0.0, I can only reproduce the batch mean (mean of output from hidden_1 in my case) to 2-3 decimal places - I expected to get much higher precision.
            – Prasad Kasibhatla
            Nov 12 at 23:31













          0












          0








          0






          Because you ran the optimizer, it's hard to know what is really happening inside: the hidden_1 values you are printing are not the ones that were used to update the batch norm statistics; they are the post update values.



          Anyways, I don't really see the issue:



          Moving mean original value = 0.0
          batch mean value = (1.404974 - 0.0) / 2.0 = ~0.7
          Moving mean value = momentum * Moving mean original value + (1 - momentum) * batch mean value
          = 0.0 * 0.5 + (1 - 0.5) * 0.7
          = 0.35

          Moving variance original value = 1.0
          batch variance value = ~0.5
          Moving variance value = momentum * Moving variance original value + (1 - momentum) * batch variance value
          = 1.0 * 0.5 + (1.0 - 0.5) * 0.5
          = 0.75





          share|improve this answer












          Because you ran the optimizer, it's hard to know what is really happening inside: the hidden_1 values you are printing are not the ones that were used to update the batch norm statistics; they are the post update values.



          Anyways, I don't really see the issue:



          Moving mean original value = 0.0
          batch mean value = (1.404974 - 0.0) / 2.0 = ~0.7
          Moving mean value = momentum * Moving mean original value + (1 - momentum) * batch mean value
          = 0.0 * 0.5 + (1 - 0.5) * 0.7
          = 0.35

          Moving variance original value = 1.0
          batch variance value = ~0.5
          Moving variance value = momentum * Moving variance original value + (1 - momentum) * batch variance value
          = 1.0 * 0.5 + (1.0 - 0.5) * 0.5
          = 0.75






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 10 at 16:42









          Olivier Dehaene

          65519




          65519











          • Thanks! I did not realize that the original values of the moving mean and variance are set to 0 and 1, respectively. I had assumed that the moving mean would come into play only from the second batch on, and therefore I was puzzled the moving means did not match the true means for the first batch.
            – Prasad Kasibhatla
            Nov 10 at 21:18










          • I have quick-follow up question. What is the precision of the calculations in Tensorflow? I ask because when I set the momentum for the moving average to 0.0, I can only reproduce the batch mean (mean of output from hidden_1 in my case) to 2-3 decimal places - I expected to get much higher precision.
            – Prasad Kasibhatla
            Nov 12 at 23:31
















          • Thanks! I did not realize that the original values of the moving mean and variance are set to 0 and 1, respectively. I had assumed that the moving mean would come into play only from the second batch on, and therefore I was puzzled the moving means did not match the true means for the first batch.
            – Prasad Kasibhatla
            Nov 10 at 21:18










          • I have quick-follow up question. What is the precision of the calculations in Tensorflow? I ask because when I set the momentum for the moving average to 0.0, I can only reproduce the batch mean (mean of output from hidden_1 in my case) to 2-3 decimal places - I expected to get much higher precision.
            – Prasad Kasibhatla
            Nov 12 at 23:31















          Thanks! I did not realize that the original values of the moving mean and variance are set to 0 and 1, respectively. I had assumed that the moving mean would come into play only from the second batch on, and therefore I was puzzled the moving means did not match the true means for the first batch.
          – Prasad Kasibhatla
          Nov 10 at 21:18




          Thanks! I did not realize that the original values of the moving mean and variance are set to 0 and 1, respectively. I had assumed that the moving mean would come into play only from the second batch on, and therefore I was puzzled the moving means did not match the true means for the first batch.
          – Prasad Kasibhatla
          Nov 10 at 21:18












          I have quick-follow up question. What is the precision of the calculations in Tensorflow? I ask because when I set the momentum for the moving average to 0.0, I can only reproduce the batch mean (mean of output from hidden_1 in my case) to 2-3 decimal places - I expected to get much higher precision.
          – Prasad Kasibhatla
          Nov 12 at 23:31




          I have quick-follow up question. What is the precision of the calculations in Tensorflow? I ask because when I set the momentum for the moving average to 0.0, I can only reproduce the batch mean (mean of output from hidden_1 in my case) to 2-3 decimal places - I expected to get much higher precision.
          – Prasad Kasibhatla
          Nov 12 at 23:31

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53234062%2fproblem-with-batch-normalization-in-tensorflow%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

          Edmonton

          Crossroads (UK TV series)