Problem with batch normalization in tensorflow
I am having trouble understanding the implementation of batch normalization in Tensorflow. To illustrate, I have created a simple network with one input node, one hidden node, and one output node and run with 1 batch, with a batch size of 2. My input x consists of a scalar with 2 values (ie a batch size of 2), one set to 0 and other set to 1.
I run for one epoch, and write out the output from the hidden layer (before and after batch normalization) as well as the batch norm moving mean, variance, gamma, and beta.
Here is my code:
import tensorflow as tf
import numpy as np
N_HIDDEN_1 = 1
N_INPUT= 1
N_OUTPUT = 1
###########################################################
# DEFINE THE Network
# Define placeholders for data that will be fed in during execution
x = tf.placeholder(tf.float32, (None, N_INPUT))
y = tf.placeholder(tf.float32, (None, N_OUTPUT))
lx = tf.placeholder(tf.float32, )
training = tf.placeholder_with_default(False, shape=(), name='training')
# Hidden layers with relu activation
with tf.variable_scope('hidden1'):
hidden_1 = tf.layers.dense(x, N_HIDDEN_1, activation=None, use_bias=False)
bn_1 = tf.layers.batch_normalization(hidden_1, training=training, momentum=0.5)
bn_1x = tf.nn.relu(bn_1)
# Output layer
with tf.variable_scope('output'):
predx = tf.layers.dense(bn_1x, N_OUTPUT, activation=None, use_bias=False)
pred = tf.layers.batch_normalization(predx, training=training, momentum=0.5)
###########################################################
# Define the cost function that is optimized when
# training the network and the optimizer
cost = tf.reduce_mean(tf.square(pred-y))
optimizer = tf.train.AdamOptimizer(learning_rate=lx).minimize(cost)
bout1 = tf.global_variables('hidden1/batch_normalization/moving_mean:0')
bout2 = tf.global_variables('hidden1/batch_normalization/moving_variance:0')
bout3 = tf.global_variables('hidden1/batch_normalization/gamma:0')
bout4 = tf.global_variables('hidden1/batch_normalization/beta:0')
###########################################################
# Train network
init = tf.global_variables_initializer()
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.Session() as sess:
sess.run(init)
# Create dummy data
batchx = np.zeros((2,1))
batchy = np.zeros((2,1))
batchx[0,0]=0.0
batchx[1,0]=1.0
batchy[0,0]=3.0
batchy[1,0]=4.0
_,_ = sess.run([optimizer, extra_update_ops], feed_dict=training: True, x:batchx, y:batchy, lx: 0.001)
print('weight of hidden layer')
W1 = np.array(sess.run(tf.global_variables('hidden1/dense/kernel:0')))
W1x = np.sum(W1, axis=1)
print(W1x)
print()
print('output from hidden layer, batch norm layer, and relu layer')
hid1,b1,b1x = sess.run([hidden_1, bn_1, bn_1x], feed_dict=training: False, x:batchx)
print('hidden_1', hid1)
print('bn_1', b1)
print('bn_1x', b1x)
print()
print('batchnorm parameters')
print('moving mean', sess.run(bout1))
print('moving variance', sess.run(bout2))
print('gamma', sess.run(bout3))
print('beta', sess.run(bout4))
Here is the output I get when I run the code:
weight of hidden layer [[1.404974]]
output from hidden layer, batch norm layer, and relu layer
hidden_1 [[0. ]
[1.404974]]
bn_1 [[-0.40697935]
[ 1.215785 ]]
bn_1x [[0. ]
[1.215785]]
batchnorm parameters
moving mean [array([0.3514931], dtype=float32)]
moving variance [array([0.74709475], dtype=float32)]
gamma [array([0.999], dtype=float32)]
beta [array([-0.001], dtype=float32)]
I am puzzled by the resulting batchnorm parameters. In this particular case, the output from the hidden layer prior to applying the batch norm
are the scalars 0 and 1.404974. But the batch norm parameter moving mean is 0.3514931. This is for the case where I use momentum = 0.5. It is not clear to me why the moving mean after 1 iteration is not exactly the average of 0 and 1.404974 in this case. I was under the impression that the momentum parameter would only kick in from the second batch on.
Any help would be much appreciated.
tensorflow batch-normalization
add a comment |
I am having trouble understanding the implementation of batch normalization in Tensorflow. To illustrate, I have created a simple network with one input node, one hidden node, and one output node and run with 1 batch, with a batch size of 2. My input x consists of a scalar with 2 values (ie a batch size of 2), one set to 0 and other set to 1.
I run for one epoch, and write out the output from the hidden layer (before and after batch normalization) as well as the batch norm moving mean, variance, gamma, and beta.
Here is my code:
import tensorflow as tf
import numpy as np
N_HIDDEN_1 = 1
N_INPUT= 1
N_OUTPUT = 1
###########################################################
# DEFINE THE Network
# Define placeholders for data that will be fed in during execution
x = tf.placeholder(tf.float32, (None, N_INPUT))
y = tf.placeholder(tf.float32, (None, N_OUTPUT))
lx = tf.placeholder(tf.float32, )
training = tf.placeholder_with_default(False, shape=(), name='training')
# Hidden layers with relu activation
with tf.variable_scope('hidden1'):
hidden_1 = tf.layers.dense(x, N_HIDDEN_1, activation=None, use_bias=False)
bn_1 = tf.layers.batch_normalization(hidden_1, training=training, momentum=0.5)
bn_1x = tf.nn.relu(bn_1)
# Output layer
with tf.variable_scope('output'):
predx = tf.layers.dense(bn_1x, N_OUTPUT, activation=None, use_bias=False)
pred = tf.layers.batch_normalization(predx, training=training, momentum=0.5)
###########################################################
# Define the cost function that is optimized when
# training the network and the optimizer
cost = tf.reduce_mean(tf.square(pred-y))
optimizer = tf.train.AdamOptimizer(learning_rate=lx).minimize(cost)
bout1 = tf.global_variables('hidden1/batch_normalization/moving_mean:0')
bout2 = tf.global_variables('hidden1/batch_normalization/moving_variance:0')
bout3 = tf.global_variables('hidden1/batch_normalization/gamma:0')
bout4 = tf.global_variables('hidden1/batch_normalization/beta:0')
###########################################################
# Train network
init = tf.global_variables_initializer()
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.Session() as sess:
sess.run(init)
# Create dummy data
batchx = np.zeros((2,1))
batchy = np.zeros((2,1))
batchx[0,0]=0.0
batchx[1,0]=1.0
batchy[0,0]=3.0
batchy[1,0]=4.0
_,_ = sess.run([optimizer, extra_update_ops], feed_dict=training: True, x:batchx, y:batchy, lx: 0.001)
print('weight of hidden layer')
W1 = np.array(sess.run(tf.global_variables('hidden1/dense/kernel:0')))
W1x = np.sum(W1, axis=1)
print(W1x)
print()
print('output from hidden layer, batch norm layer, and relu layer')
hid1,b1,b1x = sess.run([hidden_1, bn_1, bn_1x], feed_dict=training: False, x:batchx)
print('hidden_1', hid1)
print('bn_1', b1)
print('bn_1x', b1x)
print()
print('batchnorm parameters')
print('moving mean', sess.run(bout1))
print('moving variance', sess.run(bout2))
print('gamma', sess.run(bout3))
print('beta', sess.run(bout4))
Here is the output I get when I run the code:
weight of hidden layer [[1.404974]]
output from hidden layer, batch norm layer, and relu layer
hidden_1 [[0. ]
[1.404974]]
bn_1 [[-0.40697935]
[ 1.215785 ]]
bn_1x [[0. ]
[1.215785]]
batchnorm parameters
moving mean [array([0.3514931], dtype=float32)]
moving variance [array([0.74709475], dtype=float32)]
gamma [array([0.999], dtype=float32)]
beta [array([-0.001], dtype=float32)]
I am puzzled by the resulting batchnorm parameters. In this particular case, the output from the hidden layer prior to applying the batch norm
are the scalars 0 and 1.404974. But the batch norm parameter moving mean is 0.3514931. This is for the case where I use momentum = 0.5. It is not clear to me why the moving mean after 1 iteration is not exactly the average of 0 and 1.404974 in this case. I was under the impression that the momentum parameter would only kick in from the second batch on.
Any help would be much appreciated.
tensorflow batch-normalization
add a comment |
I am having trouble understanding the implementation of batch normalization in Tensorflow. To illustrate, I have created a simple network with one input node, one hidden node, and one output node and run with 1 batch, with a batch size of 2. My input x consists of a scalar with 2 values (ie a batch size of 2), one set to 0 and other set to 1.
I run for one epoch, and write out the output from the hidden layer (before and after batch normalization) as well as the batch norm moving mean, variance, gamma, and beta.
Here is my code:
import tensorflow as tf
import numpy as np
N_HIDDEN_1 = 1
N_INPUT= 1
N_OUTPUT = 1
###########################################################
# DEFINE THE Network
# Define placeholders for data that will be fed in during execution
x = tf.placeholder(tf.float32, (None, N_INPUT))
y = tf.placeholder(tf.float32, (None, N_OUTPUT))
lx = tf.placeholder(tf.float32, )
training = tf.placeholder_with_default(False, shape=(), name='training')
# Hidden layers with relu activation
with tf.variable_scope('hidden1'):
hidden_1 = tf.layers.dense(x, N_HIDDEN_1, activation=None, use_bias=False)
bn_1 = tf.layers.batch_normalization(hidden_1, training=training, momentum=0.5)
bn_1x = tf.nn.relu(bn_1)
# Output layer
with tf.variable_scope('output'):
predx = tf.layers.dense(bn_1x, N_OUTPUT, activation=None, use_bias=False)
pred = tf.layers.batch_normalization(predx, training=training, momentum=0.5)
###########################################################
# Define the cost function that is optimized when
# training the network and the optimizer
cost = tf.reduce_mean(tf.square(pred-y))
optimizer = tf.train.AdamOptimizer(learning_rate=lx).minimize(cost)
bout1 = tf.global_variables('hidden1/batch_normalization/moving_mean:0')
bout2 = tf.global_variables('hidden1/batch_normalization/moving_variance:0')
bout3 = tf.global_variables('hidden1/batch_normalization/gamma:0')
bout4 = tf.global_variables('hidden1/batch_normalization/beta:0')
###########################################################
# Train network
init = tf.global_variables_initializer()
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.Session() as sess:
sess.run(init)
# Create dummy data
batchx = np.zeros((2,1))
batchy = np.zeros((2,1))
batchx[0,0]=0.0
batchx[1,0]=1.0
batchy[0,0]=3.0
batchy[1,0]=4.0
_,_ = sess.run([optimizer, extra_update_ops], feed_dict=training: True, x:batchx, y:batchy, lx: 0.001)
print('weight of hidden layer')
W1 = np.array(sess.run(tf.global_variables('hidden1/dense/kernel:0')))
W1x = np.sum(W1, axis=1)
print(W1x)
print()
print('output from hidden layer, batch norm layer, and relu layer')
hid1,b1,b1x = sess.run([hidden_1, bn_1, bn_1x], feed_dict=training: False, x:batchx)
print('hidden_1', hid1)
print('bn_1', b1)
print('bn_1x', b1x)
print()
print('batchnorm parameters')
print('moving mean', sess.run(bout1))
print('moving variance', sess.run(bout2))
print('gamma', sess.run(bout3))
print('beta', sess.run(bout4))
Here is the output I get when I run the code:
weight of hidden layer [[1.404974]]
output from hidden layer, batch norm layer, and relu layer
hidden_1 [[0. ]
[1.404974]]
bn_1 [[-0.40697935]
[ 1.215785 ]]
bn_1x [[0. ]
[1.215785]]
batchnorm parameters
moving mean [array([0.3514931], dtype=float32)]
moving variance [array([0.74709475], dtype=float32)]
gamma [array([0.999], dtype=float32)]
beta [array([-0.001], dtype=float32)]
I am puzzled by the resulting batchnorm parameters. In this particular case, the output from the hidden layer prior to applying the batch norm
are the scalars 0 and 1.404974. But the batch norm parameter moving mean is 0.3514931. This is for the case where I use momentum = 0.5. It is not clear to me why the moving mean after 1 iteration is not exactly the average of 0 and 1.404974 in this case. I was under the impression that the momentum parameter would only kick in from the second batch on.
Any help would be much appreciated.
tensorflow batch-normalization
I am having trouble understanding the implementation of batch normalization in Tensorflow. To illustrate, I have created a simple network with one input node, one hidden node, and one output node and run with 1 batch, with a batch size of 2. My input x consists of a scalar with 2 values (ie a batch size of 2), one set to 0 and other set to 1.
I run for one epoch, and write out the output from the hidden layer (before and after batch normalization) as well as the batch norm moving mean, variance, gamma, and beta.
Here is my code:
import tensorflow as tf
import numpy as np
N_HIDDEN_1 = 1
N_INPUT= 1
N_OUTPUT = 1
###########################################################
# DEFINE THE Network
# Define placeholders for data that will be fed in during execution
x = tf.placeholder(tf.float32, (None, N_INPUT))
y = tf.placeholder(tf.float32, (None, N_OUTPUT))
lx = tf.placeholder(tf.float32, )
training = tf.placeholder_with_default(False, shape=(), name='training')
# Hidden layers with relu activation
with tf.variable_scope('hidden1'):
hidden_1 = tf.layers.dense(x, N_HIDDEN_1, activation=None, use_bias=False)
bn_1 = tf.layers.batch_normalization(hidden_1, training=training, momentum=0.5)
bn_1x = tf.nn.relu(bn_1)
# Output layer
with tf.variable_scope('output'):
predx = tf.layers.dense(bn_1x, N_OUTPUT, activation=None, use_bias=False)
pred = tf.layers.batch_normalization(predx, training=training, momentum=0.5)
###########################################################
# Define the cost function that is optimized when
# training the network and the optimizer
cost = tf.reduce_mean(tf.square(pred-y))
optimizer = tf.train.AdamOptimizer(learning_rate=lx).minimize(cost)
bout1 = tf.global_variables('hidden1/batch_normalization/moving_mean:0')
bout2 = tf.global_variables('hidden1/batch_normalization/moving_variance:0')
bout3 = tf.global_variables('hidden1/batch_normalization/gamma:0')
bout4 = tf.global_variables('hidden1/batch_normalization/beta:0')
###########################################################
# Train network
init = tf.global_variables_initializer()
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.Session() as sess:
sess.run(init)
# Create dummy data
batchx = np.zeros((2,1))
batchy = np.zeros((2,1))
batchx[0,0]=0.0
batchx[1,0]=1.0
batchy[0,0]=3.0
batchy[1,0]=4.0
_,_ = sess.run([optimizer, extra_update_ops], feed_dict=training: True, x:batchx, y:batchy, lx: 0.001)
print('weight of hidden layer')
W1 = np.array(sess.run(tf.global_variables('hidden1/dense/kernel:0')))
W1x = np.sum(W1, axis=1)
print(W1x)
print()
print('output from hidden layer, batch norm layer, and relu layer')
hid1,b1,b1x = sess.run([hidden_1, bn_1, bn_1x], feed_dict=training: False, x:batchx)
print('hidden_1', hid1)
print('bn_1', b1)
print('bn_1x', b1x)
print()
print('batchnorm parameters')
print('moving mean', sess.run(bout1))
print('moving variance', sess.run(bout2))
print('gamma', sess.run(bout3))
print('beta', sess.run(bout4))
Here is the output I get when I run the code:
weight of hidden layer [[1.404974]]
output from hidden layer, batch norm layer, and relu layer
hidden_1 [[0. ]
[1.404974]]
bn_1 [[-0.40697935]
[ 1.215785 ]]
bn_1x [[0. ]
[1.215785]]
batchnorm parameters
moving mean [array([0.3514931], dtype=float32)]
moving variance [array([0.74709475], dtype=float32)]
gamma [array([0.999], dtype=float32)]
beta [array([-0.001], dtype=float32)]
I am puzzled by the resulting batchnorm parameters. In this particular case, the output from the hidden layer prior to applying the batch norm
are the scalars 0 and 1.404974. But the batch norm parameter moving mean is 0.3514931. This is for the case where I use momentum = 0.5. It is not clear to me why the moving mean after 1 iteration is not exactly the average of 0 and 1.404974 in this case. I was under the impression that the momentum parameter would only kick in from the second batch on.
Any help would be much appreciated.
tensorflow batch-normalization
tensorflow batch-normalization
asked Nov 9 at 22:26
Prasad Kasibhatla
61
61
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Because you ran the optimizer, it's hard to know what is really happening inside: the hidden_1 values you are printing are not the ones that were used to update the batch norm statistics; they are the post update values.
Anyways, I don't really see the issue:
Moving mean original value = 0.0
batch mean value = (1.404974 - 0.0) / 2.0 = ~0.7
Moving mean value = momentum * Moving mean original value + (1 - momentum) * batch mean value
= 0.0 * 0.5 + (1 - 0.5) * 0.7
= 0.35
Moving variance original value = 1.0
batch variance value = ~0.5
Moving variance value = momentum * Moving variance original value + (1 - momentum) * batch variance value
= 1.0 * 0.5 + (1.0 - 0.5) * 0.5
= 0.75
Thanks! I did not realize that the original values of the moving mean and variance are set to 0 and 1, respectively. I had assumed that the moving mean would come into play only from the second batch on, and therefore I was puzzled the moving means did not match the true means for the first batch.
– Prasad Kasibhatla
Nov 10 at 21:18
I have quick-follow up question. What is the precision of the calculations in Tensorflow? I ask because when I set the momentum for the moving average to 0.0, I can only reproduce the batch mean (mean of output from hidden_1 in my case) to 2-3 decimal places - I expected to get much higher precision.
– Prasad Kasibhatla
Nov 12 at 23:31
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53234062%2fproblem-with-batch-normalization-in-tensorflow%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Because you ran the optimizer, it's hard to know what is really happening inside: the hidden_1 values you are printing are not the ones that were used to update the batch norm statistics; they are the post update values.
Anyways, I don't really see the issue:
Moving mean original value = 0.0
batch mean value = (1.404974 - 0.0) / 2.0 = ~0.7
Moving mean value = momentum * Moving mean original value + (1 - momentum) * batch mean value
= 0.0 * 0.5 + (1 - 0.5) * 0.7
= 0.35
Moving variance original value = 1.0
batch variance value = ~0.5
Moving variance value = momentum * Moving variance original value + (1 - momentum) * batch variance value
= 1.0 * 0.5 + (1.0 - 0.5) * 0.5
= 0.75
Thanks! I did not realize that the original values of the moving mean and variance are set to 0 and 1, respectively. I had assumed that the moving mean would come into play only from the second batch on, and therefore I was puzzled the moving means did not match the true means for the first batch.
– Prasad Kasibhatla
Nov 10 at 21:18
I have quick-follow up question. What is the precision of the calculations in Tensorflow? I ask because when I set the momentum for the moving average to 0.0, I can only reproduce the batch mean (mean of output from hidden_1 in my case) to 2-3 decimal places - I expected to get much higher precision.
– Prasad Kasibhatla
Nov 12 at 23:31
add a comment |
Because you ran the optimizer, it's hard to know what is really happening inside: the hidden_1 values you are printing are not the ones that were used to update the batch norm statistics; they are the post update values.
Anyways, I don't really see the issue:
Moving mean original value = 0.0
batch mean value = (1.404974 - 0.0) / 2.0 = ~0.7
Moving mean value = momentum * Moving mean original value + (1 - momentum) * batch mean value
= 0.0 * 0.5 + (1 - 0.5) * 0.7
= 0.35
Moving variance original value = 1.0
batch variance value = ~0.5
Moving variance value = momentum * Moving variance original value + (1 - momentum) * batch variance value
= 1.0 * 0.5 + (1.0 - 0.5) * 0.5
= 0.75
Thanks! I did not realize that the original values of the moving mean and variance are set to 0 and 1, respectively. I had assumed that the moving mean would come into play only from the second batch on, and therefore I was puzzled the moving means did not match the true means for the first batch.
– Prasad Kasibhatla
Nov 10 at 21:18
I have quick-follow up question. What is the precision of the calculations in Tensorflow? I ask because when I set the momentum for the moving average to 0.0, I can only reproduce the batch mean (mean of output from hidden_1 in my case) to 2-3 decimal places - I expected to get much higher precision.
– Prasad Kasibhatla
Nov 12 at 23:31
add a comment |
Because you ran the optimizer, it's hard to know what is really happening inside: the hidden_1 values you are printing are not the ones that were used to update the batch norm statistics; they are the post update values.
Anyways, I don't really see the issue:
Moving mean original value = 0.0
batch mean value = (1.404974 - 0.0) / 2.0 = ~0.7
Moving mean value = momentum * Moving mean original value + (1 - momentum) * batch mean value
= 0.0 * 0.5 + (1 - 0.5) * 0.7
= 0.35
Moving variance original value = 1.0
batch variance value = ~0.5
Moving variance value = momentum * Moving variance original value + (1 - momentum) * batch variance value
= 1.0 * 0.5 + (1.0 - 0.5) * 0.5
= 0.75
Because you ran the optimizer, it's hard to know what is really happening inside: the hidden_1 values you are printing are not the ones that were used to update the batch norm statistics; they are the post update values.
Anyways, I don't really see the issue:
Moving mean original value = 0.0
batch mean value = (1.404974 - 0.0) / 2.0 = ~0.7
Moving mean value = momentum * Moving mean original value + (1 - momentum) * batch mean value
= 0.0 * 0.5 + (1 - 0.5) * 0.7
= 0.35
Moving variance original value = 1.0
batch variance value = ~0.5
Moving variance value = momentum * Moving variance original value + (1 - momentum) * batch variance value
= 1.0 * 0.5 + (1.0 - 0.5) * 0.5
= 0.75
answered Nov 10 at 16:42
Olivier Dehaene
65519
65519
Thanks! I did not realize that the original values of the moving mean and variance are set to 0 and 1, respectively. I had assumed that the moving mean would come into play only from the second batch on, and therefore I was puzzled the moving means did not match the true means for the first batch.
– Prasad Kasibhatla
Nov 10 at 21:18
I have quick-follow up question. What is the precision of the calculations in Tensorflow? I ask because when I set the momentum for the moving average to 0.0, I can only reproduce the batch mean (mean of output from hidden_1 in my case) to 2-3 decimal places - I expected to get much higher precision.
– Prasad Kasibhatla
Nov 12 at 23:31
add a comment |
Thanks! I did not realize that the original values of the moving mean and variance are set to 0 and 1, respectively. I had assumed that the moving mean would come into play only from the second batch on, and therefore I was puzzled the moving means did not match the true means for the first batch.
– Prasad Kasibhatla
Nov 10 at 21:18
I have quick-follow up question. What is the precision of the calculations in Tensorflow? I ask because when I set the momentum for the moving average to 0.0, I can only reproduce the batch mean (mean of output from hidden_1 in my case) to 2-3 decimal places - I expected to get much higher precision.
– Prasad Kasibhatla
Nov 12 at 23:31
Thanks! I did not realize that the original values of the moving mean and variance are set to 0 and 1, respectively. I had assumed that the moving mean would come into play only from the second batch on, and therefore I was puzzled the moving means did not match the true means for the first batch.
– Prasad Kasibhatla
Nov 10 at 21:18
Thanks! I did not realize that the original values of the moving mean and variance are set to 0 and 1, respectively. I had assumed that the moving mean would come into play only from the second batch on, and therefore I was puzzled the moving means did not match the true means for the first batch.
– Prasad Kasibhatla
Nov 10 at 21:18
I have quick-follow up question. What is the precision of the calculations in Tensorflow? I ask because when I set the momentum for the moving average to 0.0, I can only reproduce the batch mean (mean of output from hidden_1 in my case) to 2-3 decimal places - I expected to get much higher precision.
– Prasad Kasibhatla
Nov 12 at 23:31
I have quick-follow up question. What is the precision of the calculations in Tensorflow? I ask because when I set the momentum for the moving average to 0.0, I can only reproduce the batch mean (mean of output from hidden_1 in my case) to 2-3 decimal places - I expected to get much higher precision.
– Prasad Kasibhatla
Nov 12 at 23:31
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53234062%2fproblem-with-batch-normalization-in-tensorflow%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown