NaN in the expected values, even though masked, introduces NaN in weight matrix

NaN in the expected values, even though masked, introduces NaN in weight matrix



Trying to deal with missing data, I wrote the following model and ran it. The output is given below. Why does the training step on NaN expected values, which are masked by loss_0_where_nan (and the history shows that the loss is indeed evaluated to 0.0), nonetheless introduce NaN weights in the weight matrices of both hidden and max_min_pred? I first thought this might be some weighting of individual parameter learning with output values, which I thought might be specific to the Adadelta optimizer. But it also happens for SGD.


loss_0_where_nan


0.0


NaN


hidden


max_min_pred


Adadelta


import keras
from keras.models import Model
from keras.optimizers import Adadelta
from keras.losses import mean_squared_error
from keras.layers import Input, Dense

import tensorflow as tf
import numpy

def loss_0_where_nan(loss_function, msg=""):
def filtered_loss_function(y_true, y_pred):
with_nans = loss_function(y_true, y_pred)
nans = tf.is_nan(with_nans)
filtered = tf.where(nans, tf.zeros_like(with_nans), with_nans)
filtered = tf.Print(filtered,
[y_true, y_pred, nans, with_nans, filtered],
message=msg)
return filtered
return filtered_loss_function

input = Input(shape=(3,))

hidden = Dense(2)(input)
min_pred = Dense(1)(hidden)
max_min_pred = Dense(1)(hidden)

model = Model(inputs=[input],
outputs=[min_pred, max_min_pred])

model.compile(
optimizer=Adadelta(),
loss=[loss_0_where_nan(mean_squared_error, "aux: "),
loss_0_where_nan(mean_squared_error, "main: ")],
loss_weights=[0.2, 1.0])

def random_values(n, missing=False):
for i in range(n):
x = numpy.random.random(size=(2, 3))
_min = numpy.minimum(x[..., 0], x[..., 1])
if missing:
_max_min = numpy.full((len(x), 1), numpy.nan)
else:
_max_min = numpy.maximum(_min, x[..., 2]).reshape((-1, 1))
# print(x, numpy.array(_min).reshape((-1, 1)), numpy.array(_max_min), sep="n", end="nn")
yield x, [numpy.array(_min).reshape((-1, 1)), numpy.array(_max_min)]

model.fit_generator(random_values(2, False),
steps_per_epoch=2,
verbose=False)
print("With missing")
history = model.fit_generator(random_values(1, True),
steps_per_epoch=1,
verbose=False)
print("Normal")
model.fit_generator(random_values(2, False),
steps_per_epoch=2,
verbose=False)

print(history.history)



Output:


main: [[0.29131493][0.769406676]][[-1.38235903][-3.32388687]][0 0][2.80118465 16.7550526][2.80118465 16.7550526]
aux: [[0.0422333851][0.0949674547]][[1.01466811][0.648737907]][0 0][0.945629239 0.306661695][0.945629239 0.306661695]
main: [[0.451149166][0.671600938]][[-2.46504498][-2.74316335]][0 0][8.50418854 11.6606159][8.50418854 11.6606159]
aux: [[0.451149166][0.355992794]][[0.893445313][0.917516708]][0 0][0.195625886 0.315309107][0.195625886 0.315309107]
With missing
aux: [[0.406784][0.44401589]][[0.852455556][1.23527527]][0 0][0.198623136 0.62609148][0.198623136 0.62609148]
main: [[nan][nan]][[-3.2140317][-2.22139478]][1 1][nan nan][0 0]
Normal
aux: [[0.490041673][0.00489727268]][[nan][nan]][1 1][nan nan][0 0]
main: [[0.867286][0.949406743]][[nan][nan]][1 1][nan nan][0 0]
aux: [[0.630184174][0.391073674]][[nan][nan]][1 1][nan nan][0 0]
main: [[0.630184174][0.391073674]][[nan][nan]][1 1][nan nan][0 0]
'loss': [0.08247146010398865], 'dense_1_loss': [0.41235730051994324], 'dense_2_loss': [0.0]




1 Answer
1



It seems like a problem similar to this TF issue about tf.where().


tf.where()



When y_true is nan, the gradient of filtered = tf.where(nans, tf.zeros_like(with_nans), with_nans) is calculated like d/dw (filtered) = 1 * d/dw (tf.zeros_like) + 0 * d/dw (with_nans). Since d/dw (with_nans) is nan in this case, the final gradient is 1 * 0 + 0 * nan = nan.


y_true


nan


filtered = tf.where(nans, tf.zeros_like(with_nans), with_nans)


d/dw (filtered) = 1 * d/dw (tf.zeros_like) + 0 * d/dw (with_nans)


d/dw (with_nans)


nan


1 * 0 + 0 * nan = nan



To avoid this issue, instead of setting the nan loss values to 0, you can set y_true to y_pred in order to get 0 loss values whenever y_true is nan.


nan


0


y_true


y_pred


y_true


nan


def filtered_loss_function(y_true, y_pred):
nans = tf.is_nan(y_true)
masked_y_true = tf.where(nans, y_pred, y_true)
filtered = loss_function(masked_y_true, y_pred)
return filtered



Since filtered no longer depends on nan values (the values are masked out before entering the loss function), the gradients will not have nans.


filtered


nan


nan


>>> model.get_weights()
[array([[ 0.9761261 , -0.7472908 ],
[-0.12295872, 0.39413464],
[-0.16676795, 0.30844116]], dtype=float32),
array([-0.00581209, 0.00300716], dtype=float32),
array([[-0.31789184],
[-0.87912357]], dtype=float32),
array([0.00628144], dtype=float32),
array([[-1.0932552 ],
[ 0.11788104]], dtype=float32),
array([0.00575602], dtype=float32)]



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Some of your past answers have not been well-received, and you're in danger of being blocked from answering.



Please pay close attention to the following guidance:



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

How do I collapse sections of code in Visual Studio Code for Windows?

ャフサォクコ ケウ,コ,ワ メ,ロスョノ゙,クネ,フムカヤヲニ,エコ゚ツ ウイオン゙ケワサネォキモュキォウイノンコチ゚メヌナイゥフュ,カヒウネェ ネ,ホノケ,ムュキ ッボーミュハ,チ ツス ィ メウイマヤ,゙ウチ ヅ ロ,ォジヌェ ャヌット ェ,マャ,チナエヒネソキツテ トホヲヲミーァ