Non-deterministic behavior of TensorFlow while_loop()
Non-deterministic behavior of TensorFlow while_loop()
I have implemented an algorithm using TensorFlow while_loop with large matrices and I have recently noticed strange behavior: I am getting different results with different runs, sometimes even nan values. I have spend some time on narrowing down the problem and I now have the following minimal example. I take a large matrix K of size 15000x15000 filled with ones, and then calculate K⁵u for the vector u filled with ones. After one iteration, I expect as result the vector filled with 15000. But this is not what happens.
while_loop
nan
15000x15000
15000
import numpy as np
import tensorflow as tf
n = 15000
np_kernel_mat = np.ones((n, n), dtype=np.float32)
kernel_mat = tf.constant(np_kernel_mat)
# for debugging
def compare_kernel(kernel_matrix):
print("AverageDifference:" + str(np.average(np.abs(np_kernel_mat - kernel_matrix))))
print("AmountDifferent:" + str(np.count_nonzero(np.abs(np_kernel_mat - kernel_matrix))))
return True
# body of the loop
def iterate(i, u):
# for debugging
with tf.control_dependencies(tf.py_func(compare_kernel, [kernel_mat], [tf.bool])):
u = tf.identity(u)
# multiply
u = tf.matmul(kernel_mat, u)
# check result and kernel
u = tf.Print(u, [tf.count_nonzero(tf.abs(kernel_mat-np_kernel_mat))], "AmountDifferentKernel: ")
u = tf.Print(u, [tf.count_nonzero(tf.abs(u-float(n)))], "AmountDifferentRes: ")
i = i + 1
return i, u
def cond(i, u):
return tf.less(i, 5)
u0 = tf.fill((n, 1), 1.0, name='u0')
iu_0 = (tf.constant(0), u0)
iu_final = tf.while_loop(cond, iterate, iu_0, back_prop=False, parallel_iterations=1)
u_res = iu_final[1]
with tf.Session() as sess:
kernel_mat_eval, u_res_eval = sess.run([kernel_mat, u_res])
print(np.array_equal(kernel_mat_eval, np_kernel_mat))
Now running this I get the following output:
I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:00:0f.0
totalMemory: 11.93GiB freeMemory: 11.81GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11435 MB memory) -> physical GPU (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:00:0f.0, compute capability: 5.2)
minimal_example.py:25: RuntimeWarning: invalid value encountered in subtr[8/281]
print("AverageDifference:" + str(np.average(np.abs(np_kernel_mat - kernel_matr
ix))))
/usr/local/lib/python3.6/dist-packages/numpy/core/_methods.py:70: RuntimeWarning
: overflow encountered in reduce
ret = umr_sum(arr, axis, dtype, out, keepdims)
AverageDifference:nan
minimal_example.py:26: RuntimeWarning: invalid value encountered in subtract
print("AmountDifferent:" + str(np.count_nonzero(np.abs(np_kernel_mat - kernel_
matrix))))
AmountDifferent:4096
AmountDifferentKernel: [0]
AmountDifferentRes, DifferenceRes: [4][inf]
AverageDifference:nan
AmountDifferent:4096
AmountDifferentKernel: [0]
AmountDifferentRes, DifferenceRes: [15000][nan]
AverageDifference:nan
AmountDifferent:4096
AmountDifferentKernel: [0]
AmountDifferentRes, DifferenceRes: [15000][nan]
AverageDifference:nan
...
It is clear that in the second iteration, the result is not 15000 anymore, but that doesn't explain why the difference is nan. On CPU, everything works fine (the difference is then something like 2e08).
15000
2e08
Now my questions are:
Why is the output of the Print op different to the output of the py_func print? Why is the evaluation of the matrix again equal to the original matrix? Why do I get different results over different runs? Can someone reproduce this?
py_func
I am running this on Ubuntu 16.04, TensorFlow 1.8, numpy 1.14, python3.6.
GPU is GeForceGTX 1080.
Ubuntu 16.04
TensorFlow 1.8
numpy 1.14
python3.6
NVRM version: NVIDIA UNIX x86_64 Kernel Module 390.48 Thu Mar 22 00:42:57 PDT 2018
GCC
version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9)
I cannot reproduce your results on TF 1.10.1 running on a GTX 1080 Ti (Ubuntu, python 3.6).
– P-Gn
Sep 20 '18 at 14:28
Neither can I with TF 1.8 (same environment).
– P-Gn
Sep 20 '18 at 15:19
I have updated to TF 1.10 and the problem seems to have disappeared: however, I still get large deviations from the expected result for large matrices (I changed the example to compare the result against the actual expected result of n^(i+1)).
– Lia Fiona
Sep 22 '18 at 10:46
@LiaFiona Okay I can see that behavior now, both with GPU and CPU. It is definitely a precision problem, switching to
float64 reduces it a lot. Note that the numbers you are computing in the last iteration are about 7.6×10²⁰, so an error of 1.3×10¹⁵ is "relatively small" (float32 is typically precise to about 7 decimal positions, but I guess the error accumulates through the iterations).– jdehesa
Sep 25 '18 at 9:32
float64
float32
0
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you agree to our terms of service, privacy policy and cookie policy
Just to drop by and say that this is a really well posed question. Thanks for taking the time to produce it.
– Jorge Leitão
Sep 17 '18 at 15:54