Non-deterministic behavior of TensorFlow while

Non-deterministic behavior of TensorFlow while_loop()

I have implemented an algorithm using TensorFlow while_loop with large matrices and I have recently noticed strange behavior: I am getting different results with different runs, sometimes even nan values. I have spend some time on narrowing down the problem and I now have the following minimal example. I take a large matrix K of size 15000x15000 filled with ones, and then calculate K⁵u for the vector u filled with ones. After one iteration, I expect as result the vector filled with 15000. But this is not what happens.

while_loop

nan

15000x15000

15000

import numpy as np import tensorflow as tf n = 15000 np_kernel_mat = np.ones((n, n), dtype=np.float32) kernel_mat = tf.constant(np_kernel_mat) # for debugging def compare_kernel(kernel_matrix): print("AverageDifference:" + str(np.average(np.abs(np_kernel_mat - kernel_matrix)))) print("AmountDifferent:" + str(np.count_nonzero(np.abs(np_kernel_mat - kernel_matrix)))) return True # body of the loop def iterate(i, u): # for debugging with tf.control_dependencies(tf.py_func(compare_kernel, [kernel_mat], [tf.bool])): u = tf.identity(u) # multiply u = tf.matmul(kernel_mat, u) # check result and kernel u = tf.Print(u, [tf.count_nonzero(tf.abs(kernel_mat-np_kernel_mat))], "AmountDifferentKernel: ") u = tf.Print(u, [tf.count_nonzero(tf.abs(u-float(n)))], "AmountDifferentRes: ") i = i + 1 return i, u def cond(i, u): return tf.less(i, 5) u0 = tf.fill((n, 1), 1.0, name='u0') iu_0 = (tf.constant(0), u0) iu_final = tf.while_loop(cond, iterate, iu_0, back_prop=False, parallel_iterations=1) u_res = iu_final[1] with tf.Session() as sess: kernel_mat_eval, u_res_eval = sess.run([kernel_mat, u_res]) print(np.array_equal(kernel_mat_eval, np_kernel_mat))

Now running this I get the following output:

I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076 pciBusID: 0000:00:0f.0 totalMemory: 11.93GiB freeMemory: 11.81GiB I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11435 MB memory) -> physical GPU (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:00:0f.0, compute capability: 5.2) minimal_example.py:25: RuntimeWarning: invalid value encountered in subtr[8/281] print("AverageDifference:" + str(np.average(np.abs(np_kernel_mat - kernel_matr ix)))) /usr/local/lib/python3.6/dist-packages/numpy/core/_methods.py:70: RuntimeWarning : overflow encountered in reduce ret = umr_sum(arr, axis, dtype, out, keepdims) AverageDifference:nan minimal_example.py:26: RuntimeWarning: invalid value encountered in subtract print("AmountDifferent:" + str(np.count_nonzero(np.abs(np_kernel_mat - kernel_ matrix)))) AmountDifferent:4096 AmountDifferentKernel: [0] AmountDifferentRes, DifferenceRes: [4][inf] AverageDifference:nan AmountDifferent:4096 AmountDifferentKernel: [0] AmountDifferentRes, DifferenceRes: [15000][nan] AverageDifference:nan AmountDifferent:4096 AmountDifferentKernel: [0] AmountDifferentRes, DifferenceRes: [15000][nan] AverageDifference:nan ...

It is clear that in the second iteration, the result is not 15000 anymore, but that doesn't explain why the difference is nan. On CPU, everything works fine (the difference is then something like 2e08).

15000

2e08

Now my questions are:
Why is the output of the Print op different to the output of the py_func print? Why is the evaluation of the matrix again equal to the original matrix? Why do I get different results over different runs? Can someone reproduce this?

py_func

I am running this on Ubuntu 16.04, TensorFlow 1.8, numpy 1.14, python3.6.
GPU is GeForceGTX 1080.

Ubuntu 16.04

TensorFlow 1.8

numpy 1.14

python3.6

NVRM version: NVIDIA UNIX x86_64 Kernel Module 390.48 Thu Mar 22 00:42:57 PDT 2018 GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9)

Just to drop by and say that this is a really well posed question. Thanks for taking the time to produce it.

– Jorge Leitão
Sep 17 '18 at 15:54

I cannot reproduce your results on TF 1.10.1 running on a GTX 1080 Ti (Ubuntu, python 3.6).

– P-Gn
Sep 20 '18 at 14:28

Neither can I with TF 1.8 (same environment).

– P-Gn
Sep 20 '18 at 15:19

I have updated to TF 1.10 and the problem seems to have disappeared: however, I still get large deviations from the expected result for large matrices (I changed the example to compare the result against the actual expected result of n^(i+1)).

– Lia Fiona
Sep 22 '18 at 10:46

@LiaFiona Okay I can see that behavior now, both with GPU and CPU. It is definitely a precision problem, switching to float64 reduces it a lot. Note that the numbers you are computing in the last iteration are about 7.6×10²⁰, so an error of 1.3×10¹⁵ is "relatively small" (float32 is typically precise to about 7 decimal positions, but I guess the error accumulates through the iterations).

– jdehesa
Sep 25 '18 at 9:32

float64

float32

0

Thanks for contributing an answer to Stack Overflow!

But avoid …

To learn more, see our tips on writing great answers.

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service, privacy policy and cookie policy

搜尋此網誌

Dfyjkt

Non-deterministic behavior of TensorFlow while_loop()

Non-deterministic behavior of TensorFlow while_loop()

0

Popular posts from this blog

How do I collapse sections of code in Visual Studio Code for Windows?