cudnnLSTM won't restore into a cudnnCompatibleLSTM

I'm trying to train an elementary network on a GPU machine (AWS p3x2, Volta) with TF 1.9 / 1.10. Not Keras -- TF only.

Based on the [rather limited] documentation my aim is to train with cudnnLSTM cell, save a checkpoint, and then restore for inference on a CPU. Per that aim, I thought that cudnnCompatibleLSTM is the way to go as it is supposed to suck in the weights from the GPU-specific LSTM implementation.

I get the following error, no matter what I try:

NotFoundError (see above for traceback): Key caseTesting/testbed/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias not found in checkpoint [[Node: caseTesting/testbed/save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_ caseTesting/testbed/save/Const_0_0, caseTesting/testbed/save/RestoreV2/tensor_names, caseTesting/testbed/save/RestoreV2/shape_and_slices)]]

Another related issue is that cudnnCompatibleLSTM and cudnnLSTM are not the same mathematically. I get different results for initialized cells. [initialized by some tf.constant() as initializer, no save/restore]. Seems that cudnnLSTM does depend on the random seed [dropout is zero], which means that there are some unique tensor/tensor initialization going on, separating it from cudnnCompatibleLSTM.

Does anybody have a clue?

1 Answer
1

Some answers:

Assuming you've read the documentation on cudnnLSTM + cudnnCompatibleLSTM [it's mostly the documentation in the code, sadly]

with tf.variable_scope("cudnn_lstm"):

Nothing happy to say about mathematical inequivalence of the cudnnLSTM and the standard LSTM. Not sure yet how to initialise the forget gate to 1.0, although I'm sure this can be done with some hacking.

Btw, performance gain [on was p3x2], on my current architecture -- having a significant DNN after the LSTM layers -- are not dramatic as advertised. The cudnnLSTM-nased network is 2-3X faster [training!] than the CudnnCompatible and only 1.5-2.5X faster than a BlockFused-based network. Standard LSTMCell performs slightly slower than the CudnnCompatible version.
– amitmi
Aug 29 at 9:51

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt