cudnnLSTM won't restore into a cudnnCompatibleLSTM

cudnnLSTM won't restore into a cudnnCompatibleLSTM



I'm trying to train an elementary network on a GPU machine (AWS p3x2, Volta) with TF 1.9 / 1.10. Not Keras -- TF only.



Based on the [rather limited] documentation my aim is to train with cudnnLSTM cell, save a checkpoint, and then restore for inference on a CPU. Per that aim, I thought that cudnnCompatibleLSTM is the way to go as it is supposed to suck in the weights from the GPU-specific LSTM implementation.



I get the following error, no matter what I try:


NotFoundError (see above for traceback): Key caseTesting/testbed/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias not found in checkpoint [[Node: caseTesting/testbed/save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT],
_device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_ caseTesting/testbed/save/Const_0_0, caseTesting/testbed/save/RestoreV2/tensor_names, caseTesting/testbed/save/RestoreV2/shape_and_slices)]]



Another related issue is that cudnnCompatibleLSTM and cudnnLSTM are not the same mathematically. I get different results for initialized cells. [initialized by some tf.constant() as initializer, no save/restore]. Seems that cudnnLSTM does depend on the random seed [dropout is zero], which means that there are some unique tensor/tensor initialization going on, separating it from cudnnCompatibleLSTM.



Does anybody have a clue?




1 Answer
1



Some answers:



Assuming you've read the documentation on cudnnLSTM + cudnnCompatibleLSTM [it's mostly the documentation in the code, sadly]


with tf.variable_scope("cudnn_lstm"):



Nothing happy to say about mathematical inequivalence of the cudnnLSTM and the standard LSTM. Not sure yet how to initialise the forget gate to 1.0, although I'm sure this can be done with some hacking.





Btw, performance gain [on was p3x2], on my current architecture -- having a significant DNN after the LSTM layers -- are not dramatic as advertised. The cudnnLSTM-nased network is 2-3X faster [training!] than the CudnnCompatible and only 1.5-2.5X faster than a BlockFused-based network. Standard LSTMCell performs slightly slower than the CudnnCompatible version.
– amitmi
Aug 29 at 9:51






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)