Implementing conv layers in lstm network










0















I am trying to create an English to French translator. I have a basic model which works fairly well:



Average step time: 232.3



Final loss: 0.4969



Model:




Layer (type) Output Shape Param # 
=================================================================
embedding_1 (Embedding) (None, 15, 336) 67200
_________________________________________________________________
lstm_1 (LSTM) (None, 256) 607232
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 21, 256) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 21, 256) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 21, 256) 525312
_________________________________________________________________
dropout_2 (Dropout) (None, 21, 256) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 21, 336) 86352
=================================================================
Total params: 1,286,096
Trainable params: 1,286,096
Non-trainable params: 0
_________________________________________________________________


Python:



model = Sequential()
model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len, mask_zero=True))
model.add(LSTM(256))
model.add(RepeatVector(fr_max_len))
model.add(Dropout(0.5))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))


I then tried to implement another LSTM layer and two 1d convolutional layers:



Average epoch time: 402s



Final loss: 1.0899



Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 15, 336) 67200
_________________________________________________________________
dropout_1 (Dropout) (None, 15, 336) 0
_________________________________________________________________
conv1d_1 (Conv1D) (None, 15, 32) 32288
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 7, 32) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 7, 32) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 7, 16) 2064
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 3, 16) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 3, 16) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 128) 74240
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 21, 128) 0
_________________________________________________________________
dropout_4 (Dropout) (None, 21, 128) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 21, 512) 1312768
_________________________________________________________________
dropout_5 (Dropout) (None, 21, 512) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 21, 128) 328192
_________________________________________________________________
dropout_6 (Dropout) (None, 21, 128) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 21, 336) 43344
=================================================================
Total params: 1,860,096
Trainable params: 1,860,096
Non-trainable params: 0
_________________________________________________________________


Python:



model = Sequential()
model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len))
model.add(Dropout(0.2))
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.3))
model.add(Conv1D(filters=16, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.2))
model.add(LSTM(128))
model.add(RepeatVector(fr_max_len))
model.add(Dropout(0.2))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))


You can see that the second one not only took longer to train, had a larger loss but also was far less accurate. Why would this be? My assumption is that I implemented the convolutional layers incorrectly. What is the best way to implement a convolutional layer in a recurrent neural network (or lstm network)?










share|improve this question


























    0















    I am trying to create an English to French translator. I have a basic model which works fairly well:



    Average step time: 232.3



    Final loss: 0.4969



    Model:




    Layer (type) Output Shape Param # 
    =================================================================
    embedding_1 (Embedding) (None, 15, 336) 67200
    _________________________________________________________________
    lstm_1 (LSTM) (None, 256) 607232
    _________________________________________________________________
    repeat_vector_1 (RepeatVecto (None, 21, 256) 0
    _________________________________________________________________
    dropout_1 (Dropout) (None, 21, 256) 0
    _________________________________________________________________
    lstm_2 (LSTM) (None, 21, 256) 525312
    _________________________________________________________________
    dropout_2 (Dropout) (None, 21, 256) 0
    _________________________________________________________________
    time_distributed_1 (TimeDist (None, 21, 336) 86352
    =================================================================
    Total params: 1,286,096
    Trainable params: 1,286,096
    Non-trainable params: 0
    _________________________________________________________________


    Python:



    model = Sequential()
    model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len, mask_zero=True))
    model.add(LSTM(256))
    model.add(RepeatVector(fr_max_len))
    model.add(Dropout(0.5))
    model.add(LSTM(256, return_sequences=True))
    model.add(Dropout(0.2))
    model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))


    I then tried to implement another LSTM layer and two 1d convolutional layers:



    Average epoch time: 402s



    Final loss: 1.0899



    Layer (type) Output Shape Param #
    =================================================================
    embedding_1 (Embedding) (None, 15, 336) 67200
    _________________________________________________________________
    dropout_1 (Dropout) (None, 15, 336) 0
    _________________________________________________________________
    conv1d_1 (Conv1D) (None, 15, 32) 32288
    _________________________________________________________________
    max_pooling1d_1 (MaxPooling1 (None, 7, 32) 0
    _________________________________________________________________
    dropout_2 (Dropout) (None, 7, 32) 0
    _________________________________________________________________
    conv1d_2 (Conv1D) (None, 7, 16) 2064
    _________________________________________________________________
    max_pooling1d_2 (MaxPooling1 (None, 3, 16) 0
    _________________________________________________________________
    dropout_3 (Dropout) (None, 3, 16) 0
    _________________________________________________________________
    lstm_1 (LSTM) (None, 128) 74240
    _________________________________________________________________
    repeat_vector_1 (RepeatVecto (None, 21, 128) 0
    _________________________________________________________________
    dropout_4 (Dropout) (None, 21, 128) 0
    _________________________________________________________________
    lstm_2 (LSTM) (None, 21, 512) 1312768
    _________________________________________________________________
    dropout_5 (Dropout) (None, 21, 512) 0
    _________________________________________________________________
    lstm_3 (LSTM) (None, 21, 128) 328192
    _________________________________________________________________
    dropout_6 (Dropout) (None, 21, 128) 0
    _________________________________________________________________
    time_distributed_1 (TimeDist (None, 21, 336) 43344
    =================================================================
    Total params: 1,860,096
    Trainable params: 1,860,096
    Non-trainable params: 0
    _________________________________________________________________


    Python:



    model = Sequential()
    model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len))
    model.add(Dropout(0.2))
    model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Dropout(0.3))
    model.add(Conv1D(filters=16, kernel_size=4, padding='same', activation='relu'))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Dropout(0.2))
    model.add(LSTM(128))
    model.add(RepeatVector(fr_max_len))
    model.add(Dropout(0.2))
    model.add(LSTM(256, return_sequences=True))
    model.add(Dropout(0.5))
    model.add(LSTM(256, return_sequences=True))
    model.add(Dropout(0.2))
    model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))


    You can see that the second one not only took longer to train, had a larger loss but also was far less accurate. Why would this be? My assumption is that I implemented the convolutional layers incorrectly. What is the best way to implement a convolutional layer in a recurrent neural network (or lstm network)?










    share|improve this question
























      0












      0








      0








      I am trying to create an English to French translator. I have a basic model which works fairly well:



      Average step time: 232.3



      Final loss: 0.4969



      Model:




      Layer (type) Output Shape Param # 
      =================================================================
      embedding_1 (Embedding) (None, 15, 336) 67200
      _________________________________________________________________
      lstm_1 (LSTM) (None, 256) 607232
      _________________________________________________________________
      repeat_vector_1 (RepeatVecto (None, 21, 256) 0
      _________________________________________________________________
      dropout_1 (Dropout) (None, 21, 256) 0
      _________________________________________________________________
      lstm_2 (LSTM) (None, 21, 256) 525312
      _________________________________________________________________
      dropout_2 (Dropout) (None, 21, 256) 0
      _________________________________________________________________
      time_distributed_1 (TimeDist (None, 21, 336) 86352
      =================================================================
      Total params: 1,286,096
      Trainable params: 1,286,096
      Non-trainable params: 0
      _________________________________________________________________


      Python:



      model = Sequential()
      model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len, mask_zero=True))
      model.add(LSTM(256))
      model.add(RepeatVector(fr_max_len))
      model.add(Dropout(0.5))
      model.add(LSTM(256, return_sequences=True))
      model.add(Dropout(0.2))
      model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))


      I then tried to implement another LSTM layer and two 1d convolutional layers:



      Average epoch time: 402s



      Final loss: 1.0899



      Layer (type) Output Shape Param #
      =================================================================
      embedding_1 (Embedding) (None, 15, 336) 67200
      _________________________________________________________________
      dropout_1 (Dropout) (None, 15, 336) 0
      _________________________________________________________________
      conv1d_1 (Conv1D) (None, 15, 32) 32288
      _________________________________________________________________
      max_pooling1d_1 (MaxPooling1 (None, 7, 32) 0
      _________________________________________________________________
      dropout_2 (Dropout) (None, 7, 32) 0
      _________________________________________________________________
      conv1d_2 (Conv1D) (None, 7, 16) 2064
      _________________________________________________________________
      max_pooling1d_2 (MaxPooling1 (None, 3, 16) 0
      _________________________________________________________________
      dropout_3 (Dropout) (None, 3, 16) 0
      _________________________________________________________________
      lstm_1 (LSTM) (None, 128) 74240
      _________________________________________________________________
      repeat_vector_1 (RepeatVecto (None, 21, 128) 0
      _________________________________________________________________
      dropout_4 (Dropout) (None, 21, 128) 0
      _________________________________________________________________
      lstm_2 (LSTM) (None, 21, 512) 1312768
      _________________________________________________________________
      dropout_5 (Dropout) (None, 21, 512) 0
      _________________________________________________________________
      lstm_3 (LSTM) (None, 21, 128) 328192
      _________________________________________________________________
      dropout_6 (Dropout) (None, 21, 128) 0
      _________________________________________________________________
      time_distributed_1 (TimeDist (None, 21, 336) 43344
      =================================================================
      Total params: 1,860,096
      Trainable params: 1,860,096
      Non-trainable params: 0
      _________________________________________________________________


      Python:



      model = Sequential()
      model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len))
      model.add(Dropout(0.2))
      model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
      model.add(MaxPooling1D(pool_size=2))
      model.add(Dropout(0.3))
      model.add(Conv1D(filters=16, kernel_size=4, padding='same', activation='relu'))
      model.add(MaxPooling1D(pool_size=2))
      model.add(Dropout(0.2))
      model.add(LSTM(128))
      model.add(RepeatVector(fr_max_len))
      model.add(Dropout(0.2))
      model.add(LSTM(256, return_sequences=True))
      model.add(Dropout(0.5))
      model.add(LSTM(256, return_sequences=True))
      model.add(Dropout(0.2))
      model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))


      You can see that the second one not only took longer to train, had a larger loss but also was far less accurate. Why would this be? My assumption is that I implemented the convolutional layers incorrectly. What is the best way to implement a convolutional layer in a recurrent neural network (or lstm network)?










      share|improve this question














      I am trying to create an English to French translator. I have a basic model which works fairly well:



      Average step time: 232.3



      Final loss: 0.4969



      Model:




      Layer (type) Output Shape Param # 
      =================================================================
      embedding_1 (Embedding) (None, 15, 336) 67200
      _________________________________________________________________
      lstm_1 (LSTM) (None, 256) 607232
      _________________________________________________________________
      repeat_vector_1 (RepeatVecto (None, 21, 256) 0
      _________________________________________________________________
      dropout_1 (Dropout) (None, 21, 256) 0
      _________________________________________________________________
      lstm_2 (LSTM) (None, 21, 256) 525312
      _________________________________________________________________
      dropout_2 (Dropout) (None, 21, 256) 0
      _________________________________________________________________
      time_distributed_1 (TimeDist (None, 21, 336) 86352
      =================================================================
      Total params: 1,286,096
      Trainable params: 1,286,096
      Non-trainable params: 0
      _________________________________________________________________


      Python:



      model = Sequential()
      model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len, mask_zero=True))
      model.add(LSTM(256))
      model.add(RepeatVector(fr_max_len))
      model.add(Dropout(0.5))
      model.add(LSTM(256, return_sequences=True))
      model.add(Dropout(0.2))
      model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))


      I then tried to implement another LSTM layer and two 1d convolutional layers:



      Average epoch time: 402s



      Final loss: 1.0899



      Layer (type) Output Shape Param #
      =================================================================
      embedding_1 (Embedding) (None, 15, 336) 67200
      _________________________________________________________________
      dropout_1 (Dropout) (None, 15, 336) 0
      _________________________________________________________________
      conv1d_1 (Conv1D) (None, 15, 32) 32288
      _________________________________________________________________
      max_pooling1d_1 (MaxPooling1 (None, 7, 32) 0
      _________________________________________________________________
      dropout_2 (Dropout) (None, 7, 32) 0
      _________________________________________________________________
      conv1d_2 (Conv1D) (None, 7, 16) 2064
      _________________________________________________________________
      max_pooling1d_2 (MaxPooling1 (None, 3, 16) 0
      _________________________________________________________________
      dropout_3 (Dropout) (None, 3, 16) 0
      _________________________________________________________________
      lstm_1 (LSTM) (None, 128) 74240
      _________________________________________________________________
      repeat_vector_1 (RepeatVecto (None, 21, 128) 0
      _________________________________________________________________
      dropout_4 (Dropout) (None, 21, 128) 0
      _________________________________________________________________
      lstm_2 (LSTM) (None, 21, 512) 1312768
      _________________________________________________________________
      dropout_5 (Dropout) (None, 21, 512) 0
      _________________________________________________________________
      lstm_3 (LSTM) (None, 21, 128) 328192
      _________________________________________________________________
      dropout_6 (Dropout) (None, 21, 128) 0
      _________________________________________________________________
      time_distributed_1 (TimeDist (None, 21, 336) 43344
      =================================================================
      Total params: 1,860,096
      Trainable params: 1,860,096
      Non-trainable params: 0
      _________________________________________________________________


      Python:



      model = Sequential()
      model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len))
      model.add(Dropout(0.2))
      model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
      model.add(MaxPooling1D(pool_size=2))
      model.add(Dropout(0.3))
      model.add(Conv1D(filters=16, kernel_size=4, padding='same', activation='relu'))
      model.add(MaxPooling1D(pool_size=2))
      model.add(Dropout(0.2))
      model.add(LSTM(128))
      model.add(RepeatVector(fr_max_len))
      model.add(Dropout(0.2))
      model.add(LSTM(256, return_sequences=True))
      model.add(Dropout(0.5))
      model.add(LSTM(256, return_sequences=True))
      model.add(Dropout(0.2))
      model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))


      You can see that the second one not only took longer to train, had a larger loss but also was far less accurate. Why would this be? My assumption is that I implemented the convolutional layers incorrectly. What is the best way to implement a convolutional layer in a recurrent neural network (or lstm network)?







      keras neural-network conv-neural-network lstm rnn






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 12 '18 at 18:15









      zoecarverzoecarver

      1,3741126




      1,3741126






















          0






          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53267881%2fimplementing-conv-layers-in-lstm-network%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53267881%2fimplementing-conv-layers-in-lstm-network%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

          ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế

          ⃀⃉⃄⃅⃍,⃂₼₡₰⃉₡₿₢⃉₣⃄₯⃊₮₼₹₱₦₷⃄₪₼₶₳₫⃍₽ ₫₪₦⃆₠₥⃁₸₴₷⃊₹⃅⃈₰⃁₫ ⃎⃍₩₣₷ ₻₮⃊⃀⃄⃉₯,⃏⃊,₦⃅₪,₼⃀₾₧₷₾ ₻ ₸₡ ₾,₭⃈₴⃋,€⃁,₩ ₺⃌⃍⃁₱⃋⃋₨⃊⃁⃃₼,⃎,₱⃍₲₶₡ ⃍⃅₶₨₭,⃉₭₾₡₻⃀ ₼₹⃅₹,₻₭ ⃌