Implementing conv layers in lstm network
I am trying to create an English to French translator. I have a basic model which works fairly well:
Average step time: 232.3
Final loss: 0.4969
Model:
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 15, 336) 67200
_________________________________________________________________
lstm_1 (LSTM) (None, 256) 607232
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 21, 256) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 21, 256) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 21, 256) 525312
_________________________________________________________________
dropout_2 (Dropout) (None, 21, 256) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 21, 336) 86352
=================================================================
Total params: 1,286,096
Trainable params: 1,286,096
Non-trainable params: 0
_________________________________________________________________
Python:
model = Sequential()
model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len, mask_zero=True))
model.add(LSTM(256))
model.add(RepeatVector(fr_max_len))
model.add(Dropout(0.5))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))
I then tried to implement another LSTM layer and two 1d convolutional layers:
Average epoch time: 402s
Final loss: 1.0899
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 15, 336) 67200
_________________________________________________________________
dropout_1 (Dropout) (None, 15, 336) 0
_________________________________________________________________
conv1d_1 (Conv1D) (None, 15, 32) 32288
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 7, 32) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 7, 32) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 7, 16) 2064
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 3, 16) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 3, 16) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 128) 74240
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 21, 128) 0
_________________________________________________________________
dropout_4 (Dropout) (None, 21, 128) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 21, 512) 1312768
_________________________________________________________________
dropout_5 (Dropout) (None, 21, 512) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 21, 128) 328192
_________________________________________________________________
dropout_6 (Dropout) (None, 21, 128) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 21, 336) 43344
=================================================================
Total params: 1,860,096
Trainable params: 1,860,096
Non-trainable params: 0
_________________________________________________________________
Python:
model = Sequential()
model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len))
model.add(Dropout(0.2))
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.3))
model.add(Conv1D(filters=16, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.2))
model.add(LSTM(128))
model.add(RepeatVector(fr_max_len))
model.add(Dropout(0.2))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))
You can see that the second one not only took longer to train, had a larger loss but also was far less accurate. Why would this be? My assumption is that I implemented the convolutional layers incorrectly. What is the best way to implement a convolutional layer in a recurrent neural network (or lstm network)?
keras neural-network conv-neural-network lstm rnn
add a comment |
I am trying to create an English to French translator. I have a basic model which works fairly well:
Average step time: 232.3
Final loss: 0.4969
Model:
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 15, 336) 67200
_________________________________________________________________
lstm_1 (LSTM) (None, 256) 607232
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 21, 256) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 21, 256) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 21, 256) 525312
_________________________________________________________________
dropout_2 (Dropout) (None, 21, 256) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 21, 336) 86352
=================================================================
Total params: 1,286,096
Trainable params: 1,286,096
Non-trainable params: 0
_________________________________________________________________
Python:
model = Sequential()
model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len, mask_zero=True))
model.add(LSTM(256))
model.add(RepeatVector(fr_max_len))
model.add(Dropout(0.5))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))
I then tried to implement another LSTM layer and two 1d convolutional layers:
Average epoch time: 402s
Final loss: 1.0899
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 15, 336) 67200
_________________________________________________________________
dropout_1 (Dropout) (None, 15, 336) 0
_________________________________________________________________
conv1d_1 (Conv1D) (None, 15, 32) 32288
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 7, 32) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 7, 32) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 7, 16) 2064
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 3, 16) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 3, 16) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 128) 74240
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 21, 128) 0
_________________________________________________________________
dropout_4 (Dropout) (None, 21, 128) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 21, 512) 1312768
_________________________________________________________________
dropout_5 (Dropout) (None, 21, 512) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 21, 128) 328192
_________________________________________________________________
dropout_6 (Dropout) (None, 21, 128) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 21, 336) 43344
=================================================================
Total params: 1,860,096
Trainable params: 1,860,096
Non-trainable params: 0
_________________________________________________________________
Python:
model = Sequential()
model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len))
model.add(Dropout(0.2))
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.3))
model.add(Conv1D(filters=16, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.2))
model.add(LSTM(128))
model.add(RepeatVector(fr_max_len))
model.add(Dropout(0.2))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))
You can see that the second one not only took longer to train, had a larger loss but also was far less accurate. Why would this be? My assumption is that I implemented the convolutional layers incorrectly. What is the best way to implement a convolutional layer in a recurrent neural network (or lstm network)?
keras neural-network conv-neural-network lstm rnn
add a comment |
I am trying to create an English to French translator. I have a basic model which works fairly well:
Average step time: 232.3
Final loss: 0.4969
Model:
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 15, 336) 67200
_________________________________________________________________
lstm_1 (LSTM) (None, 256) 607232
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 21, 256) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 21, 256) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 21, 256) 525312
_________________________________________________________________
dropout_2 (Dropout) (None, 21, 256) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 21, 336) 86352
=================================================================
Total params: 1,286,096
Trainable params: 1,286,096
Non-trainable params: 0
_________________________________________________________________
Python:
model = Sequential()
model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len, mask_zero=True))
model.add(LSTM(256))
model.add(RepeatVector(fr_max_len))
model.add(Dropout(0.5))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))
I then tried to implement another LSTM layer and two 1d convolutional layers:
Average epoch time: 402s
Final loss: 1.0899
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 15, 336) 67200
_________________________________________________________________
dropout_1 (Dropout) (None, 15, 336) 0
_________________________________________________________________
conv1d_1 (Conv1D) (None, 15, 32) 32288
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 7, 32) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 7, 32) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 7, 16) 2064
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 3, 16) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 3, 16) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 128) 74240
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 21, 128) 0
_________________________________________________________________
dropout_4 (Dropout) (None, 21, 128) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 21, 512) 1312768
_________________________________________________________________
dropout_5 (Dropout) (None, 21, 512) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 21, 128) 328192
_________________________________________________________________
dropout_6 (Dropout) (None, 21, 128) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 21, 336) 43344
=================================================================
Total params: 1,860,096
Trainable params: 1,860,096
Non-trainable params: 0
_________________________________________________________________
Python:
model = Sequential()
model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len))
model.add(Dropout(0.2))
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.3))
model.add(Conv1D(filters=16, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.2))
model.add(LSTM(128))
model.add(RepeatVector(fr_max_len))
model.add(Dropout(0.2))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))
You can see that the second one not only took longer to train, had a larger loss but also was far less accurate. Why would this be? My assumption is that I implemented the convolutional layers incorrectly. What is the best way to implement a convolutional layer in a recurrent neural network (or lstm network)?
keras neural-network conv-neural-network lstm rnn
I am trying to create an English to French translator. I have a basic model which works fairly well:
Average step time: 232.3
Final loss: 0.4969
Model:
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 15, 336) 67200
_________________________________________________________________
lstm_1 (LSTM) (None, 256) 607232
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 21, 256) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 21, 256) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 21, 256) 525312
_________________________________________________________________
dropout_2 (Dropout) (None, 21, 256) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 21, 336) 86352
=================================================================
Total params: 1,286,096
Trainable params: 1,286,096
Non-trainable params: 0
_________________________________________________________________
Python:
model = Sequential()
model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len, mask_zero=True))
model.add(LSTM(256))
model.add(RepeatVector(fr_max_len))
model.add(Dropout(0.5))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))
I then tried to implement another LSTM layer and two 1d convolutional layers:
Average epoch time: 402s
Final loss: 1.0899
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 15, 336) 67200
_________________________________________________________________
dropout_1 (Dropout) (None, 15, 336) 0
_________________________________________________________________
conv1d_1 (Conv1D) (None, 15, 32) 32288
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 7, 32) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 7, 32) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 7, 16) 2064
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 3, 16) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 3, 16) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 128) 74240
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 21, 128) 0
_________________________________________________________________
dropout_4 (Dropout) (None, 21, 128) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 21, 512) 1312768
_________________________________________________________________
dropout_5 (Dropout) (None, 21, 512) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 21, 128) 328192
_________________________________________________________________
dropout_6 (Dropout) (None, 21, 128) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 21, 336) 43344
=================================================================
Total params: 1,860,096
Trainable params: 1,860,096
Non-trainable params: 0
_________________________________________________________________
Python:
model = Sequential()
model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len))
model.add(Dropout(0.2))
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.3))
model.add(Conv1D(filters=16, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.2))
model.add(LSTM(128))
model.add(RepeatVector(fr_max_len))
model.add(Dropout(0.2))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))
You can see that the second one not only took longer to train, had a larger loss but also was far less accurate. Why would this be? My assumption is that I implemented the convolutional layers incorrectly. What is the best way to implement a convolutional layer in a recurrent neural network (or lstm network)?
keras neural-network conv-neural-network lstm rnn
keras neural-network conv-neural-network lstm rnn
asked Nov 12 '18 at 18:15
zoecarverzoecarver
1,3741126
1,3741126
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53267881%2fimplementing-conv-layers-in-lstm-network%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53267881%2fimplementing-conv-layers-in-lstm-network%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown