How to use own word embedding with pre-trained embedding like word2vec in Keras
How to use own word embedding with pre-trained embedding like word2vec in Keras
I have a co-occurrence matrix stored in a CSV file which contains the relationship between words and emojis like this:
word emo1 emo2 emo3
w1 0.5 0.3 0.2
w2 0.8 0 0
w3 0.2 0.5 0.2
This co-occurrence matrix is huge which has 1584755
rows and 621
columns. I have a Sequential() LSTM
model in Keras
where I use pre-trained (word2vec) word-embedding. Now I would like to use the co-occurrence matrix as another embedding layer. How can I do that? My current code is something like this:
1584755
621
Sequential() LSTM
Keras
model = Sequential()
model.add(Embedding(max_features, embeddings_dim, input_length=max_sent_len, weights=[embedding_weights]))
model.add(Dropout(0.25))
model.add(Convolution1D(nb_filter=nb_filter, filter_length=filter_length, border_mode='valid', activation='relu', subsample_length=1))
model.add(MaxPooling1D(pool_length=pool_length))
model.add(LSTM(embeddings_dim))
model.add(Dense(reg_dimensions))
model.add(Activation('sigmoid'))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit( train_sequences , train_labels , nb_epoch=30, batch_size=16)
Also, if the co-occurrence matrix is sparse then what would be the best way to use it in the embedding layer?
1 Answer
1
You can use the Embedding
layer and set your own weight matrix like this:
Embedding
Embedding(n_in, n_out, trainable=False, weights=[weights])
If I understood you correctly weights
will be your co-occurrence matrix, n_in
the number of rows and n_out
the number of columns.
weights
n_in
n_out
You can find some more information and examples in this blog post.
@AbuShoeb Then you have to load the
csv
in your code, either using pandas
or numpy
. There is no way to 'link' the file to the weights directly.– FlashTek
Sep 3 at 9:53
csv
pandas
numpy
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Ok thanks but the co-occurrence matrix is stored in a csv file. How can I link the file to the embedding? The blog post uses test and train data in embedding but my test and train data are different than the matrix.
– Abu Shoeb
Sep 3 at 9:45