How to use own word embedding with pre-trained embedding like word2vec in Keras

I have a co-occurrence matrix stored in a CSV file which contains the relationship between words and emojis like this:

word emo1 emo2 emo3 w1 0.5 0.3 0.2 w2 0.8 0 0 w3 0.2 0.5 0.2

This co-occurrence matrix is huge which has 1584755 rows and 621 columns. I have a Sequential() LSTM model in Keras where I use pre-trained (word2vec) word-embedding. Now I would like to use the co-occurrence matrix as another embedding layer. How can I do that? My current code is something like this:

1584755

621

Sequential() LSTM

Keras

model = Sequential() model.add(Embedding(max_features, embeddings_dim, input_length=max_sent_len, weights=[embedding_weights])) model.add(Dropout(0.25)) model.add(Convolution1D(nb_filter=nb_filter, filter_length=filter_length, border_mode='valid', activation='relu', subsample_length=1)) model.add(MaxPooling1D(pool_length=pool_length)) model.add(LSTM(embeddings_dim)) model.add(Dense(reg_dimensions)) model.add(Activation('sigmoid')) model.compile(loss='mean_absolute_error', optimizer='adam') model.fit( train_sequences , train_labels , nb_epoch=30, batch_size=16)

Also, if the co-occurrence matrix is sparse then what would be the best way to use it in the embedding layer?

1 Answer
1

You can use the Embedding layer and set your own weight matrix like this:

Embedding

Embedding(n_in, n_out, trainable=False, weights=[weights])

If I understood you correctly weights will be your co-occurrence matrix, n_in the number of rows and n_out the number of columns.

weights

n_in

n_out

You can find some more information and examples in this blog post.

Ok thanks but the co-occurrence matrix is stored in a csv file. How can I link the file to the embedding? The blog post uses test and train data in embedding but my test and train data are different than the matrix.
– Abu Shoeb
Sep 3 at 9:45

@AbuShoeb Then you have to load the csv in your code, either using pandas or numpy. There is no way to 'link' the file to the weights directly.
– FlashTek
Sep 3 at 9:53

csv

pandas

numpy

Thanks for contributing an answer to Stack Overflow!

But avoid …

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

But avoid …

To learn more, see our tips on writing great answers.

Required, but never shown

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt