PyTorch n-to-1 LSTM does not learn anything

I am new to PyTorch and LSTMs and I am trying to train a classification model that takes a sentences where each word is encoded via word2vec (pre-trained vectors) and outputs one class after it saw the full sentence. I have four different classes. The sentences have variable length.

My code is running without errors, but it always predicts the same class, no matter how many epochs I train my model. So I think the gradients are not properly backpropagated. Here is my code:

class LSTM(nn.Module): def __init__(self, embedding_dim, hidden_dim, tagset_size): super(LSTM, self).__init__() self.hidden_dim = hidden_dim self.lstm = nn.LSTM(embedding_dim, hidden_dim) self.hidden2tag = nn.Linear(hidden_dim, tagset_size) self.hidden = self.init_hidden() def init_hidden(self): # The axes semantics are (num_layers, minibatch_size, hidden_dim) return (torch.zeros(1, 1, self.hidden_dim).to(device), torch.zeros(1, 1, self.hidden_dim).to(device)) def forward(self, sentence): lstm_out, self.hidden = self.lstm(sentence.view(len(sentence), 1, -1), self.hidden) tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1)) tag_scores = F.log_softmax(tag_space, dim=1) return tag_scores EMBEDDING_DIM = len(training_data[0][0][0]) HIDDEN_DIM = 256 model = LSTM(EMBEDDING_DIM, HIDDEN_DIM, 4) model.to(device) loss_function = nn.NLLLoss() optimizer = optim.SGD(model.parameters(), lr=0.1) for epoch in tqdm(range(n_epochs)): for sentence, tag in tqdm(training_data): model.zero_grad() model.hidden = model.init_hidden() sentence_in = torch.tensor(sentence, dtype=torch.float).to(device) targets = torch.tensor([label_to_idx[tag]], dtype=torch.long).to(device) tag_scores = model(sentence_in) res = torch.tensor(tag_scores[-1], dtype=torch.float).view(1,-1).to(device) # I THINK THIS IS WRONG??? print(res) # tensor([[-10.6328, -10.6783, -10.6667, -0.0001]], device='cuda:0', grad_fn=<CopyBackwards>) print(targets) # tensor([3], device='cuda:0') loss = loss_function(res, targets) loss.backward() optimizer.step()

The code is largely inspired by https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html
The difference is that they have a sequence-to-sequence model and I have a sequence-to-ONE model.

I am not sure what the problem is, but I guess that the scores returned by the model contain a score for each tag and my ground truth only contains the index of the correct class? How would this be handled correctly?

Or is the loss function maybe not the correct one for my use case? Also I am not sure if this is done correctly:

res = torch.tensor(tag_scores[-1], dtype=torch.float).view(1,-1).to(device)

By taking tag_scores[-1] I want to get the scores after the last word has been given to the network because tag_scores contains the scores after each step, if I understand correctly.

tag_scores[-1]

And this is how I evaluate:

with torch.no_grad(): preds = gts = for sentence, tag in tqdm(test_data): inputs = torch.tensor(sentence, dtype=torch.float).to(device) tag_scores = model(inputs) # find index with max value (this is the class to be predicted) pred = [j for j,v in enumerate(tag_scores[-1]) if v == max(tag_scores[-1])][0] print(pred, idx_to_label[pred], tag) preds.append(pred) gts.append(label_to_idx[tag]) print(f1_score(gts, preds, average='micro')) print(classification_report(gts, preds))

EDIT:

When shuffling the data before training it seems to work. But why?

EDIT 2:

I think the reason why shuffling is needed is that my training data contains samples for each class in groups. So when training them each after the other, the model will only see the same class in the last N iterations and therefore it will only predict this class. Another reason might also be that I am currently using mini-batches of only one sample because I haven't figured out yet how to use other sizes.

Required, but never shown

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt

PyTorch n-to-1 LSTM does not learn anything

PyTorch n-to-1 LSTM does not learn anything

Popular posts from this blog

Edmonton

Crossroads (UK TV series)