'None' gradients in pytorch

'None' gradients in pytorch



I am trying to implement a simple MDN that predicts the parameters of a distribution over a target variable instead of a point value, and then assigns probabilities to discrete bins of the point value. Narrowing down the issue, the code from which the 'None' springs is:


import torch

# params
tte_bins = np.linspace(
start=0,
stop=399,
num=400,
dtype='float32'
).reshape(1, 1, -1)
bins = torch.tensor(tte_bins, dtype=torch.float32)
x_train = np.random.randn(1, 1024, 3)
y_labels = np.random.randint(low=0, high=399, size=(1, 1024))
y_train = np.eye(400)[y_labels]

# data
in_train = torch.tensor(x_train[0:1, :, :], dtype=torch.float)
in_train = (in_train - torch.mean(in_train)) / torch.std(in_train)
out_train = torch.tensor(y_train[0:1, :, :], dtype=torch.float)

# model
linear = torch.nn.Linear(in_features=3, out_features=2)
lin = linear(in_train)
preds = torch.exp(lin)

# intermediate values
alpha = torch.clamp(preds[0:1, :, 0:1], 0, 500)
beta = torch.clamp(preds[0:1, :, 1:2], 0, 100)

# probs
p1 = torch.exp(-torch.pow(bins / alpha, beta))
p2 = torch.exp(-torch.pow((bins + 1.0) / alpha, beta))
probs = p1 - p2

# loss
loss = torch.mean(torch.pow(out_train - probs, 2))

# gradients
loss.backward()
for p in linear.parameters():
print(p.grad, 'gradient')



in_train has shape: [1, 1024, 3], out_train has shape: [1, 1024, 400], bins has shape: [1, 1, 400]. All the broadcasting etc.. appears find, the resulting matrices (like alpha/beta/loss) are the right shape and have the right values - there's simply no gradients



edit: added loss.backward() and x_train/y_train, now I have nans


loss.backward()


x_train/y_train


nans





Can you add information about your input x_train and y_train?
– McLawrence
Aug 29 at 3:06


x_train


y_train





added example data, nans appear to come from elsewhere
– user2780519
Aug 29 at 3:21





you never use y_labels and testis not defined. your code should always be minimal and reproducible.
– McLawrence
Aug 29 at 3:23


y_labels


test





The gradients explode when you compute p1 and p2. Using preds.sum().backward() still produces valid gradients. I do not know what you are trying to compute with your model. However, when dcomputing the derivative of p1with respect to alpha for example, you get a multiplicative factor of bins**(beta)which will probably be very large.
– McLawrence
Aug 29 at 3:27


p1


p2


preds.sum().backward()


p1


alpha


bins**(beta)





noted, fixed that test/y_labels
– user2780519
Aug 29 at 3:27





1 Answer
1



You simply forgot to compute the gradients. While you calculate the loss, you never tell pytorch with respect to which function it should calculate the gradients.


pytorch



Simply adding


loss.backward()



to your code should fix the problem.



Additionally, in your code some intermediate results like alpha are sometimes zero but are in a denominator when computing the gradient. This will lead to the nan results you observed.


alpha


nan





Added, except now I get all nans as the gradient output
– user2780519
Aug 29 at 3:04






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)