Custom loss function for sequential output in Keras

Custom loss function for sequential output in Keras



I need to write a custom loss for my keras model. As I need to write the function using Keras functions for auto-backpropagation, I am not sure how I will implement this, as this might require some looping operations -


Target[1*300] - [...0 0 0 1 0 0 0 0 0 1 0 0 0...]

Output[1*300] - [...0 0 1 0 0 0 0 0 0 0 1 0 0...]



What I need is that while calculating loss I don't need a exact match.
Even if my output has a discrepancy of +/- three places. I want it to mark it to consider this as a correct prediction.



For example, both of these should be considered as the right predictions -


Output[1*300] - [...0 0 1 0 0 0 0 0 0 0 1 0 0...]

Output[1*300] - [...0 1 0 0 0 0 0 0 0 0 0 1 0...]



The code which I have written till now -


import tensorflow as tf
tar = tf.placeholder(tf.float32, shape=(1, 10))
tar_unpacked = tf.unstack(tar)

pred = tf.placeholder(tf.float32, shape=(1, 10))
pred_unpacked = tf.unstack(pred)


for t in tar_unpacked:
result_tensor = tf.equal(t,1)

tar_ind = tf.where(result_tensor)

with tf.Session() as sess:
print(sess.run([tar_ind], feed_dict=tar:np.asarray([[0, 0,1, 0,0,0,1,0,0,0]]),pred:np.asarray([[0, 0,1, 0,0,0,1,0,0,0]])))



Now what I want to do next is generate valid indexes by adding each from


[-3,-2,-1,0,1,2,3]



to elements in tar_ind and then compare the indexes with pred_unstacked.


tar_ind


pred_unstacked



My naive loss would be 1 - (NUM_MATCHED/TOTAL)


1 - (NUM_MATCHED/TOTAL)



But the problem is that tar_ind is a variably sized tensor, and I cannot loop over it for the next operation.


tar_ind



Update-1.



As suggested by @user36624, I tried the alternate approach of having tf.py_func which gives the updated y_pred and then I used the updated ones for binary cross-entropy.


tf.py_func


y_pred


binary cross-entropy.



As I have implemented the function using py_func, It is giving me error as ValueError: An operation hasNonefor the gradient. Please make sure that all of your ops have a gradient defined (i.e., are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.


py_func,


ValueError: An operation has


for the gradient. Please make sure that all of your ops have a gradient defined (i.e., are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.



Also as he suggested that I need to manually stop gradients which I don't know how to do?


def specificity_loss_wrapper():
def specificity_loss(y_true, y_pred):

y_pred = tf.py_func(best_match,[y_true,y_pred],(tf.float32))

y_pred = tf.stop_gradient(y_pred)
y_pred.set_shape(y_true.get_shape())


return K.binary_crossentropy(y_true, y_pred)

return specificity_loss

spec_loss = specificity_loss_wrapper()



and


...
model.compile(loss=spec_loss, optimizer='adam', metrics=['accuracy'])
...



In my understanding, binary_crossentropy should be differentiable.


binary_crossentropy



Thanks





Would you elaborate more on "discrepancy of +/- three places"? How do you define the discrepancy? And specify the labels shape as well, i.e. each label is an array of shape (300,) so the labels array would have a shape of (n_samples, 300)? Further, do the labels consist of only zeros and ones?
– today
Aug 30 at 13:34



(300,)


(n_samples, 300)





I have time-series data, and each slot is for some N minutes in some order, so even if my predictions are +/- 30 minutes, I would be fine with the results. The shape of a label is as (1,300), i.e. (num_samples,300) for all inputs. In my example code, I have taken it as 10, but that is just for testing purposes.
– Nikhil Verma
Aug 30 at 13:39



N


+/- 30 minutes


(1,300)


(num_samples,300)


10





Please be more specific and provide examples if you can. What is the "+/- 30 minutes"? Is it equivalent to one step in the 300 steps? I can't understand the exact definition of discrepancy yet: for example if the k-th element of target is 1 then any prediction with a 1 in either k-3, k-2, k-1, ..., k+3 places is acceptable?? And what if there is no 1 in that places? What is the value of loss? When you want to implement a loss function you must first be able to define it mathematically on a paper at least.
– today
Aug 30 at 13:52





Using py_func to compute the loss will not give you the gradient for sure. This is nothing related to the error message that you saw ops without gradient: K.argmax, K.round, K.eval.
– pitfall
Sep 1 at 23:00


py_func


ops without gradient: K.argmax, K.round, K.eval.





What I suggested you is to modify y_true instead of y_pred. These two things seem to be equivalent, but they are not. Given y_pred_mod=f(y_pred), and loss=g(y_pred_mod, y_true), then both functions f and g have to be differentiable to compute loss gradients w.r.t. y_pred. In contrast, given y_true_mod=f(y_true) and loss=g(y_pred, y_true_mod), only function g needs to be differentiable to compute loss gradients w.r.t. y_pred.
– pitfall
Sep 1 at 23:05



y_true


y_pred


y_pred_mod=f(y_pred)


loss=g(y_pred_mod, y_true)


f


g


y_pred


y_true_mod=f(y_true)


loss=g(y_pred, y_true_mod)


g


y_pred




1 Answer
1



What you are suggesting is to compute


1. offsets = compute_index_offsets( y_true, y_pred )
2. loss = 1 - num(offsets <= 3)/total



I suggest to solve it in an alternative way.


1. y_true_mod = DP_best_match( y_true, y_pred )
2. loss = 1 - num(y_true_mod==y_pred)/total



The advantage of modifying y_true is that it is equivalent to providing a new target value, and thus it is not a part of the model graph optimization or the loss computation.


y_true



What DP_best_match( y_true, y_pred ) should do is to modify y_true according to y_pred,


DP_best_match( y_true, y_pred )


y_true


y_pred



e.g. given


y_true[1*300] - [...0 0 0 1 0 0 0 0 0 1 0 0 0...]
y_pred[1*300] - [...0 0 1 0 0 0 0 0 0 0 1 0 0...]



then DP_best_match( y_true, y_pred ) should give the new target


DP_best_match( y_true, y_pred )


y_true_mod[1*300] - [...0 0 1 0 0 0 0 0 0 0 1 0 0...]



Note, DP_best_match( y_true, y_pred ) is aiming to modify y_true to best match y_pred, so it is deterministic and nothing to optimize. Thus, no need to have backpropagation. This means you need to manually stop gradients if you implement DP_best_match( y_true, y_pred ) in tf. Otherwise, you can implement it in numpy and wrap the function via tf.py_func, which might be easier to implement.


DP_best_match( y_true, y_pred )


y_true


y_pred


DP_best_match( y_true, y_pred )


tf


tf.py_func



Final remark, you should make sure the proposed loss function makes sense. For me, it makes more sense to use binary_crossentropy or mse after finding the best y_true_mod.


binary_crossentropy


mse


y_true_mod





Hey, I intended to modify the loss after I was done with the matching part. I know subtracting one, does not make sense. Let me try the tf.py_func.
– Nikhil Verma
Aug 31 at 7:44


tf.py_func





Hey, can you check the update?
– Nikhil Verma
Aug 31 at 10:13



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Some of your past answers have not been well-received, and you're in danger of being blocked from answering.



Please pay close attention to the following guidance:



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)