Custom loss function for sequential output in Keras
Custom loss function for sequential output in Keras
I need to write a custom loss for my keras model. As I need to write the function using Keras functions for auto-backpropagation, I am not sure how I will implement this, as this might require some looping operations -
Target[1*300] - [...0 0 0 1 0 0 0 0 0 1 0 0 0...]
Output[1*300] - [...0 0 1 0 0 0 0 0 0 0 1 0 0...]
What I need is that while calculating loss I don't need a exact match.
Even if my output has a discrepancy of +/- three places. I want it to mark it to consider this as a correct prediction.
For example, both of these should be considered as the right predictions -
Output[1*300] - [...0 0 1 0 0 0 0 0 0 0 1 0 0...]
Output[1*300] - [...0 1 0 0 0 0 0 0 0 0 0 1 0...]
The code which I have written till now -
import tensorflow as tf
tar = tf.placeholder(tf.float32, shape=(1, 10))
tar_unpacked = tf.unstack(tar)
pred = tf.placeholder(tf.float32, shape=(1, 10))
pred_unpacked = tf.unstack(pred)
for t in tar_unpacked:
result_tensor = tf.equal(t,1)
tar_ind = tf.where(result_tensor)
with tf.Session() as sess:
print(sess.run([tar_ind], feed_dict=tar:np.asarray([[0, 0,1, 0,0,0,1,0,0,0]]),pred:np.asarray([[0, 0,1, 0,0,0,1,0,0,0]])))
Now what I want to do next is generate valid indexes by adding each from
[-3,-2,-1,0,1,2,3]
to elements in tar_ind
and then compare the indexes with pred_unstacked
.
tar_ind
pred_unstacked
My naive loss would be 1 - (NUM_MATCHED/TOTAL)
1 - (NUM_MATCHED/TOTAL)
But the problem is that tar_ind
is a variably sized tensor, and I cannot loop over it for the next operation.
tar_ind
Update-1.
As suggested by @user36624, I tried the alternate approach of having tf.py_func
which gives the updated y_pred
and then I used the updated ones for binary cross-entropy.
tf.py_func
y_pred
binary cross-entropy.
As I have implemented the function using py_func,
It is giving me error as ValueError: An operation has
Nonefor the gradient. Please make sure that all of your ops have a gradient defined (i.e., are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
py_func,
ValueError: An operation has
for the gradient. Please make sure that all of your ops have a gradient defined (i.e., are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
Also as he suggested that I need to manually stop gradients which I don't know how to do?
def specificity_loss_wrapper():
def specificity_loss(y_true, y_pred):
y_pred = tf.py_func(best_match,[y_true,y_pred],(tf.float32))
y_pred = tf.stop_gradient(y_pred)
y_pred.set_shape(y_true.get_shape())
return K.binary_crossentropy(y_true, y_pred)
return specificity_loss
spec_loss = specificity_loss_wrapper()
and
...
model.compile(loss=spec_loss, optimizer='adam', metrics=['accuracy'])
...
In my understanding, binary_crossentropy
should be differentiable.
binary_crossentropy
Thanks
(300,)
(n_samples, 300)
I have time-series data, and each slot is for some
N
minutes in some order, so even if my predictions are +/- 30 minutes
, I would be fine with the results. The shape of a label is as (1,300)
, i.e. (num_samples,300)
for all inputs. In my example code, I have taken it as 10
, but that is just for testing purposes.– Nikhil Verma
Aug 30 at 13:39
N
+/- 30 minutes
(1,300)
(num_samples,300)
10
Please be more specific and provide examples if you can. What is the "+/- 30 minutes"? Is it equivalent to one step in the 300 steps? I can't understand the exact definition of discrepancy yet: for example if the k-th element of target is 1 then any prediction with a 1 in either k-3, k-2, k-1, ..., k+3 places is acceptable?? And what if there is no 1 in that places? What is the value of loss? When you want to implement a loss function you must first be able to define it mathematically on a paper at least.
– today
Aug 30 at 13:52
Using
py_func
to compute the loss will not give you the gradient for sure. This is nothing related to the error message that you saw ops without gradient: K.argmax, K.round, K.eval.
– pitfall
Sep 1 at 23:00
py_func
ops without gradient: K.argmax, K.round, K.eval.
What I suggested you is to modify
y_true
instead of y_pred
. These two things seem to be equivalent, but they are not. Given y_pred_mod=f(y_pred)
, and loss=g(y_pred_mod, y_true)
, then both functions f
and g
have to be differentiable to compute loss gradients w.r.t. y_pred
. In contrast, given y_true_mod=f(y_true)
and loss=g(y_pred, y_true_mod)
, only function g
needs to be differentiable to compute loss gradients w.r.t. y_pred
.– pitfall
Sep 1 at 23:05
y_true
y_pred
y_pred_mod=f(y_pred)
loss=g(y_pred_mod, y_true)
f
g
y_pred
y_true_mod=f(y_true)
loss=g(y_pred, y_true_mod)
g
y_pred
1 Answer
1
What you are suggesting is to compute
1. offsets = compute_index_offsets( y_true, y_pred )
2. loss = 1 - num(offsets <= 3)/total
I suggest to solve it in an alternative way.
1. y_true_mod = DP_best_match( y_true, y_pred )
2. loss = 1 - num(y_true_mod==y_pred)/total
The advantage of modifying y_true
is that it is equivalent to providing a new target value, and thus it is not a part of the model graph optimization or the loss computation.
y_true
What DP_best_match( y_true, y_pred )
should do is to modify y_true
according to y_pred
,
DP_best_match( y_true, y_pred )
y_true
y_pred
e.g. given
y_true[1*300] - [...0 0 0 1 0 0 0 0 0 1 0 0 0...]
y_pred[1*300] - [...0 0 1 0 0 0 0 0 0 0 1 0 0...]
then DP_best_match( y_true, y_pred )
should give the new target
DP_best_match( y_true, y_pred )
y_true_mod[1*300] - [...0 0 1 0 0 0 0 0 0 0 1 0 0...]
Note, DP_best_match( y_true, y_pred )
is aiming to modify y_true
to best match y_pred
, so it is deterministic and nothing to optimize. Thus, no need to have backpropagation. This means you need to manually stop gradients if you implement DP_best_match( y_true, y_pred )
in tf
. Otherwise, you can implement it in numpy and wrap the function via tf.py_func
, which might be easier to implement.
DP_best_match( y_true, y_pred )
y_true
y_pred
DP_best_match( y_true, y_pred )
tf
tf.py_func
Final remark, you should make sure the proposed loss function makes sense. For me, it makes more sense to use binary_crossentropy
or mse
after finding the best y_true_mod
.
binary_crossentropy
mse
y_true_mod
Hey, I intended to modify the loss after I was done with the matching part. I know subtracting one, does not make sense. Let me try the
tf.py_func
.– Nikhil Verma
Aug 31 at 7:44
tf.py_func
Hey, can you check the update?
– Nikhil Verma
Aug 31 at 10:13
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Would you elaborate more on "discrepancy of +/- three places"? How do you define the discrepancy? And specify the labels shape as well, i.e. each label is an array of shape
(300,)
so the labels array would have a shape of(n_samples, 300)
? Further, do the labels consist of only zeros and ones?– today
Aug 30 at 13:34