Custom loss function for sequential output in Keras

I need to write a custom loss for my keras model. As I need to write the function using Keras functions for auto-backpropagation, I am not sure how I will implement this, as this might require some looping operations -

Target[1*300] - [...0 0 0 1 0 0 0 0 0 1 0 0 0...] Output[1*300] - [...0 0 1 0 0 0 0 0 0 0 1 0 0...]

What I need is that while calculating loss I don't need a exact match.
Even if my output has a discrepancy of +/- three places. I want it to mark it to consider this as a correct prediction.

For example, both of these should be considered as the right predictions -

Output[1*300] - [...0 0 1 0 0 0 0 0 0 0 1 0 0...] Output[1*300] - [...0 1 0 0 0 0 0 0 0 0 0 1 0...]

The code which I have written till now -

import tensorflow as tf tar = tf.placeholder(tf.float32, shape=(1, 10)) tar_unpacked = tf.unstack(tar) pred = tf.placeholder(tf.float32, shape=(1, 10)) pred_unpacked = tf.unstack(pred) for t in tar_unpacked: result_tensor = tf.equal(t,1) tar_ind = tf.where(result_tensor) with tf.Session() as sess: print(sess.run([tar_ind], feed_dict=tar:np.asarray([[0, 0,1, 0,0,0,1,0,0,0]]),pred:np.asarray([[0, 0,1, 0,0,0,1,0,0,0]])))

Now what I want to do next is generate valid indexes by adding each from

[-3,-2,-1,0,1,2,3]

to elements in tar_ind and then compare the indexes with pred_unstacked.

tar_ind

pred_unstacked

My naive loss would be 1 - (NUM_MATCHED/TOTAL)

1 - (NUM_MATCHED/TOTAL)

But the problem is that tar_ind is a variably sized tensor, and I cannot loop over it for the next operation.

tar_ind

Update-1.

As suggested by @user36624, I tried the alternate approach of having tf.py_func which gives the updated y_pred and then I used the updated ones for binary cross-entropy.

tf.py_func

y_pred

binary cross-entropy.

As I have implemented the function using py_func, It is giving me error as ValueError: An operation hasNonefor the gradient. Please make sure that all of your ops have a gradient defined (i.e., are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

py_func,

ValueError: An operation has

for the gradient. Please make sure that all of your ops have a gradient defined (i.e., are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

Also as he suggested that I need to manually stop gradients which I don't know how to do?

def specificity_loss_wrapper(): def specificity_loss(y_true, y_pred): y_pred = tf.py_func(best_match,[y_true,y_pred],(tf.float32)) y_pred = tf.stop_gradient(y_pred) y_pred.set_shape(y_true.get_shape()) return K.binary_crossentropy(y_true, y_pred) return specificity_loss spec_loss = specificity_loss_wrapper()

and

... model.compile(loss=spec_loss, optimizer='adam', metrics=['accuracy']) ...

In my understanding, binary_crossentropy should be differentiable.

binary_crossentropy

Thanks

Would you elaborate more on "discrepancy of +/- three places"? How do you define the discrepancy? And specify the labels shape as well, i.e. each label is an array of shape (300,) so the labels array would have a shape of (n_samples, 300)? Further, do the labels consist of only zeros and ones?
– today
Aug 30 at 13:34

(300,)

(n_samples, 300)

I have time-series data, and each slot is for some N minutes in some order, so even if my predictions are +/- 30 minutes, I would be fine with the results. The shape of a label is as (1,300), i.e. (num_samples,300) for all inputs. In my example code, I have taken it as 10, but that is just for testing purposes.
– Nikhil Verma
Aug 30 at 13:39

N

+/- 30 minutes

(1,300)

(num_samples,300)

10

Please be more specific and provide examples if you can. What is the "+/- 30 minutes"? Is it equivalent to one step in the 300 steps? I can't understand the exact definition of discrepancy yet: for example if the k-th element of target is 1 then any prediction with a 1 in either k-3, k-2, k-1, ..., k+3 places is acceptable?? And what if there is no 1 in that places? What is the value of loss? When you want to implement a loss function you must first be able to define it mathematically on a paper at least.
– today
Aug 30 at 13:52

Using py_func to compute the loss will not give you the gradient for sure. This is nothing related to the error message that you saw ops without gradient: K.argmax, K.round, K.eval.
– pitfall
Sep 1 at 23:00

py_func

ops without gradient: K.argmax, K.round, K.eval.

What I suggested you is to modify y_true instead of y_pred. These two things seem to be equivalent, but they are not. Given y_pred_mod=f(y_pred), and loss=g(y_pred_mod, y_true), then both functions f and g have to be differentiable to compute loss gradients w.r.t. y_pred. In contrast, given y_true_mod=f(y_true) and loss=g(y_pred, y_true_mod), only function g needs to be differentiable to compute loss gradients w.r.t. y_pred.
– pitfall
Sep 1 at 23:05

y_true

y_pred

y_pred_mod=f(y_pred)

loss=g(y_pred_mod, y_true)

f

g

y_pred

y_true_mod=f(y_true)

loss=g(y_pred, y_true_mod)

g

y_pred

1 Answer
1

What you are suggesting is to compute

1. offsets = compute_index_offsets( y_true, y_pred ) 2. loss = 1 - num(offsets <= 3)/total

I suggest to solve it in an alternative way.

1. y_true_mod = DP_best_match( y_true, y_pred ) 2. loss = 1 - num(y_true_mod==y_pred)/total

The advantage of modifying y_true is that it is equivalent to providing a new target value, and thus it is not a part of the model graph optimization or the loss computation.

y_true

What DP_best_match( y_true, y_pred ) should do is to modify y_true according to y_pred,

DP_best_match( y_true, y_pred )

y_true

y_pred

e.g. given

y_true[1*300] - [...0 0 0 1 0 0 0 0 0 1 0 0 0...] y_pred[1*300] - [...0 0 1 0 0 0 0 0 0 0 1 0 0...]

then DP_best_match( y_true, y_pred ) should give the new target

DP_best_match( y_true, y_pred )

y_true_mod[1*300] - [...0 0 1 0 0 0 0 0 0 0 1 0 0...]

Note, DP_best_match( y_true, y_pred ) is aiming to modify y_true to best match y_pred, so it is deterministic and nothing to optimize. Thus, no need to have backpropagation. This means you need to manually stop gradients if you implement DP_best_match( y_true, y_pred ) in tf. Otherwise, you can implement it in numpy and wrap the function via tf.py_func, which might be easier to implement.

DP_best_match( y_true, y_pred )

y_true

y_pred

DP_best_match( y_true, y_pred )

tf

tf.py_func

Final remark, you should make sure the proposed loss function makes sense. For me, it makes more sense to use binary_crossentropy or mse after finding the best y_true_mod.

binary_crossentropy

mse

y_true_mod

Hey, I intended to modify the loss after I was done with the matching part. I know subtracting one, does not make sense. Let me try the tf.py_func.
– Nikhil Verma
Aug 31 at 7:44

tf.py_func

Hey, can you check the update?
– Nikhil Verma
Aug 31 at 10:13

Thanks for contributing an answer to Stack Overflow!

But avoid …

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

But avoid …

To learn more, see our tips on writing great answers.

Required, but never shown

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt