Calculate function on array slices in a vectorized way

Let's say, I have 1D numpy arrays X (features) and Y (binary classes) and a function f that takes two slices of X and Y and calculates a number.

I also have an array of indices S, by which I need to split X and Y. It is guaranteed, that each slice will be not empty.

So my code looks like this:

def f(x_left, y_left, x_right, y_right):
 n = x_left.shape[0] + x_right.shape[0]

 lcond = y_left == 1
 rcond = y_right == 1

 hleft = 1 - ((y_left[lcond].shape[0])**2
 + (y_left[~lcond].shape[0])**2) / n**2

 hright = 1 - ((y_right[rcond].shape[0])**2
 + (y_right[~rcond].shape[0])**2) / n**2

 return -(x_left.shape[0] / n) * hleft - (x_right.shape[0] / n) * hright

results = np.empty(len(S))
for i in range(len(S)):
 results[i] = f(X[:S[i]], Y[:S[i]], X[S[i]:], Y[S[i]:])

The array results must contain results of f on each split from S.

len(results) == len(S)

My question is how to perform my calculations in vectorised way, using numpy, to make this code faster?

edited Nov 12 '18 at 18:07

Andras Deak

21k64174

asked Nov 12 '18 at 15:05

Kurt Kurtov

304

3

There's no way to magically vectorize using arbitrary functions. You have to implement your function itself so that you can use vectorized operations in it (typically, arithmetic operations on multidimensional arrays). What does your function do?

– Andras Deak
Nov 12 '18 at 15:07

1

I edited the code and question text

– Kurt Kurtov
Nov 12 '18 at 15:23

add a comment |

Let's say, I have 1D numpy arrays X (features) and Y (binary classes) and a function f that takes two slices of X and Y and calculates a number.

I also have an array of indices S, by which I need to split X and Y. It is guaranteed, that each slice will be not empty.

So my code looks like this:

def f(x_left, y_left, x_right, y_right):
 n = x_left.shape[0] + x_right.shape[0]

 lcond = y_left == 1
 rcond = y_right == 1

 hleft = 1 - ((y_left[lcond].shape[0])**2
 + (y_left[~lcond].shape[0])**2) / n**2

 hright = 1 - ((y_right[rcond].shape[0])**2
 + (y_right[~rcond].shape[0])**2) / n**2

 return -(x_left.shape[0] / n) * hleft - (x_right.shape[0] / n) * hright

results = np.empty(len(S))
for i in range(len(S)):
 results[i] = f(X[:S[i]], Y[:S[i]], X[S[i]:], Y[S[i]:])

The array results must contain results of f on each split from S.

len(results) == len(S)

My question is how to perform my calculations in vectorised way, using numpy, to make this code faster?

edited Nov 12 '18 at 18:07

Andras Deak

21k64174

asked Nov 12 '18 at 15:05

Kurt Kurtov

304

3

There's no way to magically vectorize using arbitrary functions. You have to implement your function itself so that you can use vectorized operations in it (typically, arithmetic operations on multidimensional arrays). What does your function do?

– Andras Deak
Nov 12 '18 at 15:07

1

I edited the code and question text

– Kurt Kurtov
Nov 12 '18 at 15:23

add a comment |

Let's say, I have 1D numpy arrays X (features) and Y (binary classes) and a function f that takes two slices of X and Y and calculates a number.

I also have an array of indices S, by which I need to split X and Y. It is guaranteed, that each slice will be not empty.

So my code looks like this:

def f(x_left, y_left, x_right, y_right):
 n = x_left.shape[0] + x_right.shape[0]

 lcond = y_left == 1
 rcond = y_right == 1

 hleft = 1 - ((y_left[lcond].shape[0])**2
 + (y_left[~lcond].shape[0])**2) / n**2

 hright = 1 - ((y_right[rcond].shape[0])**2
 + (y_right[~rcond].shape[0])**2) / n**2

 return -(x_left.shape[0] / n) * hleft - (x_right.shape[0] / n) * hright

results = np.empty(len(S))
for i in range(len(S)):
 results[i] = f(X[:S[i]], Y[:S[i]], X[S[i]:], Y[S[i]:])

The array results must contain results of f on each split from S.

len(results) == len(S)

My question is how to perform my calculations in vectorised way, using numpy, to make this code faster?

edited Nov 12 '18 at 18:07

Andras Deak

21k64174

asked Nov 12 '18 at 15:05

Kurt Kurtov

304

Let's say, I have 1D numpy arrays X (features) and Y (binary classes) and a function f that takes two slices of X and Y and calculates a number.

I also have an array of indices S, by which I need to split X and Y. It is guaranteed, that each slice will be not empty.

So my code looks like this:

def f(x_left, y_left, x_right, y_right):
 n = x_left.shape[0] + x_right.shape[0]

 lcond = y_left == 1
 rcond = y_right == 1

 hleft = 1 - ((y_left[lcond].shape[0])**2
 + (y_left[~lcond].shape[0])**2) / n**2

 hright = 1 - ((y_right[rcond].shape[0])**2
 + (y_right[~rcond].shape[0])**2) / n**2

 return -(x_left.shape[0] / n) * hleft - (x_right.shape[0] / n) * hright

results = np.empty(len(S))
for i in range(len(S)):
 results[i] = f(X[:S[i]], Y[:S[i]], X[S[i]:], Y[S[i]:])

The array results must contain results of f on each split from S.

len(results) == len(S)

My question is how to perform my calculations in vectorised way, using numpy, to make this code faster?

python arrays python-3.x numpy vectorization

edited Nov 12 '18 at 18:07

Andras Deak

21k64174

asked Nov 12 '18 at 15:05

Kurt Kurtov

304

edited Nov 12 '18 at 18:07

Andras Deak

21k64174

asked Nov 12 '18 at 15:05

Kurt Kurtov

304

edited Nov 12 '18 at 18:07

Andras Deak

21k64174

edited Nov 12 '18 at 18:07

Andras Deak

21k64174

edited Nov 12 '18 at 18:07

Andras Deak

21k64174

asked Nov 12 '18 at 15:05

Kurt Kurtov

304

asked Nov 12 '18 at 15:05

Kurt Kurtov

304

asked Nov 12 '18 at 15:05

Kurt Kurtov

304

3

There's no way to magically vectorize using arbitrary functions. You have to implement your function itself so that you can use vectorized operations in it (typically, arithmetic operations on multidimensional arrays). What does your function do?

– Andras Deak
Nov 12 '18 at 15:07

1

I edited the code and question text

– Kurt Kurtov
Nov 12 '18 at 15:23

add a comment |

3

There's no way to magically vectorize using arbitrary functions. You have to implement your function itself so that you can use vectorized operations in it (typically, arithmetic operations on multidimensional arrays). What does your function do?

– Andras Deak
Nov 12 '18 at 15:07

1

I edited the code and question text

– Kurt Kurtov
Nov 12 '18 at 15:23

There's no way to magically vectorize using arbitrary functions. You have to implement your function itself so that you can use vectorized operations in it (typically, arithmetic operations on multidimensional arrays). What does your function do?

– Andras Deak
Nov 12 '18 at 15:07

I edited the code and question text

– Kurt Kurtov
Nov 12 '18 at 15:23

add a comment |

1 Answer
1

active

oldest

votes

First, let's make your function a bit more efficient. You are doing some unnecessary indexing operations: instead of y_left[lcond].shape[0] you just need lcond.sum(), or len(lcond.nonzero()[0]) which seems to be faster.

Here's an improved loopy version of your code (complete with dummy input):

import numpy as np 

n = 1000 
X = np.random.randint(0,n,n) 
Y = np.random.randint(0,n,n) 
S = np.random.choice(n//2, n)

def f2(x, y, s): 
 """Same loopy solution as original, only faster"""
 n = x.size 
 isone = y == 1 
 lval = len(isone[:s].nonzero()[0]) 
 rval = len(isone[s:].nonzero()[0]) 

 hleft = 1 - (lval**2 + (s - lval)**2) / n**2 
 hright = 1 - (rval**2 + (n - s - rval)**2) / n**2

 return - s / n * hleft - (n - s) / n * hright

def time_newloop(): 
 """Callable front-end for timing comparisons""" 
 results = np.empty(len(S)) 
 for i in range(len(S)): 
 results[i] = f2(X, Y, S[i]) 
 return results

The changes are fairly straightforward.

Now, it turns out that we can indeed vectorize your loops. For this we have to compare using each element of S at the same time. The way we can do this is creating a 2d mask of shape (nS, n) (where S.size == nS) which cuts off the values up to the corresponding element of S. Here's how:

def f3(X, Y, S): 
 """Vectorized solution working on all the data at the same time"""
 n = X.size 
 leftmask = np.arange(n) < S[:,None] # boolean, shape (nS, n) 
 rightmask = ~leftmask # boolean, shape (nS, n) 

 isone = Y == 1 # shape (n,) 
 lval = (isone & leftmask).sum(axis=1) # shape (nS,) 
 rval = (isone & rightmask).sum(axis=1) # shape (nS,) 

 hleft = 1 - (lval**2 + (S - lval)**2) / n**2 
 hright = 1 - (rval**2 + (n - S - rval)**2) / n**2 

 return - S / n * hleft - (n - S) / n * hright # shape (nS,) 

def time_vector(): 
 """Trivial front-end for fair timing""" 
 return f3(X,Y,S)

Defining your original solution to be run as time_orig() we can check that the results are the same:

>>> np.array_equal(time_orig(), time_newloop()), np.array_equal(time_orig(), time_vector())
(True, True)

And the runtimes with the above random inputs:

>>> %timeit time_orig()
... %timeit time_newloop()
... %timeit time_vector()
... 
... 
19 ms ± 501 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
11.4 ms ± 214 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.93 ms ± 37.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This means that the loopy version above is almost twice as fast as the original loopy version, and the vectorized version is another factor of three faster. Of course the cost of the latter improvement is an increased memory need: instead of arrays of shape (n,) you now have arrays of shape (nS, n) which can get quite big if your input arrays are huge. But as they say there's no free lunch, with vectorization you often trade runtime for memory.

answered Nov 12 '18 at 18:06

Andras Deak

21k64174

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53264919%2fcalculate-function-on-array-slices-in-a-vectorized-way%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Here's an improved loopy version of your code (complete with dummy input):

import numpy as np 

n = 1000 
X = np.random.randint(0,n,n) 
Y = np.random.randint(0,n,n) 
S = np.random.choice(n//2, n)

def f2(x, y, s): 
 """Same loopy solution as original, only faster"""
 n = x.size 
 isone = y == 1 
 lval = len(isone[:s].nonzero()[0]) 
 rval = len(isone[s:].nonzero()[0]) 

 hleft = 1 - (lval**2 + (s - lval)**2) / n**2 
 hright = 1 - (rval**2 + (n - s - rval)**2) / n**2

 return - s / n * hleft - (n - s) / n * hright

def time_newloop(): 
 """Callable front-end for timing comparisons""" 
 results = np.empty(len(S)) 
 for i in range(len(S)): 
 results[i] = f2(X, Y, S[i]) 
 return results

The changes are fairly straightforward.

def f3(X, Y, S): 
 """Vectorized solution working on all the data at the same time"""
 n = X.size 
 leftmask = np.arange(n) < S[:,None] # boolean, shape (nS, n) 
 rightmask = ~leftmask # boolean, shape (nS, n) 

 isone = Y == 1 # shape (n,) 
 lval = (isone & leftmask).sum(axis=1) # shape (nS,) 
 rval = (isone & rightmask).sum(axis=1) # shape (nS,) 

 hleft = 1 - (lval**2 + (S - lval)**2) / n**2 
 hright = 1 - (rval**2 + (n - S - rval)**2) / n**2 

 return - S / n * hleft - (n - S) / n * hright # shape (nS,) 

def time_vector(): 
 """Trivial front-end for fair timing""" 
 return f3(X,Y,S)

Defining your original solution to be run as time_orig() we can check that the results are the same:

>>> np.array_equal(time_orig(), time_newloop()), np.array_equal(time_orig(), time_vector())
(True, True)

And the runtimes with the above random inputs:

>>> %timeit time_orig()
... %timeit time_newloop()
... %timeit time_vector()
... 
... 
19 ms ± 501 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
11.4 ms ± 214 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.93 ms ± 37.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

answered Nov 12 '18 at 18:06

Andras Deak

21k64174

add a comment |

Here's an improved loopy version of your code (complete with dummy input):

import numpy as np 

n = 1000 
X = np.random.randint(0,n,n) 
Y = np.random.randint(0,n,n) 
S = np.random.choice(n//2, n)

def f2(x, y, s): 
 """Same loopy solution as original, only faster"""
 n = x.size 
 isone = y == 1 
 lval = len(isone[:s].nonzero()[0]) 
 rval = len(isone[s:].nonzero()[0]) 

 hleft = 1 - (lval**2 + (s - lval)**2) / n**2 
 hright = 1 - (rval**2 + (n - s - rval)**2) / n**2

 return - s / n * hleft - (n - s) / n * hright

def time_newloop(): 
 """Callable front-end for timing comparisons""" 
 results = np.empty(len(S)) 
 for i in range(len(S)): 
 results[i] = f2(X, Y, S[i]) 
 return results

The changes are fairly straightforward.

def f3(X, Y, S): 
 """Vectorized solution working on all the data at the same time"""
 n = X.size 
 leftmask = np.arange(n) < S[:,None] # boolean, shape (nS, n) 
 rightmask = ~leftmask # boolean, shape (nS, n) 

 isone = Y == 1 # shape (n,) 
 lval = (isone & leftmask).sum(axis=1) # shape (nS,) 
 rval = (isone & rightmask).sum(axis=1) # shape (nS,) 

 hleft = 1 - (lval**2 + (S - lval)**2) / n**2 
 hright = 1 - (rval**2 + (n - S - rval)**2) / n**2 

 return - S / n * hleft - (n - S) / n * hright # shape (nS,) 

def time_vector(): 
 """Trivial front-end for fair timing""" 
 return f3(X,Y,S)

Defining your original solution to be run as time_orig() we can check that the results are the same:

>>> np.array_equal(time_orig(), time_newloop()), np.array_equal(time_orig(), time_vector())
(True, True)

And the runtimes with the above random inputs:

>>> %timeit time_orig()
... %timeit time_newloop()
... %timeit time_vector()
... 
... 
19 ms ± 501 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
11.4 ms ± 214 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.93 ms ± 37.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

answered Nov 12 '18 at 18:06

Andras Deak

21k64174

add a comment |

Here's an improved loopy version of your code (complete with dummy input):

import numpy as np 

n = 1000 
X = np.random.randint(0,n,n) 
Y = np.random.randint(0,n,n) 
S = np.random.choice(n//2, n)

def f2(x, y, s): 
 """Same loopy solution as original, only faster"""
 n = x.size 
 isone = y == 1 
 lval = len(isone[:s].nonzero()[0]) 
 rval = len(isone[s:].nonzero()[0]) 

 hleft = 1 - (lval**2 + (s - lval)**2) / n**2 
 hright = 1 - (rval**2 + (n - s - rval)**2) / n**2

 return - s / n * hleft - (n - s) / n * hright

def time_newloop(): 
 """Callable front-end for timing comparisons""" 
 results = np.empty(len(S)) 
 for i in range(len(S)): 
 results[i] = f2(X, Y, S[i]) 
 return results

The changes are fairly straightforward.

def f3(X, Y, S): 
 """Vectorized solution working on all the data at the same time"""
 n = X.size 
 leftmask = np.arange(n) < S[:,None] # boolean, shape (nS, n) 
 rightmask = ~leftmask # boolean, shape (nS, n) 

 isone = Y == 1 # shape (n,) 
 lval = (isone & leftmask).sum(axis=1) # shape (nS,) 
 rval = (isone & rightmask).sum(axis=1) # shape (nS,) 

 hleft = 1 - (lval**2 + (S - lval)**2) / n**2 
 hright = 1 - (rval**2 + (n - S - rval)**2) / n**2 

 return - S / n * hleft - (n - S) / n * hright # shape (nS,) 

def time_vector(): 
 """Trivial front-end for fair timing""" 
 return f3(X,Y,S)

Defining your original solution to be run as time_orig() we can check that the results are the same:

>>> np.array_equal(time_orig(), time_newloop()), np.array_equal(time_orig(), time_vector())
(True, True)

And the runtimes with the above random inputs:

>>> %timeit time_orig()
... %timeit time_newloop()
... %timeit time_vector()
... 
... 
19 ms ± 501 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
11.4 ms ± 214 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.93 ms ± 37.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

answered Nov 12 '18 at 18:06

Andras Deak

21k64174

Here's an improved loopy version of your code (complete with dummy input):

import numpy as np 

n = 1000 
X = np.random.randint(0,n,n) 
Y = np.random.randint(0,n,n) 
S = np.random.choice(n//2, n)

def f2(x, y, s): 
 """Same loopy solution as original, only faster"""
 n = x.size 
 isone = y == 1 
 lval = len(isone[:s].nonzero()[0]) 
 rval = len(isone[s:].nonzero()[0]) 

 hleft = 1 - (lval**2 + (s - lval)**2) / n**2 
 hright = 1 - (rval**2 + (n - s - rval)**2) / n**2

 return - s / n * hleft - (n - s) / n * hright

def time_newloop(): 
 """Callable front-end for timing comparisons""" 
 results = np.empty(len(S)) 
 for i in range(len(S)): 
 results[i] = f2(X, Y, S[i]) 
 return results

The changes are fairly straightforward.

def f3(X, Y, S): 
 """Vectorized solution working on all the data at the same time"""
 n = X.size 
 leftmask = np.arange(n) < S[:,None] # boolean, shape (nS, n) 
 rightmask = ~leftmask # boolean, shape (nS, n) 

 isone = Y == 1 # shape (n,) 
 lval = (isone & leftmask).sum(axis=1) # shape (nS,) 
 rval = (isone & rightmask).sum(axis=1) # shape (nS,) 

 hleft = 1 - (lval**2 + (S - lval)**2) / n**2 
 hright = 1 - (rval**2 + (n - S - rval)**2) / n**2 

 return - S / n * hleft - (n - S) / n * hright # shape (nS,) 

def time_vector(): 
 """Trivial front-end for fair timing""" 
 return f3(X,Y,S)

Defining your original solution to be run as time_orig() we can check that the results are the same:

>>> np.array_equal(time_orig(), time_newloop()), np.array_equal(time_orig(), time_vector())
(True, True)

And the runtimes with the above random inputs:

>>> %timeit time_orig()
... %timeit time_newloop()
... %timeit time_vector()
... 
... 
19 ms ± 501 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
11.4 ms ± 214 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.93 ms ± 37.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

answered Nov 12 '18 at 18:06

Andras Deak

21k64174

answered Nov 12 '18 at 18:06

Andras Deak

21k64174

answered Nov 12 '18 at 18:06

Andras Deak

21k64174

answered Nov 12 '18 at 18:06

Andras Deak

21k64174

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dfyjkt