How to construct a list of lengths efficiently

Say I have a sorted list of integers

RandomInteger[1, 100000, 10000] // Sort // Short

I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:

Table[Length@Select[%, LessEqualThan[m]], m, 10000]

This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...

edited Nov 13 '18 at 12:30

Alexey Popkov

38.6k4108265

asked Nov 13 '18 at 1:00

AccidentalFourierTransform

5,16311042

1

$begingroup$
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
$endgroup$
– Thies Heidecke
Nov 13 '18 at 13:08

add a comment |

Say I have a sorted list of integers

RandomInteger[1, 100000, 10000] // Sort // Short

I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:

Table[Length@Select[%, LessEqualThan[m]], m, 10000]

edited Nov 13 '18 at 12:30

Alexey Popkov

38.6k4108265

asked Nov 13 '18 at 1:00

AccidentalFourierTransform

5,16311042

1

$begingroup$
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
$endgroup$
– Thies Heidecke
Nov 13 '18 at 13:08

add a comment |

Say I have a sorted list of integers

RandomInteger[1, 100000, 10000] // Sort // Short

I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:

Table[Length@Select[%, LessEqualThan[m]], m, 10000]

edited Nov 13 '18 at 12:30

Alexey Popkov

38.6k4108265

asked Nov 13 '18 at 1:00

AccidentalFourierTransform

5,16311042

Say I have a sorted list of integers

RandomInteger[1, 100000, 10000] // Sort // Short

I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:

Table[Length@Select[%, LessEqualThan[m]], m, 10000]

list-manipulation performance-tuning filtering

edited Nov 13 '18 at 12:30

Alexey Popkov

38.6k4108265

asked Nov 13 '18 at 1:00

AccidentalFourierTransform

5,16311042

edited Nov 13 '18 at 12:30

Alexey Popkov

38.6k4108265

asked Nov 13 '18 at 1:00

AccidentalFourierTransform

5,16311042

edited Nov 13 '18 at 12:30

Alexey Popkov

38.6k4108265

edited Nov 13 '18 at 12:30

Alexey Popkov

38.6k4108265

edited Nov 13 '18 at 12:30

Alexey Popkov

38.6k4108265

asked Nov 13 '18 at 1:00

AccidentalFourierTransform

5,16311042

asked Nov 13 '18 at 1:00

AccidentalFourierTransform

5,16311042

asked Nov 13 '18 at 1:00

AccidentalFourierTransform

5,16311042

1

$begingroup$
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
$endgroup$
– Thies Heidecke
Nov 13 '18 at 13:08

add a comment |

1

$begingroup$
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
$endgroup$
– Thies Heidecke
Nov 13 '18 at 13:08

What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?

– Thies Heidecke
Nov 13 '18 at 13:08

add a comment |

4 Answers
4

active

oldest

votes

You can use the usual UnitStep + Total tricks:

r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

r1 === r2

0.435358, Null

41.4357, Null

True

Update

As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:

mincounts[s_] := With[
 
 unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
 counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
 ,
 With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
 counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
 ]
]

Comparison:

SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;

(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
 Join[
 Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
 ]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming

r1 === r2 === r3

0.432897, Null

0.122198, Null

0.025923, Null

True

edited Nov 13 '18 at 4:11

answered Nov 13 '18 at 1:11

Carl Woll

70.5k394184

$begingroup$
Ah, great answer, as usual.
$endgroup$
– AccidentalFourierTransform
Nov 13 '18 at 1:19

$begingroup$
Can you please check my answer because my laptop is very slow
$endgroup$
– J42161217
Nov 13 '18 at 3:05

add a comment |

BinCounts and Accumulate combination is faster than all the methods posted so far:

r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First

0.00069

versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:

r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
 1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00081

r3 = mincounts[s]; // RepeatedTiming // First

0.016

r2 = Flatten[Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
     Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First

0.149

r2 == r3 == r4 == r5

True

edited Nov 13 '18 at 10:10

Henrik Schumacher

56.7k577157

answered Nov 13 '18 at 6:31

kglr

189k10205423

$begingroup$
Beat me to it - BinCounts is the way... +1
$endgroup$
– ciao
Nov 13 '18 at 6:49

1

$begingroup$
Hey @ciao, you are back?!!
$endgroup$
– kglr
Nov 13 '18 at 6:53

$begingroup$
Sorry @Henrik; thanks for the edit.
$endgroup$
– kglr
Nov 13 '18 at 10:23

1

$begingroup$
Short and fast. No need sorting.. Excellent!!... +1
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 15:14

$begingroup$
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
$endgroup$
– ciao
Nov 13 '18 at 22:42

add a comment |

I think this is at least x3 faster than Mr. Carl Woll's answer

Can anybody compare my timing?

r3 = Flatten[Join[Table[0, s[[1]] - 1], 
Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming

0.157123, Null

Using MapThread the same code is way faster

r6 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 MapThread[
 Table, Range[t = Length[Select[s, # <= 10000 &]]], 
 Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming 

r6===r3

0.008387, Null

True

edited Nov 13 '18 at 12:57

answered Nov 13 '18 at 3:04

J42161217

3,968323

1

$begingroup$
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 3:55

$begingroup$
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
$endgroup$
– J42161217
Nov 13 '18 at 4:01

add a comment |

s = Sort[RandomInteger[1, 100000, 10000]];

Let us just imagine for the moment that the target list r is supposed to have length 100000 (we can truncate it afterwards). Then for each entry i in the list s, the list r needs to have a jump of height 1 at position i. The jumps are the "derivative" of r (in a discrete sense) and the antiderivative can be obtained with Accumulate. In order to get the list of jumps, we need additive matrix assembly.

This can be done with this function:

mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
 If[(Head[rules] === Rule) && (rules[[1]] === ),
 rules[[2]],
 With[spopt = SystemOptions["SparseArrayOptions"],
 Internal`WithLocalSettings[
 SetSystemOptions[
 "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
 SparseArray[rules, dims, background],
 SetSystemOptions[spopt]]
 ]
 ]

So, in total, r can be obtained as follows:

r4 = Accumulate[
 mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00055

For comparison:

r3 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First
r3 == r4

0.116

True

edited Nov 13 '18 at 23:11

answered Nov 13 '18 at 6:32

Henrik Schumacher

56.7k577157

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "387"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f185874%2fhow-to-construct-a-list-of-lengths-efficiently%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

You can use the usual UnitStep + Total tricks:

r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

r1 === r2

0.435358, Null

41.4357, Null

True

Update

As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:

mincounts[s_] := With[
 
 unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
 counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
 ,
 With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
 counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
 ]
]

Comparison:

SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;

(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
 Join[
 Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
 ]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming

r1 === r2 === r3

0.432897, Null

0.122198, Null

0.025923, Null

True

edited Nov 13 '18 at 4:11

answered Nov 13 '18 at 1:11

Carl Woll

70.5k394184

$begingroup$
Ah, great answer, as usual.
$endgroup$
– AccidentalFourierTransform
Nov 13 '18 at 1:19

$begingroup$
Can you please check my answer because my laptop is very slow
$endgroup$
– J42161217
Nov 13 '18 at 3:05

add a comment |

You can use the usual UnitStep + Total tricks:

r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

r1 === r2

0.435358, Null

41.4357, Null

True

Update

As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:

mincounts[s_] := With[
 
 unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
 counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
 ,
 With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
 counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
 ]
]

Comparison:

SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;

(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
 Join[
 Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
 ]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming

r1 === r2 === r3

0.432897, Null

0.122198, Null

0.025923, Null

True

edited Nov 13 '18 at 4:11

answered Nov 13 '18 at 1:11

Carl Woll

70.5k394184

$begingroup$
Ah, great answer, as usual.
$endgroup$
– AccidentalFourierTransform
Nov 13 '18 at 1:19

$begingroup$
Can you please check my answer because my laptop is very slow
$endgroup$
– J42161217
Nov 13 '18 at 3:05

add a comment |

You can use the usual UnitStep + Total tricks:

r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

r1 === r2

0.435358, Null

41.4357, Null

True

Update

As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:

mincounts[s_] := With[
 
 unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
 counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
 ,
 With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
 counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
 ]
]

Comparison:

SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;

(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
 Join[
 Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
 ]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming

r1 === r2 === r3

0.432897, Null

0.122198, Null

0.025923, Null

True

edited Nov 13 '18 at 4:11

answered Nov 13 '18 at 1:11

Carl Woll

70.5k394184

You can use the usual UnitStep + Total tricks:

r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

r1 === r2

0.435358, Null

41.4357, Null

True

Update

As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:

mincounts[s_] := With[
 
 unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
 counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
 ,
 With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
 counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
 ]
]

Comparison:

SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;

(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
 Join[
 Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
 ]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming

r1 === r2 === r3

0.432897, Null

0.122198, Null

0.025923, Null

True

edited Nov 13 '18 at 4:11

answered Nov 13 '18 at 1:11

Carl Woll

70.5k394184

edited Nov 13 '18 at 4:11

answered Nov 13 '18 at 1:11

Carl Woll

70.5k394184

answered Nov 13 '18 at 1:11

Carl Woll

70.5k394184

answered Nov 13 '18 at 1:11

Carl Woll

70.5k394184

$begingroup$
Ah, great answer, as usual.
$endgroup$
– AccidentalFourierTransform
Nov 13 '18 at 1:19

$begingroup$
Can you please check my answer because my laptop is very slow
$endgroup$
– J42161217
Nov 13 '18 at 3:05

add a comment |

$begingroup$
Ah, great answer, as usual.
$endgroup$
– AccidentalFourierTransform
Nov 13 '18 at 1:19

$begingroup$
Can you please check my answer because my laptop is very slow
$endgroup$
– J42161217
Nov 13 '18 at 3:05

Ah, great answer, as usual.

– AccidentalFourierTransform
Nov 13 '18 at 1:19

Can you please check my answer because my laptop is very slow

– J42161217
Nov 13 '18 at 3:05

add a comment |

BinCounts and Accumulate combination is faster than all the methods posted so far:

r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First

0.00069

versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:

r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
 1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00081

r3 = mincounts[s]; // RepeatedTiming // First

0.016

r2 = Flatten[Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
     Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First

0.149

r2 == r3 == r4 == r5

True

edited Nov 13 '18 at 10:10

Henrik Schumacher

56.7k577157

answered Nov 13 '18 at 6:31

kglr

189k10205423

$begingroup$
Beat me to it - BinCounts is the way... +1
$endgroup$
– ciao
Nov 13 '18 at 6:49

1

$begingroup$
Hey @ciao, you are back?!!
$endgroup$
– kglr
Nov 13 '18 at 6:53

$begingroup$
Sorry @Henrik; thanks for the edit.
$endgroup$
– kglr
Nov 13 '18 at 10:23

1

$begingroup$
Short and fast. No need sorting.. Excellent!!... +1
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 15:14

$begingroup$
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
$endgroup$
– ciao
Nov 13 '18 at 22:42

add a comment |

BinCounts and Accumulate combination is faster than all the methods posted so far:

r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First

0.00069

versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:

r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
 1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00081

r3 = mincounts[s]; // RepeatedTiming // First

0.016

r2 = Flatten[Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
     Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First

0.149

r2 == r3 == r4 == r5

True

edited Nov 13 '18 at 10:10

Henrik Schumacher

56.7k577157

answered Nov 13 '18 at 6:31

kglr

189k10205423

$begingroup$
Beat me to it - BinCounts is the way... +1
$endgroup$
– ciao
Nov 13 '18 at 6:49

1

$begingroup$
Hey @ciao, you are back?!!
$endgroup$
– kglr
Nov 13 '18 at 6:53

$begingroup$
Sorry @Henrik; thanks for the edit.
$endgroup$
– kglr
Nov 13 '18 at 10:23

1

$begingroup$
Short and fast. No need sorting.. Excellent!!... +1
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 15:14

$begingroup$
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
$endgroup$
– ciao
Nov 13 '18 at 22:42

add a comment |

BinCounts and Accumulate combination is faster than all the methods posted so far:

r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First

0.00069

versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:

r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
 1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00081

r3 = mincounts[s]; // RepeatedTiming // First

0.016

r2 = Flatten[Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
     Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First

0.149

r2 == r3 == r4 == r5

True

edited Nov 13 '18 at 10:10

Henrik Schumacher

56.7k577157

answered Nov 13 '18 at 6:31

kglr

189k10205423

BinCounts and Accumulate combination is faster than all the methods posted so far:

r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First

0.00069

versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:

r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
 1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00081

r3 = mincounts[s]; // RepeatedTiming // First

0.016

r2 = Flatten[Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
     Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First

0.149

r2 == r3 == r4 == r5

True

edited Nov 13 '18 at 10:10

Henrik Schumacher

56.7k577157

answered Nov 13 '18 at 6:31

kglr

189k10205423

edited Nov 13 '18 at 10:10

Henrik Schumacher

56.7k577157

edited Nov 13 '18 at 10:10

Henrik Schumacher

56.7k577157

edited Nov 13 '18 at 10:10

Henrik Schumacher

56.7k577157

answered Nov 13 '18 at 6:31

kglr

189k10205423

answered Nov 13 '18 at 6:31

kglr

189k10205423

answered Nov 13 '18 at 6:31

kglr

189k10205423

$begingroup$
Beat me to it - BinCounts is the way... +1
$endgroup$
– ciao
Nov 13 '18 at 6:49

1

$begingroup$
Hey @ciao, you are back?!!
$endgroup$
– kglr
Nov 13 '18 at 6:53

$begingroup$
Sorry @Henrik; thanks for the edit.
$endgroup$
– kglr
Nov 13 '18 at 10:23

1

$begingroup$
Short and fast. No need sorting.. Excellent!!... +1
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 15:14

$begingroup$
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
$endgroup$
– ciao
Nov 13 '18 at 22:42

add a comment |

$begingroup$
Beat me to it - BinCounts is the way... +1
$endgroup$
– ciao
Nov 13 '18 at 6:49

1

$begingroup$
Hey @ciao, you are back?!!
$endgroup$
– kglr
Nov 13 '18 at 6:53

$begingroup$
Sorry @Henrik; thanks for the edit.
$endgroup$
– kglr
Nov 13 '18 at 10:23

1

$begingroup$
Short and fast. No need sorting.. Excellent!!... +1
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 15:14

$begingroup$
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
$endgroup$
– ciao
Nov 13 '18 at 22:42

Beat me to it - BinCounts is the way... +1

– ciao
Nov 13 '18 at 6:49

Hey @ciao, you are back?!!

– kglr
Nov 13 '18 at 6:53

Sorry @Henrik; thanks for the edit.

– kglr
Nov 13 '18 at 10:23

Short and fast. No need sorting.. Excellent!!... +1

– Okkes Dulgerci
Nov 13 '18 at 15:14

@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.

– ciao
Nov 13 '18 at 22:42

add a comment |

I think this is at least x3 faster than Mr. Carl Woll's answer

Can anybody compare my timing?

r3 = Flatten[Join[Table[0, s[[1]] - 1], 
Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming

0.157123, Null

Using MapThread the same code is way faster

r6 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 MapThread[
 Table, Range[t = Length[Select[s, # <= 10000 &]]], 
 Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming 

r6===r3

0.008387, Null

True

edited Nov 13 '18 at 12:57

answered Nov 13 '18 at 3:04

J42161217

3,968323

1

$begingroup$
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 3:55

$begingroup$
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
$endgroup$
– J42161217
Nov 13 '18 at 4:01

add a comment |

I think this is at least x3 faster than Mr. Carl Woll's answer

Can anybody compare my timing?

r3 = Flatten[Join[Table[0, s[[1]] - 1], 
Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming

0.157123, Null

Using MapThread the same code is way faster

r6 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 MapThread[
 Table, Range[t = Length[Select[s, # <= 10000 &]]], 
 Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming 

r6===r3

0.008387, Null

True

edited Nov 13 '18 at 12:57

answered Nov 13 '18 at 3:04

J42161217

3,968323

1

$begingroup$
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 3:55

$begingroup$
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
$endgroup$
– J42161217
Nov 13 '18 at 4:01

add a comment |

I think this is at least x3 faster than Mr. Carl Woll's answer

Can anybody compare my timing?

r3 = Flatten[Join[Table[0, s[[1]] - 1], 
Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming

0.157123, Null

Using MapThread the same code is way faster

r6 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 MapThread[
 Table, Range[t = Length[Select[s, # <= 10000 &]]], 
 Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming 

r6===r3

0.008387, Null

True

edited Nov 13 '18 at 12:57

answered Nov 13 '18 at 3:04

J42161217

3,968323

I think this is at least x3 faster than Mr. Carl Woll's answer

Can anybody compare my timing?

r3 = Flatten[Join[Table[0, s[[1]] - 1], 
Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming

0.157123, Null

Using MapThread the same code is way faster

r6 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 MapThread[
 Table, Range[t = Length[Select[s, # <= 10000 &]]], 
 Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming 

r6===r3

0.008387, Null

True

edited Nov 13 '18 at 12:57

answered Nov 13 '18 at 3:04

J42161217

3,968323

edited Nov 13 '18 at 12:57

answered Nov 13 '18 at 3:04

J42161217

3,968323

answered Nov 13 '18 at 3:04

J42161217

3,968323

answered Nov 13 '18 at 3:04

J42161217

3,968323

1

$begingroup$
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 3:55

$begingroup$
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
$endgroup$
– J42161217
Nov 13 '18 at 4:01

add a comment |

1

$begingroup$
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 3:55

$begingroup$
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
$endgroup$
– J42161217
Nov 13 '18 at 4:01

These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True

– Okkes Dulgerci
Nov 13 '18 at 3:55

Hey, thanks for checking. I will use your timing. Thanks for the confirmation

– J42161217
Nov 13 '18 at 4:01

add a comment |

s = Sort[RandomInteger[1, 100000, 10000]];

This can be done with this function:

mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
 If[(Head[rules] === Rule) && (rules[[1]] === ),
 rules[[2]],
 With[spopt = SystemOptions["SparseArrayOptions"],
 Internal`WithLocalSettings[
 SetSystemOptions[
 "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
 SparseArray[rules, dims, background],
 SetSystemOptions[spopt]]
 ]
 ]

So, in total, r can be obtained as follows:

r4 = Accumulate[
 mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00055

For comparison:

r3 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First
r3 == r4

0.116

True

edited Nov 13 '18 at 23:11

answered Nov 13 '18 at 6:32

Henrik Schumacher

56.7k577157

add a comment |

s = Sort[RandomInteger[1, 100000, 10000]];

This can be done with this function:

mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
 If[(Head[rules] === Rule) && (rules[[1]] === ),
 rules[[2]],
 With[spopt = SystemOptions["SparseArrayOptions"],
 Internal`WithLocalSettings[
 SetSystemOptions[
 "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
 SparseArray[rules, dims, background],
 SetSystemOptions[spopt]]
 ]
 ]

So, in total, r can be obtained as follows:

r4 = Accumulate[
 mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00055

For comparison:

r3 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First
r3 == r4

0.116

True

edited Nov 13 '18 at 23:11

answered Nov 13 '18 at 6:32

Henrik Schumacher

56.7k577157

add a comment |

s = Sort[RandomInteger[1, 100000, 10000]];

This can be done with this function:

mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
 If[(Head[rules] === Rule) && (rules[[1]] === ),
 rules[[2]],
 With[spopt = SystemOptions["SparseArrayOptions"],
 Internal`WithLocalSettings[
 SetSystemOptions[
 "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
 SparseArray[rules, dims, background],
 SetSystemOptions[spopt]]
 ]
 ]

So, in total, r can be obtained as follows:

r4 = Accumulate[
 mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00055

For comparison:

r3 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First
r3 == r4

0.116

True

edited Nov 13 '18 at 23:11

answered Nov 13 '18 at 6:32

Henrik Schumacher

56.7k577157

s = Sort[RandomInteger[1, 100000, 10000]];

This can be done with this function:

mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
 If[(Head[rules] === Rule) && (rules[[1]] === ),
 rules[[2]],
 With[spopt = SystemOptions["SparseArrayOptions"],
 Internal`WithLocalSettings[
 SetSystemOptions[
 "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
 SparseArray[rules, dims, background],
 SetSystemOptions[spopt]]
 ]
 ]

So, in total, r can be obtained as follows:

r4 = Accumulate[
 mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00055

For comparison:

r3 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First
r3 == r4

0.116

True

edited Nov 13 '18 at 23:11

answered Nov 13 '18 at 6:32

Henrik Schumacher

56.7k577157

edited Nov 13 '18 at 23:11

answered Nov 13 '18 at 6:32

Henrik Schumacher

56.7k577157

answered Nov 13 '18 at 6:32

Henrik Schumacher

56.7k577157

answered Nov 13 '18 at 6:32

Henrik Schumacher

56.7k577157

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematica Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dfyjkt