How to construct a list of lengths efficiently
$begingroup$
Say I have a sorted list of integers
RandomInteger[1, 100000, 10000] // Sort // Short
I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:
Table[Length@Select[%, LessEqualThan[m]], m, 10000]
This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...
list-manipulation performance-tuning filtering
$endgroup$
add a comment |
$begingroup$
Say I have a sorted list of integers
RandomInteger[1, 100000, 10000] // Sort // Short
I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:
Table[Length@Select[%, LessEqualThan[m]], m, 10000]
This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...
list-manipulation performance-tuning filtering
$endgroup$
1
$begingroup$
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe anEmpiricalDistribution
orBinCounts
already can accomplish what you want?
$endgroup$
– Thies Heidecke
Nov 13 '18 at 13:08
add a comment |
$begingroup$
Say I have a sorted list of integers
RandomInteger[1, 100000, 10000] // Sort // Short
I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:
Table[Length@Select[%, LessEqualThan[m]], m, 10000]
This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...
list-manipulation performance-tuning filtering
$endgroup$
Say I have a sorted list of integers
RandomInteger[1, 100000, 10000] // Sort // Short
I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:
Table[Length@Select[%, LessEqualThan[m]], m, 10000]
This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...
list-manipulation performance-tuning filtering
list-manipulation performance-tuning filtering
edited Nov 13 '18 at 12:30
Alexey Popkov
38.6k4108265
38.6k4108265
asked Nov 13 '18 at 1:00
AccidentalFourierTransformAccidentalFourierTransform
5,16311042
5,16311042
1
$begingroup$
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe anEmpiricalDistribution
orBinCounts
already can accomplish what you want?
$endgroup$
– Thies Heidecke
Nov 13 '18 at 13:08
add a comment |
1
$begingroup$
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe anEmpiricalDistribution
orBinCounts
already can accomplish what you want?
$endgroup$
– Thies Heidecke
Nov 13 '18 at 13:08
1
1
$begingroup$
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an
EmpiricalDistribution
or BinCounts
already can accomplish what you want?$endgroup$
– Thies Heidecke
Nov 13 '18 at 13:08
$begingroup$
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an
EmpiricalDistribution
or BinCounts
already can accomplish what you want?$endgroup$
– Thies Heidecke
Nov 13 '18 at 13:08
add a comment |
4 Answers
4
active
oldest
votes
$begingroup$
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming
r1 === r2
0.435358, Null
41.4357, Null
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
,
With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
0.432897, Null
0.122198, Null
0.025923, Null
True
$endgroup$
$begingroup$
Ah, great answer, as usual.
$endgroup$
– AccidentalFourierTransform
Nov 13 '18 at 1:19
$begingroup$
Can you please check my answer because my laptop is very slow
$endgroup$
– J42161217
Nov 13 '18 at 3:05
add a comment |
$begingroup$
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
$endgroup$
$begingroup$
Beat me to it - BinCounts is the way... +1
$endgroup$
– ciao
Nov 13 '18 at 6:49
1
$begingroup$
Hey @ciao, you are back?!!
$endgroup$
– kglr
Nov 13 '18 at 6:53
$begingroup$
Sorry @Henrik; thanks for the edit.
$endgroup$
– kglr
Nov 13 '18 at 10:23
1
$begingroup$
Short and fast. No need sorting.. Excellent!!... +1
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 15:14
$begingroup$
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
$endgroup$
– ciao
Nov 13 '18 at 22:42
add a comment |
$begingroup$
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming
0.157123, Null
Using MapThread the same code is way faster
r6 = Flatten[
Join[Table[0, s[[1]] - 1],
MapThread[
Table, Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming
r6===r3
0.008387, Null
True
$endgroup$
1
$begingroup$
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 3:55
$begingroup$
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
$endgroup$
– J42161217
Nov 13 '18 at 4:01
add a comment |
$begingroup$
s = Sort[RandomInteger[1, 100000, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === ),
rules[[2]],
With[spopt = SystemOptions["SparseArrayOptions"],
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "387"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f185874%2fhow-to-construct-a-list-of-lengths-efficiently%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming
r1 === r2
0.435358, Null
41.4357, Null
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
,
With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
0.432897, Null
0.122198, Null
0.025923, Null
True
$endgroup$
$begingroup$
Ah, great answer, as usual.
$endgroup$
– AccidentalFourierTransform
Nov 13 '18 at 1:19
$begingroup$
Can you please check my answer because my laptop is very slow
$endgroup$
– J42161217
Nov 13 '18 at 3:05
add a comment |
$begingroup$
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming
r1 === r2
0.435358, Null
41.4357, Null
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
,
With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
0.432897, Null
0.122198, Null
0.025923, Null
True
$endgroup$
$begingroup$
Ah, great answer, as usual.
$endgroup$
– AccidentalFourierTransform
Nov 13 '18 at 1:19
$begingroup$
Can you please check my answer because my laptop is very slow
$endgroup$
– J42161217
Nov 13 '18 at 3:05
add a comment |
$begingroup$
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming
r1 === r2
0.435358, Null
41.4357, Null
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
,
With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
0.432897, Null
0.122198, Null
0.025923, Null
True
$endgroup$
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming
r1 === r2
0.435358, Null
41.4357, Null
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
,
With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
0.432897, Null
0.122198, Null
0.025923, Null
True
edited Nov 13 '18 at 4:11
answered Nov 13 '18 at 1:11
Carl WollCarl Woll
70.5k394184
70.5k394184
$begingroup$
Ah, great answer, as usual.
$endgroup$
– AccidentalFourierTransform
Nov 13 '18 at 1:19
$begingroup$
Can you please check my answer because my laptop is very slow
$endgroup$
– J42161217
Nov 13 '18 at 3:05
add a comment |
$begingroup$
Ah, great answer, as usual.
$endgroup$
– AccidentalFourierTransform
Nov 13 '18 at 1:19
$begingroup$
Can you please check my answer because my laptop is very slow
$endgroup$
– J42161217
Nov 13 '18 at 3:05
$begingroup$
Ah, great answer, as usual.
$endgroup$
– AccidentalFourierTransform
Nov 13 '18 at 1:19
$begingroup$
Ah, great answer, as usual.
$endgroup$
– AccidentalFourierTransform
Nov 13 '18 at 1:19
$begingroup$
Can you please check my answer because my laptop is very slow
$endgroup$
– J42161217
Nov 13 '18 at 3:05
$begingroup$
Can you please check my answer because my laptop is very slow
$endgroup$
– J42161217
Nov 13 '18 at 3:05
add a comment |
$begingroup$
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
$endgroup$
$begingroup$
Beat me to it - BinCounts is the way... +1
$endgroup$
– ciao
Nov 13 '18 at 6:49
1
$begingroup$
Hey @ciao, you are back?!!
$endgroup$
– kglr
Nov 13 '18 at 6:53
$begingroup$
Sorry @Henrik; thanks for the edit.
$endgroup$
– kglr
Nov 13 '18 at 10:23
1
$begingroup$
Short and fast. No need sorting.. Excellent!!... +1
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 15:14
$begingroup$
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
$endgroup$
– ciao
Nov 13 '18 at 22:42
add a comment |
$begingroup$
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
$endgroup$
$begingroup$
Beat me to it - BinCounts is the way... +1
$endgroup$
– ciao
Nov 13 '18 at 6:49
1
$begingroup$
Hey @ciao, you are back?!!
$endgroup$
– kglr
Nov 13 '18 at 6:53
$begingroup$
Sorry @Henrik; thanks for the edit.
$endgroup$
– kglr
Nov 13 '18 at 10:23
1
$begingroup$
Short and fast. No need sorting.. Excellent!!... +1
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 15:14
$begingroup$
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
$endgroup$
– ciao
Nov 13 '18 at 22:42
add a comment |
$begingroup$
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
$endgroup$
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
edited Nov 13 '18 at 10:10
Henrik Schumacher
56.7k577157
56.7k577157
answered Nov 13 '18 at 6:31
kglrkglr
189k10205423
189k10205423
$begingroup$
Beat me to it - BinCounts is the way... +1
$endgroup$
– ciao
Nov 13 '18 at 6:49
1
$begingroup$
Hey @ciao, you are back?!!
$endgroup$
– kglr
Nov 13 '18 at 6:53
$begingroup$
Sorry @Henrik; thanks for the edit.
$endgroup$
– kglr
Nov 13 '18 at 10:23
1
$begingroup$
Short and fast. No need sorting.. Excellent!!... +1
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 15:14
$begingroup$
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
$endgroup$
– ciao
Nov 13 '18 at 22:42
add a comment |
$begingroup$
Beat me to it - BinCounts is the way... +1
$endgroup$
– ciao
Nov 13 '18 at 6:49
1
$begingroup$
Hey @ciao, you are back?!!
$endgroup$
– kglr
Nov 13 '18 at 6:53
$begingroup$
Sorry @Henrik; thanks for the edit.
$endgroup$
– kglr
Nov 13 '18 at 10:23
1
$begingroup$
Short and fast. No need sorting.. Excellent!!... +1
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 15:14
$begingroup$
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
$endgroup$
– ciao
Nov 13 '18 at 22:42
$begingroup$
Beat me to it - BinCounts is the way... +1
$endgroup$
– ciao
Nov 13 '18 at 6:49
$begingroup$
Beat me to it - BinCounts is the way... +1
$endgroup$
– ciao
Nov 13 '18 at 6:49
1
1
$begingroup$
Hey @ciao, you are back?!!
$endgroup$
– kglr
Nov 13 '18 at 6:53
$begingroup$
Hey @ciao, you are back?!!
$endgroup$
– kglr
Nov 13 '18 at 6:53
$begingroup$
Sorry @Henrik; thanks for the edit.
$endgroup$
– kglr
Nov 13 '18 at 10:23
$begingroup$
Sorry @Henrik; thanks for the edit.
$endgroup$
– kglr
Nov 13 '18 at 10:23
1
1
$begingroup$
Short and fast. No need sorting.. Excellent!!... +1
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 15:14
$begingroup$
Short and fast. No need sorting.. Excellent!!... +1
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 15:14
$begingroup$
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
$endgroup$
– ciao
Nov 13 '18 at 22:42
$begingroup$
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
$endgroup$
– ciao
Nov 13 '18 at 22:42
add a comment |
$begingroup$
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming
0.157123, Null
Using MapThread the same code is way faster
r6 = Flatten[
Join[Table[0, s[[1]] - 1],
MapThread[
Table, Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming
r6===r3
0.008387, Null
True
$endgroup$
1
$begingroup$
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 3:55
$begingroup$
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
$endgroup$
– J42161217
Nov 13 '18 at 4:01
add a comment |
$begingroup$
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming
0.157123, Null
Using MapThread the same code is way faster
r6 = Flatten[
Join[Table[0, s[[1]] - 1],
MapThread[
Table, Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming
r6===r3
0.008387, Null
True
$endgroup$
1
$begingroup$
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 3:55
$begingroup$
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
$endgroup$
– J42161217
Nov 13 '18 at 4:01
add a comment |
$begingroup$
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming
0.157123, Null
Using MapThread the same code is way faster
r6 = Flatten[
Join[Table[0, s[[1]] - 1],
MapThread[
Table, Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming
r6===r3
0.008387, Null
True
$endgroup$
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming
0.157123, Null
Using MapThread the same code is way faster
r6 = Flatten[
Join[Table[0, s[[1]] - 1],
MapThread[
Table, Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming
r6===r3
0.008387, Null
True
edited Nov 13 '18 at 12:57
answered Nov 13 '18 at 3:04
J42161217J42161217
3,968323
3,968323
1
$begingroup$
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 3:55
$begingroup$
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
$endgroup$
– J42161217
Nov 13 '18 at 4:01
add a comment |
1
$begingroup$
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 3:55
$begingroup$
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
$endgroup$
– J42161217
Nov 13 '18 at 4:01
1
1
$begingroup$
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 3:55
$begingroup$
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
$endgroup$
– Okkes Dulgerci
Nov 13 '18 at 3:55
$begingroup$
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
$endgroup$
– J42161217
Nov 13 '18 at 4:01
$begingroup$
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
$endgroup$
– J42161217
Nov 13 '18 at 4:01
add a comment |
$begingroup$
s = Sort[RandomInteger[1, 100000, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === ),
rules[[2]],
With[spopt = SystemOptions["SparseArrayOptions"],
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
$endgroup$
add a comment |
$begingroup$
s = Sort[RandomInteger[1, 100000, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === ),
rules[[2]],
With[spopt = SystemOptions["SparseArrayOptions"],
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
$endgroup$
add a comment |
$begingroup$
s = Sort[RandomInteger[1, 100000, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === ),
rules[[2]],
With[spopt = SystemOptions["SparseArrayOptions"],
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
$endgroup$
s = Sort[RandomInteger[1, 100000, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === ),
rules[[2]],
With[spopt = SystemOptions["SparseArrayOptions"],
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
edited Nov 13 '18 at 23:11
answered Nov 13 '18 at 6:32
Henrik SchumacherHenrik Schumacher
56.7k577157
56.7k577157
add a comment |
add a comment |
Thanks for contributing an answer to Mathematica Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f185874%2fhow-to-construct-a-list-of-lengths-efficiently%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an
EmpiricalDistribution
orBinCounts
already can accomplish what you want?$endgroup$
– Thies Heidecke
Nov 13 '18 at 13:08