How to construct a list of lengths efficiently










19












$begingroup$


Say I have a sorted list of integers



RandomInteger[1, 100000, 10000] // Sort // Short


I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:



Table[Length@Select[%, LessEqualThan[m]], m, 10000]


This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...










share|improve this question











$endgroup$







  • 1




    $begingroup$
    What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
    $endgroup$
    – Thies Heidecke
    Nov 13 '18 at 13:08















19












$begingroup$


Say I have a sorted list of integers



RandomInteger[1, 100000, 10000] // Sort // Short


I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:



Table[Length@Select[%, LessEqualThan[m]], m, 10000]


This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...










share|improve this question











$endgroup$







  • 1




    $begingroup$
    What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
    $endgroup$
    – Thies Heidecke
    Nov 13 '18 at 13:08













19












19








19


7



$begingroup$


Say I have a sorted list of integers



RandomInteger[1, 100000, 10000] // Sort // Short


I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:



Table[Length@Select[%, LessEqualThan[m]], m, 10000]


This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...










share|improve this question











$endgroup$




Say I have a sorted list of integers



RandomInteger[1, 100000, 10000] // Sort // Short


I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:



Table[Length@Select[%, LessEqualThan[m]], m, 10000]


This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...







list-manipulation performance-tuning filtering






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 '18 at 12:30









Alexey Popkov

38.6k4108265




38.6k4108265










asked Nov 13 '18 at 1:00









AccidentalFourierTransformAccidentalFourierTransform

5,16311042




5,16311042







  • 1




    $begingroup$
    What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
    $endgroup$
    – Thies Heidecke
    Nov 13 '18 at 13:08












  • 1




    $begingroup$
    What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
    $endgroup$
    – Thies Heidecke
    Nov 13 '18 at 13:08







1




1




$begingroup$
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
$endgroup$
– Thies Heidecke
Nov 13 '18 at 13:08




$begingroup$
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
$endgroup$
– Thies Heidecke
Nov 13 '18 at 13:08










4 Answers
4






active

oldest

votes


















22












$begingroup$

You can use the usual UnitStep + Total tricks:



r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

r1 === r2



0.435358, Null



41.4357, Null



True




Update



As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:



mincounts[s_] := With[

unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
,
With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]


Comparison:



SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;

(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming

r1 === r2 === r3



0.432897, Null



0.122198, Null



0.025923, Null



True







share|improve this answer











$endgroup$












  • $begingroup$
    Ah, great answer, as usual.
    $endgroup$
    – AccidentalFourierTransform
    Nov 13 '18 at 1:19










  • $begingroup$
    Can you please check my answer because my laptop is very slow
    $endgroup$
    – J42161217
    Nov 13 '18 at 3:05


















15












$begingroup$

BinCounts and Accumulate combination is faster than all the methods posted so far:



r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First 



0.00069




versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:



r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First



 0.00081




r3 = mincounts[s]; // RepeatedTiming // First



0.016




r2 = Flatten[Join[Table[0, s[[1]] - 1], 
Table[Table[i, Differences[s][[i]]], i,
    Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First



0.149




r2 == r3 == r4 == r5



True







share|improve this answer











$endgroup$












  • $begingroup$
    Beat me to it - BinCounts is the way... +1
    $endgroup$
    – ciao
    Nov 13 '18 at 6:49






  • 1




    $begingroup$
    Hey @ciao, you are back?!!
    $endgroup$
    – kglr
    Nov 13 '18 at 6:53










  • $begingroup$
    Sorry @Henrik; thanks for the edit.
    $endgroup$
    – kglr
    Nov 13 '18 at 10:23






  • 1




    $begingroup$
    Short and fast. No need sorting.. Excellent!!... +1
    $endgroup$
    – Okkes Dulgerci
    Nov 13 '18 at 15:14










  • $begingroup$
    @kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
    $endgroup$
    – ciao
    Nov 13 '18 at 22:42


















12












$begingroup$

I think this is at least x3 faster than Mr. Carl Woll's answer

Can anybody compare my timing?



r3 = Flatten[Join[Table[0, s[[1]] - 1], 
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming



0.157123, Null




Using MapThread the same code is way faster



r6 = Flatten[
Join[Table[0, s[[1]] - 1],
MapThread[
Table, Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming

r6===r3



0.008387, Null



True







share|improve this answer











$endgroup$








  • 1




    $begingroup$
    These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
    $endgroup$
    – Okkes Dulgerci
    Nov 13 '18 at 3:55











  • $begingroup$
    Hey, thanks for checking. I will use your timing. Thanks for the confirmation
    $endgroup$
    – J42161217
    Nov 13 '18 at 4:01


















6












$begingroup$

s = Sort[RandomInteger[1, 100000, 10000]];


Let us just imagine for the moment that the target list r is supposed to have length 100000 (we can truncate it afterwards). Then for each entry i in the list s, the list r needs to have a jump of height 1 at position i. The jumps are the "derivative" of r (in a discrete sense) and the antiderivative can be obtained with Accumulate. In order to get the list of jumps, we need additive matrix assembly.



This can be done with this function:



mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
If[(Head[rules] === Rule) && (rules[[1]] === ),
rules[[2]],
With[spopt = SystemOptions["SparseArrayOptions"],
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]


So, in total, r can be obtained as follows:



r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First



0.00055




For comparison:



r3 = Flatten[
Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4



0.116



True







share|improve this answer











$endgroup$












    Your Answer





    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "387"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f185874%2fhow-to-construct-a-list-of-lengths-efficiently%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    22












    $begingroup$

    You can use the usual UnitStep + Total tricks:



    r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

    r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

    r1 === r2



    0.435358, Null



    41.4357, Null



    True




    Update



    As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:



    mincounts[s_] := With[

    unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
    counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
    ,
    With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
    counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
    ]
    ]


    Comparison:



    SeedRandom[1];
    s=RandomInteger[1,100000,10000]//Sort;

    (* my first answer *)
    r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
    (* J42161217's answer *)
    r2 = Flatten[
    Join[
    Table[0, s[[1]] - 1],
    Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
    ]
    ][[;;10000]]; // AbsoluteTiming
    (* using Nearest *)
    r3 = mincounts[s]; //AbsoluteTiming

    r1 === r2 === r3



    0.432897, Null



    0.122198, Null



    0.025923, Null



    True







    share|improve this answer











    $endgroup$












    • $begingroup$
      Ah, great answer, as usual.
      $endgroup$
      – AccidentalFourierTransform
      Nov 13 '18 at 1:19










    • $begingroup$
      Can you please check my answer because my laptop is very slow
      $endgroup$
      – J42161217
      Nov 13 '18 at 3:05















    22












    $begingroup$

    You can use the usual UnitStep + Total tricks:



    r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

    r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

    r1 === r2



    0.435358, Null



    41.4357, Null



    True




    Update



    As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:



    mincounts[s_] := With[

    unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
    counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
    ,
    With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
    counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
    ]
    ]


    Comparison:



    SeedRandom[1];
    s=RandomInteger[1,100000,10000]//Sort;

    (* my first answer *)
    r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
    (* J42161217's answer *)
    r2 = Flatten[
    Join[
    Table[0, s[[1]] - 1],
    Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
    ]
    ][[;;10000]]; // AbsoluteTiming
    (* using Nearest *)
    r3 = mincounts[s]; //AbsoluteTiming

    r1 === r2 === r3



    0.432897, Null



    0.122198, Null



    0.025923, Null



    True







    share|improve this answer











    $endgroup$












    • $begingroup$
      Ah, great answer, as usual.
      $endgroup$
      – AccidentalFourierTransform
      Nov 13 '18 at 1:19










    • $begingroup$
      Can you please check my answer because my laptop is very slow
      $endgroup$
      – J42161217
      Nov 13 '18 at 3:05













    22












    22








    22





    $begingroup$

    You can use the usual UnitStep + Total tricks:



    r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

    r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

    r1 === r2



    0.435358, Null



    41.4357, Null



    True




    Update



    As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:



    mincounts[s_] := With[

    unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
    counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
    ,
    With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
    counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
    ]
    ]


    Comparison:



    SeedRandom[1];
    s=RandomInteger[1,100000,10000]//Sort;

    (* my first answer *)
    r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
    (* J42161217's answer *)
    r2 = Flatten[
    Join[
    Table[0, s[[1]] - 1],
    Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
    ]
    ][[;;10000]]; // AbsoluteTiming
    (* using Nearest *)
    r3 = mincounts[s]; //AbsoluteTiming

    r1 === r2 === r3



    0.432897, Null



    0.122198, Null



    0.025923, Null



    True







    share|improve this answer











    $endgroup$



    You can use the usual UnitStep + Total tricks:



    r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

    r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

    r1 === r2



    0.435358, Null



    41.4357, Null



    True




    Update



    As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:



    mincounts[s_] := With[

    unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
    counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
    ,
    With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
    counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
    ]
    ]


    Comparison:



    SeedRandom[1];
    s=RandomInteger[1,100000,10000]//Sort;

    (* my first answer *)
    r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
    (* J42161217's answer *)
    r2 = Flatten[
    Join[
    Table[0, s[[1]] - 1],
    Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
    ]
    ][[;;10000]]; // AbsoluteTiming
    (* using Nearest *)
    r3 = mincounts[s]; //AbsoluteTiming

    r1 === r2 === r3



    0.432897, Null



    0.122198, Null



    0.025923, Null



    True








    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 13 '18 at 4:11

























    answered Nov 13 '18 at 1:11









    Carl WollCarl Woll

    70.5k394184




    70.5k394184











    • $begingroup$
      Ah, great answer, as usual.
      $endgroup$
      – AccidentalFourierTransform
      Nov 13 '18 at 1:19










    • $begingroup$
      Can you please check my answer because my laptop is very slow
      $endgroup$
      – J42161217
      Nov 13 '18 at 3:05
















    • $begingroup$
      Ah, great answer, as usual.
      $endgroup$
      – AccidentalFourierTransform
      Nov 13 '18 at 1:19










    • $begingroup$
      Can you please check my answer because my laptop is very slow
      $endgroup$
      – J42161217
      Nov 13 '18 at 3:05















    $begingroup$
    Ah, great answer, as usual.
    $endgroup$
    – AccidentalFourierTransform
    Nov 13 '18 at 1:19




    $begingroup$
    Ah, great answer, as usual.
    $endgroup$
    – AccidentalFourierTransform
    Nov 13 '18 at 1:19












    $begingroup$
    Can you please check my answer because my laptop is very slow
    $endgroup$
    – J42161217
    Nov 13 '18 at 3:05




    $begingroup$
    Can you please check my answer because my laptop is very slow
    $endgroup$
    – J42161217
    Nov 13 '18 at 3:05











    15












    $begingroup$

    BinCounts and Accumulate combination is faster than all the methods posted so far:



    r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First 



    0.00069




    versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:



    r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
    1 ;; Length[s]]]
    ]; // RepeatedTiming // First



     0.00081




    r3 = mincounts[s]; // RepeatedTiming // First



    0.016




    r2 = Flatten[Join[Table[0, s[[1]] - 1], 
    Table[Table[i, Differences[s][[i]]], i,
        Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
    RepeatedTiming // First



    0.149




    r2 == r3 == r4 == r5



    True







    share|improve this answer











    $endgroup$












    • $begingroup$
      Beat me to it - BinCounts is the way... +1
      $endgroup$
      – ciao
      Nov 13 '18 at 6:49






    • 1




      $begingroup$
      Hey @ciao, you are back?!!
      $endgroup$
      – kglr
      Nov 13 '18 at 6:53










    • $begingroup$
      Sorry @Henrik; thanks for the edit.
      $endgroup$
      – kglr
      Nov 13 '18 at 10:23






    • 1




      $begingroup$
      Short and fast. No need sorting.. Excellent!!... +1
      $endgroup$
      – Okkes Dulgerci
      Nov 13 '18 at 15:14










    • $begingroup$
      @kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
      $endgroup$
      – ciao
      Nov 13 '18 at 22:42















    15












    $begingroup$

    BinCounts and Accumulate combination is faster than all the methods posted so far:



    r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First 



    0.00069




    versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:



    r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
    1 ;; Length[s]]]
    ]; // RepeatedTiming // First



     0.00081




    r3 = mincounts[s]; // RepeatedTiming // First



    0.016




    r2 = Flatten[Join[Table[0, s[[1]] - 1], 
    Table[Table[i, Differences[s][[i]]], i,
        Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
    RepeatedTiming // First



    0.149




    r2 == r3 == r4 == r5



    True







    share|improve this answer











    $endgroup$












    • $begingroup$
      Beat me to it - BinCounts is the way... +1
      $endgroup$
      – ciao
      Nov 13 '18 at 6:49






    • 1




      $begingroup$
      Hey @ciao, you are back?!!
      $endgroup$
      – kglr
      Nov 13 '18 at 6:53










    • $begingroup$
      Sorry @Henrik; thanks for the edit.
      $endgroup$
      – kglr
      Nov 13 '18 at 10:23






    • 1




      $begingroup$
      Short and fast. No need sorting.. Excellent!!... +1
      $endgroup$
      – Okkes Dulgerci
      Nov 13 '18 at 15:14










    • $begingroup$
      @kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
      $endgroup$
      – ciao
      Nov 13 '18 at 22:42













    15












    15








    15





    $begingroup$

    BinCounts and Accumulate combination is faster than all the methods posted so far:



    r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First 



    0.00069




    versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:



    r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
    1 ;; Length[s]]]
    ]; // RepeatedTiming // First



     0.00081




    r3 = mincounts[s]; // RepeatedTiming // First



    0.016




    r2 = Flatten[Join[Table[0, s[[1]] - 1], 
    Table[Table[i, Differences[s][[i]]], i,
        Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
    RepeatedTiming // First



    0.149




    r2 == r3 == r4 == r5



    True







    share|improve this answer











    $endgroup$



    BinCounts and Accumulate combination is faster than all the methods posted so far:



    r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First 



    0.00069




    versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:



    r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
    1 ;; Length[s]]]
    ]; // RepeatedTiming // First



     0.00081




    r3 = mincounts[s]; // RepeatedTiming // First



    0.016




    r2 = Flatten[Join[Table[0, s[[1]] - 1], 
    Table[Table[i, Differences[s][[i]]], i,
        Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
    RepeatedTiming // First



    0.149




    r2 == r3 == r4 == r5



    True








    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 13 '18 at 10:10









    Henrik Schumacher

    56.7k577157




    56.7k577157










    answered Nov 13 '18 at 6:31









    kglrkglr

    189k10205423




    189k10205423











    • $begingroup$
      Beat me to it - BinCounts is the way... +1
      $endgroup$
      – ciao
      Nov 13 '18 at 6:49






    • 1




      $begingroup$
      Hey @ciao, you are back?!!
      $endgroup$
      – kglr
      Nov 13 '18 at 6:53










    • $begingroup$
      Sorry @Henrik; thanks for the edit.
      $endgroup$
      – kglr
      Nov 13 '18 at 10:23






    • 1




      $begingroup$
      Short and fast. No need sorting.. Excellent!!... +1
      $endgroup$
      – Okkes Dulgerci
      Nov 13 '18 at 15:14










    • $begingroup$
      @kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
      $endgroup$
      – ciao
      Nov 13 '18 at 22:42
















    • $begingroup$
      Beat me to it - BinCounts is the way... +1
      $endgroup$
      – ciao
      Nov 13 '18 at 6:49






    • 1




      $begingroup$
      Hey @ciao, you are back?!!
      $endgroup$
      – kglr
      Nov 13 '18 at 6:53










    • $begingroup$
      Sorry @Henrik; thanks for the edit.
      $endgroup$
      – kglr
      Nov 13 '18 at 10:23






    • 1




      $begingroup$
      Short and fast. No need sorting.. Excellent!!... +1
      $endgroup$
      – Okkes Dulgerci
      Nov 13 '18 at 15:14










    • $begingroup$
      @kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
      $endgroup$
      – ciao
      Nov 13 '18 at 22:42















    $begingroup$
    Beat me to it - BinCounts is the way... +1
    $endgroup$
    – ciao
    Nov 13 '18 at 6:49




    $begingroup$
    Beat me to it - BinCounts is the way... +1
    $endgroup$
    – ciao
    Nov 13 '18 at 6:49




    1




    1




    $begingroup$
    Hey @ciao, you are back?!!
    $endgroup$
    – kglr
    Nov 13 '18 at 6:53




    $begingroup$
    Hey @ciao, you are back?!!
    $endgroup$
    – kglr
    Nov 13 '18 at 6:53












    $begingroup$
    Sorry @Henrik; thanks for the edit.
    $endgroup$
    – kglr
    Nov 13 '18 at 10:23




    $begingroup$
    Sorry @Henrik; thanks for the edit.
    $endgroup$
    – kglr
    Nov 13 '18 at 10:23




    1




    1




    $begingroup$
    Short and fast. No need sorting.. Excellent!!... +1
    $endgroup$
    – Okkes Dulgerci
    Nov 13 '18 at 15:14




    $begingroup$
    Short and fast. No need sorting.. Excellent!!... +1
    $endgroup$
    – Okkes Dulgerci
    Nov 13 '18 at 15:14












    $begingroup$
    @kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
    $endgroup$
    – ciao
    Nov 13 '18 at 22:42




    $begingroup$
    @kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
    $endgroup$
    – ciao
    Nov 13 '18 at 22:42











    12












    $begingroup$

    I think this is at least x3 faster than Mr. Carl Woll's answer

    Can anybody compare my timing?



    r3 = Flatten[Join[Table[0, s[[1]] - 1], 
    Table[Table[i, Differences[s][[i]]], i,
    Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming



    0.157123, Null




    Using MapThread the same code is way faster



    r6 = Flatten[
    Join[Table[0, s[[1]] - 1],
    MapThread[
    Table, Range[t = Length[Select[s, # <= 10000 &]]],
    Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming

    r6===r3



    0.008387, Null



    True







    share|improve this answer











    $endgroup$








    • 1




      $begingroup$
      These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
      $endgroup$
      – Okkes Dulgerci
      Nov 13 '18 at 3:55











    • $begingroup$
      Hey, thanks for checking. I will use your timing. Thanks for the confirmation
      $endgroup$
      – J42161217
      Nov 13 '18 at 4:01















    12












    $begingroup$

    I think this is at least x3 faster than Mr. Carl Woll's answer

    Can anybody compare my timing?



    r3 = Flatten[Join[Table[0, s[[1]] - 1], 
    Table[Table[i, Differences[s][[i]]], i,
    Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming



    0.157123, Null




    Using MapThread the same code is way faster



    r6 = Flatten[
    Join[Table[0, s[[1]] - 1],
    MapThread[
    Table, Range[t = Length[Select[s, # <= 10000 &]]],
    Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming

    r6===r3



    0.008387, Null



    True







    share|improve this answer











    $endgroup$








    • 1




      $begingroup$
      These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
      $endgroup$
      – Okkes Dulgerci
      Nov 13 '18 at 3:55











    • $begingroup$
      Hey, thanks for checking. I will use your timing. Thanks for the confirmation
      $endgroup$
      – J42161217
      Nov 13 '18 at 4:01













    12












    12








    12





    $begingroup$

    I think this is at least x3 faster than Mr. Carl Woll's answer

    Can anybody compare my timing?



    r3 = Flatten[Join[Table[0, s[[1]] - 1], 
    Table[Table[i, Differences[s][[i]]], i,
    Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming



    0.157123, Null




    Using MapThread the same code is way faster



    r6 = Flatten[
    Join[Table[0, s[[1]] - 1],
    MapThread[
    Table, Range[t = Length[Select[s, # <= 10000 &]]],
    Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming

    r6===r3



    0.008387, Null



    True







    share|improve this answer











    $endgroup$



    I think this is at least x3 faster than Mr. Carl Woll's answer

    Can anybody compare my timing?



    r3 = Flatten[Join[Table[0, s[[1]] - 1], 
    Table[Table[i, Differences[s][[i]]], i,
    Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming



    0.157123, Null




    Using MapThread the same code is way faster



    r6 = Flatten[
    Join[Table[0, s[[1]] - 1],
    MapThread[
    Table, Range[t = Length[Select[s, # <= 10000 &]]],
    Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming

    r6===r3



    0.008387, Null



    True








    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 13 '18 at 12:57

























    answered Nov 13 '18 at 3:04









    J42161217J42161217

    3,968323




    3,968323







    • 1




      $begingroup$
      These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
      $endgroup$
      – Okkes Dulgerci
      Nov 13 '18 at 3:55











    • $begingroup$
      Hey, thanks for checking. I will use your timing. Thanks for the confirmation
      $endgroup$
      – J42161217
      Nov 13 '18 at 4:01












    • 1




      $begingroup$
      These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
      $endgroup$
      – Okkes Dulgerci
      Nov 13 '18 at 3:55











    • $begingroup$
      Hey, thanks for checking. I will use your timing. Thanks for the confirmation
      $endgroup$
      – J42161217
      Nov 13 '18 at 4:01







    1




    1




    $begingroup$
    These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
    $endgroup$
    – Okkes Dulgerci
    Nov 13 '18 at 3:55





    $begingroup$
    These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
    $endgroup$
    – Okkes Dulgerci
    Nov 13 '18 at 3:55













    $begingroup$
    Hey, thanks for checking. I will use your timing. Thanks for the confirmation
    $endgroup$
    – J42161217
    Nov 13 '18 at 4:01




    $begingroup$
    Hey, thanks for checking. I will use your timing. Thanks for the confirmation
    $endgroup$
    – J42161217
    Nov 13 '18 at 4:01











    6












    $begingroup$

    s = Sort[RandomInteger[1, 100000, 10000]];


    Let us just imagine for the moment that the target list r is supposed to have length 100000 (we can truncate it afterwards). Then for each entry i in the list s, the list r needs to have a jump of height 1 at position i. The jumps are the "derivative" of r (in a discrete sense) and the antiderivative can be obtained with Accumulate. In order to get the list of jumps, we need additive matrix assembly.



    This can be done with this function:



    mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
    If[(Head[rules] === Rule) && (rules[[1]] === ),
    rules[[2]],
    With[spopt = SystemOptions["SparseArrayOptions"],
    Internal`WithLocalSettings[
    SetSystemOptions[
    "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
    SparseArray[rules, dims, background],
    SetSystemOptions[spopt]]
    ]
    ]


    So, in total, r can be obtained as follows:



    r4 = Accumulate[
    mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
    ]; // RepeatedTiming // First



    0.00055




    For comparison:



    r3 = Flatten[
    Join[Table[0, s[[1]] - 1],
    Table[Table[i, Differences[s][[i]]], i,
    Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
    RepeatedTiming // First
    r3 == r4



    0.116



    True







    share|improve this answer











    $endgroup$

















      6












      $begingroup$

      s = Sort[RandomInteger[1, 100000, 10000]];


      Let us just imagine for the moment that the target list r is supposed to have length 100000 (we can truncate it afterwards). Then for each entry i in the list s, the list r needs to have a jump of height 1 at position i. The jumps are the "derivative" of r (in a discrete sense) and the antiderivative can be obtained with Accumulate. In order to get the list of jumps, we need additive matrix assembly.



      This can be done with this function:



      mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
      If[(Head[rules] === Rule) && (rules[[1]] === ),
      rules[[2]],
      With[spopt = SystemOptions["SparseArrayOptions"],
      Internal`WithLocalSettings[
      SetSystemOptions[
      "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
      SparseArray[rules, dims, background],
      SetSystemOptions[spopt]]
      ]
      ]


      So, in total, r can be obtained as follows:



      r4 = Accumulate[
      mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
      ]; // RepeatedTiming // First



      0.00055




      For comparison:



      r3 = Flatten[
      Join[Table[0, s[[1]] - 1],
      Table[Table[i, Differences[s][[i]]], i,
      Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
      RepeatedTiming // First
      r3 == r4



      0.116



      True







      share|improve this answer











      $endgroup$















        6












        6








        6





        $begingroup$

        s = Sort[RandomInteger[1, 100000, 10000]];


        Let us just imagine for the moment that the target list r is supposed to have length 100000 (we can truncate it afterwards). Then for each entry i in the list s, the list r needs to have a jump of height 1 at position i. The jumps are the "derivative" of r (in a discrete sense) and the antiderivative can be obtained with Accumulate. In order to get the list of jumps, we need additive matrix assembly.



        This can be done with this function:



        mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
        If[(Head[rules] === Rule) && (rules[[1]] === ),
        rules[[2]],
        With[spopt = SystemOptions["SparseArrayOptions"],
        Internal`WithLocalSettings[
        SetSystemOptions[
        "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
        SparseArray[rules, dims, background],
        SetSystemOptions[spopt]]
        ]
        ]


        So, in total, r can be obtained as follows:



        r4 = Accumulate[
        mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
        ]; // RepeatedTiming // First



        0.00055




        For comparison:



        r3 = Flatten[
        Join[Table[0, s[[1]] - 1],
        Table[Table[i, Differences[s][[i]]], i,
        Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
        RepeatedTiming // First
        r3 == r4



        0.116



        True







        share|improve this answer











        $endgroup$



        s = Sort[RandomInteger[1, 100000, 10000]];


        Let us just imagine for the moment that the target list r is supposed to have length 100000 (we can truncate it afterwards). Then for each entry i in the list s, the list r needs to have a jump of height 1 at position i. The jumps are the "derivative" of r (in a discrete sense) and the antiderivative can be obtained with Accumulate. In order to get the list of jumps, we need additive matrix assembly.



        This can be done with this function:



        mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
        If[(Head[rules] === Rule) && (rules[[1]] === ),
        rules[[2]],
        With[spopt = SystemOptions["SparseArrayOptions"],
        Internal`WithLocalSettings[
        SetSystemOptions[
        "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
        SparseArray[rules, dims, background],
        SetSystemOptions[spopt]]
        ]
        ]


        So, in total, r can be obtained as follows:



        r4 = Accumulate[
        mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
        ]; // RepeatedTiming // First



        0.00055




        For comparison:



        r3 = Flatten[
        Join[Table[0, s[[1]] - 1],
        Table[Table[i, Differences[s][[i]]], i,
        Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
        RepeatedTiming // First
        r3 == r4



        0.116



        True








        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 13 '18 at 23:11

























        answered Nov 13 '18 at 6:32









        Henrik SchumacherHenrik Schumacher

        56.7k577157




        56.7k577157



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Mathematica Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f185874%2fhow-to-construct-a-list-of-lengths-efficiently%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

            ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế

            ⃀⃉⃄⃅⃍,⃂₼₡₰⃉₡₿₢⃉₣⃄₯⃊₮₼₹₱₦₷⃄₪₼₶₳₫⃍₽ ₫₪₦⃆₠₥⃁₸₴₷⃊₹⃅⃈₰⃁₫ ⃎⃍₩₣₷ ₻₮⃊⃀⃄⃉₯,⃏⃊,₦⃅₪,₼⃀₾₧₷₾ ₻ ₸₡ ₾,₭⃈₴⃋,€⃁,₩ ₺⃌⃍⃁₱⃋⃋₨⃊⃁⃃₼,⃎,₱⃍₲₶₡ ⃍⃅₶₨₭,⃉₭₾₡₻⃀ ₼₹⃅₹,₻₭ ⃌