Cluster analysis algorithm for identifying line clusters on a map










-3















I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:



enter image description here



Before commiting them to my database, I'd like to automatically identify all point clusters ( most of which look like lines ) and attribute a category to each colored point according to which cluster they belong to.



According to the scikit-learn roadmap I should be using either Meanshift or Gaussian mixture models, but I'd like to know if there is any solution available that will also take into account that nearby points that share similar colors are more likely to belong to the same cluster.



I have access to a GPU so any kind of solution is welcome, even if it's based on deep learning.




I tried @mcdowella 's answer and it worked surprisingly well. I ran it over the higher-dimensional version of these points ( which were generated through T-SNE ) by using the HDBSCAN Robust Single Linkage implementation and it approximated many lines without any parameter tuning.



enter image description here










share|improve this question



















  • 1





    I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?

    – Mitchel Paulin
    Nov 10 '18 at 18:01















-3















I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:



enter image description here



Before commiting them to my database, I'd like to automatically identify all point clusters ( most of which look like lines ) and attribute a category to each colored point according to which cluster they belong to.



According to the scikit-learn roadmap I should be using either Meanshift or Gaussian mixture models, but I'd like to know if there is any solution available that will also take into account that nearby points that share similar colors are more likely to belong to the same cluster.



I have access to a GPU so any kind of solution is welcome, even if it's based on deep learning.




I tried @mcdowella 's answer and it worked surprisingly well. I ran it over the higher-dimensional version of these points ( which were generated through T-SNE ) by using the HDBSCAN Robust Single Linkage implementation and it approximated many lines without any parameter tuning.



enter image description here










share|improve this question



















  • 1





    I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?

    – Mitchel Paulin
    Nov 10 '18 at 18:01













-3












-3








-3








I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:



enter image description here



Before commiting them to my database, I'd like to automatically identify all point clusters ( most of which look like lines ) and attribute a category to each colored point according to which cluster they belong to.



According to the scikit-learn roadmap I should be using either Meanshift or Gaussian mixture models, but I'd like to know if there is any solution available that will also take into account that nearby points that share similar colors are more likely to belong to the same cluster.



I have access to a GPU so any kind of solution is welcome, even if it's based on deep learning.




I tried @mcdowella 's answer and it worked surprisingly well. I ran it over the higher-dimensional version of these points ( which were generated through T-SNE ) by using the HDBSCAN Robust Single Linkage implementation and it approximated many lines without any parameter tuning.



enter image description here










share|improve this question
















I have a reasonably large set of (r,g,b)-colored data points with (x,y)-coordinates that looks like this:



enter image description here



Before commiting them to my database, I'd like to automatically identify all point clusters ( most of which look like lines ) and attribute a category to each colored point according to which cluster they belong to.



According to the scikit-learn roadmap I should be using either Meanshift or Gaussian mixture models, but I'd like to know if there is any solution available that will also take into account that nearby points that share similar colors are more likely to belong to the same cluster.



I have access to a GPU so any kind of solution is welcome, even if it's based on deep learning.




I tried @mcdowella 's answer and it worked surprisingly well. I ran it over the higher-dimensional version of these points ( which were generated through T-SNE ) by using the HDBSCAN Robust Single Linkage implementation and it approximated many lines without any parameter tuning.



enter image description here







python algorithm machine-learning scikit-learn deep-learning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 '18 at 20:30







Ruan

















asked Nov 10 '18 at 17:56









RuanRuan

35618




35618







  • 1





    I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?

    – Mitchel Paulin
    Nov 10 '18 at 18:01












  • 1





    I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?

    – Mitchel Paulin
    Nov 10 '18 at 18:01







1




1





I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?

– Mitchel Paulin
Nov 10 '18 at 18:01





I dont think this is the right place to ask these kinds of questions. Maybe the statistics stack exchange would be more appropriate?

– Mitchel Paulin
Nov 10 '18 at 18:01












1 Answer
1






active

oldest

votes


















1














I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).






share|improve this answer






















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241844%2fcluster-analysis-algorithm-for-identifying-line-clusters-on-a-map%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).






    share|improve this answer



























      1














      I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).






      share|improve this answer

























        1












        1








        1







        I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).






        share|improve this answer













        I would try https://en.wikipedia.org/wiki/Single-linkage_clustering - it has a tendency to follow lines that is sometimes even a disadvantage for people who want nice compact rounded clusters and get straggling spaghetti (nice picture on P7 of https://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf).







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 10 '18 at 18:30









        mcdowellamcdowella

        17.5k21220




        17.5k21220



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241844%2fcluster-analysis-algorithm-for-identifying-line-clusters-on-a-map%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

            ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế

            ⃀⃉⃄⃅⃍,⃂₼₡₰⃉₡₿₢⃉₣⃄₯⃊₮₼₹₱₦₷⃄₪₼₶₳₫⃍₽ ₫₪₦⃆₠₥⃁₸₴₷⃊₹⃅⃈₰⃁₫ ⃎⃍₩₣₷ ₻₮⃊⃀⃄⃉₯,⃏⃊,₦⃅₪,₼⃀₾₧₷₾ ₻ ₸₡ ₾,₭⃈₴⃋,€⃁,₩ ₺⃌⃍⃁₱⃋⃋₨⃊⃁⃃₼,⃎,₱⃍₲₶₡ ⃍⃅₶₨₭,⃉₭₾₡₻⃀ ₼₹⃅₹,₻₭ ⃌