Python Remove duplicates from csv if value in column duplicated










4















I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:



['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]


I need that the last line will be deleted because it has the same name as the first one.



What i wrote is:



file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
wtr2= csv.writer( result2 )
for r in reader2:
wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
for j in range(len(sortedlist2)-1):
if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
if(sortedlist2[i][1]>sortedlist2[j+1][1]):
del sortedlist2[i][0-2]
else:
del sortedlist2[j+1][0-2]


Thanks.










share|improve this question
























  • You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.

    – user2853437
    Nov 11 '18 at 12:35















4















I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:



['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]


I need that the last line will be deleted because it has the same name as the first one.



What i wrote is:



file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
wtr2= csv.writer( result2 )
for r in reader2:
wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
for j in range(len(sortedlist2)-1):
if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
if(sortedlist2[i][1]>sortedlist2[j+1][1]):
del sortedlist2[i][0-2]
else:
del sortedlist2[j+1][0-2]


Thanks.










share|improve this question
























  • You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.

    – user2853437
    Nov 11 '18 at 12:35













4












4








4








I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:



['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]


I need that the last line will be deleted because it has the same name as the first one.



What i wrote is:



file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
wtr2= csv.writer( result2 )
for r in reader2:
wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
for j in range(len(sortedlist2)-1):
if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
if(sortedlist2[i][1]>sortedlist2[j+1][1]):
del sortedlist2[i][0-2]
else:
del sortedlist2[j+1][0-2]


Thanks.










share|improve this question
















I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:



['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]


I need that the last line will be deleted because it has the same name as the first one.



What i wrote is:



file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
wtr2= csv.writer( result2 )
for r in reader2:
wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
for j in range(len(sortedlist2)-1):
if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
if(sortedlist2[i][1]>sortedlist2[j+1][1]):
del sortedlist2[i][0-2]
else:
del sortedlist2[j+1][0-2]


Thanks.







python csv parsing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 '18 at 16:25









Ajax1234

41.1k42853




41.1k42853










asked Nov 11 '18 at 12:05









RosenRosen

211




211












  • You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.

    – user2853437
    Nov 11 '18 at 12:35

















  • You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.

    – user2853437
    Nov 11 '18 at 12:35
















You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.

– user2853437
Nov 11 '18 at 12:35





You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.

– user2853437
Nov 11 '18 at 12:35












2 Answers
2






active

oldest

votes


















0














If you want to use csv module, a dict is probably the easiest bet:



>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']


The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:



res = 
for a,b,c in csv.reader(open('bla')):
if a not in res:
res[a]=(b,c)


then you have a "clean" dict and no need for two iterations like the one liner.






share|improve this answer
































    0














    Try with pandas:



    import pandas as pd
    df = pd.read_csv('path/name_file.csv')
    df = df.drop_duplicates([0]) #0 this is columns which will compare.
    df.to_csv('New_file.csv') #save to csv


    This method delete all duplicates from columns 1.



    If you need simple delete you can use method drop.



    #You file after use pandas (print(df)):
    0 1 2
    0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
    1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
    2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236


    For example you need delete 2 row.



    df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns. 


    Output:



     0 1 2
    0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
    1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236





    share|improve this answer

























    • Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second

      – Rosen
      Nov 11 '18 at 13:30











    • Mmm... so be it)

      – Rudolf Morkovskyi
      Nov 11 '18 at 14:45











    • Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.

      – Rudolf Morkovskyi
      Nov 11 '18 at 14:47











    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53248550%2fpython-remove-duplicates-from-csv-if-value-in-column-duplicated%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    If you want to use csv module, a dict is probably the easiest bet:



    >>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
    'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']


    The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:



    res = 
    for a,b,c in csv.reader(open('bla')):
    if a not in res:
    res[a]=(b,c)


    then you have a "clean" dict and no need for two iterations like the one liner.






    share|improve this answer





























      0














      If you want to use csv module, a dict is probably the easiest bet:



      >>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
      'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']


      The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:



      res = 
      for a,b,c in csv.reader(open('bla')):
      if a not in res:
      res[a]=(b,c)


      then you have a "clean" dict and no need for two iterations like the one liner.






      share|improve this answer



























        0












        0








        0







        If you want to use csv module, a dict is probably the easiest bet:



        >>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
        'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']


        The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:



        res = 
        for a,b,c in csv.reader(open('bla')):
        if a not in res:
        res[a]=(b,c)


        then you have a "clean" dict and no need for two iterations like the one liner.






        share|improve this answer















        If you want to use csv module, a dict is probably the easiest bet:



        >>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
        'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']


        The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:



        res = 
        for a,b,c in csv.reader(open('bla')):
        if a not in res:
        res[a]=(b,c)


        then you have a "clean" dict and no need for two iterations like the one liner.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 11 '18 at 12:29

























        answered Nov 11 '18 at 12:24









        kabanuskabanus

        11.4k31339




        11.4k31339























            0














            Try with pandas:



            import pandas as pd
            df = pd.read_csv('path/name_file.csv')
            df = df.drop_duplicates([0]) #0 this is columns which will compare.
            df.to_csv('New_file.csv') #save to csv


            This method delete all duplicates from columns 1.



            If you need simple delete you can use method drop.



            #You file after use pandas (print(df)):
            0 1 2
            0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
            1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
            2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236


            For example you need delete 2 row.



            df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns. 


            Output:



             0 1 2
            0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
            1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236





            share|improve this answer

























            • Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second

              – Rosen
              Nov 11 '18 at 13:30











            • Mmm... so be it)

              – Rudolf Morkovskyi
              Nov 11 '18 at 14:45











            • Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.

              – Rudolf Morkovskyi
              Nov 11 '18 at 14:47
















            0














            Try with pandas:



            import pandas as pd
            df = pd.read_csv('path/name_file.csv')
            df = df.drop_duplicates([0]) #0 this is columns which will compare.
            df.to_csv('New_file.csv') #save to csv


            This method delete all duplicates from columns 1.



            If you need simple delete you can use method drop.



            #You file after use pandas (print(df)):
            0 1 2
            0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
            1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
            2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236


            For example you need delete 2 row.



            df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns. 


            Output:



             0 1 2
            0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
            1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236





            share|improve this answer

























            • Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second

              – Rosen
              Nov 11 '18 at 13:30











            • Mmm... so be it)

              – Rudolf Morkovskyi
              Nov 11 '18 at 14:45











            • Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.

              – Rudolf Morkovskyi
              Nov 11 '18 at 14:47














            0












            0








            0







            Try with pandas:



            import pandas as pd
            df = pd.read_csv('path/name_file.csv')
            df = df.drop_duplicates([0]) #0 this is columns which will compare.
            df.to_csv('New_file.csv') #save to csv


            This method delete all duplicates from columns 1.



            If you need simple delete you can use method drop.



            #You file after use pandas (print(df)):
            0 1 2
            0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
            1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
            2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236


            For example you need delete 2 row.



            df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns. 


            Output:



             0 1 2
            0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
            1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236





            share|improve this answer















            Try with pandas:



            import pandas as pd
            df = pd.read_csv('path/name_file.csv')
            df = df.drop_duplicates([0]) #0 this is columns which will compare.
            df.to_csv('New_file.csv') #save to csv


            This method delete all duplicates from columns 1.



            If you need simple delete you can use method drop.



            #You file after use pandas (print(df)):
            0 1 2
            0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
            1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
            2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236


            For example you need delete 2 row.



            df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns. 


            Output:



             0 1 2
            0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
            1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 11 '18 at 14:47

























            answered Nov 11 '18 at 12:17









            Rudolf MorkovskyiRudolf Morkovskyi

            720117




            720117












            • Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second

              – Rosen
              Nov 11 '18 at 13:30











            • Mmm... so be it)

              – Rudolf Morkovskyi
              Nov 11 '18 at 14:45











            • Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.

              – Rudolf Morkovskyi
              Nov 11 '18 at 14:47


















            • Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second

              – Rosen
              Nov 11 '18 at 13:30











            • Mmm... so be it)

              – Rudolf Morkovskyi
              Nov 11 '18 at 14:45











            • Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.

              – Rudolf Morkovskyi
              Nov 11 '18 at 14:47

















            Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second

            – Rosen
            Nov 11 '18 at 13:30





            Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second

            – Rosen
            Nov 11 '18 at 13:30













            Mmm... so be it)

            – Rudolf Morkovskyi
            Nov 11 '18 at 14:45





            Mmm... so be it)

            – Rudolf Morkovskyi
            Nov 11 '18 at 14:45













            Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.

            – Rudolf Morkovskyi
            Nov 11 '18 at 14:47






            Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.

            – Rudolf Morkovskyi
            Nov 11 '18 at 14:47


















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53248550%2fpython-remove-duplicates-from-csv-if-value-in-column-duplicated%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

            Edmonton

            Crossroads (UK TV series)