Python Remove duplicates from csv if value in column duplicated

I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:

['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]

I need that the last line will be deleted because it has the same name as the first one.

What i wrote is:

file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
 wtr2= csv.writer( result2 )
 for r in reader2:
 wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
 for j in range(len(sortedlist2)-1):
 if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
 if(sortedlist2[i][1]>sortedlist2[j+1][1]):
 del sortedlist2[i][0-2]
 else:
 del sortedlist2[j+1][0-2]

Thanks.

edited Nov 11 '18 at 16:25

Ajax1234

41.1k42853

asked Nov 11 '18 at 12:05

Rosen

211

You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.

– user2853437
Nov 11 '18 at 12:35

add a comment |

I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:

['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]

I need that the last line will be deleted because it has the same name as the first one.

What i wrote is:

file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
 wtr2= csv.writer( result2 )
 for r in reader2:
 wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
 for j in range(len(sortedlist2)-1):
 if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
 if(sortedlist2[i][1]>sortedlist2[j+1][1]):
 del sortedlist2[i][0-2]
 else:
 del sortedlist2[j+1][0-2]

Thanks.

edited Nov 11 '18 at 16:25

Ajax1234

41.1k42853

asked Nov 11 '18 at 12:05

Rosen

211

You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.

– user2853437
Nov 11 '18 at 12:35

add a comment |

I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:

['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]

I need that the last line will be deleted because it has the same name as the first one.

What i wrote is:

file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
 wtr2= csv.writer( result2 )
 for r in reader2:
 wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
 for j in range(len(sortedlist2)-1):
 if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
 if(sortedlist2[i][1]>sortedlist2[j+1][1]):
 del sortedlist2[i][0-2]
 else:
 del sortedlist2[j+1][0-2]

Thanks.

edited Nov 11 '18 at 16:25

Ajax1234

41.1k42853

asked Nov 11 '18 at 12:05

Rosen

211

I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:

['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]

I need that the last line will be deleted because it has the same name as the first one.

What i wrote is:

file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
 wtr2= csv.writer( result2 )
 for r in reader2:
 wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
 for j in range(len(sortedlist2)-1):
 if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
 if(sortedlist2[i][1]>sortedlist2[j+1][1]):
 del sortedlist2[i][0-2]
 else:
 del sortedlist2[j+1][0-2]

Thanks.

python csv parsing

edited Nov 11 '18 at 16:25

Ajax1234

41.1k42853

asked Nov 11 '18 at 12:05

Rosen

211

edited Nov 11 '18 at 16:25

Ajax1234

41.1k42853

asked Nov 11 '18 at 12:05

Rosen

211

edited Nov 11 '18 at 16:25

Ajax1234

41.1k42853

edited Nov 11 '18 at 16:25

Ajax1234

41.1k42853

edited Nov 11 '18 at 16:25

Ajax1234

41.1k42853

asked Nov 11 '18 at 12:05

Rosen

211

asked Nov 11 '18 at 12:05

Rosen

211

asked Nov 11 '18 at 12:05

Rosen

211

You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.

– user2853437
Nov 11 '18 at 12:35

add a comment |

You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.

– user2853437
Nov 11 '18 at 12:35

You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.

– user2853437
Nov 11 '18 at 12:35

add a comment |

2 Answers
2

active

oldest

votes

If you want to use csv module, a dict is probably the easiest bet:

>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']

The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:

res = 
for a,b,c in csv.reader(open('bla')):
 if a not in res:
 res[a]=(b,c)

then you have a "clean" dict and no need for two iterations like the one liner.

edited Nov 11 '18 at 12:29

answered Nov 11 '18 at 12:24

kabanus

11.4k31339

add a comment |

Try with pandas:

import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv

This method delete all duplicates from columns 1.

If you need simple delete you can use method drop.

#You file after use pandas (print(df)):
 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236

For example you need delete 2 row.

df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.

Output:

 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236

edited Nov 11 '18 at 14:47

answered Nov 11 '18 at 12:17

Rudolf Morkovskyi

720117

Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second

– Rosen
Nov 11 '18 at 13:30

Mmm... so be it)

– Rudolf Morkovskyi
Nov 11 '18 at 14:45

Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.

– Rudolf Morkovskyi
Nov 11 '18 at 14:47

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53248550%2fpython-remove-duplicates-from-csv-if-value-in-column-duplicated%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

If you want to use csv module, a dict is probably the easiest bet:

>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']

The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:

res = 
for a,b,c in csv.reader(open('bla')):
 if a not in res:
 res[a]=(b,c)

then you have a "clean" dict and no need for two iterations like the one liner.

edited Nov 11 '18 at 12:29

answered Nov 11 '18 at 12:24

kabanus

11.4k31339

add a comment |

If you want to use csv module, a dict is probably the easiest bet:

>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']

The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:

res = 
for a,b,c in csv.reader(open('bla')):
 if a not in res:
 res[a]=(b,c)

then you have a "clean" dict and no need for two iterations like the one liner.

edited Nov 11 '18 at 12:29

answered Nov 11 '18 at 12:24

kabanus

11.4k31339

add a comment |

If you want to use csv module, a dict is probably the easiest bet:

>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']

The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:

res = 
for a,b,c in csv.reader(open('bla')):
 if a not in res:
 res[a]=(b,c)

then you have a "clean" dict and no need for two iterations like the one liner.

edited Nov 11 '18 at 12:29

answered Nov 11 '18 at 12:24

kabanus

11.4k31339

If you want to use csv module, a dict is probably the easiest bet:

>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']

The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:

res = 
for a,b,c in csv.reader(open('bla')):
 if a not in res:
 res[a]=(b,c)

then you have a "clean" dict and no need for two iterations like the one liner.

edited Nov 11 '18 at 12:29

answered Nov 11 '18 at 12:24

kabanus

11.4k31339

edited Nov 11 '18 at 12:29

answered Nov 11 '18 at 12:24

kabanus

11.4k31339

answered Nov 11 '18 at 12:24

kabanus

11.4k31339

answered Nov 11 '18 at 12:24

kabanus

11.4k31339

add a comment |

Try with pandas:

import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv

This method delete all duplicates from columns 1.

If you need simple delete you can use method drop.

#You file after use pandas (print(df)):
 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236

For example you need delete 2 row.

df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.

Output:

 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236

edited Nov 11 '18 at 14:47

answered Nov 11 '18 at 12:17

Rudolf Morkovskyi

720117

Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second

– Rosen
Nov 11 '18 at 13:30

Mmm... so be it)

– Rudolf Morkovskyi
Nov 11 '18 at 14:45

Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.

– Rudolf Morkovskyi
Nov 11 '18 at 14:47

add a comment |

Try with pandas:

import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv

This method delete all duplicates from columns 1.

If you need simple delete you can use method drop.

#You file after use pandas (print(df)):
 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236

For example you need delete 2 row.

df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.

Output:

 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236

edited Nov 11 '18 at 14:47

answered Nov 11 '18 at 12:17

Rudolf Morkovskyi

720117

Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second

– Rosen
Nov 11 '18 at 13:30

Mmm... so be it)

– Rudolf Morkovskyi
Nov 11 '18 at 14:45

Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.

– Rudolf Morkovskyi
Nov 11 '18 at 14:47

add a comment |

Try with pandas:

import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv

This method delete all duplicates from columns 1.

If you need simple delete you can use method drop.

#You file after use pandas (print(df)):
 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236

For example you need delete 2 row.

df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.

Output:

 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236

edited Nov 11 '18 at 14:47

answered Nov 11 '18 at 12:17

Rudolf Morkovskyi

720117

Try with pandas:

import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv

This method delete all duplicates from columns 1.

If you need simple delete you can use method drop.

#You file after use pandas (print(df)):
 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236

For example you need delete 2 row.

df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.

Output:

 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236

edited Nov 11 '18 at 14:47

answered Nov 11 '18 at 12:17

Rudolf Morkovskyi

720117

edited Nov 11 '18 at 14:47

answered Nov 11 '18 at 12:17

Rudolf Morkovskyi

720117

answered Nov 11 '18 at 12:17

Rudolf Morkovskyi

720117

answered Nov 11 '18 at 12:17

Rudolf Morkovskyi

720117

Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second

– Rosen
Nov 11 '18 at 13:30

Mmm... so be it)

– Rudolf Morkovskyi
Nov 11 '18 at 14:45

Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.

– Rudolf Morkovskyi
Nov 11 '18 at 14:47

add a comment |

Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second

– Rosen
Nov 11 '18 at 13:30

Mmm... so be it)

– Rudolf Morkovskyi
Nov 11 '18 at 14:45

Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.

– Rudolf Morkovskyi
Nov 11 '18 at 14:47

Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second

– Rosen
Nov 11 '18 at 13:30

Mmm... so be it)

– Rudolf Morkovskyi
Nov 11 '18 at 14:45

Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.

– Rudolf Morkovskyi
Nov 11 '18 at 14:47

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dfyjkt