Python Remove duplicates from csv if value in column duplicated
I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]
I need that the last line will be deleted because it has the same name as the first one.
What i wrote is:
file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
wtr2= csv.writer( result2 )
for r in reader2:
wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
for j in range(len(sortedlist2)-1):
if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
if(sortedlist2[i][1]>sortedlist2[j+1][1]):
del sortedlist2[i][0-2]
else:
del sortedlist2[j+1][0-2]
Thanks.
python csv parsing
add a comment |
I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]
I need that the last line will be deleted because it has the same name as the first one.
What i wrote is:
file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
wtr2= csv.writer( result2 )
for r in reader2:
wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
for j in range(len(sortedlist2)-1):
if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
if(sortedlist2[i][1]>sortedlist2[j+1][1]):
del sortedlist2[i][0-2]
else:
del sortedlist2[j+1][0-2]
Thanks.
python csv parsing
You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 '18 at 12:35
add a comment |
I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]
I need that the last line will be deleted because it has the same name as the first one.
What i wrote is:
file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
wtr2= csv.writer( result2 )
for r in reader2:
wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
for j in range(len(sortedlist2)-1):
if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
if(sortedlist2[i][1]>sortedlist2[j+1][1]):
del sortedlist2[i][0-2]
else:
del sortedlist2[j+1][0-2]
Thanks.
python csv parsing
I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]
I need that the last line will be deleted because it has the same name as the first one.
What i wrote is:
file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
wtr2= csv.writer( result2 )
for r in reader2:
wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
for j in range(len(sortedlist2)-1):
if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
if(sortedlist2[i][1]>sortedlist2[j+1][1]):
del sortedlist2[i][0-2]
else:
del sortedlist2[j+1][0-2]
Thanks.
python csv parsing
python csv parsing
edited Nov 11 '18 at 16:25
Ajax1234
41.1k42853
41.1k42853
asked Nov 11 '18 at 12:05
RosenRosen
211
211
You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 '18 at 12:35
add a comment |
You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 '18 at 12:35
You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 '18 at 12:35
You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 '18 at 12:35
add a comment |
2 Answers
2
active
oldest
votes
If you want to use csv
module, a dict
is probably the easiest bet:
>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']
The need to reverse ([::-1]
) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:
res =
for a,b,c in csv.reader(open('bla')):
if a not in res:
res[a]=(b,c)
then you have a "clean" dict and no need for two iterations like the one liner.
add a comment |
Try with pandas:
import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv
This method delete all duplicates from columns 1.
If you need simple delete you can use method drop.
#You file after use pandas (print(df)):
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236
For example you need delete 2 row.
df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.
Output:
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 '18 at 13:30
Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 '18 at 14:45
Oh) I think your need change 1 to 0 indf.drop_duplicates([0])
. I fixed it. Try again.
– Rudolf Morkovskyi
Nov 11 '18 at 14:47
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53248550%2fpython-remove-duplicates-from-csv-if-value-in-column-duplicated%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you want to use csv
module, a dict
is probably the easiest bet:
>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']
The need to reverse ([::-1]
) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:
res =
for a,b,c in csv.reader(open('bla')):
if a not in res:
res[a]=(b,c)
then you have a "clean" dict and no need for two iterations like the one liner.
add a comment |
If you want to use csv
module, a dict
is probably the easiest bet:
>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']
The need to reverse ([::-1]
) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:
res =
for a,b,c in csv.reader(open('bla')):
if a not in res:
res[a]=(b,c)
then you have a "clean" dict and no need for two iterations like the one liner.
add a comment |
If you want to use csv
module, a dict
is probably the easiest bet:
>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']
The need to reverse ([::-1]
) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:
res =
for a,b,c in csv.reader(open('bla')):
if a not in res:
res[a]=(b,c)
then you have a "clean" dict and no need for two iterations like the one liner.
If you want to use csv
module, a dict
is probably the easiest bet:
>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']
The need to reverse ([::-1]
) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:
res =
for a,b,c in csv.reader(open('bla')):
if a not in res:
res[a]=(b,c)
then you have a "clean" dict and no need for two iterations like the one liner.
edited Nov 11 '18 at 12:29
answered Nov 11 '18 at 12:24
kabanuskabanus
11.4k31339
11.4k31339
add a comment |
add a comment |
Try with pandas:
import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv
This method delete all duplicates from columns 1.
If you need simple delete you can use method drop.
#You file after use pandas (print(df)):
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236
For example you need delete 2 row.
df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.
Output:
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 '18 at 13:30
Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 '18 at 14:45
Oh) I think your need change 1 to 0 indf.drop_duplicates([0])
. I fixed it. Try again.
– Rudolf Morkovskyi
Nov 11 '18 at 14:47
add a comment |
Try with pandas:
import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv
This method delete all duplicates from columns 1.
If you need simple delete you can use method drop.
#You file after use pandas (print(df)):
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236
For example you need delete 2 row.
df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.
Output:
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 '18 at 13:30
Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 '18 at 14:45
Oh) I think your need change 1 to 0 indf.drop_duplicates([0])
. I fixed it. Try again.
– Rudolf Morkovskyi
Nov 11 '18 at 14:47
add a comment |
Try with pandas:
import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv
This method delete all duplicates from columns 1.
If you need simple delete you can use method drop.
#You file after use pandas (print(df)):
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236
For example you need delete 2 row.
df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.
Output:
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
Try with pandas:
import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv
This method delete all duplicates from columns 1.
If you need simple delete you can use method drop.
#You file after use pandas (print(df)):
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236
For example you need delete 2 row.
df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.
Output:
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
edited Nov 11 '18 at 14:47
answered Nov 11 '18 at 12:17
Rudolf MorkovskyiRudolf Morkovskyi
720117
720117
Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 '18 at 13:30
Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 '18 at 14:45
Oh) I think your need change 1 to 0 indf.drop_duplicates([0])
. I fixed it. Try again.
– Rudolf Morkovskyi
Nov 11 '18 at 14:47
add a comment |
Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 '18 at 13:30
Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 '18 at 14:45
Oh) I think your need change 1 to 0 indf.drop_duplicates([0])
. I fixed it. Try again.
– Rudolf Morkovskyi
Nov 11 '18 at 14:47
Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 '18 at 13:30
Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 '18 at 13:30
Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 '18 at 14:45
Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 '18 at 14:45
Oh) I think your need change 1 to 0 in
df.drop_duplicates([0])
. I fixed it. Try again.– Rudolf Morkovskyi
Nov 11 '18 at 14:47
Oh) I think your need change 1 to 0 in
df.drop_duplicates([0])
. I fixed it. Try again.– Rudolf Morkovskyi
Nov 11 '18 at 14:47
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53248550%2fpython-remove-duplicates-from-csv-if-value-in-column-duplicated%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 '18 at 12:35