Iterating through tuples in a list

Iterating through tuples in a list



Let's say I have a list made up of tuples:


stList = [('NJ', 'Burlington County', '12/21/2017'),
('NJ', 'Burlington County', '12/21/2017'),
('NJ', 'Burlington County', '12/21/2017'),
('VA', 'Frederick County', '2/13/2018'),
('MD', 'Montgomery County', '8/7/2017'),
('NJ', 'Burlington County', '12/21/2017'),
('NC', 'Lee County', '1/14/2018'),
('NC', 'Alamance County', '11/28/2017'),]



I want to iterate through each item(tuple) and if it already exists, remove it from stList.


stList


for item in stList:
if item in stList:
stList.remove(item)



This doesn't exactly work. Basically, when I run this, if any item in the tuple is also in the list, it removes that item, so I get this:


[('NJ', 'Burlington County', '12/21/2017'),
('VA', 'Frederick County', '2/13/2018'),
('NJ', 'Burlington County', '12/21/2017'),
('NC', 'Alamance County', '11/28/2017')]



What is a better way to approach this?




1 Answer
1



Tuples with all entries matching will be considered equal.


>>> ('NJ', 'Burlington County', '12/21/2017') == ('NJ', 'Burlington County', '12/21/2017')
>>> True

>>> ('NJ', 'Burlington County', '12/21/2017') == ('NJ', 'Burlington County', '1/21/2017')
>>> False



This can give unexpected behavior unless you are aware of how the removal is done and you are doing it properly. That is a different story.



Here are a few options.


seen = set()
result =
for item in stList:
# Tuple can be compared directly to other tupled in `seen`.
if item not in seen:
seen.add(item)
result.append(item)

stList = result



Another possibility is


seen = set()
# Use a list to preserve ordering. Change to set if that does not matter.
first_seen =
for i, item in enumerate(stList):
if item not in seen:
seen.add(item)
first_seen.append(i)

stList = [stList[i] for i in first_seen]



Edit
On second thought the second option is not as good as the first unless you need the indices for some reason (i.e., they can be reused for some other task) because result in the first case stores references and not copies of the tuples so it will incur more or less the same memory as storing indices to those tuples in stList.


result


stList


stList = list(set(stList))



If you just want an iterable and have no need to index stList, then you can even keep it as a set object.


stList


set






This worked very well. I agree now that trying to remove items from a list that I am iterating through is a bad idea. So, the seen is just a temporary holding spot for the comparison. Very nice; thanks!

– gwydion93
Sep 14 '18 at 0:50


seen






@gwydion93 Yes. seen is a set to track the items encountered so far. If you have more knowledge/control over the data structure this step can usually be improved, for e.g., by working at the bit level or making your own custom hash function etc. If the problem is unconstrained and without any specific structure, in general it is hard to do away with something like the seen variable here.

– lightalchemist
Sep 14 '18 at 1:01



seen


set


seen



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.