Check if Object attribute is present in list of Object
Check if Object attribute is present in list of Object
I have an object with different attributes and a list that contains those objects.
Before adding an object to the list, I'd like to check if an attribute of this new object is present in the list.
This attribute is unique, so this is done to make sure that every object in the list is unique.
I would do something like this:
for post in stream:
if post.post_id not in post_list:
post_list.append(post)
else:
# Find old post in the list and replace it
But obviously line 2 doesn't work as I'm comparing the post_id
to the object list.
post_id
post
__hash__
post_id
post_list = set(stream)
@Peter: take into account that that won't preserver order!
list(OrderedDict.fromkeys(stream))
would keep the inputs in order of first-seen id.– Martijn Pieters♦
Sep 3 at 16:21
list(OrderedDict.fromkeys(stream))
Ah yeah, to be fair I assumed from the way he'd worded it order didn't matter, hadn't really given it much thought though :P
– Peter
Sep 3 at 17:01
2 Answers
2
Keep a separate set to which you add the attribute, and against which you can then test the next value:
ids_seen = set()
for post in stream:
if post.post_id not in ids_seen:
post_list.append(post)
ids_seen.add(post.post_id)
Another option is to create an ordered dict first, with the ids as keys:
posts = OrderedDict((post.post_id, post) for post in stream)
post_list = list(posts.values())
This will keep the most recently seen post
reference for a given id
, but you'll still unique ids only.
post
id
If ordering isn't important, just use a regular dictionary comprehension:
posts = post.post_id: post for post in stream
post_list = list(posts.values())
If you are using Python 3.6 or newer, then the order will be preserved anyway as the CPython implementation was updated to retain input order, and in Python 3.7 this feature became part of the language specification.
Whatever you do, don't use a separate list to test the post.id
against, as that takes O(N) time each time you check to see if the id is present, where N is the number of items in your stream in the end. Combined with O(N) such checks, that approach would take O(N**2) quadratic time, meaning that for every 10-fold increase in the number of input items, you'd also take 100 times more time to process them all.
post.id
But when using a set or dictionary, testing if the id is already there only takes O(1) constant time, so checks are cheap. That makes a full processing loop take O(N) linear time, meaning that it'll take time directly proportional to how many input items you have.
This should work
for post in stream:
if post.post_id not in [post.post_id for post in post_list]:
post_list.append(post)
This is incredibly inefficient, as
not in
has to do a full scan of the list you generated. You also re-generate the post_id
list each iteration. The combination is a killer, making this a O(N**3) quadratic approach where only O(N) linear time is ever needed. For 1000 items, your approach will take approximatly 1 million times as long as a O(N) linear approach.– Martijn Pieters♦
Sep 3 at 16:17
not in
post_id
So yes, it should work, if you have the patience. This rapidly sucks up time as the input sequence becomes longer (10k items, 100 million times slower, 100k items, 10 billion times slower, etc).
– Martijn Pieters♦
Sep 3 at 16:20
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
If you have control over the
post
class, you could create a__hash__
method and use the value ofpost_id
there, then just dopost_list = set(stream)
– Peter
Sep 3 at 16:15