Is there any benefit of working with an Iterator over a List
Is there any benefit of working with an Iterator over a List
Is there any benefit of manipulating an Iterator over or List ?
I need to know if concatenating 2 iterators is better that concatenating to List ?
In a sense what the fundamental difference between working with iterator over the actual collection.
1 Answer
1
An Iterator isn't an actual data structure, although it behaves similar to one. It is just a traversal pointer to some actual data structure. Thus, unlike in an actual data structure, an Iterator can't "go back," that is, access old elements. Once you've gone through an Iterator, you're done.
Iterator
Iterator
Iterator
What's cool about Iterator is that you can give it a map, filter, or other transformation elements, and instead of actually modifying any existing data structure, it will instead apply the transformation the next time you ask for an element.
Iterator
map
filter
"Concatenating" two Iterators creates a new Iterator that wraps both of them.
Iterators
Iterator
On the other hand, Lists are actual collections and can be re-traversed.
Lists
It's not so much about advantages vs. disadvantages in an abstract sense. It's about the differences, which only become advantages and disadvantages depending on what you want to do.
Streams are also lazy, but unlike Iterators they can be retraversed. The question is, what is it you wanna do?– ubadub
Sep 16 '18 at 18:22
Streams
Iterators
Actually I was just wondering because I am having to concatenate list in my spark code and spark itself work with iterator. When u finish a mapPartition, you should return an iterator over your data. So I transform back to iterator. I initially convert the spark iterator to a list because I need to batch the processing of each partition anyway. By batching I mean I have an a service that takes a batch of data and produce an output. I can’t process élément by element. The reason i am concatenating is that that batch processing is applied to groups of the partition.
– MaatDeamon
Sep 16 '18 at 18:43
It short, transform the iterator over a partition into a list, group, and apply a batch process on each group. The output of each batch process is actually a string of many json object that I convert back to a list. Hence the output of a batch process to each group is a list. Then I concatenate all of them and produce an iterator as per spark requirements. I was just thinking if I should put some effort in ovoiding list and all and stay as much as possible in the iterator land. Granted, when converting to a list you parse the all list. But that is just at the beginning. Cost should neglectable
– MaatDeamon
Sep 16 '18 at 18:48
A major difference is that an entire
List must exist in memory, whereas an Iterator can generate one element at a time on the fly.– Seth Tisue
Sep 17 '18 at 4:46
List
Iterator
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you agree to our terms of service, privacy policy and cookie policy
So in short, the only advance is that it is lazy ?
– MaatDeamon
Sep 16 '18 at 18:20