Comparison of these two algorithms?

Comparison of these two algorithms?



So I'm presented with a problem that states. "Determine if a string contains all unique characters"



So I wrote up this solution that adds each character to a set, but if the character already exists it returns false.


private static boolean allUniqueCharacters(String s)

Set<Character> charSet = new HashSet<Character>();
for (int i = 0; i < s.length(); i++)
char currentChar = s.charAt(i);
if (!charSet.contains(currentChar))
charSet.add(currentChar);

else
return false;



return true;




According to the book I am reading this is the "optimal solution"


public static boolean isUniqueChars2(String str)
if (str.length() > 128)
return false;

boolean char_set = new boolean[128];

for (int i = 0; i < str.length(); i++)
int val = str.charAt(i);

if (char_set[val])
return false;

char_set[val] = true;


return true;



My question is, is my implementation slower than the one presented? I assume it is, but if a Hash look up is O(1) wouldn't they be the same complexity?



Thank you.





Well, the book "solution" fails in Java because characters are 16 bits and can have a value from 0 to 65535, so a 128-byte array won't work.
– Jim Garrison
Aug 21 at 7:43


0


65535





They are of same complexity. Not necessarily of same speed. Carrying 10 vases up a staircase has the same complexity as carrying 10 pianos up a staircase, but the effort involved is somewhat different. Hash O(1) is slower than array O(1), given that hash needs to, at least, calculate the hash. Also, hash is amortised O(1), not true O(1); it does not matter complexity-wise, but it does affect speed.
– Amadan
Aug 21 at 7:43






Apples and oranges. Your implementation is more general since it doesn't assume this is pure ascii. One performance improvement you can do is getting rid of the contains operation and simply comparing the size of the set and the string. As soon as they start to differ, return false.
– Jilles van Gurp
Aug 21 at 8:54






I suggest throwing the book away if the author thinks there are only 128 characters
– phuclv
Aug 21 at 9:33





To be fair your algorithm calculates the hash twice for the contains() and the add(). The add() method returns whether the element was already contained so no need for the extra contains() check.
– Nathan Adams
Aug 21 at 9:43




6 Answers
6



As Amadan said in the comments, the two solutions have the same time complexity O(n) because you have a for loop looping through the string, and you do constant time operations in the for loop. This means that the time it takes to run your methods increases linearly with the length of the string.



Note that time complexity is all about how the time it takes changes when you change the size of the input. It's not about how fast it is with data of the same size.



For the same string, the "optimal" solution should be faster because sets have some overheads over arrays. Handling arrays is faster than handling sets. However, to actually make the "optimal" solution work, you would need an array of length 2^16. That is how many different char values there are. You would also need to remove the check for a string longer than 128.


char



This is one of the many examples of the tradeoff between space and time. If you want it to go faster, you need more space. If you want to save space, you have to go slower.



Both algorithms have time complexity of O(N). The difference is in their space complexity.



The book's solution will always require storage for 128 characters - O(1), while your solution's space requirement will vary linearly according to the input - O(N).


O(1)


O(N)



The book's space requirement is based on an assumed character set with 128 characters. But this may be rather problematic (and not scalable) given the likelihood of needing different character sets.





in an interview would my solution be unacceptable or trivial?
– fsdff
Aug 21 at 7:54





@fsdff that will depend entirely on what the interviewer expects, so it's very subjective. But your solution seems very normal/common to me.
– ernest_k
Aug 21 at 7:57





I don't see how the OP's solution would be O(N). When all possible keys have been entered in the hashmap, it will not grow anymore. The required space is O(1) as for the array.
– Yves Daoust
Aug 21 at 8:23





@YvesDaoust You're assuming some character set, that makes sense... except that the point of complexity analysis is getting an idea of worst-case scenarios. And whether the character set is finite or not, it's a fact in this case that space complexity depends on input, and no explicit mention was made about the size of the character set.
– ernest_k
Aug 21 at 8:30





@ernest_k: whatever the alphabet size, let M, there is no difference between the space complexities: both O(M) (or O(M Log M) if you want to be picky), contrary to what you say. And no dependency on the size of the input.
– Yves Daoust
Aug 21 at 8:33




The hashmap is in theory acceptable, but is a waste.



A hashmap is built over an array (so it is certainly more costly than an array), and collision resolution requires extra space (at least the double of the number of elements). In addition, any access requires the computation of the hash and possibly the resolution of collisions.



This adds a lot of overhead in terms of space and time, compared to a straight array.



Also note that it is kind of folklore that a hash table has an O(1) behavior. The worst case is much poorer, accesses can take up to O(N) time for a table of size N.



As a final remark, the time complexity of this algorithm is O(1) because you conclude false at worse when N>128.





While the worst case for individual accesses to a hash table is worse than O(1), this is entirely irrelevant here because the algorithm performs Θ(n) accesses, not a single one. Unless a piss-poor hash algorithm or collision avoidance scheme is used, the expected runtime of the algorithm is very much still O(n). And I’m curious how you intend to improve the asymptotic space complexity, which you’ve omitted to explain in your answer.
– Konrad Rudolph
Aug 21 at 13:58






@KonradRudolph: for instance a hash function returning a constant. In this question, there is nothing that can be improved asymptotically. I am just saying that hashing brings no practical benefit, on the opposite.
– Yves Daoust
Aug 21 at 14:03





But this isn’t the hash function being used here, and this isn’t how the complexity of the hash table operations are usually (or usefully) calculated. But OK, you say hashing brings no practical benefits; in that case please enlighten me how you’d implement this algorithm for a Unicode alphabet.
– Konrad Rudolph
Aug 21 at 14:08






@KonradRudolph: the question is about ASCII.
– Yves Daoust
Aug 21 at 14:17





It was my impression that the comments under the question established that the question was not about ASCII, and that, rather, the answer given in the book is simply wrong. At any rate, ASCII contains 127 valid characters, not 128 so at best the answer in the book is still incompetent.
– Konrad Rudolph
Aug 21 at 14:21




Your algorithm is also O(1). You can think about complexity like how my algorithm will react to the change in amount of elements processed. Therefore O(n) and O(2n) are effectively equal.


O(1)


how my algorithm will react to the change in amount of elements processed


O(n)


O(2n)



People are talking about O notation as growth rate here





Not O(1). Definitely. Unless there is an upper bound on input size.
– user202729
Aug 21 at 9:27



Your solution is could indeed be slower than the book's solution. Firstly, a hash lookup ideally has a constant time lookup. But, the retrieval of the object will not be if there are multiple hash collisions. Secondly, even if it is constant time lookup, there is usually significant overhead involved in executing the hash code function as compared to looking up an element in an array by index. That's why you may want to go with the array lookup. However, if you start to deal with non-ASCII Unicode characters, then you might not want to go with the array approach due to the significant amount of space overhead.



The bottleneck of your implementation is, that a set has a lookup (and insert) complexity* of O(log k), while the array has a lookup complexity in O(1).


O(log k)


O(1)



This sounds like your algorithm must be much worse. But in fact it is not, as k is bounded by 128 (else the reference implementation would be wrong and produce a out-of-bounds error) and can be treated as a constant. This makes the set lookup O(1) as well with a bit bigger constants than the array lookup.


k


128


O(1)



* assuming a sane implementation as tree or hashmap. The hashmap time complexity is in general not constant, as filling it up needs log(n) resize operations to avoid the increase of collisions which would lead to linear lookup time, see e.g. here and here for answers on stackoverflow.


*


log(n)



This article even explains that java 8 by itself converts a hashmap to a binary tree (O(n log n) for the converstion, O(log n) for the lookup) before its lookup time degenerates to O(n) because of too many collisions.


O(n log n)


O(log n)


O(n)





I moved this discussion into chat.
– allo
Aug 21 at 14:50







By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)