Find first value not in std::set min-max

Find first value not in std::set<int> min-max



How do I find the first value h that is not in std::set<int> ids such that result is clamped to [0, *ids.rbegin() + 1] range.


h


std::set<int> ids


[0, *ids.rbegin() + 1]



I see this is rather simple problem, but I didn't find any matching question yet. Basically I want the inverted set of ids so that I can use them.


ids



So far I have following:


#incldue <set>
std::set<int> ids;
int h = 0;
for(; !ids.empty() && (h <= *ids.rbegin() + 1); ++h)
if(!ids.count(h))
break;


// h is now smallest value not in ids.



I suspect this be improved further such that e.g the loop is not required?



@edit: to clarify what values are in the set: In my use case the value generated by the algorithm is inserted into the set. I should have really said std::set<unsigned int>. I'm happy to so much discussion done on this question!


std::set<unsigned int>






With sorted std::vector, you would have random access and can check middle element and then check only left or right side. but you cannot access random element from a std::set...

– Jarod42
Sep 6 '18 at 20:02


std::vector


std::set






Btw, you can iterate on the std::set instead of doing linear look-up.

– Jarod42
Sep 6 '18 at 20:04


std::set






Don't you just need int h = ids.begin() - 1;?

– NathanOliver
Sep 6 '18 at 20:07


int h = ids.begin() - 1;






@NathanOliver: Wouldn't that result in -1 if 0 was in the set?

– Fred Larson
Sep 6 '18 at 20:14


-1


0






@NathanOliver: this would not detect 3 in a set 0,1,2,4,5

– Stephan Lechner
Sep 6 '18 at 20:18


3


0,1,2,4,5




7 Answers
7



Since a std::set's elements are sorted, you can use std::adjacent_find.


std::set


std::adjacent_find


std::adjacent_find(set.begin(), set.end(), (int a, int b) return a+1 != b; );



This will return an iterator to the first element, a, which is not followed by the value a+1. Or set.end() if there is no such value.


a


a+1


set.end()



Sample usage:


std::set<int> ids -2, -1, 0, 1, 2, 4, 5, 6 ;

// This code assumes a non-empty set, since the range in the question requires same
if ( *ids.begin() > 0 )

// Special case for [0
std::cout << 0;

else

auto res = std::adjacent_find(ids.begin(),
ids.end(),
(int a, int b) return a+1 != b; );
if ( res == ids.end() )

// Special case for *ids.rbegin() + 1]
std::cout << *ids.rbegin() + 1;

else

// Print the value that should have followed this one.
std::cout << *res + 1;




Output:


3






Elegant approach.

– Stephan Lechner
Sep 6 '18 at 20:22






Thank you, @StephanLechner. That particular algo seems to be one that gets lost in the back of the drawer and forgotten.

– Drew Dormann
Sep 6 '18 at 20:30






For 1, 2, 3, 5, you return 4 instead of 0.

– Jarod42
Sep 6 '18 at 20:41


1, 2, 3, 5


4


0






From range [0, *ids.rbegin() + 1] or simply [0, ids.size()], it is guarantied that set has at least one missing number.

– Jarod42
Sep 6 '18 at 20:53


[0, *ids.rbegin() + 1]


[0, ids.size()]


set






Not sure negative numbers have to be handled, but if it is -3, 0, 1, 2, 4 return -2 instead of 3 (as I understand expectation). And if it is not, your sample is misleading.

– Jarod42
Sep 6 '18 at 20:59



-3, 0, 1, 2, 4


3


std::set<int> ids;
int h = 0;
for( auto id : ids )
if( id != h )
break;
h++;






Results with 0 when -1 and 0 are in set.

– Öö Tiib
Sep 7 '18 at 4:48




std::set is not optimized for this problem.


std::set



Naive approaches both give you O(n) performance and bad O(n) performance because you are walking a node based datastructure.



What you want is a sorted "leapable" datastructure (be it some kind of tree, or a skip list) where the size of the leaps is recorded.



Then you can track the delta-id and the size of the leap; if the leap is smaller than the id difference, there is an empty id in there. If not, there is no empty id in there.



None of the associative containers in std keep track of the information you need, and retrofitting them isn't practical as you are not given access to the "leap" based iteration or structure.



With such a datastructure, insert and remove is O(lgn) as is "find first unused id". Without it, find first unused id is O(n).



That involves some heavy codesmithing. We can do almost as good, with higher constant costs, by simply storing a set of ranges and wrapping it to guarantee that the ranges don't overlap.


struct range
int base = 0;
int length = 0;
friend bool operator<( range lhs, range rhs )
if (lhs.base != rhs.base) return lhs.base < rhs.base;
return lhs.length < rhs.length;

bool operator()(int x) const
ERROR( if (length < 0) std::cout << "bad lengthn"; )
ERROR( if (base < 0) std::cout << "bad basen"; )
return (x>=base) && (x<(base+length));

;



range is a half-open interval going from [base, base+length). Thus [x,0) is an empty range for all x and [x,1) just contains x.


range


[base, base+length)


[x,0)


x


[x,1)


x



It is ordered by base. If you ask for the lower bound of an ordered collection on [x,0)


base


[x,0)



Now we make a std::set<range> and wrap it up:


std::set<range>


struct id_set
bool taken(int x) const;
int find_unused() const;
void take(int x);
void recycle(int x);
int take_unused()
auto r = find_unused();
take(r);
return r;

std::size_t count() const
std::size_t r = 0;
for (auto e:state)
r += e.length;
ERROR( if (r!=counter) std::cout << "Counter failuren"; )
return r;

private:
std::set<range> state;
using iterator = std::set<range>::iterator;
iterator get_interval(int x) const;
std::size_t counter = 0;
;

id_set::iterator id_set::get_interval(int x) const
auto start = state.lower_bound( x,0 );
if (start != state.end())
if ((*start)(x))
return start;

if (start == state.begin() )
return state.end();

auto prev = std::prev(start);
if ((*prev)(x))
return prev;

return state.end();

bool id_set::taken(int x) const
return get_interval(x) != state.end();


int id_set::find_unused() const
auto it = state.begin();
if (it == state.end()) return 0;
auto r = it->base + it->length; // we ensure these intervals are never adjacent; thus the element after the first interval is untaken
ERROR( if (taken(r)) std::cout << "find_unused failedn"; )
return r;


void id_set::take(int x)
// return an id:
void id_set::recycle(int x)
auto it = get_interval(x);
if (it == state.end()) return; // nothing to do
--counter;

// create two intervals, one before and one after:
auto lhs = *it;
lhs.length = x-lhs.base;
auto rhs = *it;
rhs.base = x+1;
rhs.length -= lhs.length+1;
// remove this interval:
state.erase(it);
// insert either non-zero length interval:
if (lhs.length > 0)
state.insert(lhs);
if (rhs.length > 0)
state.insert(rhs);
ERROR( if (taken(x)) std::cout << "recycle failedn"; )



there are probably typos above. But the core idea is that take and recycle are both O(lgn) operations, as is find_unused. Thus take_unused is also O(lgn).


take


recycle


find_unused


take_unused



Live example



The elements in the set are sorted. Hence, when you iterate the elements, you just have to detect the first "leak" (e.g. if the value of the set is larger than a linear increasing integral value). You can define where to start, e.g. you could write h=1 if you want to ignore value 0, and also the corner case of a set of continuous numbers is covered (result is then max+1):


h=1


0


int main()
std::set<int> mySet = 5,4,6,2,1,0;
int h=0;
for(auto val : mySet)
if (val != h)
break;

h++;


cout << "not in mySet:" << h << endl;

return 0;



Your version is O(n log(n)).

You can have it in linear time (O(n)):


O(n log(n))


O(n)


int h = 0;
for (const int e : ids)
if (e != h)
return h;

++h;

return h;



If ids can contains negative numbers, then change the loop to:


ids


int h = 0;
for (auto it = ids.lower_bound(0); it != ids.end(); ++it)
if (*it != h)
return h;

++h;

return h;



With sorted vector, you can even reduce complexity to O(log(n)).


O(log(n))



If you insist on using std::set to store a set of integers, then unfortunately linear traversal (suggested in other answers) is the best option.


std::set



With a small modification to your search tree you can achieve O(log n) search for the smallest missing integer while retaining all the other properties - you just need to store the amount of entries smaller than the key for each tree node. Not sure whether you can implement it without rewriting the whole std::set though.

UPD: turns out you can - check the answer by @Yakk - Adam Nevraumont


O(log n)


std::set



In fact, by caching the smallest value after each operation on your container, you can get it to "find" your number in O(1), which is asymptotically unbeatable. Of course, it will not gain you any actual performance improvement.


O(1)



Finally, if there are any additional limitations placed upon your integers, you can sometimes come up with some clever trick (usually happens in coding tasks from the training websites) - e.g. if you know that there is exactly one number missing in a range, you can calculate it arithmetically in O(1).


O(1)



I already accepted @Drew Dormann's answer. However I modified the algo such that I can to allow inserting the values into std::list<int> keeping the list sorted.
I also made it a generic template function:


std::list<int>


#include <algorithm>
#include <list>
#include <iterator>
#include <iostream>
// Make smallest sequential val that is not in c.
// returns [val, *c.rbegin() + 1] and suitable iterator to insert val into c.
template<typename Cont>
void make_minimal_id(const Cont & c, typename Cont::value_type & val,
typename Cont::const_iterator & x)

if( c.empty())
x = c.begin();
else if ( *c.begin() > val )
x = c.begin();
val = *x - 1;
else
// "one algo that gets lost in the back of the drawer and forgotten."
x = std::adjacent_find(c.begin(), c.end(),
(typename Cont::value_type a, typename Cont::value_type b)
return a + 1 != b; );
if ( x == c.end() )
val = *c.rbegin() + 1;
else
val = *x++ + 1;



// pretty print container
template<typename C>
void print_cont(const C & c)
std::cout << std::accumulate(std::next(c.begin()), c.end(),
std::to_string(c.front()), // start with first element
(std::string a, int b)
return a + ", " + std::to_string(b); ) << std::endl;


int main()
std::list<unsigned int> ids = 3, 4, 5, 6, 9 ;
print_cont(ids);
for(int i = 0; i < 6; ++i)
unsigned int id = 0;
std::list<unsigned int>::iterator x;
make_minimal_id(ids, id, x);
ids.insert(x, id);
print_cont(ids);

// prints:
//3, 4, 5, 6, 9
//2, 3, 4, 5, 6, 9
//1, 2, 3, 4, 5, 6, 9
//0, 1, 2, 3, 4, 5, 6, 9
//0, 1, 2, 3, 4, 5, 6, 7, 9
//0, 1, 2, 3, 4, 5, 6, 7, 8, 9
//0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
return 0;



make_minimal_id() now works for any sorted container and value. :-)


make_minimal_id()



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)