Heap implementation for numeric types

Heap implementation for numeric types



I am trying to improve the quality of my code as well as trying to study the Heap data structure. I have implemented a minHeap (Heap in which minimum value nodes have higher priority). Ref: link1, link2, link3. This is the commented code:


Heap


#include <iostream>
#include <vector>
#include <limits>

// Class for throwing `out of range` exceptions
class OutofRange

std::string error = "The index is out of range.";
public:
OutofRange()



std::string what()

return error;

;

// Class for MinHeap. The `arr` holds the heap elements. The `capacity` indicates total
// allocated space for the heap. The `length` indicates size of the heap.
template<typename T>
class MinHeap

T* arr;
size_t capacity;
size_t length;
public:
MinHeap(size_t);
T get_min();
void heapify(int);
void insert(T);
T delete_min();
T delete_by_index(int);
void print_heap();
;

// Constructor for the heap
template<typename T>
MinHeap<T>::MinHeap(size_t n)

arr = new T[n];
capacity = n;
length = 0;


// Return the minimum node (root)
template<typename T>
T MinHeap<T>::get_min()

if(length > 0)

return arr[0];

else

std::cout << "Error: Invalid access" << std::endl;
throw OutofRange;



// Function to put a node at the right position if it has the possibility to move up
// in the heap
template<typename T>
void MinHeap<T>::heapify(int start)

for(int i = start; i > 0; i = (i - 1) / 2)

if(arr[i] < arr[(i - 1) / 2])

std::swap(arr[i], arr[(i - 1) / 2]);

else

return;





// Inserts nodes to heap
template<typename T>
void MinHeap<T>::insert(T data)

// Check if capacity is reached
if(length == capacity)

std::cout << "Error: Heap capacity reached" << std::endl;
return;


// Insert the node
arr[length] = data;
this -> heapify(length);
length++;


// Deletes the minimum value node (root)
template<typename T>
T MinHeap<T>::delete_min()

// Check if heap has any elements/ nodes
if(length == 0)

std::cout << "Error: Invalid Access" << std::endl;
throw OutofRange;


// Replace minimum node (root) with the last node
T del_min = arr[0];
std::swap(arr[0], arr[length - 1]);
arr[length - 1] = (T)NULL;
length--;
// Bring the new root to correct position based on value
for(size_t i = 0; i < length - 1 && length > 0;)

int min_child;
if(2 * i + 1 > length - 1)

break;

if(2 * i + 2 > length - 1)

min_child = 2 * i + 1;

else

min_child = arr[2 * i + 1] <= arr[2 * i + 2] ? 2 * i + 1 : 2 * i + 2;

if(arr[i] > arr[min_child])

std::swap(arr[i], arr[min_child]);

i = min_child;

return del_min;


// Deletes a node by index
template<typename T>
T MinHeap<T>::delete_by_index(int index)

int del_val = arr[index];
T min_val = std::numeric_limits<T>::min();
arr[index] = min_val;
this -> heapify(index);
this -> delete_min();
return del_val;


// Print the heap
template<typename T>
void MinHeap<T>::print_heap()

std::cout << "The Min Heap: " << std::endl;
if(length == 0)

std::cout << "Empty" << std::endl;

for(size_t i = 0; i < length; i++)

std::cout << arr[i] << "t";

std::cout << std::endl;


int main()

try

MinHeap<int> h110;
int del_val;
std::vector<int> input = 35, 33, 42, 10, 14, 19, 27, 44, 26, 31;
// Insert input values to the heap
for(size_t i = 0; i < input.size(); i++)

h1.insert(input[i]);
h1.print_heap();


// Check if minimum value/ highest priority can be accessed
std::cout << "Current Root: " << h1.get_min() << std::endl;

// Delete by index test case
del_val = h1.delete_by_index(2);
std::cout << "Delete by index: " << del_val << std::endl;
h1.print_heap();

// Delete the minimum value
for(size_t i = 0; i < input.size(); i++)

del_val = h1.delete_min();
std::cout << "Deleted value: " << del_val << std::endl;
h1.print_heap();


// Uncomment to test exception in deletion if delete_by_index test case is not present.
//del_val = h1.delete_min();
//std::cout << "Deleted value: " << del_val << std::endl;

// Uncomment to test exception in accessing minimum value(root)
//h1.get_min();

// Catch the out of range exceptions
catch(OutofRange& e)

std::cout << e.what() << std::endl;

return 0;



Are heaps used for data types other than numeric? Even though my implementation uses templates, it does depend on numeric_limits. I am assuming that if the data type is not numeric, it can converted to a custom data type like shown below:


numeric_limits


struct custom_type

T val;
int priority;
custom_type(T v, int p)

val = v;
priority = p;

// Overloads for comparison operators which utilize priority field.
/*
-------
-------
*/
;



Is there a better way to handle out of range errors without using exceptions and not returning a value such as INT_MAX, INT_MIN or zero?


out of range


INT_MAX


INT_MIN



What could be other possible improvements in the code?





I realize that implementing a heap is a good learning exercise, but if you ever want a better implementation: std::make_heap et al. are pretty good.
– Rakete1111
Aug 21 at 8:47


std::make_heap




3 Answers
3



Memory management



You never delete arr;, which leaks memory. Not a good thing! There are multiple ways to fix this:


delete arr;



Adding correct calls to delete arr; in the right places (remember exceptions, assignments and so on!), which is rather bug-prone.


delete arr;



Also, new T[size] default-constructs size objects of type T in the contiguous memory. This requires T to be default constructible and might be expensive, depending on T and size.


new T[size]


size


T


T


T


size



Use a std::unique_ptr<T> arr instead. This will automatically clean up memory, but you still need to keep track of size, capacity etc.


std::unique_ptr<T> arr



Still has the default-construction issue from above.



Use a std::unique_ptr<std::aligned_storage<sizeof(T), alignof(T)>>. Similar to above, but without the default-construction issue. Requires placement new, though.


std::unique_ptr<std::aligned_storage<sizeof(T), alignof(T)>>



Use a std::vector<T> (or similar container) and let it handle all that memory management stuff. Easiest solution, and hard to screw up. It also makes adding additional features easier, like growing capacity.


std::vector<T>



I'd suggest using the last option, unless you have some really strong reasons for not using std::vector.


std::vector



Design



At the very lowest level, one could extract the comparison operation and the actual container and write the heap logic independently of those. This would allow to reuse the same implementation for many different orderings (min-heap, max-heap, ...) and different underlying storage.


heap



heapify is an internal helper function, and thus shouldn't be accessible to the public.


heapify



print_heap might be a handy debugging utility function, but doesn't seem to fit into the finished product.


print_heap



How would someone effectively use delete_by_index? An outsider doesn't have access to MinHeap::arr, so there isn't a good way to determine which indices might contain a value that ought to be removed.


delete_by_index


MinHeap::arr



Most of the remaining public facing member functions have names that differ from the usual standard library ones. Examples:


public


get_min


top


front


delete_min


pop


insert


push


emplace



A revised class design could look something like this:


template<typename T, typename Container = std::vector<T>, typename Comparer = std::less<typename Container::value_type>>
class heap
Container storage;
Comparer compare;

public:
heap() = default;

void push(const T& value);
void push(T&& value);

template<typename... Args>
void emplace(Args&&... args);

void pop();

const T& top() const; // could return T, but would be expensive for large T

typename Container::size_type size() const;
bool empty() const;

private:
void heapify();
void bubble_down(typename Container::iterator pos); // could be std::size_t or std::ptrdiff_t instead
;



Some of those member functions could be marked noexcept (or conditionally noexcept), depending on their actual implementation.


noexcept


noexcept



Implementation



An "out of range" exception is so common that it got included in the <stdexcept> standard library header: std::out_of_range. This could be used instead of OutOfRange.


<stdexcept>


std::out_of_range


OutOfRange



There are lots of random debug messages which get printed to std::cout. Either they aren't needed at all, or an exception would convey much better to the caller that something went wrong.


std::cout



For example, instead of std::cout << "Error: Heap capacity reached" << std::endl; return; (taken from MinHeap::insert), throwing an exception like throw std::out_of_range "Max heap capacity reached!" ; would tell the caller that something went wrong.


std::cout << "Error: Heap capacity reached" << std::endl; return;


MinHeap::insert


throw std::out_of_range "Max heap capacity reached!" ;



(T)NULL might not work for non-numeric types T. If a default value is wanted, T could be used instead.


(T)NULL


T


T



In many cases there is no need for the costly heapify operation. A simpler bubble_down operation (like that loop inside delete_min) that only checks one branch would often suffice, reducing the complexity of those calls from $mathcalO(n)$ to $mathcalO(log n)$.


heapify


bubble_down


delete_min



In delete_min, there are two patterns that are repeated quite a lot: 2 * i + 1 and 2 * i + 2. Would it hurt to give them names, like left_child and right_child?


delete_min


2 * i + 1


2 * i + 2


left_child


right_child



The implementer for type T might provide its own free-standing swap(T&, T&) function, which usually is better than std::swap(T&, T&). To use that one if it exists, you could add using std::swap; at the top of each function where std::swap is used, and then replace all calls to std::swap with just swap. (That way, if a type-specific swap function exists, it will be a better match than std::swap. std::swap will be used as fallback.)


T


swap(T&, T&)


std::swap(T&, T&)


using std::swap;


std::swap


std::swap


swap


swap


std::swap


std::swap



The bubble-down loop in delete_min can exit early if the condition arr[i] > arr[min_child] is not true.


delete_min


arr[i] > arr[min_child]



Try to std::move values instead of copying them if a strict copy isn't necessary. (Hint: None are in this implementation.) This allows the heap to contain move-only types (like std::unique_ptr), which cannot be copied.


std::move


std::unique_ptr



Q & A



There are many uses for heaps (or as the standard library calls them: Priority queues), and more often than not they don't operate on pure numbers. Most likely you'll have some objects ordered by a priority data member or by some timestamp. Relying on std::numeric_limits thus isn't a good idea.


priority


std::numeric_limits



For deleting a random element, it would be better to swap it with the last element, and then bubble that one up or down as needed.



If I understand the question correctly, you seem to want to ask "How can I fail gracefully if I have to return a value?". The simple answer is: You can't. If you need to return a value, that value needs to come from somewhere, and special values like INT_MIN, INT_MAX, 0, -1 or T just aren't expressive enough since they introduce ambiguities and require special checks by the caller.


INT_MIN


INT_MAX


0


-1


T



In some cases, if "no result" is an actually expected return value, you can use std::optional<T> for that, which makes those checks more explicit.


std::optional<T>



In other cases, there is no need for returning a value. For example, delete_min doesn't need to return a value (the same value could have been retrieved by calling get_min beforehand). If delete_mins return type got changed to void, it could fail gracefully (not throwing an exception) if the heap were empty.


delete_min


get_min


delete_min


void



Generally, try to not return values from state modifiers (e.g. insertion or deletinon operations), unless those values are an indirect result of those state modifiers themselves (e.g. an iterator pointing to the newly inserted element/to the element after the removed element).



See above.





@skr_robo: I hope I guessed the intent behind the second question correctly. If not, please let me know, so I can better address it.
– hoffmale
Aug 21 at 5:53





Thank you for taking time to write a very detailed answer. I will correct them one by one and post additional questions/ comments if required. Your answer to the second question was spot on. So, thank you!
– skr_robo
Aug 21 at 8:55



An addendum to hoffmale's excellent answer (read and upvote that one first!):



Prefer to initialize members, rather than assign them during construction:


template<typename T>
MinHeap<T>::MinHeap(size_t n)
: arrnew T[n],
capacityn,
length0




That's clearer, and allows compiler warnings such as -Weffc++ let you know if you missed any.


-Weffc++



Consider providing a constructor that accepts a std::initializer_list, which can be much more efficient than adding elements one at a time:


std::initializer_list


template<typename T>
MinHeap<T>::MinHeap(std::initializer_list<T> values)
: MinHeap(values.size())

std::copy(values.begin(), values.end(), arr);
length = values.size();
heapify(0);



std::size_t is consistently misspelt. Either add using std::size_t;, or (better, for a header) write it in full, throughout.


std::size_t


using std::size_t;



get_min() and print_heap() shouldn't modify the heap, so declare them const.


get_min()


print_heap()


const



No need to explicitly qualify calls to other members using this-> - that's just unnecessary clutter.


this->



If you must print error messages, write them to the error stream, rather than mixing them into the output:


catch(OutofRange& e)

std::cerr << e.what() << std::endl;



delete_by_index assigns a T to int in this line:


delete_by_index


T


int


int del_val = arr[index];



That looks like it was meant to be


const T del_val = arr[index];



or perhaps simply


const auto del_val = arr[index];



Also, this function accepts its index as an int, where std::size_t is more appropriate - that's also the case for heapify.


int


std::size_t


heapify



Once delete_by_index() is fixed to not need std::numeric_limits, then T is no longer constrained to be an arithmetic type, and we should be able to use any type with a working < operator (e.g. std::string) as the value type. In fact, the test program (which doesn't exercise delete_by_index()) works with std::string once the other minor bugs are fixed.


delete_by_index()


std::numeric_limits


T


<


std::string


delete_by_index()


std::string



In the test program, let's make input a const value, and use range-based for to iterate over it:


input


const


for


auto const input = "35", "33", "42", "10", "14",
"19", "27", "44", "26", "31";


// Insert input values to the heap
for (auto const& i: input)

h1.insert(i);
h1.print_heap();



// Delete the minimum value
for (auto const& i: input)

del_val = h1.delete_min();
std::clog << "Deleted value: " << del_val << std::endl;
h1.print_heap();



Finally, consider using a unit-test framework to exercise the code and check that the correct results or exceptions are produced. Done correctly, that can give a more complete test suite, allowing refactoring to be done with higher confidence against introducing new bugs. And you don't have to inspect reams of output to be sure that the tests have passed.





Thank you for pointing out some excellent points. I will take care of them one by one and post additional questions/ comments if required.
– skr_robo
Aug 21 at 8:57





I didn't put too much effort into fixing delete_by_index since I cannot see an easy way to get that index in any meaningful manner. However, your suggestions should be nice for a remove or remove_if like function. // I'm careful with general advice against this-> inside templates, since it might actually be required in some cases (e.g. refering to a member of a dependent base class). Not the case here, just FYI.
– hoffmale
Aug 21 at 9:41


delete_by_index


remove


remove_if


this->





@hoffmale: Likewise, I only looked at delete_by_index when it failed to compile (with T = std::string) - I don't see that it has any value to a user. Thanks for the reminder of where this-> can be required (I generally lean on the compiler to alert me to those).
– Toby Speight
Aug 21 at 9:45


delete_by_index


T = std::string


this->


class OutofRange {
std::string error = "The index is out of range.";
public:
OutofRange()



If no user-declared constructors of any kind are provided for a class type (struct, class, or union), the compiler will always declare a default constructor as an inline public member. If that implicitly-declared default constructor is odr-used, then the compiler will implicitly provide a definition with the same effect as a user-defined constructor with an empty body and empty initializer list. That is, if your default constructor does nothing, either =default the user-provided explicit declaration or omit the explicit-declaration and let the compiler implicitly generate it.


struct


class


union


inline public


=default


// Implicitly-declared default constructor
class OutofRange
std::string error = "The index is out of range.";
public:
std::string what() return error;


// Explicitly-declared default constructor that's been =default'd
class OutofRange
std::string error = "The index is out of range.";
public:
OutofRange() = default;
std::string what() return error;



The standard library provides plenty of exception types (<stdexcept>) you could be using.


<stdexcept>



Exception Hierarchy



If you want to specify your own type, derive from std::logic_error or std::runtime_error or any of their derived types. For your program, std::out_of_range already exists, so you could use that. Maybe you want something more specific, like popped_on_empty,


std::logic_error


std::runtime_error


std::out_of_range


popped_on_empty


struct popped_on_empty : public std::out_of_range
using std::out_of_range::out_of_range;
;


public:
void heapify(int);



Should this be part of the public api?


template<typename T>
MinHeap<T>::MinHeap(size_t n)

arr = new T[n];
capacity = n;
length = 0;



Make sure you namespace-qualify names. size_t is not guaranteed by the standard to exist while std::size_t is.


size_t


std::size_t



Don't rely on other headers to transitively include files your code is dependent on. Always include the exact header that you need. (<cstddef> for std::size_t, <string> for std::string)


<cstddef>


std::size_t


<string>


std::string



Your minHeap takes ownership of a dynamic allocation. You should follow the rule of five. Basically, whenever your class is managing some resource or responsibility, you should consider how the default compiler-generated behavior of the five special member functions operate on that object. The five special member functions you must consider are the copy constructor, move constructor, copy assignment operator, move assignment operator, and destructor.


minHeap



A better approach is to use a container to manage the memory for you. If the value semantics of a container behaves correctly with the compiler generated member functions, then you do not have to define them yourself. This is the considered the rule of zero.



When constructing objects, prefer initialization over assignment. If you have a constant value you are initializing a data member to (like length with zero), use the in-class member initializers.


length


class MinHeap {
T* arr;
std::size_t capacity = 0;
std::size_t length = 0;



If you are taking want to initialize a value at construction, use the constructor member initializer list. If a value is initialized at the data member and in the constructor init list, the compiler will initialize the value at the constructor init list.


MinHeap(std::size_t n)
: arr(new T[n]), capacity(n), length(0)

~MinHeap()
delete arr;


MinHeap(const MinHeap& other);
MinHeap(MinHeap&& other);
MinHeap& operator=(const MinHeap& other);
MinHeap& operator=(MinHeap&& other);



Avoid calling new/delete explicitly. Use std::make_unique<T> to create a std::unique_ptr<T> to manage the lifetime of arr. You still need to consider the five special member functions.


new


delete


std::make_unique<T>


std::unique_ptr<T>


arr


T MinHeap<T>::get_min() {
if(length > 0)
return arr[0];
else
std::cout << "Error: Invalid access" << std::endl;
throw OutofRange;



Should get_min() return a const T&? Should get_min() return the min for a const MinHeap? Consider applying const-correctness on member functions that only inspect and not mutate.


get_min()


const T&


get_min()


const MinHeap


const



Do not use else after the flow control has been interrupted by a jump statement (return, break, continue, and goto).


else


return


break


continue


goto



There is no need to print an error message here. Any information you want to inform the callee should be passed back through the exception. Let the callee report the error information to the console if they want to. Use a logging library if you need to track events.



Avoid std::endl. Passing the stream manipulator std::endl inserts a new line character into the stream and then flushes the stream. When working with buffered streams, this is often an unexpected side effect. Just stream the end line character ('n' or "n"). If you want to flush the stream as well, explicitly state your intent by streaming std::flush.


std::endl


std::endl


'n'


"n"


std::flush


template<typename T>
void MinHeap<T>::insert(T data) {
// Check if capacity is reached
if(length == capacity)
/* ... */

// Insert the node
arr[length] = data;
this -> heapify(length);
length++;



Don't state in comments what can be clearly stated in code. Compilers do not read comments. The added verbosity is not as precise as code and is not updated nearly as often/consistently. Comments should state what is done, not what is supposed to be done.


template<typename T>
void MinHeap<T>::insert(T data) {
// sort the underlying container using <, percolating the inserted element up the heap
// until it is inorder with its parent.



length is an unsigned integral type of at least 32-bits (std::size_t). MinHeap<T>::heapify(int) is expecting an a signed integral type. You have a narrowing conversion from std::size_t to int which could result in a possible loss of data.


length


std::size_t


MinHeap<T>::heapify(int)


std::size_t


int


template<typename T> T MinHeap<T>::delete_min() {
/* ... */
T del_min = arr[0];
std::swap(arr[0], arr[length - 1]);
arr[length - 1] = (T)NULL;



You have an accessor (MinHeap<T>::get_min()) that returns the minimum in the heap. Does MinHeap<T>::delete_min() also need to provide the minimum?


MinHeap<T>::get_min()


MinHeap<T>::delete_min()



NULL is a macro variable that represents the value 0. You are doing a C-style cast of 0 to type T. If T is a non-integral type, bad things will happen.


NULL


T


T



If std::swap is not specialized for T, you'll have a problem. You can read more about the std two-step here.


std::swap


T


for(size_t i = 0; i < length - 1 && length > 0;) {
int min_child;



All of your math in this loop has implicit conversions between std::size_t i and int min_child; from changing signedness to possible precision loss to signed/unsigned comparison.


std::size_t i


int min_child


template<typename T> void MinHeap<T>::print_heap() /* ... */



Should callers have access to the internal heapified array? I'm not convinced it's necessary. If you take the adapter approach for your heap class, in which some other container owns the memory, you can make the container a protected member accessible through inheritance.



1) Are heaps used for data types other than numeric?



Yes. Anything that can be ordered can be heap sorted. Uses that immediately spring to mind are:



Instead of using an extreme value to represent the priority, consider using a comparator that forces the value to sift up as the minimum (always returns true as being less than its parent).



2) Is there a better way to handle out of range errors without using exceptions and not returning a value such as INT_MAX, INT_MIN or zero?



There are many ways to handle the unexpected situation of popping an empty list. Here are a couple defensive design approaches that checks the preconditions and informs the user of an error:


nullopt


std::optional


std::variant


std::expected


expected



The caller is free to ignore the result type unless the [[nodiscard]] attribute is declared (C++17).


[[nodiscard]]



Alternatively, there is the design-by-contract approach - There is an assumption that all of the client components that invoke an operation will meet the preconditions specified as required for that operation. The pop operation requires the container not be empty and calling pop on an empty heap will attempt to pop as if there were an element there. The expected result is undefined behavior. Formal systems to enforce contracts are coming to C++. Boost.Contracts is currently in development. C++20 will provide expects, ensures, and assert attributes.


expects


ensures


assert



3) What could be other possible improvements in the code?



Besides what is listed, support for move operations. allocator, comparator, projections, container requirement conformance.





While I like the design-by-contract approach, I don't think it's the correct one to use for public interfaces. Blindly "trusting" user input gets us all kinds of security problems (SQL injections, Heartbleed, ...) or undefined behavior if unchecked. It might be a good choice to use internally, but that requires rigorous discipline to always check pre-conditions at the callsite (if necessary).
– hoffmale
Aug 21 at 10:20






@Snowhawk: Thank you for drawing attention to many details including std::endl, use of libraries for std::string and std::size_t , and the answer for error handling.
– skr_robo
Aug 21 at 10:35





@hoffmale it's the approach used in the standard library. std::vector::front on an empty vector is simply UB. The nice thing about UB is that you can have different semantics e.g. throwing in a debug build, vs no checking in release
– Caleth
Aug 21 at 12:37


std::vector::front





@Caleth: This approach is used in parts of the standard library, true. However, that's mostly in the name of "zero overhead" performance, and not ease of use. The tradeoff is spending thousands of programmer hours in debugging instead. // Having different semantics between debug and release builds can be a real pain, especially if there is a bug only showing in release mode or if some developer on a project starts relying on debug mode error-checking "features". (Basically, there are two different sets of contracts in debug and release mode.)
– hoffmale
Aug 22 at 8:58






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

ャフサォクコ ケウ,コ,ワ メ,ロスョノ゙,クネ,フムカヤヲニ,エコ゚ツ ウイオン゙ケワサネォキモュキォウイノンコチ゚メヌナイゥフュ,カヒウネェ ネ,ホノケ,ムュキ ッボーミュハ,チ ツス ィ メウイマヤ,゙ウチ ヅ ロ,ォジヌェ ャヌット ェ,マャ,チナエヒネソキツテ トホヲヲミーァ

Node.js puppeteer - Use values from array in a loop to cycle through pages