Project 4: Merge Sort, Heaps, and HashMaps

(Return to homepage)

In Project 4, you'll implement Merge Sort, plus a selection of data structures: Heaps, Priority Queues (bonus!), and HashMaps.

Merge Sort is a well-known O(nlogn) optimal sort that uses the merge technique to join two sorted lists in O(n) time. Merge Sort operates off the observation that an empty or single-element list is already sorted. The base case for a recursive sorting algorithm can then be thought of as empty or single-element lists, and the recursive case merges the recursive calls, propagating larger and larger sorted lists up the stack until the whole list is sorted.

Heaps are a data structure optimized for retrieving and storing minimum values (or maximum values, for a max-heap). Heaps can be thought of as binary trees, with two properties: parent nodes are less than or equal to their children, and the tree is complete. That is to say, when appending elements to the heap, they are inserted in the deepest row of the tree farthest to the left in the row.

HashMaps are ideal for associating data, or in other words, creating lookup tables from data to data. HashMaps are efficient due to hashing functions that convert arbitrary input to predictable, but uniformly random integer outputs. As integer indexing in an array is O(1), a HashMap with a O(1) hashing function enables us to look up values associated with keys in O(1) time. However, because hashing has collisions and can cause keys to map to the same index (bucket), we need to store an additional Linked List for every bucket in the array that can be appended to when keys have the same index (but not necessarily the same value).

Java determines the hash of each Object using its public int hashCode() method. Java provides hashCode() for free for all library types; for custom types, you must override the hashCode() function just as you would override toString().

As a general rule of thumb, when two Objects are .equals() to each other, they should have the same hashCode() However, it is fine if two Objects that do not .equals() each other to have the same hashCode() as well. What matters is that the hashCode() of unequal Objects are sufficiently random so that their table indices (buckets) will be random, and their value entires are spread across different buckets in the table.

Time for battle. Get the Project 4 starter code here! (jUnit dependencies included)

The project code should work out-of-the-box for IntelliJ (running and debugging), but if you're having trouble getting it working, try these IntelliJ setup instructions.

Merge Sort

Your implementation of Merge Sort will be a two-step process: first, you'll implement merging, and then you'll call the merge function recursively to form Merge Sort. Both static methods below are in the algorithms.Sorting class.

Heaps

The heaps.MinHeap class contains method stubs for a minimum Heap data structure. The Heap will be backed by an ArrayList<T>, storing the root node at index 1. The mathematical formulas for computing child and parent indices are povided for free.

Heaps are excellent backing structures for Priority Queues. The heaps.PriorityQueue<T> class is a bonus class which contains a MinHeap that you may optionally choose to implement, for up to 10 bonus points on the project.

HashMaps

The hashing.HashMap<K, V> class contains method stubs for a HashMap data structure that maps keys of type K to keys of type V. The HashMap will use the hashing with buckets technique, by storing an array of LinkedList<Pair<K, V>>. Keys that hash to the same bucket will store their values as individual Pair<K, V> elements on the Linked List at the array index (bucket).

To determine what bucket a key belongs to, you can use the provided int getSlot(K) method, which computes hashCode() % table.length. Java permits a single null key, which always hashes to 0.

To avoid performance degredation when the HashMap is too full, the HashMap has a load factor that, when surpassed, causes the HashMap to expand. The current load factor is computed as number of key-value pairs / array length. When the load factor exceeds the hardcoded double LOAD_FACTOR = 0.75, the HashMap should be expanded by relocating all the key-value pairs to their new buckets.

Running Your Code + Bonus Points

As with the previous projects, this project contains a full test suite (see the test package) that will evaluate your code.

As usual, your project score is based on the percentage of tests you pass, contingent on the correctness of your submission. The Priority Queue class is a bonus and can be implemented for an additional 10 bonus points on the project, for a total possible score of 110. You are eligible to receive bonus points even if other parts of the project are incomplete.

A run script for the entire project test suite is included as run.bat (Windows) or run.sh (Mac/Linux). This will compile your entire project and produce a run report with your estimated grade, including bonus points. Your entire project must compile for the script to work, and your entire project must compile to receive a grade after submission.

You are welcome to modify the tests for debugging purposes, though keep in mind your project will be graded using an unmodified copy of the test suite.

Submission Instructions

See the submissions page for details. Have questions or need help? Post on Piazza!

Critical Analysis Questions

These questions aren't graded, but are similar to those you may see on an exam. You are welcome to discuss these with classmates offline on on Piazza.

(Return to homepage)