Monday, December 1, 2008

Update on the Kompimi C Data Structure Library: One of the most useful common collections is the associative array. Today I completed work on two of the three data structures planned to support this abstraction, the hash table and the "dynamic array set", which is simply a dynamic array that uses linear search for lookup, intended for very small collections and as a performance baseline.

Performance results are shown below. The first tests insertion into a hash table preallocated to be large enough to hold all elements, the second tests lookup in a full hash table, and the third tests inserting while growing the hash table from its smallest size. The results are relatively unsurprising; the naive dynamic array set is almost never worthwhile, even on very small lists, except for growing, since it does not need to redo insertions when growing. Direct chaining performs particularly poorly in the growing test, since it needs to free up the memory for the old chains and allocate new ones whenever the table is resized (coallesced hashing may do better here), and because resizing open addressing hash tables by a factor of 2 has good cache performance (each bucket maps to only 2 new buckets). Time per operation is constant for a while, but increases as the table becomes too large to fit in the L2 cache. Quadratic probing is probably the best choice across the board.

The slowdown for growing versus insertion into a preallocated array ranges from 1.5x to 3x, and is worst for small lists - this is why the dynamic array set dominates here. There might be an interesting way to speed up resizing.





All content in this document is granted into the public domain. Where this is not legally possible, the copyright owner releases all rights. This notice may be modified or removed.