Showing posts with label Data Structure. Show all posts
Showing posts with label Data Structure. Show all posts

Monday, December 1, 2008

Update on the Kompimi C Data Structure Library: One of the most useful common collections is the associative array. Today I completed work on two of the three data structures planned to support this abstraction, the hash table and the "dynamic array set", which is simply a dynamic array that uses linear search for lookup, intended for very small collections and as a performance baseline.

Performance results are shown below. The first tests insertion into a hash table preallocated to be large enough to hold all elements, the second tests lookup in a full hash table, and the third tests inserting while growing the hash table from its smallest size. The results are relatively unsurprising; the naive dynamic array set is almost never worthwhile, even on very small lists, except for growing, since it does not need to redo insertions when growing. Direct chaining performs particularly poorly in the growing test, since it needs to free up the memory for the old chains and allocate new ones whenever the table is resized (coallesced hashing may do better here), and because resizing open addressing hash tables by a factor of 2 has good cache performance (each bucket maps to only 2 new buckets). Time per operation is constant for a while, but increases as the table becomes too large to fit in the L2 cache. Quadratic probing is probably the best choice across the board.

The slowdown for growing versus insertion into a preallocated array ranges from 1.5x to 3x, and is worst for small lists - this is why the dynamic array set dominates here. There might be an interesting way to speed up resizing.





All content in this document is granted into the public domain. Where this is not legally possible, the copyright owner releases all rights. This notice may be modified or removed.

Thursday, November 20, 2008

C Data Structure Library: list is done

Today I finished my list implementation, based on unrolled doubly-linked linked lists with external storage on the C Data Structure Library. As a relatively complex data structure, it includes deep invariant checking in debug mode, and has a good set of unit tests to stress it. I built a release version for performance testing and compared it against a simple doubly-linked list doing similar operations. Preliminary results seem to indicate the complexity was worthwhile:
  • An insert at the end requires only 29 ns on average, as compared to 110 ns for a standard doubly-linked list.
  • An iteration over a million-element list requires only 3.7 ms, as compared to 9.5 ms per a standard doubly-linked list.
It's a good thing I did the perf test though - a few critical functions had to be converted into macros to get this performance (before, it was slower than doubly-linked lists with similarly inlined calls).

After this I varied ELEMENTS_PER_LIST_NODE a bit to examine how it affects the performance of these simple tests. The default value is 25. At the minimum, 2, it required about 75 ns per insert and 7 ms per iteration. Around 18, iteration time drops to 2.9 ms. For particularly large values, like 128, iteration drops to 2.2 ms and insert drop to 24.4 ns, but other benchmarks like insert_after would probably show a loss on these.

Update: Improvements to the perf test framework eliminated some of the overhead of the framework itself, which makes a difference for the very short insertion test. Currently inserts are down to 23.5 ns, compared to 98 ns for the standard doubly linked list (4.1x speedup).

All content in this document is granted into the public domain. Where this is not legally possible, the copyright owner releases all rights. This notice may be modified or removed.

Tuesday, November 18, 2008

Back and working on the C Data Structure Library

After a long hiatus I'm getting back in the game and looking to finish up the Kompimi C Data Structure Library, the first Kompimi public domain component. C is a language that has traditionally suffered from a reinvent-the-wheel problem due to its dearth of basic data structures, and this module aims to supply some modern, well thought-out, fully tested data structures for this purpose.

Recent changes include:
  • Modifying the build to deploy all derived files into a single directory. It's a good general policy to separate derived files from source files, so that they can be easily cleaned up and so new source files can be easily identified.
  • Documenting the interfaces for each module using Doxygen doc comments. Although Doxygen is not a public domain tool, it is optional and installed separately. These docs will be deployed at Kompimi's website as well, when kompimi.org goes up.
  • I'm working on a new list module, based on unrolled doubly-linked linked lists with external storage. This will provide a great generic solution for programs that do a lot of insertion and deletion.
More to come!

All content in this document is granted into the public domain. Where this is not legally possible, the copyright owner releases all rights. This notice may be modified or removed.

Wednesday, July 9, 2008

Kompimi C Data Structure Library: Dynamic Arrays

Although it's been slow going, the first Kompimi component is in development - the Kompimi C Data Structure Library. Its Sourceforge page is here:

http://sourceforge.net/projects/kompimi-cdsl/

So far I've implemented dynamic arrays, also known as vectors in C++, a vector permitting insertions and deletions that automatically grows to accomodate new elements. The interface was designed to include all the functionality of C++ vectors. The interface still needs some better documentation, and I need to adopt a consistent doc format that can be used to generate pretty interface web pages.

This container is very generally useful, and also surprisingly subtle in its implementation - ensuring that a sequence of insert/remove operations at the end of the array takes amortized constant time requires not only geometric expansion, but also requires that the expansion and unexpansion ratios be chosen in a way that prevents continual reallocation in an insert/remove/insert/remove/... sequence.

More concepts in the design document here:

http://sourceforge.net/docman/display_doc.php?docid=114694&group_id=231603

Preliminary unit tests are included. When this interface is thoroughly documented, next comes the next most important data structure: the list. I plan to implement this with unrolled linked lists to make it efficient.

All content in this document is granted into the public domain. Where this is not legally possible, the copyright owner releases all rights. This notice may be modified or removed.