Note: This website is archived. For up-to-date information about D projects and development, please visit wiki.dlang.org.

A comparison of Hash table implementations in D

Validity of pointer returned by "in" operator.

he D programming language (DPL) having the very convenient operator "in", which returns a pointer to a value. ( ValueType* val = key in AA; ) This is a useful function for detecting both the presence of a key, and being able to optionally copy from or access the original stored value as a pointer without copying. Ideally other implementations Associative Array (AA) and storage collections should not relocate previously stored data value returned by "in", since references to these may be held outside the table.

The builtin implementation of AA is immune to invalidating a pointer even after calling AA.remove(key), since the garbage collector (GC) will not delete the node until all node references have vanished. The "in" operator can be implemented by any class or struct type, and so the behaviour of alternate implementations is only limited by imagination. Other AA implementations may invalidate the value pointer. So careful coding should check the potential of other implementations for the transitory validity of the result of "in".

Immune AA implementations for "in" store both the key and its value in a memory block and store the pointer to this block, and leave the deletion of the block to be the work of the GC.

A non-immune AA implementation, for performance reasons, may store the memory directly in a the primary hash table, and the location may change.

Other features of AA implementations

The principle of Associative Arrays is to convert a key into a hash value as an index for a fast random access lookup into an array. All the implementations so far examined use a large primary table. The table entry index is calculated as hash % table.length. Table sizes are commonly selected from a table of prime numbers. This seems to allow better distribution of the table index calculation, as the hash values and table length never have a common factor.

The Java HashMap uses powers of 2, and so does the pyDict original in the dsource aa project. The Java HashMap also uses a simple bit randomizing rehash function on top of the key hash result, to ensure better randomisation of bits for not so good hash values. Having a prime number table size removes a requirement for a hash-rehash function.

Implementations commonly define a Node data structure, which encapsulates the key, value association, and most often caches the hash value of the key as well.

If the primary table stores Node pointers, then the Node structure contains one or more pointers to itself.

If the primary table stores the Node data structure directly, the Node data is relocatable.

A hybrid storage model can embed Node data structure in directly in the table, and also chain pointers to Node structures to handle table index collisions.

Most implementations allow a loading ratio to be specified, and allow an initial capacity to be set, so as to avoid needless table resize and rehash. The DPL builtin AA does not have this facility, and instead grows by a factor of 4 each time the loading ratio is exceeded.

Comparison of implementations

AA implementations differ in how table index collisions are handled. This can be done using tree or list starting at the table entry (Chaining), or by looking for another empty table entry using a hash value dependent algorithm (Open addressing)

To quote from Wikipedia (Wikepedia Hash Collision) on hash collision, "The primary disadvantage of open addressing is primary and secondary clustering, in which searches may access long sequences of used buckets that contain items with different hash addresses; items with one hash address can thus lengthen searches for items with other hash addresses.".

The implementations using open addressing also have to specially mark table entries when their keys are removed, as having a special status of being formerly occupied, since the entry may be on the lookup path of one or more sequences of buckets for items with different hash addresses.

and also regarding the use of trees for the chaining..

"Instead of a list, one can use any other data structure that supports the required operations. By using a self-balancing tree, for example, the theoretical worst-case time of a hash table can be brought down to O(log n) rather than O(n). However, this approach is only worth the trouble and extra memory cost if long delays must be avoided at all costs (e.g. in a real-time application), or if one expects to have many entries hashed to the same slot (e.g. if one expects extremely non-uniform or even malicious key distributions)."

So it might take rather extreme test cases to show a performance advantage to using trees.

AA using balanced tree.

The current version of D2 (dmd 2.042) uses a binary tree of nodes. The benchmarked implementation here is refered to as "hashtree". Each node, in addition to key, value and optional hash, has a left and right NodePtr. The decision for left or right allocation and search, is based on ordered comparison of first the hash value, and if hash values are equal, secondly the ordered comparison of the values. An advantage of the tree is that memory can be better utilised. The default implementation uses a loading factor of 4 times, meaning when full, there are four times as many nodes as the primary hash table size.

AA using single linked list.

The tango library hashmap uses a single linked list for table entries which collide to the same index. A new implementation of this was done by simplifying the hashtree to use a single linked list. This is the hashlist implementation. Each node contains one NodePtr, which points to the next node or is null. Also a variant of this was created to store the initial Node record directly in the primary table, called the hashtablist. Although the primary table becomes much larger, for low load factors the availability of preallocated memory directly in the table is a performance advantage.

AA using open addressing

The open addressing variant tested here is called the hashptr implementation. In hashptr, the Node record contains no pointers to itself, and the table stores a pointer to the Node. The next empty table entry is searched by multiply and add sequence , starting with the hash value, the same as a psuedo-random sequence from a linear congruential generator. Because the hashptr uses the table only, it must do a resize rehash, well before the number of nodes the table length (0.6), whereas the list and tree implementations do a resize when the number of nodes exceeds 0.7 of table length. The tree and list implementations get to use barely 50% of the table entries, because the collided nodes are linked. As a result the hashptr table is only 35% full for the nodes = 1101008 test. The calculated table size is at least 1101008 / 0.6 = 1835014. The next highest prime number in the primes list is 3145739, so optimal hashing is accompanied by use of more memory. When comparing results of different implementations, the number of nodes may be the same but the table size and closeness to full capacity and the significance of the table capacity may differ.

There is also a pydict implementation, which is a variant of open addressing that stores the Node record directly in the hash table.

Benchmark program

Example of command line usage: aatest-r.exe n=30 m=1101008 -i hashlist -t uu -l 4

n=<number of runs>

m=<number of keys>

-g <data file path> ; generate a data file, must also include n, m and -t options

-d <data file path> ; read a data file, uses the stored n, m and -t options

-l load ratio, use of depends on implementation, value between 0.5 and 16.0

-t <key value type>

uu uint[uint]
su uint[string]
cs uint[WrapString]
cu uint[WrapUint]
us string[uint]

-i implementation

builtin The DPL builtin AA
hashtree Template version of the builtin AA, with more features. Array of node pointers
hashlist Single linked list version. Array of node pointers
hashptr Open addressing version, linear congruential sequence. Array of node pointers
hashtablist Direct Node storage plus single link chaining. Array of node records
pydict Direct Node storage and open addressing. Array of node records
builtin-list Single linked list aaA.d druntime replacement. Runtime optimized for integral keys. Capacity, load ratio management. Hash table statistics.

Hash collision distributions

With a reasonable distribution of hash values, the tree size of any node will not be expected to be large. In fact the majority of trees will have only 1 - 3 nodes. The following distributions of tree sizes were obtained using a hashlist table with 1101008 nodes, where 1 more node will trigger a rehash where number of nodes is 0.7 of table length.

uint[uint], Table size 1572869, nodes = 1101008, occupied table entries = 50.41%

distribution of list length total
1 49.77 49.77
2 34.83 84.60
3 12.09 96.69
4 2.80 98.49
5 0.45 99.94
6 0.05 99.99
7 0.00 100
8 0.00

uint[string], Table size 1572869, nodes = 1101008, occupied table entries = 48.97%

distribution of list length total
1 48.08 48.08
2 32.62 80.70
3 12.19 92.88
4 3.82 96.70
5 1.45 98.15
6 0.81 98.96
7 0.48 99.44
8 0.27 99.71
9 0.17 99.88
10 0.07 99.96
11 0.03 99.99
12 0.00 100
13 0.00
14 0.00

Lookup performance comparisons.

Because of the differing implementation strategies, the performance needs to be compared over a range of load factors. Implementations with open addressing have poor performance as load factor increases over 0.6.

Note the builtin implementation always uses a default maximum loading of 4, but while growing it is somewhere between 0.5 and 4.

The performance factor most often of concern for AA is the lookup time. The AA datatype is often used in insert once, lookup many times mode.

Lookup times, using various loads and implementations,

n=30, m=1,101,009

Occupied % is for chained types (hashtree, hashlist, hashtablist).

The value for hashptr and pydict was 35% for same table size of 3,145,739

Builtin was retro-fitted with a statistic function, and was at a load ratio of 1 for 1,101,009 nodes.

This is used to establish an approximate equivalent performance point between hashtree and builtin.

To normalize and relativize the times taken, a timed task of 1 million random uint values where allocated to an array of size 1024 repeated in sequence, and followed by 1 randomized swaps from the same buffer. The time for this task is checked for stability, before each run. Runs with more than 1 stdev in variation for this are removed. The average time is used to normalize the result. The normalizing task takes about 0.11 sec on my machine, so multiply by this for raw times.

uint[WrapString]

WrapString is a class containing a string and a custom hash function.

Load 0.5 1 2 4 8 16
% Occupied* 29.5 50.4 75.4 94.0 99.7 100
Table Size 3,145,739 1,572869 786,433 393,241 196,613 98,317
builtin 3.63
hashtree 3.55 3.60 3.83 4.27 5.03 7.54
hashlist 3.61 3.68 4.11 4.69 5.95 8.69
hashptr 3.72
hashtablist 3.45 3.54 3.86 4.52 6.00 8.32
pydict 4.57
builtin-list 3.59 3.64 3.85 4.42 6.03 8.11

The WrapString custom hash function uses 31 as a multiplier instead of 11, has does Java hash implementation. Theis seems to give slightly better hash key distributions for these test keysets.

The trends in normalized times do reflect the implementation strategies. Under a low load and corresponding large table size, the collision chains are short, and the times are close to that of the open addressing model (hashptr), of storing or not storing a pointer to a node in the primary table.

The hashptr implementation performs quite well, which indicates that this style of algorithm , with whatever sort of index sequence algorithm, is no better than a chained implementation under equivalent load conditions. The best performer in low load for lookups was the hashtablist, which did not become worse than the hashtree until a loading of 4. Because of the primary use of direct node storage in the table, inserts and cleanup are notably faster for hashtablist as well under loadings 0.5, 1 and 2. The pydict uses both direct node storage (only) and open addressing, but its performance was dissappointing. Performance appeared to be good for pydict until I changed it so that it stopped using the pointer hash and compare , and called the class toHash and opCmp, like the others.

As the load ratio increases, the list implementations take progressively longer. The hashtree implementation degrades the least because of its balanced tree lookup, compared to the linear search for the list chains.

uint[uint]

Raw uint keys and values have good performance.

Load 0.5 1 2 4 8 16
% Occupied* 29.5 50.3 75.4 93.9 99.6 100
Table Size 3,145,739 1,572869 786,433 393,241 196,613 98,317
builtin 2.21
hashtree 1.78 2.05 2.32 2.69 3.23 4.03
hashlist 1.85 2.01 2.26 2.79 3.99 6.18
hashptr 1.98
hashtablist 1.57 1.75 2.05 2.53 3.65 5.93
pydict 1.61
builtin-list 1.69 1.69 1.93 2.46 3.42 5.93

At performance levels shown here, little tweaks can make a difference, such as the struct wrappers for hashtree, hashlist, hashtablist and hashptr all calling the implementation getNode directly. Again load levels of 4 and greater are necessary to gain an advantage using balanced trees in hashtree. At lower loads their is an advantage for direct node storage for hashtablist. The hashptr open addressing implementation is a little bit slower, even when the table only 35% occupied.

The slightly disappointing performance of builtin,is due to fact that this is a generic AA implementation, and does not take advantage of special templates for integer types, as do the hash* implementations. It still stores a separate hash, and have an extra builtin overhead call on TypeInfo for the getHash and compare.

The builtin-list is designed as a single linked AA drop in replacement for the D runtime aaA.d. It checks the TypeInfo of the key and pushes it into the space for hash value if it is positively identified as an integral equivalent bitty type.

uint[string]

Load 0.5 1 2 4 8 16
% Occupied* 28.6 49 74 94.0 99.6 100
Table Size 3,145,739 1,572869 786,433 393,241 196,613 98,317
builtin 5.36
hashtree 5.30 5.29 5.62 5.86 6.47 7.19
hashlist 5.19 5.36 5.60 6.34 7.43 9.97
hashptr 5.47
hashtablist 4.45 4.53 4.91 5.66 7.00 9.41
pydict 5.10
builtin-list 5.38

In case there was curiosity about using strings directly as keys, instead of a class wrapper. Performance is nearly 2 times worse at low load ratio, and surprisingly does affect lookups. At high load ratio the additive overhead is not so noticeable.

The principle difference is that the hash value is calculated once for each lookup. The string key sizes vary randomly in length from 1 to 20 characters, and the overhead of recalculating a key hash, and doing a compare or two for each lookup accumulates. The overhead is substantially reduced when the hash value is cached in the WrapString class, but even there a opCmp must check for exact match.

If the key / value is reversed to get a string[uint], the performance is that of uint[uint].

string[uint]

Load 0.5 1 2 4 8 16
% Occupied* 28.6 49 74 94.0 99.6 100
Table Size 3,145,739 1,572869 786,433 393,241 196,613 98,317
builtin 2.06
hashtree 1.65 1.78 2.03 2.35
hashlist 1.72 1.87 2.46 2.78
hashptr 1.80
hashtablist 1.43 1.55 1.86 2.40
pydict
builtin-list 1.58

The biggest determinant of AA performance.

The biggest determinant of lookup performance is the nature and number of key comparisons done by the AA implementation, and how directly accessible is the associated data.

Having a good key strategy is more important than which particular AA implementation.

Wrapping string keys in a class and caching the hash value can improve performance by removing the need for a full hash calculation for each lookup key. Having shorter string keys will help the complete key compare.

All AA implementations perform better under lower load ratios, which is a ratio of the number of nodes to the table size.

Under reasonable load ratios, of 0.5 to 2, the best performer was the hashtablist. It probably works better because for a large number of hash table hits, the first hash hit gets the actual record in the primary table. For AA implementations which store a pointer to the node in a table, at least one further memory lookup to a chain start is required (hashlist and hashtree). This implementation will probably never be used as a built-AA type because its entries may be relocated during change operations. The speed advantage is probably not enough to outwiegh this.

An open addressing implementation, hashptr, which avoids relocation issues by storing a pointer in the primary table, and used a linear congruential generated sequence, was no better than the single link list chaining for managing hash collisions, and has considerable primary table space wastage, because the load ratio must be always less than 1 (or 0.6 for reasonable performance avoiding long lookup sequences).

The builtin AA type for DPL is designed for coping gracefully with heavy load ratios, but the implementation used does not allow the user to specify the load ratio, nor does the default table size grow and rehash strategy ever achieve a greater load ratio than 3. Its resize automatically occurs when the number of nodes becomes equal to 4 times the primary table size. Consequently the overhead of the extra pointer for each node is never a great advantage, and closely equal performance would occur with a single linked list implementation used the same way. The results might be different if poor hash distribution keys were used.

The default string hash uses a character multiplier of 11. Experience with the WrapString class suggests a multiplier of 31 would be better (as used in Java).

The implementations tested here were given capacity and load management, plus a simple hash statistic distribution reporting function. Each one has a structure wrapper to aid the drop-in replacement for the DPL builtin AA, and so can improve flexibility. They all use the TypeInfo facility for templated key and value management.

The builtinAA whatever its underlying implementation, could be given more functionality and flexibility by adding to it .capacity, .loadRatio and .statistics properties, similar to that demonstrated with the test implementations.

A single linked list with these properties (not integrated into D compiler) has been uploaded to be the current version of dsource\projects\aa\trunk\druntime\aaA.d

The biggest determinant of performance is still key choice and management, followed by AA load management. Actual implementations should try to help the first two as much as possible.