|
1 | | -// TODO: Rewrite performance tests using pyperf. |
2 | | -// TODO: Group similar functionality. |
3 | | -// TODO: Check refcounts when calling into hash and comparison functions. |
4 | | -// TODO: Check allocation and cleanup. |
5 | | -// TODO: Subinterpreter support. |
6 | | -// TODO: Docstrings and stubs. |
7 | | -// TODO: GC support. |
8 | 1 |
|
| 2 | +// For background on the hashtable design first implemented in AutoMap, see the following: |
| 3 | +// https://github.com/brandtbucher/automap/blob/b787199d38d6bfa1b55484e5ea1e89b31cc1fa72/automap.c#L12 |
9 | 4 |
|
10 | | -/******************************************************************************* |
11 | 5 |
|
12 | | -Our use cases differ significantly from Python's general-purpose dict type, even |
13 | | -when setting aside the whole immutable/grow-only and contiguous-integer-values |
14 | | -stuff. |
15 | | -
|
16 | | -What we don't care about: |
17 | | -
|
18 | | - - Memory usage. Python's dicts are used literally everywhere, so a tiny |
19 | | - reduction in the footprint of the average dict results in a significant gain |
20 | | - for *all* Python programs. We are happy to instead trade a few extra bytes |
21 | | - of RAM for a more cache-friendly hash table design. Since we don't store |
22 | | - values, we are still close to the same size on average! |
23 | | -
|
24 | | - - Worst-case performance. Again, Python's dicts are used for literally |
25 | | - everything, so they need to be able to gracefully handle lots of hash |
26 | | - collisions, whether resulting from bad hash algorithms, heterogeneous keys |
27 | | - with badly-combining hash algorithms, or maliciously-formed input. We can |
28 | | - safely assume that our use cases don't need to worry about these issues, and |
29 | | - instead choose lookup and collision resolution strategies that utilize cache |
30 | | - lines more effectively. This extends to the case of lookups for nonexistent |
31 | | - keys as well; we can assume that if our users are looking for something, |
32 | | - they know that it's probably there. |
33 | | -
|
34 | | -What we do care about: |
35 | | -
|
36 | | - - Creation and update time. This is *by far* the most expensive operation you |
37 | | - do on a mapping. More on this below. |
38 | | -
|
39 | | - - The speed of lookups that result in hits. This is what the mapping is used |
40 | | - for, so it *must* be good. More on this below. |
41 | | -
|
42 | | - - Iteration order and speed. You really can't beat a Python list or tuple |
43 | | - here, so we can just store the keys in one of them to avoid reinventing the |
44 | | - wheel. We use a list since it allows us to grow more efficiently. |
45 | | -
|
46 | | -So what we need is a hash table that's easy to insert into and easy to scan. |
47 | | -
|
48 | | -Here's how it works. A vanilla Python dict of the form: |
49 | | -
|
50 | | -{a: 0, b: 1, c: 2} |
51 | | -
|
52 | | -...basically looks like this (assume the hashes are 3, 6, and 9): |
53 | | -
|
54 | | -Indices: [-, 2, -, 0, -, -, 1, -] |
55 | | -
|
56 | | -Hashes: [3, 6, 9, -, -] |
57 | | -Keys: [a, b, c, -, -] |
58 | | -Values: [0, 1, 2 -, -] |
59 | | -
|
60 | | -It's pretty standard; keys, values, and cached hashes are stored in sequential |
61 | | -order, and their offsets are placed in the Indices table at position |
62 | | -HASH % TABLE_SIZE. Though it's not used here, collisions are resolved by jumping |
63 | | -around the table according to the following recurrence: |
64 | | -
|
65 | | -NEXT_INDEX = (5 * CURRENT_INDEX + 1 + (HASH >>= 5)) % TABLE_SIZE |
66 | | -
|
67 | | -This is good in the face of bad hash algorithms, but is sorta expensive. It's |
68 | | -also unable to utilize cache lines at all, since it's basically random (it's |
69 | | -literally based on random number generation)! |
70 | | -
|
71 | | -To contrast, the same table looks something like this for us: |
72 | | -
|
73 | | -Indices: [-, -, -, 0, -, -, 1, -, -, 2, -, -, -, -, -, -, -, -, -] |
74 | | -Hashes: [-, -, -, 3, -, -, 6, -, -, 9, -, -, -, -, -, -, -, -, -] |
75 | | -
|
76 | | -Keys: [a, b, c] |
77 | | -
|
78 | | -Right away you can see that we don't need to store the values, because they |
79 | | -match the indices (by design). |
80 | | -
|
81 | | -Notice that even though we allocated enough space in our table for 19 entries, |
82 | | -we still insert them into initial position HASH % 4. This leaves the whole |
83 | | -15-element tail chunk of the table free for colliding keys. So, what's a good |
84 | | -collision-resolution strategy? |
85 | | -
|
86 | | -NEXT_INDEX = CURRENT_INDEX + 1 |
87 | | -
|
88 | | -It's just a sequential scan! That means *every* collision-resolution lookup is |
89 | | -hot in L1 cache (and can even be predicted and speculatively executed). The |
90 | | -indices and hashes are actually interleaved for better cache locality as well. |
91 | | -
|
92 | | -We repeat this scan 15 times. We don't even have to worry about wrapping around |
93 | | -the edge of the table during this part, since we've left enough free space |
94 | | -(equal to the number of scans) to safely run over the end. It's wasteful for a |
95 | | -small example like this, but for more realistic sizes it's just about perfect. |
96 | | -
|
97 | | -We then jump to another spot in the table using a version of the recurrence |
98 | | -above: |
99 | | -
|
100 | | -NEXT_INDEX = (5 * (CURRENT_INDEX - 15) + 1 + (HASH >>= 1)) % TABLE_SIZE |
101 | | -
|
102 | | -...and repeat the whole thing over again. This collision resolution strategy is |
103 | | -similar to what Python's sets do, so we still handle some nasty collisions and |
104 | | -missing keys well. |
105 | | -
|
106 | | -There are a couple of other tricks that we use (like globally caching integer |
107 | | -objects from value lookups), but the hardware-friendly hash table design is what |
108 | | -really gives us our awesome performance. |
109 | | -
|
110 | | -*******************************************************************************/ |
111 | 6 | # include <math.h> |
112 | 7 | # define PY_SSIZE_T_CLEAN |
113 | 8 | # include "Python.h" |
|
0 commit comments