@@ -129,6 +129,103 @@ from here on::
129129 :class: `~ua_parser.caching.Local `, which is also caching-related,
130130 and serves to use thread-local caches rather than a shared cache.
131131
132+ Builtin Resolvers
133+ =================
134+
135+ .. list-table ::
136+ :header-rows: 1
137+ :stub-columns: 1
138+
139+ * -
140+ - speed
141+ - portability
142+ - memory use
143+ - safety
144+ * - ``regex ``
145+ - great
146+ - good
147+ - bad
148+ - great
149+ * - ``re2 ``
150+ - good
151+ - bad
152+ - good
153+ - good
154+ * - ``basic ``
155+ - terrible
156+ - great
157+ - great
158+ - great
159+
160+ ``regex ``
161+ ---------
162+
163+ The ``regex `` resolver is a bespoke effort as part of the `uap-rust
164+ <https://github.com/ua-parser/uap-rust> `_ sibling project, built on
165+ `rust-regex <https://github.com/rust-lang/regex >`_ and `a bespoke
166+ regex-prefiltering implementation
167+ <https://github.com/ua-parser/uap-rust/tree/main/regex-filtered> `_,
168+ it:
169+
170+ - Is the fastest available resolver, usually edging out ``re2 `` by a
171+ significant margin (when that is even available).
172+ - Is fully controlled by the project, and thus can be built for all
173+ interpreters and platforms supported by pyo3 (currently: cpython,
174+ pypy, and graalpy, on linux, macos and linux, intel and arm). It is
175+ also built as a cpython abi3 wheel and should thus suffer from no
176+ compatibility issues with new release.
177+ - Built entirely out of safe rust code, its safety risks are entirely
178+ in ``regex `` and ``pyo3 ``.
179+ - Its biggest drawback is that it is a lot more memory intensive than
180+ the other resolvers, because ``regex `` tends to trade memory for
181+ speed (~155MB high water mark on a real-world dataset).
182+
183+ If available, it is the default resolver, without a cache.
184+
185+ ``re2 ``
186+ -------
187+
188+ The ``re2 `` resolver is built atop the widely used `google-re2
189+ <https://github.com/google/re2> `_ via its built-in Python bindings.
190+ It:
191+
192+ - Is extremely fast, though around 80% slower than ``regex `` on
193+ real-world data.
194+ - Is only compatible with CPython, and uses pure API wheels, so needs
195+ a different release for each cpython version, for each OS, for each
196+ architecture.
197+ - Is built entirely in C++, but by experienced Google developers.
198+ - Is more memory intensive than the pure-python ``basic `` resolver,
199+ but quite slim all things considered (~55MB high water mark on a
200+ real-world dataset).
201+
202+ If available, it is the second-preferred resolver, without a cache.
203+
204+ ``basic ``
205+ ---------
206+
207+ The ``basic `` resolver is a naive linear traversal of all rules, using
208+ the standard library's ``re ``. It:
209+
210+ - Is *extremely * slow, about 10x slower than ``re2 `` in cpython, and
211+ pypy and graal's regex implementations do *not * like the workload
212+ and behind cpython by a factor of 3~4.
213+ - Has perfect compatibility, with the caveat above, by virtue of being
214+ built entirely out of standard library code.
215+ - Is basically as safe as Python software can be by virtue of being
216+ just Python, with the native code being the standard library's.
217+ - Is the slimmest resolver at about 40MB.
218+
219+ This is caveated by a hard requirement to use caches which makes it
220+ workably faster on real-world datasets (if still nowhere near
221+ *uncached * ``re2 `` or ``regex ``) but increases its memory requirement
222+ significantly e.g. using "sieve" and a cache size of 20000 on a
223+ real-world dataset, it is about 4x slower than ``re2 `` for about the
224+ same memory requirements.
225+
226+ It is the fallback and least preferred resolver, with a medium
227+ (currently 2000 entries) cache by default.
228+
132229Writing Custom Resolvers
133230========================
134231
0 commit comments