You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add lazy builtin matchers (with a separately compiled file), as well
as loading json or yaml files using lazy matchers.
Lazy matchers are very much a tradeoff: they improve import speed (and
memory consumption until triggered), but slow down run speed, possibly
dramatically:
- importing the package itself takes ~36ms
- importing the lazy matchers takes ~36ms (including the package, so
~0) and ~70kB RSS
- importing the eager matchers takes ~97ms and ~780kB RSS
- triggering the instantiation of the lazy matchers adds ~800kB RSS
- running bench on the sample file using the lazy matcher has
700~800ms overhead compared to the eager matchers
While the lazy matchers are less costly across the board until they're
used, benching the sample file causes the loading of *every* regex --
likely due to matching failures -- has a 700~800ms overhead over eager
matchers, and increases the RSS by ~800kB (on top of the original 70).
Thus lazy matchers are not a great default for the basic parser.
Though they might be a good opt-in if the user only ever uses one of
the domains (especially if it's not the devices one as that's by far
the largest).
With the re2 parser however, only 156 of the 1162 regexes get
evaluated, leading to a minor CPU overhead of 20~30ms (1% of bench
time) and a more reasonable memory overhead. Thus use the lazy matcher
fot the re2 parser.
On the more net-negative but relatively minor side of things, the
pregenerated lazy matchers file adds 120k to the on-disk requirements
of the library, and ~25k to the wheel archive. This is also what the
_regexes and _matchers precompiled files do. pyc files seem to be even
bigger (~130k) so the tradeoff is dubious even if they are slightly
faster.
Fixes#171, fixes#173
0 commit comments