Commit 5e2ef78
authored
Split type-checking into interface and implementation in parallel workers (#21119)
The general idea is very straightforward: when doing type-checking, we
first type-check only module top-levels and those functions/methods that
define/infer externally visible variables. Then we write cache and send
new interface hash back to coordinator to unblock more SCCs early. This
makes parallel type-checking ~25% faster.
However, this simple idea surfaced multiple quirks and old hacks. I
address some of them in this PR, but I decided to handle the rest in
follow up PR(s) to limit the size of this one.
First, important implementation details:
* On each `select()` call, coordinator collects all responses, both
interface and implementation ones, and processes them as a single batch.
This simplifies reasoning and shouldn't affect performance.
* We need to write indirect dependencies to a separate cache file, since
they are only known after processing function bodies. I combine them
together with error messages in files called `foo.meta_ex.ff`. Not 100%
sure about the name, couldn't find anything more meaningful.
* Overload signatures are now processed as part of the top-level in type
checker. This is a big change, but it is unavoidable and it didn't cause
any problems with the daemon.
* Initializers (default values of function arguments) are now processed
as part of the top-levels (to match runtime semantics). Btw
@hauntsaninja you optimized them away in some cases, I am not sure this
is safe in presence of walrus, see e.g. `testWalrus`.
* `local_definitions()` now do not yield methods of classes nested in
functions. We add such methods to both symbol table of their actual
class, and to the module top-level symbol table, thus causing
double-processing.
Now some smaller things I already fixed:
* We used to have _three_ scoping systems to track current class in type
checker. One existed purely for the purpose of `TypeForm` support. I
think two is enough, so I deleted the last one.
* `AwaitableGenerator` return type wrapping used to happen during
processing of function body, which is obviously wrong.
* Invalid function redefinitions sometimes caused duplicate errors in
case of partial types/deferrals. Now they should not, as I explicitly
skip them after emitting first error.
* Some generated methods were not marked as such. Now they are.
Finally, some remaining problems and how I propose to address them in
followups:
* Narrowing of final global variables is not preserved in functions
anymore, see `testNarrowingOfFinalPersistsInFunctions`. Supporting this
will be tricky/expensive, it would require preserving binder state at
the point of each function definition, and restoring it later. IMO this
is a relatively niche edge case, and we can simply "un-support" it
(there is a simple workaround, add an assert in function body). To be
clear, there are no problems with a much more common use of this
feature: preserving narrowing in _nested_ functions/lambdas.
* Support for `--disallow-incomplete-defs` in plugins doesn't work, see
`testDisallowIncompleteDefsAttrsPartialAnnotations`. I think this should
be not hard to fix (with some dedicated cleaner support). I can do this
in a follow-up PR soon.
* Around a dozen incremental tests are skipped in parallel mode because
order of error messages is more unstable now (which is expected). To be
clear, we still group errors per module, but order of modules is much
less predictable now. If there are no objections, I am going to ignore
order of modules when comparing errors in incremental tests in a
follow-up PR.
* When inferred type variable variance is not ready, we fall back to
covariance, see `testPEP695InferVarianceNotReadyWhenNeeded`. However,
when processing function/method bodies in a later phase, variance is
ready more often. Although this is an improvement, it creates an
inconsistency between parallel mode, and regular mode. I propose to
address this by making the two-phase logic default even without parallel
checking, see below.
* Finally, there are few edge cases with `--local-partial-types` when
behavior is different in parallel mode, see e.g.
`testLocalPartialTypesWithGlobalInitializedToNone`. Again the new
behavior is IMO clearly better. However, it again creates an
inconsistency with non-parallel mode. I propose to address this by
enabling two-phase (interface then implementation) checking whenever
`--local-partial-types` is enabled (globally, not per-file), even
without parallel checking. Since `--local-partial-types` will be default
behavior soon (and hopefully the only behavior at some point), this will
allow us to avoid discrepancies between parallel and regular checking.1 parent c51a81e commit 5e2ef78
30 files changed
Lines changed: 720 additions & 327 deletions
File tree
- misc
- mypy
- build_worker
- test
- test-data/unit
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| |||
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
| 72 | + | |
72 | 73 | | |
73 | 74 | | |
74 | 75 | | |
| |||
115 | 116 | | |
116 | 117 | | |
117 | 118 | | |
118 | | - | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
119 | 131 | | |
120 | 132 | | |
121 | 133 | | |
| |||
0 commit comments