Skip to content

Commit 5e2ef78

Browse files
authored
Split type-checking into interface and implementation in parallel workers (#21119)
The general idea is very straightforward: when doing type-checking, we first type-check only module top-levels and those functions/methods that define/infer externally visible variables. Then we write cache and send new interface hash back to coordinator to unblock more SCCs early. This makes parallel type-checking ~25% faster. However, this simple idea surfaced multiple quirks and old hacks. I address some of them in this PR, but I decided to handle the rest in follow up PR(s) to limit the size of this one. First, important implementation details: * On each `select()` call, coordinator collects all responses, both interface and implementation ones, and processes them as a single batch. This simplifies reasoning and shouldn't affect performance. * We need to write indirect dependencies to a separate cache file, since they are only known after processing function bodies. I combine them together with error messages in files called `foo.meta_ex.ff`. Not 100% sure about the name, couldn't find anything more meaningful. * Overload signatures are now processed as part of the top-level in type checker. This is a big change, but it is unavoidable and it didn't cause any problems with the daemon. * Initializers (default values of function arguments) are now processed as part of the top-levels (to match runtime semantics). Btw @hauntsaninja you optimized them away in some cases, I am not sure this is safe in presence of walrus, see e.g. `testWalrus`. * `local_definitions()` now do not yield methods of classes nested in functions. We add such methods to both symbol table of their actual class, and to the module top-level symbol table, thus causing double-processing. Now some smaller things I already fixed: * We used to have _three_ scoping systems to track current class in type checker. One existed purely for the purpose of `TypeForm` support. I think two is enough, so I deleted the last one. * `AwaitableGenerator` return type wrapping used to happen during processing of function body, which is obviously wrong. * Invalid function redefinitions sometimes caused duplicate errors in case of partial types/deferrals. Now they should not, as I explicitly skip them after emitting first error. * Some generated methods were not marked as such. Now they are. Finally, some remaining problems and how I propose to address them in followups: * Narrowing of final global variables is not preserved in functions anymore, see `testNarrowingOfFinalPersistsInFunctions`. Supporting this will be tricky/expensive, it would require preserving binder state at the point of each function definition, and restoring it later. IMO this is a relatively niche edge case, and we can simply "un-support" it (there is a simple workaround, add an assert in function body). To be clear, there are no problems with a much more common use of this feature: preserving narrowing in _nested_ functions/lambdas. * Support for `--disallow-incomplete-defs` in plugins doesn't work, see `testDisallowIncompleteDefsAttrsPartialAnnotations`. I think this should be not hard to fix (with some dedicated cleaner support). I can do this in a follow-up PR soon. * Around a dozen incremental tests are skipped in parallel mode because order of error messages is more unstable now (which is expected). To be clear, we still group errors per module, but order of modules is much less predictable now. If there are no objections, I am going to ignore order of modules when comparing errors in incremental tests in a follow-up PR. * When inferred type variable variance is not ready, we fall back to covariance, see `testPEP695InferVarianceNotReadyWhenNeeded`. However, when processing function/method bodies in a later phase, variance is ready more often. Although this is an improvement, it creates an inconsistency between parallel mode, and regular mode. I propose to address this by making the two-phase logic default even without parallel checking, see below. * Finally, there are few edge cases with `--local-partial-types` when behavior is different in parallel mode, see e.g. `testLocalPartialTypesWithGlobalInitializedToNone`. Again the new behavior is IMO clearly better. However, it again creates an inconsistency with non-parallel mode. I propose to address this by enabling two-phase (interface then implementation) checking whenever `--local-partial-types` is enabled (globally, not per-file), even without parallel checking. Since `--local-partial-types` will be default behavior soon (and hopefully the only behavior at some point), this will allow us to avoid discrepancies between parallel and regular checking.
1 parent c51a81e commit 5e2ef78

30 files changed

Lines changed: 720 additions & 327 deletions

misc/diff-cache.py

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
from librt import base64
1919
from librt.internal import ReadBuffer, WriteBuffer
2020

21-
from mypy.cache import CacheMeta
21+
from mypy.cache import CacheMeta, CacheMetaEx
2222
from mypy.metastore import FilesystemMetadataStore, MetadataStore, SqliteMetadataStore
2323
from mypy.util import json_dumps, json_loads
2424

@@ -69,6 +69,7 @@ def normalize_meta(meta: CacheMeta) -> None:
6969
7070
Zero out mtimes and sort dependencies deterministically.
7171
"""
72+
# TODO: handle dep_hashes here and in relevant parts below.
7273
meta.mtime = 0
7374
meta.data_mtime = 0
7475
meta.dependencies, meta.suppressed, meta.dep_prios, meta.dep_lines = sort_deps(
@@ -115,7 +116,18 @@ def load(cache: MetadataStore, s: str) -> Any:
115116
return data
116117
normalize_meta(meta)
117118
return serialize_meta_ff(meta, version_prefix)
118-
if s.endswith((".data.ff", ".err.ff")):
119+
if s.endswith(".meta_ex.ff"):
120+
buf = ReadBuffer(data)
121+
meta = CacheMetaEx.read(buf)
122+
if meta is None:
123+
# Can't deserialize. Fall back to raw bytes as above
124+
return data
125+
meta.dependencies.sort()
126+
meta.suppressed.sort()
127+
outbuf = WriteBuffer()
128+
meta.write(outbuf)
129+
return outbuf.getvalue()
130+
if s.endswith(".data.ff"):
119131
return data
120132
obj = json_loads(data)
121133
if s.endswith(".meta.json"):

0 commit comments

Comments
 (0)