Skip to content

Send early errors to workers#20822

Open
ilevkivskyi wants to merge 2 commits intopython:masterfrom
ilevkivskyi:send-errors-to-worers
Open

Send early errors to workers#20822
ilevkivskyi wants to merge 2 commits intopython:masterfrom
ilevkivskyi:send-errors-to-worers

Conversation

@ilevkivskyi
Copy link
Member

@ilevkivskyi ilevkivskyi commented Feb 16, 2026

Ref #933

When parallel parsing is ready, we will not run load_graph() in each worker. So, we need to send early errors (mostly import errors) to the relevant workers. Couple notes here:

  • Now only_once messages are shown only once per worker. It is hard to fix, and I don't think we should really bother. It doesn't affect correctness, and may only slightly increase noise. Btw they were never truly only_once, when reading errors from cache we can get multiple occurrences of the same error/note.
  • While I am at it, I switch few more error to the standard file.py: error: some message format.

@github-actions

This comment has been minimized.

@ilevkivskyi ilevkivskyi marked this pull request as ready for review February 16, 2026 13:45
@github-actions
Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

attrs (https://github.com/python-attrs/attrs)
- mypy: can't read file 'tests/typing_example.py': No such file or directory
+ mypy: error: cannot read file 'tests/typing_example.py': No such file or directory

@ilevkivskyi
Copy link
Member Author

ilevkivskyi commented Feb 16, 2026

Btw @hauntsaninja mypy_primer output looks like we aren't really checking attrs now (but it is probably important). To clear, that error is a blocker.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions, mostly about things that I don't fully understand. Otherwise looks good.

try:
for id in scc.mod_ids:
state = graph[id]
# Extra step below is needed only because we are using local graph.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment wasn't clear to me. I'm not sure which extra step this refers to (parse_file or the state.tree is None check?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It refers to the whole if/else block. It can be avoided if we would not call load_graph() in the worker.

# Extra step below is needed only because we are using local graph.
# TODO: clone options when switching to coordinator graph.
if state.tree is None:
state.parse_file()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we parsing the file here? Is this related to error processing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we want to get the side effect of setting ignored/skipped lines.

source_set = BuildSourceSet(sources)
cached_read = fscache.read
errors = Errors(options, read_source=lambda path: read_py_file(path, cached_read))
# Record import errors so that they sn be replayed by the workers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: sn

# We don't need parsed trees in coordinator process, we parse only to
# compute dependencies.
state.tree = None
if not temporary:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we only have the del statement behind the not temporary condition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we need the tree until it is used by the caller who created temporary state to check for is_partial_stub_package, I will delete it there to make it more clear.

util.hard_exit(0)


def serve(server: IPCServer, ctx: ServerContext) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated: Add a docstring here?


for id in graph:
manager.import_map[id] = set(graph[id].dependencies + graph[id].suppressed)
manager.import_map[id] = graph[id].dependencies_set
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to error processing or the earlier changes to how we process suppressed modules?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a minor perf optimization (we never check for imports of suppressed deps after graph is loaded). I was scrolling past this line and decided it is not worth a separate PR.

# Cache for transitive dependency check (expensive).
self.transitive_deps_cache: dict[tuple[int, int], bool] = {}
# Resolved paths for each module in build.
self.path_by_id: dict[str, str] = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because we now delete modules in the middle of a build, or is there another reason for maintaing this here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly for convenience, to avoid passing graph around just for the purpose of mapping id to xpath. In most cases we identify modules by id, but in error-tracking-related code it is always xpath.

def __init__(self, *, graph: Graph, missing_modules: set[str]) -> None:
self.graph = graph
self.missing_modules = missing_modules
self.from_cache = {mod_id for mod_id in graph if graph[mod_id].meta}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we might calculate this on demand using graph, and it wouldn't be much slower than a set lookup. Why are we storing this separately here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The receiver will not be able to do this after we stop calling load_graph() there. This is because we don't serialize State.meta (it would double the size). The worker only needs to know whether the state is from cache or was already parsed by coordinator. I will try to restructure some code in other places to make this more obvious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants