Skip to content

Commit b5b6f7f

Browse files
Merge pull request #915 from dhellmann/graph-subset
feat(graph): add subset command to extract package-related subgraphs
2 parents c1c9de6 + aeda662 commit b5b6f7f

4 files changed

Lines changed: 413 additions & 2 deletions

File tree

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
Extracting Graph Subsets
2+
========================
3+
4+
The ``fromager graph subset`` command extracts a focused subgraph containing only the dependencies and dependents of a specific package. This is useful for understanding the impact scope of a particular package, debugging specific dependency issues, or creating smaller, more manageable graphs for analysis.
5+
6+
Basic Usage
7+
-----------
8+
9+
To extract a subset graph for a specific package:
10+
11+
.. code-block:: bash
12+
13+
fromager graph subset <graph-file> <package-name>
14+
15+
Example
16+
-------
17+
18+
Using the example graph file from the e2e test, let's extract a subset for the ``keyring`` package:
19+
20+
.. code-block:: bash
21+
22+
fromager graph subset e2e/build-parallel/graph.json keyring
23+
24+
This command will output a JSON graph containing:
25+
26+
- The ``keyring`` package itself
27+
- All packages that depend on ``keyring`` (dependents)
28+
- All packages that ``keyring`` depends on (dependencies)
29+
- The ROOT node if ``keyring`` is a top-level dependency
30+
31+
The resulting subset will include packages like:
32+
33+
- ``keyring==25.6.0`` (the target package)
34+
- ``imapautofiler==1.14.0`` (depends on keyring)
35+
- ``jaraco-classes==3.4.0`` (keyring dependency)
36+
- ``jaraco-context==6.0.1`` (keyring dependency)
37+
- ``jaraco-functools==4.1.0`` (keyring dependency)
38+
- And their transitive dependencies
39+
40+
Version Filtering
41+
-----------------
42+
43+
You can limit the subset to a specific version of the target package using the ``--version`` flag:
44+
45+
.. code-block:: bash
46+
47+
fromager graph subset e2e/build-parallel/graph.json setuptools --version 80.8.0
48+
49+
This is particularly useful when dealing with packages that have multiple versions in the graph, allowing you to focus on the relationships of a specific version.
50+
51+
File Output
52+
-----------
53+
54+
Save the subset graph to a file instead of printing to stdout:
55+
56+
.. code-block:: bash
57+
58+
fromager graph subset e2e/build-parallel/graph.json jinja2 -o jinja2-subset.json
59+
60+
The output file will be in the same JSON format as the original graph file and can be used as input to other ``fromager graph`` commands.
61+
62+
Use Cases
63+
---------
64+
65+
**Debugging Dependency Issues**
66+
When a specific package is causing build problems, extract its subset to focus on just the relevant dependencies without the noise of the full graph.
67+
68+
**Impact Analysis**
69+
Before upgrading or removing a package, understand what other packages would be affected by examining its dependents.
70+
71+
**Creating Focused Build Graphs**
72+
Generate smaller graphs for specific components of your application, making it easier to understand and manage complex dependency trees.
73+
74+
**Documentation and Communication**
75+
Create focused dependency diagrams for specific packages when documenting or explaining system architecture to team members.
76+
77+
**Performance Optimization**
78+
When working with very large dependency graphs, extract subsets to improve performance of analysis tools and reduce memory usage.
79+
80+
Example Workflow
81+
----------------
82+
83+
Here's a typical workflow for investigating a package's dependencies:
84+
85+
.. code-block:: bash
86+
87+
# Extract subset for a problematic package
88+
fromager graph subset my-project-graph.json problematic-package -o debug-subset.json
89+
90+
# Visualize the subset
91+
fromager graph to-dot debug-subset.json -o debug-subset.dot
92+
dot -Tpng debug-subset.dot -o debug-subset.png
93+
94+
# Analyze why specific dependencies appear
95+
fromager graph why debug-subset.json some-unexpected-dependency
96+
97+
This workflow helps you quickly isolate and understand issues within a complex dependency tree.
98+
99+
Output Format
100+
-------------
101+
102+
The subset command preserves the original graph structure and format. The output is a valid dependency graph that:
103+
104+
- Maintains all edge relationships between included nodes
105+
- Preserves requirement specifications and constraint information
106+
- Can be used as input to other graph commands
107+
- Is compatible with existing fromager workflows
108+
109+
Error Handling
110+
--------------
111+
112+
The command will report an error if:
113+
114+
- The specified package is not found in the graph
115+
- The specified version of a package is not found
116+
- The graph file is invalid or corrupted
117+
118+
Example error output:
119+
120+
.. code-block:: bash
121+
122+
$ fromager graph subset e2e/build-parallel/graph.json nonexistent-package
123+
Error: Package nonexistent-package not found in graph

docs/how-tos/graph-commands/index.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,17 +9,18 @@ All examples use the sample graph file ``e2e/build-parallel/graph.json`` which c
99
:maxdepth: 1
1010
:glob:
1111

12-
[uvw]*
12+
[euvw]*
1313

1414
Overview of Graph Commands
1515
--------------------------
1616

1717
The ``fromager graph`` command group provides several subcommands for analyzing dependency graphs:
1818

19+
- ``subset``: Extract a focused subgraph containing only dependencies and dependents of a specific package
1920
- ``why``: Understand why a package appears in the dependency graph
2021
- ``to-dot``: Convert graph to DOT format for visualization with Graphviz
2122
- ``explain-duplicates``: Analyze multiple versions of packages in the graph
2223
- ``to-constraints``: Convert graph to constraints file format
2324
- ``migrate-graph``: Convert old graph formats to the current format
2425

25-
These tools help you understand complex dependency relationships, debug unexpected dependencies, and create visual representations of your build requirements.
26+
These tools help you understand complex dependency relationships, debug unexpected dependencies, create focused subgraphs for analysis, and create visual representations of your build requirements.

src/fromager/commands/graph.py

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -459,6 +459,183 @@ def why(
459459
find_why(graph, node, depth, 0, requirement_type)
460460

461461

462+
@graph.command()
463+
@click.option(
464+
"-o",
465+
"--output",
466+
type=clickext.ClickPath(),
467+
help="Output file path for the subset graph",
468+
)
469+
@click.option(
470+
"--version",
471+
type=clickext.PackageVersion(),
472+
help="Limit subset to specific version of the package",
473+
)
474+
@click.argument(
475+
"graph-file",
476+
type=str,
477+
)
478+
@click.argument("package-name", type=str)
479+
@click.pass_obj
480+
def subset(
481+
wkctx: context.WorkContext,
482+
graph_file: str,
483+
package_name: str,
484+
output: pathlib.Path | None,
485+
version: Version | None,
486+
) -> None:
487+
"""Extract a subset of a build graph related to a specific package.
488+
489+
Creates a new graph containing only nodes that depend on the specified package
490+
and the dependencies of that package. By default includes all versions of the
491+
package, but can be limited to a specific version with --version.
492+
"""
493+
try:
494+
graph = DependencyGraph.from_file(graph_file)
495+
subset_graph = extract_package_subset(graph, package_name, version)
496+
497+
if output:
498+
with open(output, "w") as f:
499+
subset_graph.serialize(f)
500+
else:
501+
subset_graph.serialize(sys.stdout)
502+
except ValueError as e:
503+
raise click.ClickException(str(e)) from e
504+
505+
506+
def extract_package_subset(
507+
graph: DependencyGraph,
508+
package_name: str,
509+
version: Version | None = None,
510+
) -> DependencyGraph:
511+
"""Extract a subset of the graph containing nodes related to a specific package.
512+
513+
Creates a new graph containing:
514+
- All nodes matching the package name (optionally filtered by version)
515+
- All nodes that depend on the target package (dependents)
516+
- All dependencies of the target package
517+
518+
Args:
519+
graph: The source dependency graph
520+
package_name: Name of the package to extract subset for
521+
version: Optional version to filter target nodes
522+
523+
Returns:
524+
A new DependencyGraph containing only the related nodes
525+
526+
Raises:
527+
ValueError: If package not found in graph
528+
"""
529+
# Find target nodes matching the package name
530+
target_nodes = graph.get_nodes_by_name(package_name)
531+
if version:
532+
target_nodes = [node for node in target_nodes if node.version == version]
533+
534+
if not target_nodes:
535+
version_msg = f" version {version}" if version else ""
536+
raise ValueError(f"Package {package_name}{version_msg} not found in graph")
537+
538+
# Collect all related nodes
539+
related_nodes: set[str] = set()
540+
541+
# Add target nodes
542+
for node in target_nodes:
543+
related_nodes.add(node.key)
544+
545+
# Traverse up to find dependents (what depends on our package)
546+
visited_up: set[str] = set()
547+
for target_node in target_nodes:
548+
_collect_dependents(target_node, related_nodes, visited_up)
549+
550+
# Traverse down to find dependencies (what our package depends on)
551+
visited_down: set[str] = set()
552+
for target_node in target_nodes:
553+
_collect_dependencies(target_node, related_nodes, visited_down)
554+
555+
# Create new graph with only related nodes
556+
subset_graph = DependencyGraph()
557+
_build_subset_graph(graph, subset_graph, related_nodes)
558+
559+
return subset_graph
560+
561+
562+
def _collect_dependents(
563+
node: DependencyNode,
564+
related_nodes: set[str],
565+
visited: set[str],
566+
) -> None:
567+
"""Recursively collect all nodes that depend on the given node."""
568+
if node.key in visited:
569+
return
570+
visited.add(node.key)
571+
572+
for parent_edge in node.parents:
573+
parent_node = parent_edge.destination_node
574+
related_nodes.add(parent_node.key)
575+
_collect_dependents(parent_node, related_nodes, visited)
576+
577+
578+
def _collect_dependencies(
579+
node: DependencyNode,
580+
related_nodes: set[str],
581+
visited: set[str],
582+
) -> None:
583+
"""Recursively collect all dependencies of the given node."""
584+
if node.key in visited:
585+
return
586+
visited.add(node.key)
587+
588+
for child_edge in node.children:
589+
child_node = child_edge.destination_node
590+
related_nodes.add(child_node.key)
591+
_collect_dependencies(child_node, related_nodes, visited)
592+
593+
594+
def _build_subset_graph(
595+
source_graph: DependencyGraph,
596+
target_graph: DependencyGraph,
597+
included_nodes: set[str],
598+
) -> None:
599+
"""Build the subset graph with only the included nodes and their edges."""
600+
# First pass: add all included nodes
601+
for node_key in included_nodes:
602+
source_node = source_graph.nodes[node_key]
603+
if node_key == ROOT:
604+
continue # ROOT is already created in the new graph
605+
606+
# Add the node to target graph
607+
target_graph._add_node(
608+
req_name=source_node.canonicalized_name,
609+
version=source_node.version,
610+
download_url=source_node.download_url,
611+
pre_built=source_node.pre_built,
612+
constraint=source_node.constraint,
613+
)
614+
615+
# Second pass: add edges between included nodes
616+
for node_key in included_nodes:
617+
source_node = source_graph.nodes[node_key]
618+
for child_edge in source_node.children:
619+
child_key = child_edge.destination_node.key
620+
# Only add edge if both parent and child are in the subset
621+
if child_key in included_nodes:
622+
child_node = child_edge.destination_node
623+
target_graph.add_dependency(
624+
parent_name=source_node.canonicalized_name
625+
if source_node.canonicalized_name
626+
else None,
627+
parent_version=source_node.version
628+
if source_node.canonicalized_name
629+
else None,
630+
req_type=child_edge.req_type,
631+
req=child_edge.req,
632+
req_version=child_node.version,
633+
download_url=child_node.download_url,
634+
pre_built=child_node.pre_built,
635+
constraint=child_node.constraint,
636+
)
637+
638+
462639
def find_why(
463640
graph: DependencyGraph,
464641
node: DependencyNode,

0 commit comments

Comments
 (0)