Skip to content

Several more atom mapping improvements#839

Merged
kfir4444 merged 21 commits into
mainfrom
atom_mapping_n
Jun 21, 2026
Merged

Several more atom mapping improvements#839
kfir4444 merged 21 commits into
mainfrom
atom_mapping_n

Conversation

@kfir4444

Copy link
Copy Markdown
Collaborator

This PR improves several aspects of atom mappings.
two key fixes:

  1. Stopping conformer generation in ARCSpecies._scissors. This is a major issue that improves both accuracy and performence:
    1. Performence - less RDKit calls
    2. Accuracy - Generating conformers distorts the XYZ of the scissored products.
      Moreover, this is critical for cyclic species, since debugging showed we did not include the original XYZ in the ARCSpecies, which caused errors.
  2. the identify_superimposable_candidates, which was being called $\mathcal{O}(N^2)$ times and reduces to $\mathcal{O}(N)$. More calls are not required, and does not increase accuracy. (There is some more math behind it, and it can be discussed offline @alongd).

Other fixes added, and more generalization of AM testing (manually verified)

@kfir4444

Copy link
Copy Markdown
Collaborator Author

Added a few more commits:

  1. Mapping Performance Optimization (arc/family/family.py): Significantly improved atom-mapping performance (achieving a ~2.2x speedup) by implementing a global ReactionFamily cache, refactoring initialization to load only necessary leaf-level entries, and introducing robust recursive regex parsing for RMG group files.
  2. Robust Heuristics Testing (arc/job/adapters/ts/heuristics_test.py): Updated the get_new_zmat_2_map test to use atom isomorphism verification instead of rigid dictionary comparisons. This ensures tests pass even when valid atom mappings differ in order due to molecular symmetry.
  3. Reaction Family Detection Logic (arc/reaction/reaction.py): Optimized the ARCReaction class to cache family determination results more effectively, including a flag to prevent redundant searches for reactions where no family is found.
  4. Species Test Correction (arc/species/species_test.py): Adjusted the test_scissors verification logic to correctly handle isomorphism checks for cyclic species fragments.

@codecov

codecov Bot commented Mar 28, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.03%. Comparing base (96893fc) to head (a56bd2c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #839      +/-   ##
==========================================
- Coverage   63.05%   63.03%   -0.03%     
==========================================
  Files         114      114              
  Lines       38069    38178     +109     
  Branches     9967     9990      +23     
==========================================
+ Hits        24005    24065      +60     
- Misses      11175    11221      +46     
- Partials     2889     2892       +3     
Flag Coverage Δ
functionaltests 63.03% <ø> (-0.03%) ⬇️
unittests 63.03% <ø> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread arc/family/family.py Fixed
@kfir4444 kfir4444 force-pushed the atom_mapping_n branch 2 times, most recently from b3dfac2 to 2be1521 Compare June 6, 2026 20:07
@kfir4444

kfir4444 commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator Author

The previous error in CI was caused by get_entries incorrectly including intermediate Logical OR groups (e.g., OR{X_H, Xrad_H}) in the result dictionary when recursing through the RMG database hierarchy. Fixed now.

@alongd alongd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please see some comments

Comment thread arc/mapping/engine.py
pairs.append((react, prod))
p_cuts.pop(idx)
found = False
for res in res1:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can res1 be None? Shall we protect against it?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment wasn't addressed. take a look?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was see L#1193

Comment thread arc/mapping/engine.py
Comment thread arc/family/family.py Outdated
Comment thread arc/family/family.py
Comment thread arc/species/species.py Outdated
Comment thread arc/mapping/engine.py Outdated
Comment thread arc/family/family.py Outdated
from typing import TYPE_CHECKING
from __future__ import annotations

from typing import Dict, List, Optional, Tuple, TYPE_CHECKING

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need those in Python 3.14. remove (just keep TYPE_CHECKING as in the original) and rebuild the env?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to read more about the new way python deals with typing. For now, I'm reverting these.

Comment thread arc/family/family.py Outdated
Comment thread arc/species/species_test.py Dismissed
Comment thread arc/species/species_test.py Dismissed
Comment thread arc/family/family.py Outdated
Comment thread arc/family/family.py Outdated
break
return specific_entries
groups_str = "\n" + "".join(groups_as_lines)
# Split by entry( but keep the delimiter-ish part

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the comment, seems broken? "("

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's the regex expression (L942)

@kfir4444 kfir4444 force-pushed the atom_mapping_n branch 2 times, most recently from 970bd46 to 10d6f26 Compare June 14, 2026 13:46
Comment thread arc/common.py Dismissed
alongd
alongd previously approved these changes Jun 21, 2026

@alongd alongd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

kfir4444 added 7 commits June 21, 2026 08:29
…ns and multiplicity

When a bond in a cyclic molecule is broken (ring-opening), the resulting fragment must have new radical electrons assigned to the atoms that were part of the severed bond. This commit fixes _scissors() to:
1. Detect ring-opening cases (len(mol_splits) == 1).
2. Assign radical electrons to the cut atoms based on their missing valency.
3. Update the fragment's multiplicity.
4. Set keep_mol=True and provide final_xyz to the new ARCSpecies to ensure proper initialization.
This prevents ValueError and SpeciesError during molecule perception when mapping reactions involving ring-opening/closing.
Added a check for None results from cut_species_based_on_atom_indices in map_rxn. If scission fails for reactants or products, the function now attempts to fall back to the next dictionary template (if available)
or returns None with a logged error, preventing a TypeError when calling update_xyz on a None object.
Updated benzene scission tests to allow for multiple valid atom mappings due to the symmetry of the molecule. Replacing assertEqual with assertIn ensures the tests pass if any of the chemically equivalent valid maps are returned.
This was also verified with the NEB, achieving the same TS with both AMs.
This commit introduces several optimizations to the atom mapping algorithm:
1.  identify_superimposable_candidates: Reduced algorithmic complexity from O(N^2) to O(N) by starting graph traversal from only the first heavy atom. For connected molecular graphs, this is sufficient to find all valid mappings (including symmetries) and significantly reduces redundant DFS calls.
2.  prune_identical_dicts: Fixed a logic bug where dictionaries were incorrectly pruned if they shared a single key-value pair. Now correctly uses exact dictionary equality.
3.  pairing_reactants_and_products_for_mapping: Pre-calculates resonance structures for all reactant fragments once before the pairing loop, avoiding redundant O(R*P) computations.
4.  copy_species_list_for_mapping: Replaced the expensive spc.copy() (which uses dictionary serialization) with a lighter direct ARCSpecies instantiation to reduce overhead during the mapping process.
safe generate_resonance_structures_safely
…erception

Modified ARCSpecies.__init__ to skip the expensive mol_from_xyz() call (which performs molecule perception from 3D coordinates) when a valid Molecule object is already provided and the keep_mol flag is set. This significantly reduces initialization overhead during scission and atom mapping operations where the molecular graph is already known.
kfir4444 added 14 commits June 21, 2026 08:29
To facilitate new scissors logic
- Implement global ReactionFamily caching to avoid redundant instantiation.
- Refactor ReactionFamily initialization to selectively load only leaf-level entries
  identified from the template, significantly reducing object creation overhead.
- Improve RMG group file parsing robustness and performance using pre-compiled
  regex and recursive entry extraction for OR complexes.
- Refine product generation logic to use pre-loaded groups, avoiding repeated
  parsing and object creation during mapping.
…on families

- Fix generate_bimolecular_products to correctly handle swapped reactant orders relative to the family template while maintaining original molecule sequence for product generation.
- Fix descent_complex_group to robustly split OR complexes by stripping whitespace from labels.
- Fix typo logger.Error -> logger.error in ARCReaction multiplicity property.
- Refine selective entry loading to traverse and preserve all necessary intermediate nodes for OR complex resolution.
ARC attempts to define a Z-matrix dihedral angle by locating a 4-atom sequence  n  -  C  -  B  -  A  that is not linear.
  If the candidate atom  A  is terminal (only connected to  B ) and the angle  A  -  B  -  C  is linear, the code attempts to retrieve a dummy atom on  B  from  connectivity  by  checking for the symbol  'X' :
  ```python
    x_neighbor = [neighbor for neighbor in b_neighbors if xyz['symbols'][neighbor] == 'X'][0]
```
However, in cases like  C#CC=CC=C  isomerization where the molecular builder has not yet placed/needed a dummy atom on  B  (e.g.  C1  has no dummy atom because the angle  C1  -  C2  -  C3  is not linear, and  C0  has not yet been fully integrated when defining  C1 ), the dummy atom  'X'  does not exist in  b_neighbors . This resulted in an empty list
  slice  [0]  raising  IndexError: list index out of range , crashing the Z-matrix construction and preventing the reaction mapping engine from finding the atom map.
wrapping the dummy atom lookup and mapping checks in a  try-except  block for  (IndexError, ValueError) . If the dummy atom cannot be resolved on  B , the search now cleanly breaks from the inner loop to evaluate other candidate atoms instead of crashing.
Greedy algorithms cost in success rate. Change to a backtracking logic.
Due to a large amount of them during mapping procedures.
The tests are hard-coded, therefore failures will occure in symmetric species, unless less rigidity is added and all cases accounted for.
@kfir4444 kfir4444 merged commit 2d687fe into main Jun 21, 2026
7 of 8 checks passed
@kfir4444 kfir4444 deleted the atom_mapping_n branch June 21, 2026 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants