Several more atom mapping improvements#839
Conversation
|
Added a few more commits:
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #839 +/- ##
==========================================
- Coverage 63.05% 63.03% -0.03%
==========================================
Files 114 114
Lines 38069 38178 +109
Branches 9967 9990 +23
==========================================
+ Hits 24005 24065 +60
- Misses 11175 11221 +46
- Partials 2889 2892 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
0efe1ce to
52d3fca
Compare
52d3fca to
0c62c03
Compare
b3dfac2 to
2be1521
Compare
|
The previous error in CI was caused by |
alongd
left a comment
There was a problem hiding this comment.
Thanks! Please see some comments
| pairs.append((react, prod)) | ||
| p_cuts.pop(idx) | ||
| found = False | ||
| for res in res1: |
There was a problem hiding this comment.
can res1 be None? Shall we protect against it?
There was a problem hiding this comment.
I think this comment wasn't addressed. take a look?
There was a problem hiding this comment.
I think it was see L#1193
| from typing import TYPE_CHECKING | ||
| from __future__ import annotations | ||
|
|
||
| from typing import Dict, List, Optional, Tuple, TYPE_CHECKING |
There was a problem hiding this comment.
We don't need those in Python 3.14. remove (just keep TYPE_CHECKING as in the original) and rebuild the env?
There was a problem hiding this comment.
I need to read more about the new way python deals with typing. For now, I'm reverting these.
| break | ||
| return specific_entries | ||
| groups_str = "\n" + "".join(groups_as_lines) | ||
| # Split by entry( but keep the delimiter-ish part |
There was a problem hiding this comment.
see the comment, seems broken? "("
There was a problem hiding this comment.
No, that's the regex expression (L942)
970bd46 to
10d6f26
Compare
…ns and multiplicity When a bond in a cyclic molecule is broken (ring-opening), the resulting fragment must have new radical electrons assigned to the atoms that were part of the severed bond. This commit fixes _scissors() to: 1. Detect ring-opening cases (len(mol_splits) == 1). 2. Assign radical electrons to the cut atoms based on their missing valency. 3. Update the fragment's multiplicity. 4. Set keep_mol=True and provide final_xyz to the new ARCSpecies to ensure proper initialization. This prevents ValueError and SpeciesError during molecule perception when mapping reactions involving ring-opening/closing.
Added a check for None results from cut_species_based_on_atom_indices in map_rxn. If scission fails for reactants or products, the function now attempts to fall back to the next dictionary template (if available) or returns None with a logged error, preventing a TypeError when calling update_xyz on a None object.
Updated benzene scission tests to allow for multiple valid atom mappings due to the symmetry of the molecule. Replacing assertEqual with assertIn ensures the tests pass if any of the chemically equivalent valid maps are returned. This was also verified with the NEB, achieving the same TS with both AMs.
This commit introduces several optimizations to the atom mapping algorithm: 1. identify_superimposable_candidates: Reduced algorithmic complexity from O(N^2) to O(N) by starting graph traversal from only the first heavy atom. For connected molecular graphs, this is sufficient to find all valid mappings (including symmetries) and significantly reduces redundant DFS calls. 2. prune_identical_dicts: Fixed a logic bug where dictionaries were incorrectly pruned if they shared a single key-value pair. Now correctly uses exact dictionary equality. 3. pairing_reactants_and_products_for_mapping: Pre-calculates resonance structures for all reactant fragments once before the pairing loop, avoiding redundant O(R*P) computations. 4. copy_species_list_for_mapping: Replaced the expensive spc.copy() (which uses dictionary serialization) with a lighter direct ARCSpecies instantiation to reduce overhead during the mapping process. safe generate_resonance_structures_safely
…erception Modified ARCSpecies.__init__ to skip the expensive mol_from_xyz() call (which performs molecule perception from 3D coordinates) when a valid Molecule object is already provided and the keep_mol flag is set. This significantly reduces initialization overhead during scission and atom mapping operations where the molecular graph is already known.
To facilitate new scissors logic
- Implement global ReactionFamily caching to avoid redundant instantiation. - Refactor ReactionFamily initialization to selectively load only leaf-level entries identified from the template, significantly reducing object creation overhead. - Improve RMG group file parsing robustness and performance using pre-compiled regex and recursive entry extraction for OR complexes. - Refine product generation logic to use pre-loaded groups, avoiding repeated parsing and object creation during mapping.
…on families - Fix generate_bimolecular_products to correctly handle swapped reactant orders relative to the family template while maintaining original molecule sequence for product generation. - Fix descent_complex_group to robustly split OR complexes by stripping whitespace from labels. - Fix typo logger.Error -> logger.error in ARCReaction multiplicity property. - Refine selective entry loading to traverse and preserve all necessary intermediate nodes for OR complex resolution.
ARC attempts to define a Z-matrix dihedral angle by locating a 4-atom sequence n - C - B - A that is not linear.
If the candidate atom A is terminal (only connected to B ) and the angle A - B - C is linear, the code attempts to retrieve a dummy atom on B from connectivity by checking for the symbol 'X' :
```python
x_neighbor = [neighbor for neighbor in b_neighbors if xyz['symbols'][neighbor] == 'X'][0]
```
However, in cases like C#CC=CC=C isomerization where the molecular builder has not yet placed/needed a dummy atom on B (e.g. C1 has no dummy atom because the angle C1 - C2 - C3 is not linear, and C0 has not yet been fully integrated when defining C1 ), the dummy atom 'X' does not exist in b_neighbors . This resulted in an empty list
slice [0] raising IndexError: list index out of range , crashing the Z-matrix construction and preventing the reaction mapping engine from finding the atom map.
wrapping the dummy atom lookup and mapping checks in a try-except block for (IndexError, ValueError) . If the dummy atom cannot be resolved on B , the search now cleanly breaks from the inner loop to evaluate other candidate atoms instead of crashing.
Greedy algorithms cost in success rate. Change to a backtracking logic.
Due to a large amount of them during mapping procedures.
The tests are hard-coded, therefore failures will occure in symmetric species, unless less rigidity is added and all cases accounted for.
This PR improves several aspects of atom mappings.
two key fixes:
ARCSpecies._scissors. This is a major issue that improves both accuracy and performence:Moreover, this is critical for cyclic species, since debugging showed we did not include the original XYZ in the ARCSpecies, which caused errors.
identify_superimposable_candidates, which was being calledOther fixes added, and more generalization of AM testing (manually verified)