Skip to content

Fix Xygeni parser deduplicating repeated SAST/Secrets findings in the same file#15003

Open
lmrb-1968 wants to merge 2 commits into
DefectDojo:bugfixfrom
xygeni:fix/xygeni-duplicate-occurrences
Open

Fix Xygeni parser deduplicating repeated SAST/Secrets findings in the same file#15003
lmrb-1968 wants to merge 2 commits into
DefectDojo:bugfixfrom
xygeni:fix/xygeni-duplicate-occurrences

Conversation

@lmrb-1968

@lmrb-1968 lmrb-1968 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

⚠️ Pre-Approval check ⚠️

This is a bug fix for an existing parser (the Xygeni parser added in #14769), which falls under the "Acceptable examples" in CONTRIBUTING (bug fix for an existing parser), so no separate pre-approval issue is filed.

Description

The Xygeni parser deduplicated away legitimate findings when the same secret value or code pattern appeared more than once in a single file, so only one occurrence showed up after import.

Root cause: Xygeni reuses a single uniqueHash across every occurrence of the same value in a file (it hashes the value, not the location), while giving each occurrence a distinct issueId that encodes the file path and line. The SAST and Secrets scan types deduplicate on unique_id_from_tool, which was set to uniqueHash — so occurrences after the first were marked as duplicates and hidden.

Fix: for SAST and Secrets, key unique_id_from_tool on the per-occurrence issueId (falling back to uniqueHash if issueId is absent), and keep uniqueHash as vuln_id_from_tool so it still groups occurrences of the same value. Each occurrence is now its own Finding.

SCA is intentionally left unchanged: there uniqueHash (CVE#package:version) is unique per finding, while issueId (SCA.CVE-…) collides across packages, so uniqueHash remains the correct dedup key.

No model, settings, or migration changes.

Test results

Extended unittests/tools/test_xygeni_parser.py:

  • Updated the existing SAST/Secrets/dispatch assertions to the new unique_id_from_tool (issueId) values and added vuln_id_from_tool == uniqueHash checks.
  • Added test_secrets_repeated_in_same_file_stay_distinct — a secret repeated on lines 9 and 29 of one file now yields two findings with distinct unique_id_from_tool.
  • Added test_sast_repeated_detector_in_same_file_stays_distinct — a detector flagged on four lines now yields four distinct findings.

Against the checked-in fixtures the parser now produces 61/61 distinct ids for Secrets and 501/501 for SAST (previously several collapsed). Ruff (0.15.15) passes on all changed files.

Documentation

Updated docs/content/supported_tools/parsers/file/xygeni.md — the Deduplication section now shows the per-scan-type unique_id_from_tool / vuln_id_from_tool mapping and explains why SAST/Secrets key on issueId while SCA keys on uniqueHash.

Checklist

  • Make sure to rebase your PR against the very latest dev. (Rebased onto the latest bugfix, the correct base for a bug fix.)
  • Bugfixes should be submitted against the bugfix branch.
  • Give a meaningful name to your PR, as it may end up being used in the release notes.
  • Your code is Ruff compliant (see ruff.toml).
  • Your code is python 3.13 compliant.
  • If this is a new feature and not a bug fix, you've included the proper documentation. (Not a new feature; docs updated anyway.)
  • Model changes must include the necessary migrations in the dojo/db_migrations folder. (No model changes.)
  • Add applicable tests to the unit tests.
  • Add the proper label to categorize your PR. (bugfix)

Dedup on the per-occurrence issueId instead of uniqueHash (which Xygeni reuses
across occurrences of the same value), keeping uniqueHash as vuln_id_from_tool.
SCA is unchanged. Adds regression tests and updates the parser docs.

@valentijnscholten valentijnscholten left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this and the PR. The changes does mean that users will have to take a one time "hit" as existing findings will be closed when a reimport occurs where the unique_id_from_tool will now no longer match. Do you see any way around it? If not we'll need some notes in the upgrade notes for 2.59.1 (version number to be determined).
Another approach would be to create a V2 parser like we have done in other places. My feeling is for this parser it would be OK to take the one time hit.

@valentijnscholten valentijnscholten added this to the 2.59.2 milestone Jun 14, 2026
@lmrb-1968

Copy link
Copy Markdown
Contributor Author

Thanks, @valentijnscholten. I don't see a clean way around the one-time hit. The old findings were stored with unique_id_from_tool set to uniqueHash, and there's no data on existing findings to recompute the correct issueId from, so any migration would be guessing. Whatever the new key is, it won't line up with what's already in the database, so the first reimport after upgrading will close the old findings and create the corrected ones. After that, reimports match normally.

I agree the one-time hit is acceptable here rather than a V2 parser, since the old behavior was hiding real findings. I've added an upgrade note to docs/content/releases/os_upgrading/2.59.2.md , as milestone marked was 2.59.2, describing the reimport behavior (one-time close/recreate for SAST and Secrets tests, SCA unaffected). Happy to adjust the wording or move it if you'd prefer it land in a different version's notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants