Fix Xygeni parser deduplicating repeated SAST/Secrets findings in the same file by lmrb-1968 · Pull Request #15003 · DefectDojo/django-DefectDojo

lmrb-1968 · 2026-06-12T18:22:33Z

⚠️ Pre-Approval check ⚠️

This is a bug fix for an existing parser (the Xygeni parser added in #14769), which falls under the "Acceptable examples" in CONTRIBUTING (bug fix for an existing parser), so no separate pre-approval issue is filed.

Description

The Xygeni parser deduplicated away legitimate findings when the same secret value or code pattern appeared more than once in a single file, so only one occurrence showed up after import.

Root cause: Xygeni reuses a single uniqueHash across every occurrence of the same value in a file (it hashes the value, not the location), while giving each occurrence a distinct issueId that encodes the file path and line. The SAST and Secrets scan types deduplicate on unique_id_from_tool, which was set to uniqueHash — so occurrences after the first were marked as duplicates and hidden.

Fix: for SAST and Secrets, key unique_id_from_tool on the per-occurrence issueId (falling back to uniqueHash if issueId is absent), and keep uniqueHash as vuln_id_from_tool so it still groups occurrences of the same value. Each occurrence is now its own Finding.

SCA is intentionally left unchanged: there uniqueHash (CVE#package:version) is unique per finding, while issueId (SCA.CVE-…) collides across packages, so uniqueHash remains the correct dedup key.

No model, settings, or migration changes.

Test results

Extended unittests/tools/test_xygeni_parser.py:

Updated the existing SAST/Secrets/dispatch assertions to the new unique_id_from_tool (issueId) values and added vuln_id_from_tool == uniqueHash checks.
Added test_secrets_repeated_in_same_file_stay_distinct — a secret repeated on lines 9 and 29 of one file now yields two findings with distinct unique_id_from_tool.
Added test_sast_repeated_detector_in_same_file_stays_distinct — a detector flagged on four lines now yields four distinct findings.

Against the checked-in fixtures the parser now produces 61/61 distinct ids for Secrets and 501/501 for SAST (previously several collapsed). Ruff (0.15.15) passes on all changed files.

Documentation

Updated docs/content/supported_tools/parsers/file/xygeni.md — the Deduplication section now shows the per-scan-type unique_id_from_tool / vuln_id_from_tool mapping and explains why SAST/Secrets key on issueId while SCA keys on uniqueHash.

Checklist

Make sure to rebase your PR against the very latest dev. (Rebased onto the latest bugfix, the correct base for a bug fix.)
Bugfixes should be submitted against the bugfix branch.
Give a meaningful name to your PR, as it may end up being used in the release notes.
Your code is Ruff compliant (see ruff.toml).
Your code is python 3.13 compliant.
If this is a new feature and not a bug fix, you've included the proper documentation. (Not a new feature; docs updated anyway.)
Model changes must include the necessary migrations in the dojo/db_migrations folder. (No model changes.)
Add applicable tests to the unit tests.
Add the proper label to categorize your PR. (bugfix)

Dedup on the per-occurrence issueId instead of uniqueHash (which Xygeni reuses across occurrences of the same value), keeping uniqueHash as vuln_id_from_tool. SCA is unchanged. Adds regression tests and updates the parser docs.

valentijnscholten

Thanks for looking into this and the PR. The changes does mean that users will have to take a one time "hit" as existing findings will be closed when a reimport occurs where the unique_id_from_tool will now no longer match. Do you see any way around it? If not we'll need some notes in the upgrade notes for 2.59.1 (version number to be determined).
Another approach would be to create a V2 parser like we have done in other places. My feeling is for this parser it would be OK to take the one time hit.

lmrb-1968 · 2026-06-15T16:31:02Z

Thanks, @valentijnscholten. I don't see a clean way around the one-time hit. The old findings were stored with unique_id_from_tool set to uniqueHash, and there's no data on existing findings to recompute the correct issueId from, so any migration would be guessing. Whatever the new key is, it won't line up with what's already in the database, so the first reimport after upgrading will close the old findings and create the corrected ones. After that, reimports match normally.

I agree the one-time hit is acceptable here rather than a V2 parser, since the old behavior was hiding real findings. I've added an upgrade note to docs/content/releases/os_upgrading/2.59.2.md , as milestone marked was 2.59.2, describing the reimport behavior (one-time close/recreate for SAST and Secrets tests, SCA unaffected). Happy to adjust the wording or move it if you'd prefer it land in a different version's notes.

lmrb-1968 requested review from Maffooch and mtesauro as code owners June 12, 2026 18:22

github-actions Bot added docs unittests parser labels Jun 12, 2026

valentijnscholten reviewed Jun 14, 2026

View reviewed changes

valentijnscholten added this to the 2.59.2 milestone Jun 14, 2026

docs(xygeni): add 2.59.2 upgrade note for reimport behavior

baa6700

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Xygeni parser deduplicating repeated SAST/Secrets findings in the same file#15003

Fix Xygeni parser deduplicating repeated SAST/Secrets findings in the same file#15003
lmrb-1968 wants to merge 2 commits into
DefectDojo:bugfixfrom
xygeni:fix/xygeni-duplicate-occurrences

lmrb-1968 commented Jun 12, 2026 •

edited

Loading

Uh oh!

valentijnscholten left a comment

Uh oh!

lmrb-1968 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lmrb-1968 commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Pre-Approval check ⚠️

Uh oh!

valentijnscholten left a comment

Choose a reason for hiding this comment

Uh oh!

lmrb-1968 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lmrb-1968 commented Jun 12, 2026 •

edited

Loading