Skip to content

feat: replace rank_packages() with cumulative-coverage impact scoring [CM-1228]#4204

Open
mbani01 wants to merge 5 commits into
mainfrom
feat/impact_score_v2
Open

feat: replace rank_packages() with cumulative-coverage impact scoring [CM-1228]#4204
mbani01 wants to merge 5 commits into
mainfrom
feat/impact_score_v2

Conversation

@mbani01

@mbani01 mbani01 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

This pull request updates the package criticality ranking system to use a new cumulative coverage-based approach and simplifies both the database schema and the worker script. The changes remove the previous weighted ranking and arbitrary top-N caps in favor of a more robust and interpretable method, and update the code to match the new logic and schema.

Database schema and ranking logic changes:

  • The package_criticality_spotlight table now uses a package_id foreign key instead of text-based identifiers, improving join performance and data integrity.
  • The rank_packages() function is rewritten to use cumulative coverage scoring: critical packages are those that together account for a specified percentage of each signal (e.g., downloads, dependents), and impact is calculated as the average of (1 − cumulative coverage) across signals. The function signature and logic are updated to support this.

Worker script updates:

  • The worker script (run-impact.ts) now accepts --cutoff and --ecosystems arguments instead of weights and top-N caps, aligning with the new ranking method. [1] [2]
  • The script's output and query parameters are updated to reflect the new function signature and return values.

Note

High Risk
Rewrites criticality scoring and spotlight schema, which will change which packages are flagged critical and their impact/rank across the product.

Overview
Replaces package criticality ranking with cumulative-coverage scoring and aligns spotlight overrides and the on-demand worker with the new model.

rank_packages() drops weighted PERCENT_RANK and per-ecosystem top-N JSON caps. It now takes coverage_cutoff (default 90%) and optional ecosystems, scores downloads / direct / transitive dependents per ecosystem, marks is_critical when a package sits in the smallest prefix that reaches the cutoff for any signal, sets impact to the average of (1 − cumulative_share) across signals with non-zero ecosystem totals, and ranks by impact. It updates packages directly and returns processed_rows (replacing separate scored/ranked counts). Spotlight rows still force is_critical via join.

package_criticality_spotlight switches from (ecosystem, namespace, name) to a package_id FK with a unique index on package_id for indexed joins.

run-impact.ts CLI drops --w-* and --top-n in favor of --cutoff and --ecosystems, and logs processed_rows from the new function signature.

Reviewed by Cursor Bugbot for commit 8348b06. Bugbot is set up for automated code reviews on this repo. Configure here.

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
@mbani01 mbani01 self-assigned this Jun 12, 2026
Copilot AI review requested due to automatic review settings June 12, 2026 12:06
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

ALTER TABLE package_criticality_spotlight
ADD COLUMN package_id bigint NOT NULL REFERENCES packages(id),
DROP COLUMN name,
DROP COLUMN namespace;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotlight migration missing backfill

High Severity

The migration adds package_id as NOT NULL and drops namespace and name in one step, with no UPDATE to populate package_id from existing (ecosystem, namespace, name) rows first. On any database that already has package_criticality_spotlight rows, the migration fails and spotlight overrides cannot be migrated.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit fdd2765. Configure here.

last_rank_pass_at = NOW(),
last_synced_at = NOW()
FROM final
WHERE p.id = final.id;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotlight overrides scope reduced

Medium Severity

Spotlight forcing of is_critical now happens only inside the final join for packages that were ranked in the same pass. The previous rank_packages() ran a separate spotlight UPDATE for all matching packages, so partial --ecosystems runs (or packages skipped when signal totals are zero) can leave spotlight entries without is_critical = TRUE.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit fdd2765. Configure here.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the packages criticality ranking pipeline to switch from weighted percentile ranking + top-N caps to a cumulative-coverage based scoring model, and adjusts both storage (spotlight overrides) and the packages_worker CLI accordingly.

Changes:

  • Replaces rank_packages() SQL logic with a cumulative-coverage approach and changes its signature to (coverage_cutoff, ecosystems), returning processed_rows.
  • Refactors package_criticality_spotlight to use a package_id FK instead of (ecosystem, namespace, name) matching.
  • Updates run-impact.ts CLI to accept --cutoff and --ecosystems and call the new SQL signature.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
services/apps/packages_worker/src/criticality/run-impact.ts Updates the worker CLI parsing/logging and SQL invocation to match the new rank_packages() signature.
backend/src/osspckgs/migrations/V1781262276__rank_packages_cumulative_coverage.sql Alters spotlight schema and replaces rank_packages() with cumulative-coverage scoring + bulk update of packages criticality fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +27 to 31
function parseListArg(flag: string): string[] | null {
const idx = process.argv.indexOf(flag)
return idx !== -1 ? process.argv[idx + 1] : fallback
if (idx === -1) return null
return process.argv[idx + 1].split(',').map((s) => s.trim())
}
Comment thread services/apps/packages_worker/src/criticality/run-impact.ts
…available signals

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Copilot AI review requested due to automatic review settings June 12, 2026 15:43
mbani01 and others added 2 commits June 12, 2026 16:45
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8348b06. Configure here.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

Comment thread services/apps/packages_worker/src/criticality/run-impact.ts
Comment thread services/apps/packages_worker/src/criticality/run-impact.ts
Comment thread services/apps/packages_worker/src/criticality/run-impact.ts Outdated
@mbani01 mbani01 requested a review from joanagmaia June 12, 2026 16:02
@mbani01 mbani01 changed the title feat: replace rank_packages() with cumulative-coverage impact scoring feat: replace rank_packages() with cumulative-coverage impact scoring [CM-1228] Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants