Skip to content

[Feature] : API ENDPOINTS PR 4 : Samples Endpoints#1133

Open
pulk17 wants to merge 7 commits into
CCExtractor:masterfrom
pulk17:api-pr4-samples
Open

[Feature] : API ENDPOINTS PR 4 : Samples Endpoints#1133
pulk17 wants to merge 7 commits into
CCExtractor:masterfrom
pulk17:api-pr4-samples

Conversation

@pulk17

@pulk17 pulk17 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Please prefix your pull request with one of the following: [FEATURE] [FIX] [IMPROVEMENT].

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.

My familiarity with the project is as follows (check one):

  • I have never used the project.
  • I have used the project briefly.
  • I have used the project extensively, but have not contributed previously.
  • I am an active contributor to the project.

Feature: Samples Endpoints (PR 4/6)

Executive Summary

⚠️ Note for Reviewers: This PR is stacked on top of PR 3 (#). Please review PR 3 first.
Because this PR builds upon the foundation from previous PRs, GitHub currently shows the combined file changes. Once PR 3 is merged into master, this PR will automatically update to only show the Samples specific files.

This Pull Request is Part 4 of 6 in the initiative to introduce a fully-featured JSON REST API.

This PR introduces the Samples Endpoints, responsible for querying individual media samples and their historical CI tracking records, as well as exposing the active RegressionTest catalogue.

(Note: Baseline Approval endpoints operate directly on test result data, so they have been logically grouped into the upcoming PR 5 (Results Data) for cleaner reviewing).

Architectural Additions & Enhancements

1. Samples & Regression Endpoints (mod_api/routes/samples.py)

Mounted at /api/v1/runs and /api/v1/samples, this PR exposes the core media tracking logic:

  • GET /runs/<run_id>/samples: Returns a paginated list of all samples processed within a specific CI run. Integrates tightly with the status determination engine built in PR 3 to compute pass/fail/missing states per file.
  • GET /runs/<run_id>/samples/<sample_id>: Retrieves highly detailed telemetry for a single sample within a run, including exit codes, execution times, and raw standard output/error logs.
  • GET /samples: Retrieves a comprehensive, paginated registry of all known media samples in the system.
  • GET /samples/<sample_id>: Fetches metadata for a specific media sample file.
  • GET /samples/<sample_id>/history: An advanced analytical endpoint that traces the execution history of a single sample across multiple historical CI runs, charting regressions and performance over time. (Includes N+1 optimizations for heavy batch-loading).
  • GET /regression-tests: Lists all registered regression tests, supporting filters to view only active/inactive tests (?active=true) and filters by tag (?tag=...).

2. Schemas & Serialization (mod_api/schemas/samples.py)

Implements robust marshmallow schemas (SampleSchema, RegressionTestSchema, RunSampleSchema) to safely serialize complex SQLAlchemy object graphs without leaking internal database IDs or relationships where inappropriate.

3. Structural Test Refactoring (tests/base.py)

Centralized repetitive token generation and database test object generation (setup_admin_user_and_test) into BaseTestCase, eliminating significant duplication across the test suite and speeding up future test additions.

Testing & Quality Assurance

  • Comprehensive Domain Coverage: The test suite now sits at 145 passing tests.
  • Historical Tracing Tested: The complex _process_history_entries logic, which aggregates statuses across historical runs, has been heavily verified against N+1 query regression.
  • Tag Filtering Tests Added: Ensures rigorous edge-case testing, avoiding duplicates and enforcing correct offset pagination.
  • Linting & Type Safety: 100% CI compliance verified. isort, pydocstyle, pycodestyle, and mypy pass perfectly on all introduced files locally and remotely.

📝 Known Caveats & Design Decisions

To address some of the review feedback, the following items were intentionally left as-is, representing conscious design tradeoffs:

  1. In-Memory Rate Limiting: Works perfectly for our current single-process deployment. Centralized redis stores deferred until horizontal scaling is required.
  2. Log Pagination Cursor as Line Offset: Performant enough at our scale and avoids byte-offset tracking complexity. A hard cap of 10,000,000 lines prevents malicious sizing.
  3. Status Filtering Loads Full Tables: Filtering by status pulls rows into memory to derive the status rather than forcing massive DB-level SQL complexity. Negligible performance hit for our dataset max limits.
  4. N+1 Lazy Loads in list_run_samples: The endpoint lazy-loads test results instead of one massive eager join. The ~300 tests limit makes this perfectly fine.
  5. Storage Status Checks Avoid blob.exists(): We rely on the database state to determine storage_status: 'ok' or 'degraded' without an expensive synchronous network call to Google Cloud Storage.
  6. Timing Attack Speculation: Both bcrypt (mod_auth) and Argon2 (api_token) use secure, constant-time verifications.

Next Steps

Following the review and merge of this PR, PR 5 (Results Data & Baseline Approvals) will be submitted. It will expose endpoints for fetching diff lines, raw test outputs, and approving new algorithmic baselines.

@pulk17 pulk17 changed the title [Feature] : API ENDPOINTS PR : Samples Endpoints [Feature] : API ENDPOINTS PR 4 : Samples Endpoints Jun 24, 2026
@pulk17 pulk17 force-pushed the api-pr4-samples branch 3 times, most recently from 0b4d808 to 60e7582 Compare June 25, 2026 17:53
@pulk17 pulk17 force-pushed the api-pr4-samples branch 7 times, most recently from d3e08e7 to 150f466 Compare June 25, 2026 19:13
@pulk17 pulk17 force-pushed the api-pr4-samples branch from 150f466 to f46ae6b Compare June 25, 2026 19:30
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant