[Feature] : API ENDPOINTS PR 4 : Samples Endpoints by pulk17 · Pull Request #1133 · CCExtractor/sample-platform

pulk17 · 2026-06-24T11:21:36Z

Please prefix your pull request with one of the following: [FEATURE] [FIX] [IMPROVEMENT].

In raising this pull request, I confirm the following (please check boxes):

I have read and understood the contributors guide.
I have checked that another pull request for this purpose does not exist.
I have considered, and confirmed that this submission will be valuable to others.
I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
I give this submission freely, and claim no ownership to its content.

My familiarity with the project is as follows (check one):

I have never used the project.
I have used the project briefly.
I have used the project extensively, but have not contributed previously.
I am an active contributor to the project.

Feature: Samples Endpoints (PR 4/6)

Executive Summary

⚠️ Note for Reviewers: This PR is stacked on top of PR 3 (#). Please review PR 3 first.
Because this PR builds upon the foundation from previous PRs, GitHub currently shows the combined file changes. Once PR 3 is merged into master, this PR will automatically update to only show the Samples specific files.

This Pull Request is Part 4 of 6 in the initiative to introduce a fully-featured JSON REST API.

This PR introduces the Samples Endpoints, responsible for querying individual media samples and their historical CI tracking records, as well as exposing the active RegressionTest catalogue.

(Note: Baseline Approval endpoints operate directly on test result data, so they have been logically grouped into the upcoming PR 5 (Results Data) for cleaner reviewing).

Architectural Additions & Enhancements

1. Samples & Regression Endpoints (`mod_api/routes/samples.py`)

Mounted at /api/v1/runs and /api/v1/samples, this PR exposes the core media tracking logic:

GET /runs/<run_id>/samples: Returns a paginated list of all samples processed within a specific CI run. Integrates tightly with the status determination engine built in PR 3 to compute pass/fail/missing states per file.
GET /runs/<run_id>/samples/<sample_id>: Retrieves highly detailed telemetry for a single sample within a run, including exit codes, execution times, and raw standard output/error logs.
GET /samples: Retrieves a comprehensive, paginated registry of all known media samples in the system.
GET /samples/<sample_id>: Fetches metadata for a specific media sample file.
GET /samples/<sample_id>/history: An advanced analytical endpoint that traces the execution history of a single sample across multiple historical CI runs, charting regressions and performance over time. (Includes N+1 optimizations for heavy batch-loading).
GET /regression-tests: Lists all registered regression tests, supporting filters to view only active/inactive tests (?active=true) and filters by tag (?tag=...).

2. Schemas & Serialization (`mod_api/schemas/samples.py`)

Implements robust marshmallow schemas (SampleSchema, RegressionTestSchema, RunSampleSchema) to safely serialize complex SQLAlchemy object graphs without leaking internal database IDs or relationships where inappropriate.

3. Structural Test Refactoring (`tests/base.py`)

Centralized repetitive token generation and database test object generation (setup_admin_user_and_test) into BaseTestCase, eliminating significant duplication across the test suite and speeding up future test additions.

Testing & Quality Assurance

Comprehensive Domain Coverage: The test suite now sits at 145 passing tests.
Historical Tracing Tested: The complex _process_history_entries logic, which aggregates statuses across historical runs, has been heavily verified against N+1 query regression.
Tag Filtering Tests Added: Ensures rigorous edge-case testing, avoiding duplicates and enforcing correct offset pagination.
Linting & Type Safety: 100% CI compliance verified. isort, pydocstyle, pycodestyle, and mypy pass perfectly on all introduced files locally and remotely.

📝 Known Caveats & Design Decisions

To address some of the review feedback, the following items were intentionally left as-is, representing conscious design tradeoffs:

In-Memory Rate Limiting: Works perfectly for our current single-process deployment. Centralized redis stores deferred until horizontal scaling is required.
Log Pagination Cursor as Line Offset: Performant enough at our scale and avoids byte-offset tracking complexity. A hard cap of 10,000,000 lines prevents malicious sizing.
Status Filtering Loads Full Tables: Filtering by status pulls rows into memory to derive the status rather than forcing massive DB-level SQL complexity. Negligible performance hit for our dataset max limits.
N+1 Lazy Loads in list_run_samples: The endpoint lazy-loads test results instead of one massive eager join. The ~300 tests limit makes this perfectly fine.
Storage Status Checks Avoid blob.exists(): We rely on the database state to determine storage_status: 'ok' or 'degraded' without an expensive synchronous network call to Google Cloud Storage.
Timing Attack Speculation: Both bcrypt (mod_auth) and Argon2 (api_token) use secure, constant-time verifications.

Next Steps

Following the review and merge of this PR, PR 5 (Results Data & Baseline Approvals) will be submitted. It will expose endpoints for fetching diff lines, raw test outputs, and approving new algorithmic baselines.

sonarqubecloud · 2026-06-25T19:31:21Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

PR 1: Foundation and scaffolding

ae5de4f

pulk17 requested review from canihavesomecoffee and thealphadollar as code owners June 24, 2026 11:21

pulk17 changed the title ~~[Feature] : API ENDPOINTS PR : Samples Endpoints~~ [Feature] : API ENDPOINTS PR 4 : Samples Endpoints Jun 24, 2026

pulk17 added 4 commits June 25, 2026 15:54

Fix isort failure in mod_api/__init__.py

beb4fe9

PR 2: Auth and Token Management Endpoints

32e2fbc

PR 3: System Routes and Run Execution Endpoints

a114116

PR 4: Samples Endpoints

825c5fa

pulk17 force-pushed the api-pr4-samples branch 3 times, most recently from 0b4d808 to 60e7582 Compare June 25, 2026 17:53

Fix PR 4 feedback issues

9490a09

pulk17 force-pushed the api-pr4-samples branch 7 times, most recently from d3e08e7 to 150f466 Compare June 25, 2026 19:13

Merge branch 'master' into api-pr4-samples

f46ae6b

pulk17 force-pushed the api-pr4-samples branch from 150f466 to f46ae6b Compare June 25, 2026 19:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] : API ENDPOINTS PR 4 : Samples Endpoints#1133

[Feature] : API ENDPOINTS PR 4 : Samples Endpoints#1133
pulk17 wants to merge 7 commits into
CCExtractor:masterfrom
pulk17:api-pr4-samples

pulk17 commented Jun 24, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pulk17 commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Feature: Samples Endpoints (PR 4/6)

Executive Summary

Architectural Additions & Enhancements

1. Samples & Regression Endpoints (mod_api/routes/samples.py)

2. Schemas & Serialization (mod_api/schemas/samples.py)

3. Structural Test Refactoring (tests/base.py)

Testing & Quality Assurance

📝 Known Caveats & Design Decisions

Next Steps

Uh oh!

sonarqubecloud Bot commented Jun 25, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pulk17 commented Jun 24, 2026 •

edited

Loading

1. Samples & Regression Endpoints (`mod_api/routes/samples.py`)

2. Schemas & Serialization (`mod_api/schemas/samples.py`)

3. Structural Test Refactoring (`tests/base.py`)