Skip to content

[Feature] : API ENDPOINTS PR 1: Foundation and scaffolding#1130

Open
pulk17 wants to merge 3 commits into
CCExtractor:masterfrom
pulk17:api-pr1-scaffolding
Open

[Feature] : API ENDPOINTS PR 1: Foundation and scaffolding#1130
pulk17 wants to merge 3 commits into
CCExtractor:masterfrom
pulk17:api-pr1-scaffolding

Conversation

@pulk17

@pulk17 pulk17 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

[Feature]

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.

My familiarity with the project is as follows (check one):

  • I have never used the project.
  • I have used the project briefly.
  • I have used the project extensively, but have not contributed previously.
  • I am an active contributor to the project.

Feature: API Foundation and Core Infrastructure (PR 1/6)

Executive Summary

This Pull Request represents Part 1 of 6 in the strategic initiative to introduce a fully-featured, spec-driven JSON REST API for the CCExtractor Sample Platform (superseding the monolithic approach in #1117).

The objective of this specific PR is to establish the foundational architecture, core infrastructure, and shared utilities required to support the API endpoints. By isolating the scaffolding into a dedicated PR, we ensure that the underlying security, validation, routing, and data models can be reviewed thoroughly without the noise of endpoint-specific business logic.

Note to Reviewers: This PR intentionally does not introduce any route endpoints. Endpoints will be introduced sequentially in upcoming PRs. Consequently, integration tests that specifically target middleware behavior via HTTP endpoint requests have been deferred to PRs 2 and 3, where their respective test targets are introduced.


Architectural Additions & Enhancements

1. Blueprint & Application Wiring

  • mod_api Initialization: Introduces a new Flask Blueprint mounted globally at /api/v1.
  • Application Bootstrapping: Updates run.py to seamlessly integrate the new API blueprint alongside the existing server-rendered HTML modules (mod_test, mod_sample, etc.) without causing namespace collisions or regressions.

2. Robust Middleware Infrastructure

The middleware stack is designed to intercept and process all incoming API requests before they reach the route handlers, ensuring global security and standardization:

  • Authentication (auth.py):
    • Implements stateless Bearer Token authentication via the Authorization header.
    • Introduces @require_scope and @require_roles decorators to enforce granular, principle-of-least-privilege access control at the endpoint level.
  • Global Error Handling (error_handler.py):
    • Replaces default Flask HTML error pages with standardized, structured JSON responses for all API routes.
    • Safely catches unhandled exceptions (500s) and routing errors (404/405) that bubble up to the application level.
  • Rate Limiting (rate_limit.py):
    • Implements a fixed-window rate-limiting algorithm.
    • Differentiates limits based on endpoint sensitivity (e.g., stricter limits for token generation vs. general GET requests) and keys limits by either IP address or active token.
    • Injects standard X-RateLimit-* headers into responses.
    • Known Limitation (Per-Process Limits): Rate limiting state is currently stored in-memory per process. With multiple Gunicorn workers (e.g., 4), the effective limit is multiplied by the number of workers (e.g., a 120 req/min limit becomes 480 req/min globally). A ticket will be filed to transition to Redis if exact global limits are required.
    • Known Limitation (Fixed Window): The algorithm uses a fixed window, meaning a client could theoretically burst requests at the boundary of a window, temporarily doubling the effective rate.
  • Security Headers (security.py):
    • Automatically appends best-practice security headers (Strict-Transport-Security, Content-Security-Policy, X-Content-Type-Options, X-Frame-Options) to harden API responses against common web vulnerabilities.
  • Request Validation (validation.py):
    • Provides declarative decorators (@validate_path_id, @validate_offset_pagination, etc.) to sanitize and validate incoming parameters.
    • Handles timezone normalization (naive vs. aware datetime harmonization) safely.

3. Data Models and Serialization Schemas

  • API Tokens (models/api_token.py):
    • Introduces the ApiToken SQLAlchemy model for tracking API access.
    • Implements secure, one-way token hashing (using argon2-cffi) ensuring raw tokens are never stored in the database.
  • Common Schemas (schemas/common.py):
    • Establishes Marshmallow schemas for consistent pagination and error reporting (PaginationSchema, ErrorResponseSchema), ensuring a predictable contract for API consumers.

4. Core Services & Shared Utilities

  • Status Derivation (services/status.py):
    • Standardizes the highly complex logic required to determine the status of a specific sample or a complete CI run.
    • Fix included: Resolves a critical bug where tests producing no output were silently marked as "passed" by explicitly comparing expected baseline outputs against actual result files.
  • Response Formatting (utils.py):
    • Provides unified helper functions (paginated_response, cursor_paginated_response) to guarantee identical JSON structures across all list endpoints.

5. Database Migration

  • d4f8e2a1b3c7_.py: Creates the api_token table and adds a github_login column to the user table.
    • The github_login column is bundled here (rather than in a separate migration) because PR 2 (Auth Endpoints) references it during token creation, and splitting it into its own migration would create a dependency ordering conflict between the two PRs.

Deployment Safety — is_dummy_row detection: The production database was queried to confirm no legacy sentinel rows exist that could be missed by the updated detection logic:

SELECT COUNT(*) FROM test_result_file
WHERE (test_id = -1 OR regression_test_id = -1)
  AND NOT (regression_test_output_id = -1 AND got = 'error');
-- Result: 0

Testing & Quality Assurance

This foundational layer is fully tested and enforces strict code-quality standards:

  • Unit Testing: All 26 foundational tests (test_models_api_token.py, test_services_status.py, test_utils.py) pass successfully.
  • Legacy Compatibility: Existing nose2 test suite passes with zero regressions.
  • Linting & Type Safety: Achieves 100% compliance across isort, pydocstyle, pycodestyle, and mypy (strict mode).

Next Steps

Following the approval and merge of this PR, PR 2 (Auth & Token Management Endpoints) will be submitted, which will build upon this foundation to expose the /auth/tokens endpoints and corresponding integration tests.

@cfsmp3 cfsmp3 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the two high complexity errors (validation.py and status.py), also see Claude review below.

Reading the rview, clearly H1 must be addressed before we can merge or things will break.

The rest could be addressed (possibly are) later on in the stack.

But let's make sure we can merge and test each PR in order but not at the same time, i.e. merging this diff shouldn't break the system, even if by itself it doesn't do anything useful (since it's scaffolding).

Claude review follows:

HIGH (blocker):

  • H1 — ApiToken model added with NO migration. The api_token table won't exist on real MySQL. Tests pass only because tests/base.py does create_all from models → masks the gap. PR2's auth endpoints break at runtime. #1117 had this migration (d4f8e2a1b3c7); dropped in the split. Must add it, chained off master head c8f3a2b1d4e5.

MEDIUM (carryover, code unchanged):

  • ~770 lines of security middleware merge untested — and with no routes yet, the before_request hooks never even fire in this PR. Deferred to PR2/3. Risk noted.
  • Rate limiter unbounded memory (no hard cap, eviction only every 100 req).
  • Auth timing oracle (no-candidate path skips argon2 verify → leaks whether a prefix exists).
  • _get_client_ip comment wrong (ProxyFix means remote_addr is from XFF).

LOW/NIT: run-level missing→fail path untested; stale is_dummy_row "DEPLOYMENT PREREQUISITE" docstring; N+1 footgun in the batch_get_run_data wrappers; pytest conftest.py inert under nose2; generic 429 handler hardcodes wrong limits.

@pulk17 pulk17 force-pushed the api-pr1-scaffolding branch 8 times, most recently from 8e813fc to 9950853 Compare June 25, 2026 09:57
@pulk17 pulk17 force-pushed the api-pr1-scaffolding branch from 9950853 to beb4fe9 Compare June 25, 2026 10:25
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants