You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Model Engine OnPrem Support and vLLM 0.11.1 + Model Engine Integration Fixes (#744)
* add support for on-prem
* clean up on-prem artificats
* add back comments from initial code
* fix lint
* use ecr image repo:tag directly
* fix: isort import ordering
* fix: remove unused infra_config import
* fix: mypy type annotation errors
* fix: remove type annotation causing mypy no-redef error
* fix: mypy type errors in s3_utils.py and io.py - use botocore.config.Config
* fix: mypy typeddict-item errors - use broad type ignore
* fix: update test mocks to use get_s3_resource from s3_utils
* test: add unit tests for s3_utils, onprem_docker_repository, and onprem_queue_endpoint_resource_delegate
* style: format test files with black
* refactor: use filesystem_gateway abstraction for S3 operations
Address review feedback to use the filesystem_gateway abstraction layer
instead of directly calling get_s3_client. This ensures on-prem S3
configuration logic is properly encapsulated in the gateway.
Changes:
- Add head_object, delete_object, list_objects methods to S3FilesystemGateway
- Update S3FileStorageGateway to use self.filesystem_gateway for all S3 ops
- Remove direct import of get_s3_client from s3_file_storage_gateway
* fix: deduplicate S3 client config by using centralized s3_utils
Refactor open_wrapper to use get_s3_client from s3_utils instead of
duplicating the on-prem S3 configuration logic. This ensures a single
source of truth for S3 client creation across the codebase.
* fix: add pagination to list_objects to handle >1000 objects
S3 list_objects_v2 returns max 1000 objects per request. Use paginator
to iterate through all pages and return complete results. Without this
fix, directories with >1000 files would silently return truncated results.
* fix: make OnPremDockerRepository.get_image_url consistent with ECR/ACR
Include docker_repo_prefix in image URL to match behavior of ECR and ACR
implementations. Also change image_exists logging from warning to debug
to reduce log noise on every deployment.
Updated tests to mock infra_config and verify prefix handling.
* refactor: add explicit on-prem branches in dependencies.py for clarity
Add explicit elif branches for on-prem cloud provider to make it clear
that S3-based gateways are intentionally used for on-prem (with MinIO
configuration applied via s3_utils). This improves code readability
and makes the on-prem support more discoverable.
* feat: implement Redis LLEN for queue depth in OnPremQueueEndpointResourceDelegate
Replace hardcoded queue depth with actual Redis LLEN call to enable
proper autoscaling based on queue metrics. Falls back to 0 gracefully
if Redis client is unavailable.
- Add optional redis_client parameter to constructor
- Implement lazy Redis client initialization
- Add tests for both with and without Redis scenarios
* fix: replace mutable default argument with None in _get_client
Using {} as a default argument is a Python anti-pattern that can cause
subtle bugs since the same dict instance is shared across calls.
Use Optional[Dict] = None pattern instead.
* refactor: extract inline import to module-level helper function
Move the infra_config import from inside the validator to a module-level
helper function _is_onprem_deployment(). This improves testability,
avoids repeated import overhead on each validation call, and follows
Python best practices for imports.
* fix: reduce excessive debug logging in s3_utils
Replace per-call debug logs with a one-time info log when S3 is
configured for on-prem. This prevents log spam from debug messages
firing on every S3 client creation.
- Extract common on-prem config to _get_onprem_client_kwargs helper
- Add _s3_config_logged flag to log endpoint only once
- Add return type annotations to get_s3_client and get_s3_resource
- Update tests to reset logging flag between tests
* chore: remove unused TYPE_CHECKING import
Clean up unused import left over from refactoring the inline import.
* fix: make Dockerfile multi-arch compatible for ARM/AMD64
Use architecture detection to download the correct binaries for
aws-iam-authenticator and kubectl. This enables building the image
for both ARM64 (Mac M1/M2) and AMD64 (CI/production) platforms.
* style: fix black formatting in test_onprem_queue_endpoint_resource_delegate
* fix: restore AWS_PROFILE env var fallback in s3_utils
The original code checked os.getenv('AWS_PROFILE') as a fallback when
no aws_profile kwarg was provided. This was accidentally removed during
refactoring, breaking S3 operations in CI where AWS_PROFILE may be set
via environment variable.
Restores the original behavior for AWS deployments while maintaining
the new on-prem path.
* fix: correct isort ordering in s3_filesystem_gateway.py
* fix: use Literal type for s3 addressing_style to satisfy mypy
* Onprem Compatibility Change
---------
Co-authored-by: Tarun <tarun.ravikumar@scale.com>
RUN curl -Lo /bin/aws-iam-authenticator https://github.com/kubernetes-sigs/aws-iam-authenticator/releases/download/v0.5.9/aws-iam-authenticator_0.5.9_linux_amd64
0 commit comments