Skip to content

Commit ecc8ff4

Browse files
Model Engine OnPrem Support and vLLM 0.11.1 + Model Engine Integration Fixes (#744)
* add support for on-prem * clean up on-prem artificats * add back comments from initial code * fix lint * use ecr image repo:tag directly * fix: isort import ordering * fix: remove unused infra_config import * fix: mypy type annotation errors * fix: remove type annotation causing mypy no-redef error * fix: mypy type errors in s3_utils.py and io.py - use botocore.config.Config * fix: mypy typeddict-item errors - use broad type ignore * fix: update test mocks to use get_s3_resource from s3_utils * test: add unit tests for s3_utils, onprem_docker_repository, and onprem_queue_endpoint_resource_delegate * style: format test files with black * refactor: use filesystem_gateway abstraction for S3 operations Address review feedback to use the filesystem_gateway abstraction layer instead of directly calling get_s3_client. This ensures on-prem S3 configuration logic is properly encapsulated in the gateway. Changes: - Add head_object, delete_object, list_objects methods to S3FilesystemGateway - Update S3FileStorageGateway to use self.filesystem_gateway for all S3 ops - Remove direct import of get_s3_client from s3_file_storage_gateway * fix: deduplicate S3 client config by using centralized s3_utils Refactor open_wrapper to use get_s3_client from s3_utils instead of duplicating the on-prem S3 configuration logic. This ensures a single source of truth for S3 client creation across the codebase. * fix: add pagination to list_objects to handle >1000 objects S3 list_objects_v2 returns max 1000 objects per request. Use paginator to iterate through all pages and return complete results. Without this fix, directories with >1000 files would silently return truncated results. * fix: make OnPremDockerRepository.get_image_url consistent with ECR/ACR Include docker_repo_prefix in image URL to match behavior of ECR and ACR implementations. Also change image_exists logging from warning to debug to reduce log noise on every deployment. Updated tests to mock infra_config and verify prefix handling. * refactor: add explicit on-prem branches in dependencies.py for clarity Add explicit elif branches for on-prem cloud provider to make it clear that S3-based gateways are intentionally used for on-prem (with MinIO configuration applied via s3_utils). This improves code readability and makes the on-prem support more discoverable. * feat: implement Redis LLEN for queue depth in OnPremQueueEndpointResourceDelegate Replace hardcoded queue depth with actual Redis LLEN call to enable proper autoscaling based on queue metrics. Falls back to 0 gracefully if Redis client is unavailable. - Add optional redis_client parameter to constructor - Implement lazy Redis client initialization - Add tests for both with and without Redis scenarios * fix: replace mutable default argument with None in _get_client Using {} as a default argument is a Python anti-pattern that can cause subtle bugs since the same dict instance is shared across calls. Use Optional[Dict] = None pattern instead. * refactor: extract inline import to module-level helper function Move the infra_config import from inside the validator to a module-level helper function _is_onprem_deployment(). This improves testability, avoids repeated import overhead on each validation call, and follows Python best practices for imports. * fix: reduce excessive debug logging in s3_utils Replace per-call debug logs with a one-time info log when S3 is configured for on-prem. This prevents log spam from debug messages firing on every S3 client creation. - Extract common on-prem config to _get_onprem_client_kwargs helper - Add _s3_config_logged flag to log endpoint only once - Add return type annotations to get_s3_client and get_s3_resource - Update tests to reset logging flag between tests * chore: remove unused TYPE_CHECKING import Clean up unused import left over from refactoring the inline import. * fix: make Dockerfile multi-arch compatible for ARM/AMD64 Use architecture detection to download the correct binaries for aws-iam-authenticator and kubectl. This enables building the image for both ARM64 (Mac M1/M2) and AMD64 (CI/production) platforms. * style: fix black formatting in test_onprem_queue_endpoint_resource_delegate * fix: restore AWS_PROFILE env var fallback in s3_utils The original code checked os.getenv('AWS_PROFILE') as a fallback when no aws_profile kwarg was provided. This was accidentally removed during refactoring, breaking S3 operations in CI where AWS_PROFILE may be set via environment variable. Restores the original behavior for AWS deployments while maintaining the new on-prem path. * fix: correct isort ordering in s3_filesystem_gateway.py * fix: use Literal type for s3 addressing_style to satisfy mypy * Onprem Compatibility Change --------- Co-authored-by: Tarun <tarun.ravikumar@scale.com>
1 parent b45e236 commit ecc8ff4

39 files changed

Lines changed: 1541 additions & 179 deletions

charts/model-engine/templates/_helpers.tpl

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,10 @@ env:
256256
- name: ABS_CONTAINER_NAME
257257
value: {{ .Values.azure.abs_container_name }}
258258
{{- end }}
259+
{{- if .Values.s3EndpointUrl }}
260+
- name: S3_ENDPOINT_URL
261+
value: {{ .Values.s3EndpointUrl | quote }}
262+
{{- end }}
259263
{{- end }}
260264

261265
{{- define "modelEngine.syncForwarderTemplateEnv" -}}
@@ -342,9 +346,27 @@ env:
342346
value: "/workspace/model-engine/model_engine_server/core/configs/config.yaml"
343347
{{- end }}
344348
- name: CELERY_ELASTICACHE_ENABLED
345-
value: "true"
349+
value: {{ .Values.celeryElasticacheEnabled | default true | quote }}
346350
- name: LAUNCH_SERVICE_TEMPLATE_FOLDER
347351
value: "/workspace/model-engine/model_engine_server/infra/gateways/resources/templates"
352+
{{- if .Values.s3EndpointUrl }}
353+
- name: S3_ENDPOINT_URL
354+
value: {{ .Values.s3EndpointUrl | quote }}
355+
{{- end }}
356+
{{- if .Values.redisHost }}
357+
- name: REDIS_HOST
358+
value: {{ .Values.redisHost | quote }}
359+
- name: REDIS_PORT
360+
value: {{ .Values.redisPort | default "6379" | quote }}
361+
{{- end }}
362+
{{- if .Values.celeryBrokerUrl }}
363+
- name: CELERY_BROKER_URL
364+
value: {{ .Values.celeryBrokerUrl | quote }}
365+
{{- end }}
366+
{{- if .Values.celeryResultBackend }}
367+
- name: CELERY_RESULT_BACKEND
368+
value: {{ .Values.celeryResultBackend | quote }}
369+
{{- end }}
348370
{{- if .Values.redis.auth}}
349371
- name: REDIS_AUTH_TOKEN
350372
value: {{ .Values.redis.auth }}

model-engine/Dockerfile

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,20 @@ RUN apt-get update && apt-get install -y \
2121
telnet \
2222
&& rm -rf /var/lib/apt/lists/*
2323

24-
RUN curl -Lo /bin/aws-iam-authenticator https://github.com/kubernetes-sigs/aws-iam-authenticator/releases/download/v0.5.9/aws-iam-authenticator_0.5.9_linux_amd64
25-
RUN chmod +x /bin/aws-iam-authenticator
24+
# Install aws-iam-authenticator (architecture-aware)
25+
RUN ARCH=$(dpkg --print-architecture) && \
26+
if [ "$ARCH" = "arm64" ]; then \
27+
curl -Lo /bin/aws-iam-authenticator https://github.com/kubernetes-sigs/aws-iam-authenticator/releases/download/v0.5.9/aws-iam-authenticator_0.5.9_linux_arm64; \
28+
else \
29+
curl -Lo /bin/aws-iam-authenticator https://github.com/kubernetes-sigs/aws-iam-authenticator/releases/download/v0.5.9/aws-iam-authenticator_0.5.9_linux_amd64; \
30+
fi && \
31+
chmod +x /bin/aws-iam-authenticator
2632

27-
# Install kubectl
28-
RUN curl -LO "https://dl.k8s.io/release/v1.23.13/bin/linux/amd64/kubectl" \
29-
&& chmod +x kubectl \
30-
&& mv kubectl /usr/local/bin/kubectl
33+
# Install kubectl (architecture-aware)
34+
RUN ARCH=$(dpkg --print-architecture) && \
35+
curl -LO "https://dl.k8s.io/release/v1.23.13/bin/linux/${ARCH}/kubectl" && \
36+
chmod +x kubectl && \
37+
mv kubectl /usr/local/bin/kubectl
3138

3239
# Pin pip version
3340
RUN pip install pip==24.2

model-engine/model_engine_server/api/dependencies.py

Lines changed: 32 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,9 @@
9494
from model_engine_server.infra.gateways.resources.live_endpoint_resource_gateway import (
9595
LiveEndpointResourceGateway,
9696
)
97+
from model_engine_server.infra.gateways.resources.onprem_queue_endpoint_resource_delegate import (
98+
OnPremQueueEndpointResourceDelegate,
99+
)
97100
from model_engine_server.infra.gateways.resources.queue_endpoint_resource_delegate import (
98101
QueueEndpointResourceDelegate,
99102
)
@@ -114,6 +117,7 @@
114117
FakeDockerRepository,
115118
LiveTokenizerRepository,
116119
LLMFineTuneRepository,
120+
OnPremDockerRepository,
117121
RedisModelEndpointCacheRepository,
118122
S3FileLLMFineTuneEventsRepository,
119123
S3FileLLMFineTuneRepository,
@@ -223,6 +227,8 @@ def _get_external_interfaces(
223227
queue_delegate: QueueEndpointResourceDelegate
224228
if CIRCLECI:
225229
queue_delegate = FakeQueueEndpointResourceDelegate()
230+
elif infra_config().cloud_provider == "onprem":
231+
queue_delegate = OnPremQueueEndpointResourceDelegate()
226232
elif infra_config().cloud_provider == "azure":
227233
queue_delegate = ASBQueueEndpointResourceDelegate()
228234
else:
@@ -232,7 +238,8 @@ def _get_external_interfaces(
232238

233239
inference_task_queue_gateway: TaskQueueGateway
234240
infra_task_queue_gateway: TaskQueueGateway
235-
if CIRCLECI:
241+
if CIRCLECI or infra_config().cloud_provider == "onprem":
242+
# On-prem uses Redis-based task queues
236243
inference_task_queue_gateway = redis_24h_task_queue_gateway
237244
infra_task_queue_gateway = redis_task_queue_gateway
238245
elif infra_config().cloud_provider == "azure":
@@ -274,16 +281,15 @@ def _get_external_interfaces(
274281
monitoring_metrics_gateway=monitoring_metrics_gateway,
275282
use_asyncio=(not CIRCLECI),
276283
)
277-
filesystem_gateway = (
278-
ABSFilesystemGateway()
279-
if infra_config().cloud_provider == "azure"
280-
else S3FilesystemGateway()
281-
)
282-
llm_artifact_gateway = (
283-
ABSLLMArtifactGateway()
284-
if infra_config().cloud_provider == "azure"
285-
else S3LLMArtifactGateway()
286-
)
284+
filesystem_gateway: FilesystemGateway
285+
llm_artifact_gateway: LLMArtifactGateway
286+
if infra_config().cloud_provider == "azure":
287+
filesystem_gateway = ABSFilesystemGateway()
288+
llm_artifact_gateway = ABSLLMArtifactGateway()
289+
else:
290+
# AWS uses S3, on-prem uses MinIO (S3-compatible)
291+
filesystem_gateway = S3FilesystemGateway()
292+
llm_artifact_gateway = S3LLMArtifactGateway()
287293
model_endpoints_schema_gateway = LiveModelEndpointsSchemaGateway(
288294
filesystem_gateway=filesystem_gateway
289295
)
@@ -323,23 +329,18 @@ def _get_external_interfaces(
323329
cron_job_gateway = LiveCronJobGateway()
324330

325331
llm_fine_tune_repository: LLMFineTuneRepository
332+
llm_fine_tune_events_repository: LLMFineTuneEventsRepository
326333
file_path = os.getenv(
327334
"CLOUD_FILE_LLM_FINE_TUNE_REPOSITORY",
328335
hmi_config.cloud_file_llm_fine_tune_repository,
329336
)
330337
if infra_config().cloud_provider == "azure":
331-
llm_fine_tune_repository = ABSFileLLMFineTuneRepository(
332-
file_path=file_path,
333-
)
338+
llm_fine_tune_repository = ABSFileLLMFineTuneRepository(file_path=file_path)
339+
llm_fine_tune_events_repository = ABSFileLLMFineTuneEventsRepository()
334340
else:
335-
llm_fine_tune_repository = S3FileLLMFineTuneRepository(
336-
file_path=file_path,
337-
)
338-
llm_fine_tune_events_repository = (
339-
ABSFileLLMFineTuneEventsRepository()
340-
if infra_config().cloud_provider == "azure"
341-
else S3FileLLMFineTuneEventsRepository()
342-
)
341+
# AWS uses S3, on-prem uses MinIO (S3-compatible)
342+
llm_fine_tune_repository = S3FileLLMFineTuneRepository(file_path=file_path)
343+
llm_fine_tune_events_repository = S3FileLLMFineTuneEventsRepository()
343344
llm_fine_tuning_service = DockerImageBatchJobLLMFineTuningService(
344345
docker_image_batch_job_gateway=docker_image_batch_job_gateway,
345346
docker_image_batch_job_bundle_repo=docker_image_batch_job_bundle_repository,
@@ -350,16 +351,19 @@ def _get_external_interfaces(
350351
docker_image_batch_job_gateway=docker_image_batch_job_gateway
351352
)
352353

353-
file_storage_gateway = (
354-
ABSFileStorageGateway()
355-
if infra_config().cloud_provider == "azure"
356-
else S3FileStorageGateway()
357-
)
354+
file_storage_gateway: FileStorageGateway
355+
if infra_config().cloud_provider == "azure":
356+
file_storage_gateway = ABSFileStorageGateway()
357+
else:
358+
# AWS uses S3, on-prem uses MinIO (S3-compatible)
359+
file_storage_gateway = S3FileStorageGateway()
358360

359361
docker_repository: DockerRepository
360362
if CIRCLECI:
361363
docker_repository = FakeDockerRepository()
362-
elif infra_config().docker_repo_prefix.endswith("azurecr.io"):
364+
elif infra_config().cloud_provider == "onprem":
365+
docker_repository = OnPremDockerRepository()
366+
elif infra_config().cloud_provider == "azure":
363367
docker_repository = ACRDockerRepository()
364368
else:
365369
docker_repository = ECRDockerRepository()

model-engine/model_engine_server/common/config.py

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -70,12 +70,13 @@ class HostedModelInferenceServiceConfig:
7070
user_inference_tensorflow_repository: str
7171
docker_image_layer_cache_repository: str
7272
sensitive_log_mode: bool
73-
# Exactly one of the following three must be specified
73+
# Exactly one of the following must be specified for Redis cache
7474
cache_redis_aws_url: Optional[str] = None # also using this to store sync autoscaling metrics
7575
cache_redis_azure_host: Optional[str] = None
7676
cache_redis_aws_secret_name: Optional[str] = (
7777
None # Not an env var because the redis cache info is already here
7878
)
79+
cache_redis_onprem_url: Optional[str] = None # For on-prem Redis (e.g., redis://redis:6379/0)
7980
sglang_repository: Optional[str] = None
8081

8182
@classmethod
@@ -90,21 +91,34 @@ def from_yaml(cls, yaml_path):
9091

9192
@property
9293
def cache_redis_url(self) -> str:
94+
# On-prem Redis support (explicit URL, no cloud provider dependency)
95+
if self.cache_redis_onprem_url:
96+
return self.cache_redis_onprem_url
97+
98+
cloud_provider = infra_config().cloud_provider
99+
100+
# On-prem: support REDIS_HOST env var fallback
101+
if cloud_provider == "onprem":
102+
if self.cache_redis_aws_url:
103+
logger.info("On-prem deployment using cache_redis_aws_url")
104+
return self.cache_redis_aws_url
105+
redis_host = os.getenv("REDIS_HOST", "redis")
106+
redis_port = getattr(infra_config(), "redis_port", 6379)
107+
return f"redis://{redis_host}:{redis_port}/0"
108+
93109
if self.cache_redis_aws_url:
94-
assert infra_config().cloud_provider == "aws", "cache_redis_aws_url is only for AWS"
110+
assert cloud_provider == "aws", "cache_redis_aws_url is only for AWS"
95111
if self.cache_redis_aws_secret_name:
96112
logger.warning(
97113
"Both cache_redis_aws_url and cache_redis_aws_secret_name are set. Using cache_redis_aws_url"
98114
)
99115
return self.cache_redis_aws_url
100116
elif self.cache_redis_aws_secret_name:
101-
assert (
102-
infra_config().cloud_provider == "aws"
103-
), "cache_redis_aws_secret_name is only for AWS"
104-
creds = get_key_file(self.cache_redis_aws_secret_name) # Use default role
117+
assert cloud_provider == "aws", "cache_redis_aws_secret_name is only for AWS"
118+
creds = get_key_file(self.cache_redis_aws_secret_name)
105119
return creds["cache-url"]
106120

107-
assert self.cache_redis_azure_host and infra_config().cloud_provider == "azure"
121+
assert self.cache_redis_azure_host and cloud_provider == "azure"
108122
username = os.getenv("AZURE_OBJECT_ID")
109123
token = DefaultAzureCredential().get_token("https://redis.azure.com/.default")
110124
password = token.token

model-engine/model_engine_server/core/aws/roles.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,12 +119,21 @@ def session(role: Optional[str], session_type: SessionT = Session) -> SessionT:
119119
120120
:param:`session_type` defines the type of session to return. Most users will use
121121
the default boto3 type. Some users required a special type (e.g aioboto3 session).
122+
123+
For on-prem deployments without AWS profiles, pass role=None or role=""
124+
to use default credentials from environment variables (AWS_ACCESS_KEY_ID, etc).
122125
"""
123126
# Do not assume roles in CIRCLECI
124127
if os.getenv("CIRCLECI"):
125128
logger.warning(f"In circleci, not assuming role (ignoring: {role})")
126129
role = None
127-
sesh: SessionT = session_type(profile_name=role)
130+
131+
# Use profile-based auth only if role is specified
132+
# For on-prem with MinIO, role will be None or empty - use env var credentials
133+
if role:
134+
sesh: SessionT = session_type(profile_name=role)
135+
else:
136+
sesh: SessionT = session_type() # Uses default credential chain (env vars)
128137
return sesh
129138

130139

model-engine/model_engine_server/core/aws/storage_client.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import os
12
import time
23
from typing import IO, Callable, Iterable, Optional, Sequence
34

@@ -20,6 +21,10 @@
2021

2122

2223
def sync_storage_client(**kwargs) -> BaseClient:
24+
# Support for MinIO/on-prem S3-compatible storage
25+
endpoint_url = os.getenv("S3_ENDPOINT_URL")
26+
if endpoint_url and "endpoint_url" not in kwargs:
27+
kwargs["endpoint_url"] = endpoint_url
2328
return session(infra_config().profile_ml_worker).client("s3", **kwargs) # type: ignore
2429

2530

model-engine/model_engine_server/core/celery/app.py

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -531,17 +531,28 @@ def _get_backend_url_and_conf(
531531
backend_url = get_redis_endpoint(1)
532532
elif backend_protocol == "s3":
533533
backend_url = "s3://"
534-
if aws_role is None:
535-
aws_session = session(infra_config().profile_ml_worker)
534+
if infra_config().cloud_provider == "aws":
535+
if aws_role is None:
536+
aws_session = session(infra_config().profile_ml_worker)
537+
else:
538+
aws_session = session(aws_role)
539+
out_conf_changes.update(
540+
{
541+
"s3_boto3_session": aws_session,
542+
"s3_bucket": s3_bucket,
543+
"s3_base_path": s3_base_path,
544+
}
545+
)
536546
else:
537-
aws_session = session(aws_role)
538-
out_conf_changes.update(
539-
{
540-
"s3_boto3_session": aws_session,
541-
"s3_bucket": s3_bucket,
542-
"s3_base_path": s3_base_path,
543-
}
544-
)
547+
logger.info(
548+
"Non-AWS deployment, using environment variables for S3 backend credentials"
549+
)
550+
out_conf_changes.update(
551+
{
552+
"s3_bucket": s3_bucket,
553+
"s3_base_path": s3_base_path,
554+
}
555+
)
545556
elif backend_protocol == "abs":
546557
backend_url = f"azureblockblob://{os.getenv('ABS_ACCOUNT_NAME')}"
547558
else:
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# On-premise deployment configuration
2+
# This configuration file provides defaults for on-prem deployments
3+
# Many values can be overridden via environment variables
4+
5+
cloud_provider: "onprem"
6+
env: "production" # Can be: production, staging, development, local
7+
k8s_cluster_name: "onprem-cluster"
8+
dns_host_domain: "ml.company.local"
9+
default_region: "us-east-1" # Placeholder for compatibility with cloud-agnostic code
10+
11+
# ====================
12+
# Object Storage (MinIO/S3-compatible)
13+
# ====================
14+
s3_bucket: "model-engine"
15+
# S3 endpoint URL - can be overridden by S3_ENDPOINT_URL env var
16+
# Examples: "https://minio.company.local", "http://minio-service:9000"
17+
s3_endpoint_url: "" # Set via S3_ENDPOINT_URL env var if not specified here
18+
# MinIO requires path-style addressing (bucket in URL path, not subdomain)
19+
s3_addressing_style: "path"
20+
21+
# ====================
22+
# Redis Configuration
23+
# ====================
24+
# Redis is used for:
25+
# - Celery task queue broker
26+
# - Model endpoint caching
27+
# - Inference autoscaling metrics
28+
redis_host: "" # Set via REDIS_HOST env var (e.g., "redis.company.local" or "redis-service")
29+
redis_port: 6379
30+
# Whether to use Redis as Celery broker (true for on-prem)
31+
celery_broker_type_redis: true
32+
33+
# ====================
34+
# Celery Configuration
35+
# ====================
36+
# Backend protocol: "redis" for on-prem (not "s3" or "abs")
37+
celery_backend_protocol: "redis"
38+
39+
# ====================
40+
# Database Configuration
41+
# ====================
42+
# Database connection settings (credentials from environment variables)
43+
# DB_HOST, DB_PORT, DB_NAME, DB_USER, DB_PASSWORD
44+
db_host: "postgres" # Default hostname, can be overridden by DB_HOST env var
45+
db_port: 5432
46+
db_name: "llm_engine"
47+
db_engine_pool_size: 20
48+
db_engine_max_overflow: 10
49+
db_engine_echo: false
50+
db_engine_echo_pool: false
51+
db_engine_disconnect_strategy: "pessimistic"
52+
53+
# ====================
54+
# Docker Registry Configuration
55+
# ====================
56+
# Docker registry prefix for container images
57+
# Examples: "registry.company.local", "harbor.company.local/ml-platform"
58+
# Leave empty if using full image paths directly
59+
docker_repo_prefix: "registry.company.local"
60+
61+
# ====================
62+
# Monitoring & Observability
63+
# ====================
64+
# Prometheus server address for metrics (optional)
65+
# prometheus_server_address: "http://prometheus:9090"
66+
67+
# ====================
68+
# Not applicable for on-prem (kept for compatibility)
69+
# ====================
70+
ml_account_id: "onprem"
71+
profile_ml_worker: "default"
72+
profile_ml_inference_worker: "default"

0 commit comments

Comments
 (0)