Environment details
- OS type and version: managed Agent Engine runtime (deployed from macOS)
- Python version: 3.11 (runtime
python_spec.version=3.11; reproduced on the 3.11 deploy path)
google-cloud-aiplatform version: 1.156.0
- Region: us-central1
Summary
Deploying a reasoning engine (Agent Engine) whose deployment_spec contains secret-backed environment variables — secret_env (Terraform) / env_vars with a {"secret": SECRET_ID, "version": ...} value (SDK) — creates the engine successfully (done: true, no error) but it never starts any running instances. stream_query then fails with:
FAILED_PRECONDITION: ... does not have running instances. It's likely that it does
not have a valid 'spec.package_spec' configuration.
An identical deployment with the same value passed as a plain env var (no secret reference) starts instances and serves normally. The only difference between "works" and "0 instances" is the presence of secretEnv in the deploymentSpec.
Steps to reproduce
- Create a Secret Manager secret in the same project (one enabled version).
- Grant
roles/secretmanager.secretAccessor on it to the runtime service account (and, per the docs, to the Vertex AI Service Agent service-PROJECT_NUMBER@gcp-sa-aiplatform.iam.gserviceaccount.com and the Reasoning Engine Service Agent service-PROJECT_NUMBER@gcp-sa-aiplatform-re.iam.gserviceaccount.com).
- Deploy an agent with a secret-backed env var (code below).
- Wait for the create LRO to finish — it succeeds (
done: true, no error).
- Call
stream_query → FAILED_PRECONDITION ... does not have running instances.
- Deploy the same agent with the value as a plain env var instead → instances start and it serves.
Code example
import vertexai
from vertexai import agent_engines
vertexai.init(project="PROJECT", location="us-central1",
staging_bucket="gs://STAGING_BUCKET")
# `app` is any AdkApp / custom ReasoningEngine. The trigger is purely the
# secret-backed env var, not the agent code.
# CONTROL — starts running instances, serves fine:
agent_engines.create(app, requirements=[...], display_name="control")
# REPRO — created successfully, but 0 running instances:
agent_engines.create(
app, requirements=[...], display_name="repro-secret",
env_vars={"MY_SECRET": {"secret": "my-secret", "version": "latest"}},
)
Stack trace (stream_query against the secret_env engine)
google.api_core.exceptions.FailedPrecondition: 400 The requested resource
[projects/.../locations/us-central1/reasoningEngines/...] does not have running
instances. It's likely that it does not have a valid 'spec.package_spec'
configuration. Please update the resource with a valid 'spec.package_spec' and
then try again.
Diagnosis already performed (rules out the usual causes)
With Secret Manager DATA_READ audit logging enabled and a controlled A/B/C bisect of otherwise-identical engines:
- The secret value is read successfully at instance startup by the runtime service account (
AccessSecretVersion, code=OK, zero denials).
- The application starts normally — runtime logs show
Started server process, Application startup complete, Uvicorn running on http://0.0.0.0:8080. The only anomaly is that the platform never marks the instance "running", and no request/health-probe logs follow startup.
- Independent of IAM grants: reproduced with
secretAccessor granted to the runtime SA, the Reasoning Engine Service Agent (gcp-sa-aiplatform-re), and the Vertex AI Service Agent (gcp-sa-aiplatform) — all three, fully propagated → still 0 instances. Audit logs show only the runtime SA ever reads the secret; the platform service agents never attempt a read.
- Independent of secret version format:
"latest" and a pinned numeric version both fail identically.
- Independent of Python version (3.11) and deploy tool (reproduced via both the
vertexai SDK and Terraform's google_vertex_ai_reasoning_engine secret_env).
- The control (same engine, value as a plain env var) → running instances.
Expected: a reasoning engine with secret-backed env vars should start instances, since the secret is readable and the app boots.
Actual: zero running instances whenever secretEnv is present in the deployment_spec, regardless of IAM/version/Python/tooling.
Related: #5647 — same surface symptom (secret-backed env vars in Agent Engine), but that report was resolved via the custom-service-account / secretAccessor IAM fix. This issue is distinct: the IAM fix is explicitly ruled out above (secret is read OK by the runtime SA; granting all relevant service agents still yields 0 instances).
Environment details
python_spec.version=3.11; reproduced on the 3.11 deploy path)google-cloud-aiplatformversion: 1.156.0Summary
Deploying a reasoning engine (Agent Engine) whose
deployment_speccontains secret-backed environment variables —secret_env(Terraform) /env_varswith a{"secret": SECRET_ID, "version": ...}value (SDK) — creates the engine successfully (done: true, no error) but it never starts any running instances.stream_querythen fails with:An identical deployment with the same value passed as a plain env var (no secret reference) starts instances and serves normally. The only difference between "works" and "0 instances" is the presence of
secretEnvin thedeploymentSpec.Steps to reproduce
roles/secretmanager.secretAccessoron it to the runtime service account (and, per the docs, to the Vertex AI Service Agentservice-PROJECT_NUMBER@gcp-sa-aiplatform.iam.gserviceaccount.comand the Reasoning Engine Service Agentservice-PROJECT_NUMBER@gcp-sa-aiplatform-re.iam.gserviceaccount.com).done: true, noerror).stream_query→FAILED_PRECONDITION ... does not have running instances.Code example
Stack trace (stream_query against the secret_env engine)
Diagnosis already performed (rules out the usual causes)
With Secret Manager DATA_READ audit logging enabled and a controlled A/B/C bisect of otherwise-identical engines:
AccessSecretVersion,code=OK, zero denials).Started server process,Application startup complete,Uvicorn running on http://0.0.0.0:8080. The only anomaly is that the platform never marks the instance "running", and no request/health-probe logs follow startup.secretAccessorgranted to the runtime SA, the Reasoning Engine Service Agent (gcp-sa-aiplatform-re), and the Vertex AI Service Agent (gcp-sa-aiplatform) — all three, fully propagated → still 0 instances. Audit logs show only the runtime SA ever reads the secret; the platform service agents never attempt a read."latest"and a pinned numeric version both fail identically.vertexaiSDK and Terraform'sgoogle_vertex_ai_reasoning_enginesecret_env).Expected: a reasoning engine with secret-backed env vars should start instances, since the secret is readable and the app boots.
Actual: zero running instances whenever
secretEnvis present in thedeployment_spec, regardless of IAM/version/Python/tooling.Related: #5647 — same surface symptom (secret-backed env vars in Agent Engine), but that report was resolved via the custom-service-account /
secretAccessorIAM fix. This issue is distinct: the IAM fix is explicitly ruled out above (secret is read OK by the runtime SA; granting all relevant service agents still yields 0 instances).