Use ARM64 TEI images, move groupRef to spec level

aponcedeleonch · claude · aponcedeleonch · commit fdc70b970e27 · 2026-04-16T15:39:14.000+03:00
Replace the ARM64 emulation workaround with the now-published
cpu-arm64-latest image. Move groupRef from spec.config to spec
level in all VirtualMCPServer examples to match the current CRD.
Address remaining PR review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/docs/toolhive/guides-vmcp/optimizer.mdx b/docs/toolhive/guides-vmcp/optimizer.mdx
@@ -146,37 +146,10 @@ are:
 For the complete field reference, see the
 [EmbeddingServer CRD specification](../reference/crd-spec.md#apiv1alpha1embeddingserver).
 
-:::warning[ARM64 compatibility]
+:::tip[ARM64 support]
 
-The default TEI CPU images depend on Intel MKL, which is x86_64-only. Native
-ARM64 support has been merged upstream but is not yet included in a published
-release. Track the
-[TEI GitHub repository](https://github.com/huggingface/text-embeddings-inference)
-for updates on ARM64 image availability.
-
-In the meantime, you can run the amd64 image under emulation on ARM64 nodes. If
-you are using Docker Desktop, you must first disable the containerd image store
-(**Settings > General > uncheck "Use containerd for pulling and storing
-images" > Apply & Restart**). Without this, `kind load docker-image` silently
-fails because the containerd store preserves multi-arch manifest indexes that
-kind cannot import. See
-[kind#3795](https://github.com/kubernetes-sigs/kind/issues/3795) for details.
-
-Then pull the amd64 image and load it into your cluster:
-
-```bash
-docker pull --platform linux/amd64 \
-  ghcr.io/huggingface/text-embeddings-inference:cpu-1.7
-kind load docker-image \
-  ghcr.io/huggingface/text-embeddings-inference:cpu-1.7
-```
-
-The `kind load` command is specific to kind. For other cluster distributions,
-use the equivalent image-loading mechanism (for example, `ctr images import` for
-containerd, or push the image to a registry your cluster can pull from).
-
-Then, pin the image in your EmbeddingServer so the operator uses the pre-pulled
-tag instead of the default `cpu-latest`:
+The default TEI image (`cpu-latest`) is x86_64-only. If you are running on ARM64
+nodes (for example, Apple Silicon), override the image in your EmbeddingServer:
 
 ```yaml title="embedding-server.yaml"
 apiVersion: toolhive.stacklok.dev/v1alpha1
@@ -185,7 +158,7 @@ metadata:
   name: my-embedding
   namespace: toolhive-system
 spec:
-  image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.7
+  image: ghcr.io/huggingface/text-embeddings-inference:cpu-arm64-latest
 ```
 
 :::
@@ -294,6 +267,8 @@ metadata:
   name: full-vmcp
   namespace: toolhive-system
 spec:
+  groupRef:
+    name: my-tools
   embeddingServerRef:
     name: full-embedding
   groupRef:
diff --git a/docs/toolhive/tutorials/mcp-optimizer.mdx b/docs/toolhive/tutorials/mcp-optimizer.mdx
@@ -31,8 +31,7 @@ Server (vMCP) and an EmbeddingServer for semantic tool search.
 - How to create an MCPGroup with multiple backend MCP servers
 - How to deploy an EmbeddingServer for semantic search
 - How to create a VirtualMCPServer with the optimizer enabled
-- How to connect your AI client to the optimized endpoint and verify it exposes
-  only `find_tool` and `call_tool`
+- How to connect your AI client to the optimized endpoint
 
 ## About MCP Optimizer
 
@@ -90,15 +89,13 @@ Before starting this tutorial, make sure you have:
 - An MCP client (Visual Studio Code with GitHub Copilot is used in this
   tutorial)
 
-:::warning[ARM64 compatibility]
+:::tip[ARM64 support]
 
-The default text embeddings inference (TEI) images depend on Intel MKL, which is
-x86_64-only. Native ARM64 support has been merged upstream but is not yet
-included in a published release. If you are using Apple Silicon or any other
-ARM64 nodes (including kind on macOS), you can run the amd64 image under
-emulation as a workaround. See the
+The default TEI image is x86_64-only. If you are running on ARM64 nodes (for
+example, Apple Silicon with kind), set the `image` field in your EmbeddingServer
+to use the ARM64 image. See
 [EmbeddingServer resource](../guides-vmcp/optimizer.mdx#embeddingserver-resource)
-section for the required steps, including a Docker Desktop configuration change.
+for details.
 
 :::
 
@@ -245,15 +242,15 @@ metadata:
   namespace: toolhive-system
 spec:
   # highlight-start
+  groupRef:
+    name: optimizer-demo
   embeddingServerRef:
     name: optimizer-embedding
   # highlight-end
   incomingAuth:
     type: anonymous
   serviceType: ClusterIP
   config:
-    groupRef:
-      name: optimizer-demo
     aggregation:
       conflictResolution: prefix
       conflictResolutionConfig:
@@ -350,6 +347,14 @@ To check your token savings, send this prompt to your AI client:
 
 - "How many tokens did I save using MCP Optimizer?"
 
+:::note
+
+With only two backend MCP servers and a small number of tools, the optimizer may
+report minimal or no token savings. The benefit becomes more significant as you
+add more backends and tools to your MCPGroup.
+
+:::
+
 ## Clean up
 
 Remove the local workload and delete the Kubernetes resources when you're done:
@@ -384,7 +389,7 @@ kind delete cluster --name toolhive
 ## Related information
 
 - [Optimize tool discovery](../guides-vmcp/optimizer.mdx) - full parameter
-  reference, high availability, and ARM64 workaround details
+  reference, high availability, and ARM64 support details
 - [Optimizing LLM context](../concepts/tool-optimization.mdx) - background on
   tool filtering and context pollution
 - [Virtual MCP Server overview](../concepts/vmcp.mdx) - conceptual overview of