- You need the NVIDIA proprietary GPU driver on the host with CUDA 12.x support, plus the NVIDIA Container Toolkit.
- Recommended driver branch: R550+ (or at least R535+) so it’s compatible with CUDA 12.x images used by Milvus GPU and NIM.
- Also required: nvidia-container-toolkit configured with Docker so the nvidia runtime is available. If you’re on WSL2/Windows: install the Windows NVIDIA driver with WSL2 GPU support and enable GPU for Docker Desktop/WSL. The error “WSL environment detected but no adapters were found” means no GPU is exposed to WSL/Docker.
Option A: Use pgvector (CPU) instead of Milvus GPU and keep LLM/embeddings on NVIDIA AI Endpoints:
- Set vector DB to pgvector: Override env: APP_VECTORSTORE_NAME=pgvector
- Start only the pgvector service via profiles: export NVIDIA_API_KEY=YOUR_KEY # required for NVIDIA AI Endpoints export APP_VECTORSTORE_NAME=pgvector docker compose --profile pgvector up -d --build This avoids the Milvus GPU container entirely and uses CPU Postgres+pgvector. Option B: Keep Milvus but switch to CPU image:
- Edit RAG/examples/local_deploy/docker-compose-vectordb.yaml:
- Change image: milvusdb/milvus:v2.4.15-gpu to milvusdb/milvus:v2.4.15
- Remove the deploy.resources.reservations.devices block.
- Then run docker compose up -d --build. Don’t enable the NIM microservices profiles (local-nim, nemo-retriever) unless you have a GPU—they each reserve GPUs.
- Install an NVIDIA driver R550+ (or R535+) on the host; verify nvidia-smi works.
- Install and configure NVIDIA Container Toolkit: sudo apt-get install -y nvidia-container-toolkit sudo nvidia-ctk runtime configure --runtime=docker && sudo systemctl restart docker
- On WSL2, ensure the Windows NVIDIA driver with WSL support is installed and Docker Desktop has GPU enabled.
This example deploys a basic RAG pipeline for chat Q&A and serves inferencing from an NVIDIA API Catalog endpoint. You do not need a GPU on your machine to run this example.
| Model | Embedding | Framework | Vector Database | File Types |
|---|---|---|---|---|
| meta/llama3-70b-instruct | nvidia/nv-embedqa-e5-v5 | LangChain | Milvus | TXT, PDF, MD |
Complete the common prerequisites.
-
Export your NVIDIA API key as an environment variable:
export NVIDIA_API_KEY="nvapi-<...>" -
Start the containers:
cd RAG/examples/basic_rag/langchain/ docker compose up -d --build
Example Output
✔ Network nvidia-rag Created ✔ Container rag-playground Started ✔ Container milvus-minio Started ✔ Container chain-server Started ✔ Container milvus-etcd Started ✔ Container milvus-standalone Started -
Confirm the containers are running:
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"Example Output
CONTAINER ID NAMES STATUS 39a8524829da rag-playground Up 2 minutes bfbd0193dbd2 chain-server Up 2 minutes ec02ff3cc58b milvus-standalone Up 3 minutes 6969cf5b4342 milvus-minio Up 3 minutes (healthy) 57a068d62fbb milvus-etcd Up 3 minutes (healthy) -
Open a web browser and access http://localhost:8090 to use the RAG Playground.
Refer to Using the Sample Web Application for information about uploading documents and using the web interface.
- Vector Database Customizations
- Stop the containers by running
docker compose down.
