diff --git a/docs/customize/model-providers/more/vllm.mdx b/docs/customize/model-providers/more/vllm.mdx index 3d15982dbb7..9c9fce67ce4 100644 --- a/docs/customize/model-providers/more/vllm.mdx +++ b/docs/customize/model-providers/more/vllm.mdx @@ -3,7 +3,7 @@ title: "vLLM" description: "Configure vLLM's high-performance inference library with Continue for chat, autocomplete, and embeddings, including setup instructions for Llama3.1, Qwen2.5-Coder, and Nomic Embed models" --- -vLLM is an open-source library for fast LLM inference which typically is used to serve multiple users at the same time. It can also be used to run a large model on multiple GPU:s (e.g. when it doesn´t fit in a single GPU). Run their OpenAI-compatible server using `vllm serve`. See their [server documentation](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html) and the [engine arguments documentation](https://docs.vllm.ai/en/latest/usage/engine_args.html). +vLLM is an open-source library for fast LLM inference which typically is used to serve multiple users at the same time. It can also be used to run a large model on multiple GPU:s (e.g. when it doesn´t fit in a single GPU). Run their OpenAI-compatible server using `vllm serve`. See their [server documentation](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html) and the [engine arguments documentation](https://docs.vllm.ai/en/latest/configuration/engine_args.html). ```shell vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct