EveryMCP

MCP Server

vLLM MCP

High-throughput open-source LLM serving with PagedAttention, exposed via MCP.

AI & ML ToolsSelf-hostedAISource: vllm-project
Author
vllm-project
Repository
https://github.com/vllm-project/vllm

Installation

Deploy vLLM with your GPU setup; mount the OpenAI-compatible endpoint via MCP.

Use Cases

  • Self-hosted LLM inference
  • Batch generation
  • Streaming completions

Tags

llm-servingopen-sourcegpuinference

Need Implementation Help?

We can integrate vLLM MCP into your production stack, wire auth and policies, and ship a maintainable MCP setup.

View implementation service