AMD Making It Easier To Install vLLM For ROCm
Deploying vLLM for LLM inference and serving on NVIDIA hardware can be as easy as pip3 install vllm. Beautifully simple just as many of the AI/LLM Python libraries can deploy straight-away and typically "just work" on NVIDIA. Running vLLM atop AMD Radeon/Instinct hardware though has traditionally meant either compiling vLLM from source yourself or AMD's recommended approach of using Docker containers that contain pre-built versions of vLLM. Finally there is now a blessed Python wheel for making it easier to install vLLM without Docker and leveraging ROCm...