CVE-2026-53923

vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow

Assigner: GitHub_M
Reserved: 11.06.2026 Published: 22.06.2026 Updated: 23.06.2026

vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.

Metrics

CVSS Vector: CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N
CVSS Score: 5.3

Exploitability Metrics		Vulnerable System Impact Metrics		Subsequent System Impact Metrics
Attack Vector	Network	Confidentiality	Low	Confidentiality	None
Attack Complexity	Low	Integrity	Low	Integrity	None
Attack Requirements	None	Availability	None	Availability	None
Privileges Required	None
User Interaction	Passive

CVSS 4.0

Product Status

Vendor	vllm-project
Product	vllm
Versions	Version >= 0.5.5, < 0.23.1rc0 is affected

Vendor

vllm-project

Product

vllm

Versions

Version >= 0.5.5, < 0.23.1rc0 is affected

References

Problem Types

CWE-681: Incorrect Conversion between Numeric Types CWE
CWE-200: Exposure of Sensitive Information to an Unauthorized Actor CWE

CVE-2026-53923 PUBLISHED

vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow

Metrics

Product Status

References

Problem Types