CVE-2026-34760 PUBLISHED

vLLM: Downmix Implementation Differences as Attack Vectors Against Audio AI Models

Assigner: GitHub_M
Reserved: 30.03.2026 Published: 02.04.2026 Updated: 03.04.2026

vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before version 0.18.0, Librosa defaults to using numpy.mean for mono downmixing (to_mono), while the international standard ITU-R BS.775-4 specifies a weighted downmixing algorithm. This discrepancy results in inconsistency between audio heard by humans (e.g., through headphones/regular speakers) and audio processed by AI models (Which infra via Librosa, such as vllm, transformer). This issue has been patched in version 0.18.0.

Metrics

CVSS 3.1

CVSS Vector: CVSS:3.1/AV:N/AC:H/PR:L/UI:N/S:U/C:N/I:H/A:L
CVSS Score: 5.9

Attack Vector	Network	Scope	Unchanged
Attack Complexity	High	Confidentiality Impact	None
Privileges Required	Low	Integrity Impact	High
User Interaction	None	Availability Impact	Low

CVSS 3.1

Product Status

Vendor	vllm-project
Product	vllm
Versions	Version >= 0.5.5, < 0.18.0 is affected

References

Problem Types

CWE-20: Improper Input Validation CWE