Exploring Qwen3.5 family: from small to massive
Alibaba’s team has released Qwen3.5, the latest generation of open-weight large language and multimodal models. This series pushes the boundaries of performance and efficiency, enabling high-end capabilities on dramatically reduced compute budgets. The release aligns with an industry-wide pivot toward efficient, deployable AI: models that deliver advanced reasoning, coding, agentic behavior, and native multimodality while fitting on consumer hardware, edge devices, servers with modest resources, or even local/privacy-focused setups.
Qwen3.5 spans a broad family of sizes and architectures, from ultra-compact dense models under 1 billion parameters to massive sparse MoE flagships exceeding 300 billion total parameters. This tiered lineup lets developers match models precisely to their needs for latency, throughput, memory footprint, cost, and capability.
At the lightweight end, the Qwen3.5 Small series includes four models: 0.8B, 2B, 4B, and 9B parameters. Released in early March 2026 (completing the family rollout that began in mid-February), these are optimized for on-device and edge deployment: smartphones, IoT devices, embedded systems, and privacy-sensitive local inference.
They achieve remarkable efficiency through architectural choices like hybrid attention (Gated Delta Networks for linear-time scaling) and techniques that minimize VRAM usage. Even the 9B model runs smoothly on modest consumer GPUs or high-end mobile hardware. All small models inherit native multimodality and a 262,144-token context window, making long-document processing and extended conversations feasible locally.
The 9B variant stands out as the strongest small-model performer, closing much of the gap with far larger models in reasoning, logical problem-solving, and instruction following – thanks in part to extensive post-training reinforcement learning.
A core breakthrough in Qwen3.5 is its native multimodal architecture. Unlike many prior systems that retrofit vision encoders onto pretrained language models, Qwen3.5 integrates vision and language from the pre-training stage onward (early fusion). This unified training produces a cohesive representation space for text, images, diagrams, charts, screenshots, and documents.
The result is superior performance on visual understanding tasks: document layout analysis, chart/table interpretation, diagram reasoning, fine-grained OCR, visual question answering, and multimodal agent behaviors (e.g., understanding and acting on screen content).
In the flagship and medium MoE models, only a small subset of parameters activates per token:
- Qwen3.5-397B-A17B (flagship): 397 billion total parameters, about 17 billion activated.
- Qwen3.5-122B-A10B: 122 billion total, about 10 billion activated.
- Qwen3.5-35B-A3B: 35 billion total, about 3 billion activated.
This sparsity enables high-end multimodal reasoning and agentic performance at inference costs and speeds far closer to much smaller dense models – often 60% cheaper and with 8 times better throughput on large workloads than the prior generation.
Qwen3.5 leverages large-scale post-training reinforcement learning, including multi-agent simulation environments with progressively harder, real-world-inspired tasks. This sharpens instruction following, multi-step planning, tool use, reduced hallucinations, structured output adherence, and adaptability in agentic scenarios (coding agents, visual agents, long-horizon reasoning).
The series dramatically expands linguistic coverage to 201 languages and dialects, with special emphasis on low-resource languages – advancing truly inclusive, culturally aware AI.
All models feature a native 262,144-token context window (262K), sufficient for entire codebases, lengthy documents, multi-turn conversations, or complex multi-document reasoning. Hosted/API variants (e.g., Qwen3.5-Plus on Alibaba Cloud Model Studio) extend this to 1 million tokens.
Available under permissive open licenses (primarily Apache 2.0) on Hugging Face, ModelScope, and GitHub, Qwen3.5 empowers developers and enterprises worldwide to build more capable, efficient, and accessible AI applications: from mobile assistants and edge analytics to powerful cloud agents and research frontiers.