AI/ML News

Stay updated with the latest news and articles on artificial intelligence and machine learning

OpenAI released its most capable open models

This week OpenAI has unveiled two open-weight language models – gpt-oss-120b and gpt-oss-20b. These new models are designed to bring powerful reasoning capabilities, flexible tool use, and developer-level customization to a broader audience, all under the permissive Apache 2.0 license.

Unlike the proprietary GPT-4 or GPT-4o models hosted exclusively on OpenAI’s cloud infrastructure, the gpt-oss models are available for anyone to download and run locally or through a variety of deployment platforms, enabling lower-latency, on-device inference, and enhanced data control.

The gpt-oss-120b and gpt-oss-20b models are engineered to perform well on reasoning-intensive tasks while remaining resource-efficient. The flagship 120b model contains 117 billion parameters and activates only 5.1 billion parameters per token thanks to a Mixture-of-Experts (MoE) architecture, making it possible to run the model on a single 80 GB GPU. Meanwhile, the 20b version uses 3.6 billion active parameters per token, requiring just 16 GB of memory – ideal for consumer laptops and edge devices.

Both models support 128,000-token context windows, chain-of-thought (CoT) reasoning at low, medium, and high effort levels, and structured output formats. They also integrate with tool-use capabilities such as Python code execution and web search – essential for powering agentic workflows.

Trained using OpenAI’s most advanced techniques, including high-compute reinforcement learning, supervised fine-tuning, and a post-training alignment process, the gpt-oss models share a developmental lineage with OpenAI’s o-series models (e.g., o3, o4-mini).

The models rely on Rotary Positional Embeddings (RoPE), locally banded sparse attention, and grouped multi-query attention to balance inference speed and performance. Pre-training focused on STEM, programming, and general knowledge, with tokenization based on a superset used by GPT-4o, known as o200k_harmony – also open-sourced.

OpenAI emphasizes that safety has been foundational in the development of these open models. The company filtered pre-training data to avoid exposure to high-risk topics (e.g., chemical, biological, nuclear domains) and used deliberative alignment and instruction hierarchies to enhance robustness against adversarial prompts.

To simulate worst-case misuse scenarios, OpenAI adversarially fine-tuned the models on sensitive domains like cybersecurity and biology. However, even with deliberate attempts to “weaponize” the models using their own training stack, the models failed to reach high-risk capability levels as defined in OpenAI’s Preparedness Framework. Independent reviews confirmed these findings.

Additionally, OpenAI has launched a Red Teaming Challenge with a $500,000 prize pool to further surface any novel safety vulnerabilities, encouraging the global AI community to collaborate in stress-testing the models.

The models are freely available on Hugging Face, quantized in MXFP4 for efficient performance. OpenAI has also released tooling for inference in PyTorch, Apple Metal, and provided harmony format renderers in Python and Rust.

Deployment partners include major platforms like Azure, AWS, Hugging Face, Vercel, Ollama, llama.cpp, and more. On the hardware front, collaboration with NVIDIA, AMD, Cerebras, and Groq ensures optimized support across devices.

Microsoft is also bringing GPU-optimized local versions of gpt-oss-20b to Windows via ONNX Runtime, available through Foundry Local and the AI Toolkit for Visual Studio Code.

Despite their capabilities, gpt-oss models are text-only and lack multimodal features such as image or audio understanding. Their hallucination rates remain significantly higher than newer proprietary models, with gpt-oss-120b hallucinating in 49% of PersonQA benchmark responses, compared to 16% for o1.

With gpt-oss, OpenAI is reopening the door to transparent, decentralized AI development at scale. Balancing powerful capabilities with safety-conscious architecture, these models empower researchers, startups, and developers to explore, fine-tune, and innovate with world-class language models.