Close Menu
    Facebook X (Twitter) Instagram
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Facebook X (Twitter) Instagram
    Bytecore News
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Bytecore News
    Home»AI News»Z.AI Introduces GLM-5.1: An Open-Weight 754B Agentic Model That Achieves SOTA on SWE-Bench Pro and Sustains 8-Hour Autonomous Execution
    Z.AI Introduces GLM-5.1: An Open-Weight 754B Agentic Model That Achieves SOTA on SWE-Bench Pro and Sustains 8-Hour Autonomous Execution
    AI News

    Z.AI Introduces GLM-5.1: An Open-Weight 754B Agentic Model That Achieves SOTA on SWE-Bench Pro and Sustains 8-Hour Autonomous Execution

    April 8, 20266 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email
    binance


    Z.AI, the AI platform developed by the team behind the GLM model family, has released GLM-5.1 — its next-generation flagship model developed specifically for agentic engineering. Unlike models optimized for clean, single-turn benchmarks, GLM-5.1 is built for agentic tasks, with significantly stronger coding capabilities than its predecessor, and achieves state-of-the-art performance on SWE-Bench Pro while leading GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).

    Architecture: DSA, MoE, and Asynchronous RL

    Before diving into what GLM-5.1 can do, it’s worth understanding what it’s built on — because the architecture is meaningfully different from a standard dense transformer.

    GLM-5 adopts DSA to significantly reduce training and inference costs while maintaining long-context fidelity. The model uses a glm_moe_dsa architecture (Mixture of Experts (MoE) model combined with DSA). For AI devs evaluating whether to self-host, this matters: MoE models activate only a subset of their parameters per forward pass, which can make inference significantly more efficient than a comparably-sized dense model, though they require specific serving infrastructure.

    On the training side, GLM-5 implements a new asynchronous reinforcement learning infrastructure that drastically improves post-training efficiency by decoupling generation from training. Novel asynchronous agent RL algorithms further improve RL quality, enabling the model to learn from complex, long-horizon interactions more effectively. This is what allows the model to handle agentic tasks with the kind of sustained judgment that single-turn RL training struggles to produce.

    coinbase

    The Plateau Problem GLM-5.1 is Solving

    To understand what makes GLM-5.1 different at inference time, it helps to understand a specific failure mode in LLMs used as agents. Previous models — including GLM-5 — tend to exhaust their repertoire early: they apply familiar techniques for quick initial gains, then plateau. Giving them more time doesn’t help.

    This is a structural limitation for any developer trying to use an LLM as a coding agent. The model applies the same playbook it knows, hits a wall, and stops making progress regardless of how long it runs. GLM-5.1, by contrast, is built to stay effective on agentic tasks over much longer horizons. The model handles ambiguous problems with better judgment and stays productive over longer sessions. It breaks complex problems down, runs experiments, reads results, and identifies blockers with real precision. By revisiting its reasoning and revising its strategy through repeated iteration, GLM-5.1 sustains optimization over hundreds of rounds and thousands of tool calls.

    The sustained performance requires more than a larger context window. This capability requires the model to maintain goal alignment over extended execution, reducing strategy drift, error accumulation, and ineffective trial and error, enabling truly autonomous execution for complex engineering tasks.

    Benchmarks: Where GLM-5.1 Stands

    On SWE-Bench Pro, GLM-5.1 achieves a score of 58.4, outperforming GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro, setting a new state-of-the-art result.

    The broader benchmark profile shows a well-rounded model. GLM-5.1 scores 95.3 on AIME 2026, 94.0 on HMMT Nov. 2025, 82.6 on HMMT Feb. 2026, and 86.2 on GPQA-Diamond — a graduate-level science reasoning benchmark. On agentic and tool-use benchmarks, GLM-5.1 scores 68.7 on CyberGym (a substantial jump from GLM-5’s 48.3), 68.0 on BrowseComp, 70.6 on τ³-Bench, and 71.8 on MCP-Atlas (Public Set) — the last one particularly relevant given MCP’s growing role in production agent systems. On Terminal-Bench 2.0, the model scores 63.5, rising to 66.5 when evaluated with Claude Code as the scaffolding.

    Across 12 representative benchmarks covering reasoning, coding, agents, tool use, and browsing, GLM-5.1 demonstrates a broad and well-balanced capability profile. This shows that GLM-5.1 is not a single-metric improvement — it advances simultaneously across general intelligence, real-world coding, and complex task execution.

    In terms of overall positioning, GLM-5.1’s general capability and coding performance are overall aligned with Claude Opus 4.6.

    8-Hour Sustained Execution: What That Actually Means

    The most important difference in GLM-5.1 is its capacity for long-horizon task execution. GLM-5.1 can work autonomously on a single task for up to 8 hours, completing the full process from planning and execution to testing, fixing, and delivery.

    For developers building autonomous agents, this changes the scope of what’s possible. Rather than orchestrating a model over dozens of short-lived tool calls, you can hand GLM-5.1 a complex objective and let it run a complete ‘experiment–analyze–optimize’ loop autonomously.

    The concrete engineering demonstrations make this tangible: GLM-5.1 can build a complete Linux desktop environment from scratch in 8 hours; perform 178 rounds of autonomous iteration on a vector database task and improve performance to 1.5× the initial version; and optimize a CUDA kernel, increasing speedup from 2.6× to 35.7× through sustained tuning.

    That CUDA kernel result is notable for ML engineers: improving a kernel from 2.6× to 35.7× speedup through autonomous iterative optimization is a level of depth that would take a skilled human engineer significant time to replicate manually.

    Model Specifications and Deployment

    GLM-5.1 is a 754-billion-parameter MoE model released under the MIT license on HuggingFace. It operates with a 200K context window and supports up to 128K maximum output tokens — both important for long-horizon tasks that need to hold large codebases or extended reasoning chains in memory.

    GLM-5.1 supports thinking mode (offering multiple thinking modes for different scenarios), streaming output, function calling, context caching, structured output, and MCP for integrating external tools and data sources.

    For local deployment, the following open-source frameworks support GLM-5.1: SGLang (v0.5.10+), vLLM (v0.19.0+), xLLM (v0.8.0+), Transformers (v0.5.3+), and KTransformers (v0.5.3+).

    For API access, the model is available through Z.AI’s API platform. Getting started requires installing zai-sdk via pip and initializing a ZaiClient with your API key. .

    Key Takeaways

    • GLM-5.1 sets a new state-of-the-art on SWE-Bench Pro with a score of 58.4, outperforming GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro — making it one of the the strongest publicly benchmarked model for real-world software engineering tasks at the time of release.
    • The model is built for long-horizon autonomous execution, capable of working on a single complex task for up to 8 hours — running experiments, revising strategies, and iterating across hundreds of rounds and thousands of tool calls without human intervention.
    • GLM-5.1 uses a MoE + DSA architecture trained with asynchronous reinforcement learning, which reduces training and inference costs compared to dense transformers while maintaining long-context fidelity — a meaningful consideration for teams evaluating self-hosting.
    • It is open-weight under the MIT license (754B parameters, 200K context window, 128K max output tokens) and supports local deployment via SGLang, vLLM, xLLM, Transformers, and KTransformers, as well as API access through the Z.AI platform with OpenAI SDK compatibility.
    • GLM-5.1 goes beyond coding — it also shows strong improvements in front-end prototyping, artifacts generation, and office productivity tasks (Word, Excel, PowerPoint, PDF), positioning it as a general-purpose foundation for both agentic systems and high-quality content workflows.

    Check out the Weights, API and Technical details.  Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

    Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us



    Source link

    10web
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    CryptoExpert
    • Website

    Related Posts

    Helping data centers deliver higher performance with less hardware | MIT News

    April 7, 2026

    How MassMutual and Mass General Brigham turned AI pilot sprawl into production results

    April 6, 2026

    KiloClaw targets shadow AI with autonomous agent governance

    April 5, 2026

    Evaluating the ethics of autonomous systems | MIT News

    April 3, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    kraken
    Latest Posts

    Bitcoin’s Six-Month Decline Was Not What Most People Think It Was. Find Out What Actually Caused It

    April 8, 2026

    Z.AI Introduces GLM-5.1: An Open-Weight 754B Agentic Model That Achieves SOTA on SWE-Bench Pro and Sustains 8-Hour Autonomous Execution

    April 8, 2026

    The Best Claude AI Business Ideas For Beginners

    April 8, 2026

    Anthropic’s New Claude CONWAY Is Unlike Any AI Before

    April 8, 2026

    Iran is Weighing Crypto Tolls for Ships using Strait of Hormuz: Report

    April 8, 2026
    kraken
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights

    Is ZEC Breakout a Bull Trap?

    April 8, 2026

    Leverage Delta Flipping Signals Instability

    April 8, 2026
    coinbase
    Facebook X (Twitter) Instagram Pinterest
    © 2026 BytecoreNews.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.