Local LLMs Show Significant Gains in Agentic Coding
Back in May last year, I wrote
As someone who prefers running AI locally for reasons like privacy and control, I believe achieving an agentic flow like the one shown isn’t possible yet with local AI setups. However, I hope this may change in the future.
in my blog post Balancing Agentic AI with Traditional Engineering. This has changed.
In discussions, I often hear that local models are not really usable. Certainly, cloud-based models are much larger and run on far more powerful hardware, which often leads to better results. However, over the past few months, there have been significant improvements in local models, especially now since agentic workflows can be successfully executed with them.
When you combine a clear goal with step-by-step instructions on “how” to achieve it, you often get surprisingly good results.
To illustrate this approach, I’m inspired by the intriguing test AI WORLD CLOCKS from Brian Moore, where multiple AI models are given the same simple prompt to code a clock and consistently produce beautifully flawed results.
I deliberately used the words “beautifully flawed” because watching these clocks has an almost meditative effect, or, poetically speaking, a hypnotic dance of code gone awry.
Test Setup
However, I use a slightly different approach:
- I do not restrict token usage
- I use a spec and a plan
I have added the specifications and plan at the end of the blog post in case you want to use them for your own experiments.
From left to right: the cloud models DeepSeek-V3.1, GPT-5, Kimi K2, Claude Sonnet 4.5, accessed via OpenRouter; and finally, in the right-most position, I run GLM-4.7-Flash (6-bit quantized, MLX variant), served locally via LM Studio. Then I let them run and compared the results, and how long each took.
The tests were run on a 2023 MacBook M2 Max (400 GB/s memory bandwidth) with 96 GB of unified memory. A PC with a high-end discrete GPU will typically run this faster due to higher VRAM bandwidth.
Results of the First Run
I was surprised that all the results looked good right from the start. The execution times are interesting: GPT-5 took 3 minutes and 23 seconds, compared to 4 minutes and 38 seconds for the local model. As expected, but still worth noting, are the costs: GPT-5 cost 31 cents, Claude Sonnet 14 cents, while the Chinese open-weight models DeepSeek and Kimi K2 cost only 1 and 2 cents, respectively, just a fraction of the cost.
You can see the number of tokens used and the associated costs in the screenshot at the top of the respective OpenCode instance.
Results of the Second Run
Let’s do a second run.
This time, the tick marks on the clocks from DeepSeek, GPT-5, and Claude Sonnet 4.5 are slightly offset; Kimi K2’s clock got it right but has insufficient spacing between the digital display and the clock face (admittedly not specified in the specs), so in this round, the local LLM surprisingly performs best.
LLMs are not deterministic; further runs would naturally yield additional differences.
Observations and Conclusions
I’ve noticed a clear improvement already with the Qwen-3 series models, and for my use cases, GLM-4.7-Flash, released on January 19, 2026, has further improved upon that. What these models have in common is their MoE (Mixture of Experts) architecture: GLM-4.7-Flash is a 30B total parameter model, with approximately 3B parameters active per token during inference. Like Qwen-3 from Alibaba, GLM-4.7 is developed by a Chinese company, namely Zhipu AI (Z.ai).
GLM-4.7-Flash is quite popular: on Hugging Face at GLM-4.7-Flash, you can see that it has already been downloaded 650,000 times in just 11 days since its release.
I have occasionally observed tool access issues, although less frequently than with Gemini Pro a year ago. Overall, it feels quite similar to the frontier coding models from that time, but it is significantly more effective and can even be used locally on consumer hardware. A welcome and promising development.
Conclusions:
- Local models can operate agentically and deliver strong results.
- But even with cloud models: for many tasks, open-weight models are significantly cheaper and by no means inferior.
- There’s no reason not to choose the most suitable model for each task; you’ll achieve dramatically lower costs.
- The effort required to switch models is minimal, usually just a simple command.
Screencast
For those interested, here is the screencast of the first run, this time not in fast forward as is often the case with my screencasts, since you want to see how it feels in comparison.
This also allows you to see the output of the models in the OpenCode instances.
Details
As promised above, here are the spec.md and plan.md files. These could be improved, but the question is how much effort one wants to invest in the specification. As you can see, good results are already being achieved.
And since the results are not deterministic anyway, I still believe it’s best to treat the code as the source of truth and iterate from there. However, the entire effort has a different purpose anyway: it aims to demonstrate that local models have significantly advanced and are now a viable option for operations of moderate complexity.
The mental model for this could look something like this: the spec describes the goal in a largely tech-stack-agnostic manner, while the plan outlines the technology stack and the “how”.
Specification - Analog Clock
Goal
An analog clock showing real-time system time, accompanied by a toggleable 12h/24h digital display.
Requirements
- Atmosphere: Modern dark theme with a monospaced aesthetic.
- Clock Face: Circular with a distinct border and centered layout.
- Markers:
- 60 ticks in total.
- Hour markers are more prominent (different color/size) than minute markers.
- Dynamically generated to ensure perfect alignment.
- Ticks should be aligned along the circular edge of the analog clock.
- Hands:
- Hour and Minute hands: Distinct thickness and lengths; neutral light color.
- Second hand: Thin; highlighted with a contrasting color.
- Digital Display: Positioned above the clock; shows time with AM/PM or 24h format based on state.
- Control: Toggle button at the bottom to switch between 12h and 24h digital formats.
Deliverables
index.html: Skeleton and containers.styles.css: Visual styling and positioning.clock.js: Dynamic initialization and time-tracking logic.
Functional Rules
- Movement: Hands update every second based on system time.
- Logic: Analog hands reflect smooth positioning (including fractional offsets for hour/minute hands).
- Out of Scope: Alarms, date display, multiple themes.
While the division into individual tasks could be described separately, I often just include it in plan.md. This generally doesn’t affect the outcome and is simply a matter of preference.
Implementation Plan
Tech Stack: HTML5, CSS3, Vanilla JS.
Tasks:
- Setup: Minimal
index.html,styles.css, andclock.js. - Layout (HTML): Centralized container with elements for the digital display, clock face (containing hands and a container for ticks), and a toggle button.
- Styling (CSS):
- Flexbox centering for the main container.
- Circular clock face with a distinct border.
- Hands:
- Hour/Minute hands with distinct thickness and length.
- Thin second hand with a highlight color.
- Centering: Use
left: calc(50% - width/2)for precise horizontal positioning. transform-origin: Set tocenter bottomfor hands to rotate around the clock center.
- Ticks:
- Container: Set to
top: 0; left: 0; width: 100%; height: 100%to cover the clock face. - Centering: Use
left: calc(50% - width/2)for precise horizontal positioning. transform-origin: Set to[half-width] [inner-radius](e.g.,1px 146pxfor a 2px tick on a 300px face with 4px border) to rotate around the clock center.
- Container: Set to
- Digital Display: Styled box with monospaced font.
- Logic (JS):
- Generate 60 tick marks dynamically (differentiating hour and minute ticks).
- Position ticks by applying
rotate()to thestyle.transformproperty.
- Position ticks by applying
- Interval-based updates (1s) for system time.
- Rotation calculation:
- Seconds: 6° per second.
- Minutes: 6° per minute + fractional second offset.
- Hours: 30° per hour + fractional minute offset.
- Update clock hands by applying
rotate()to thestyle.transformproperty (ensure CSS-based centering is not overridden). - 12h/24h formatting logic for the digital display.
- Generate 60 tick marks dynamically (differentiating hour and minute ticks).