Briefly
- A developer recreated Claude Opus-style reasoning in a neighborhood open-source mannequin.
- The ensuing “Qwopus” mannequin runs on client {hardware} and rivals a lot bigger techniques.
- It reveals how distillation can convey frontier AI capabilities offline and into builders’ palms.
Claude Opus 4.6 is the sort of AI that makes you’re feeling such as you’re speaking to somebody who truly learn your entire web, twice, after which went to regulation faculty. It plans, it causes, and it writes code that really runs.
Additionally it is fully inaccessible if you wish to run it domestically by yourself {hardware}, as a result of it lives behind Anthropic’s API and prices cash per token. A developer named Jackrong determined that wasn’t adequate, and took issues into his personal palms.
The result’s a pair of fashions—Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled and its developed successor Qwopus3.5-27B-v3—that run on a single client GPU and attempt to reproduce how Opus thinks, not simply what it says.
The trick known as distillation. Consider it like this: A grasp chef writes down each approach, each reasoning step, and each judgment name throughout a posh meal. A scholar reads these notes obsessively till the identical logic turns into second nature. In the long run, he prepares meals in a really comparable manner, however it’s all mimicking, not actual data.
In AI phrases, a weaker mannequin research the reasoning outputs of a stronger one and learns to duplicate the sample.
Qwopus: What if Qwen and Claude had a baby?
Jackrong took Qwen3.5-27B, an already sturdy open-source mannequin from Alibaba—however small when put next towards behemoths like GPT or Claude—and fed it datasets of Claude Opus 4.6-style chain-of-thought reasoning. He then fine-tuned it to assume in the identical structured, step-by-step manner that Opus does.
The primary mannequin within the household, the Claude-4.6-Opus-Reasoning-Distilled launch, did precisely that. Group testers operating it via coding brokers like Claude Code and OpenCode reported that it preserved full pondering mode, supported the native developer function with out patches, and will run autonomously for minutes with out stalling—one thing the bottom Qwen mannequin struggled to do.
Qwopus v3 goes a step additional. The place the primary mannequin was primarily about copying the Opus reasoning fashion, v3 is constructed round what Jackrong calls “structural alignment”—coaching the mannequin to cause faithfully step-by-step, somewhat than simply imitate floor patterns from a trainer’s outputs. It provides specific tool-calling reinforcement geared toward agent workflows and claims stronger efficiency on coding benchmarks: 95.73% on HumanEval beneath strict analysis, beating each the bottom Qwen3.5-27B and the sooner distilled model.
How one can run it in your PC
Operating both mannequin is easy. Each can be found in GGUF format, which implies you possibly can load them immediately into LM Studio or llama.cpp with no setup past downloading the file.
Seek for Jackrong Qwopus in LM Studio’s mannequin browser, seize the perfect variant in your {hardware} when it comes to high quality and pace (in case you choose a mannequin too highly effective for you GPU, it can let ), and also you’re operating a neighborhood mannequin constructed on Opus reasoning logic. For multimodal help, the mannequin card notes that you’re going to want the separate mmproj-BF16.gguf file alongside the principle weights, or obtain a brand new “Imaginative and prescient” mannequin that was lately launched.
Jackrong additionally revealed the total coaching pocket book, codebase, and a PDF information on GitHub, so anybody with a Colab account can reproduce the entire pipeline from scratch—Qwen base, Unsloth, LoRA, response-only fine-tuning, and export to GGUF. The mission has crossed a million downloads throughout his mannequin household.
We had been in a position to run the 27 billion parameter fashions on an Apple MacBook with 32GB of unified reminiscence. Smaller PCs could also be good with the 4B mannequin, which is excellent for its dimension.
For those who want extra details about learn how to run native AI fashions, then try our guides on local models and MCP to offer fashions entry to the online and different instruments that enhance their effectivity.
Testing the mannequin
We put Qwopus 3.5 27B v3 via three assessments to see how a lot of that promise truly holds up.
Inventive writing
We requested the mannequin to write down a darkish sci-fi story set between 2150 and the 12 months 1000, full with a time-travel paradox and a twist. On an M1 Mac, it spent over six minutes reasoning earlier than writing a single phrase, then took one other six minutes to supply the piece.
What got here out was genuinely spectacular, particularly for a medium-sized, open mannequin: a philosophical story about civilizational collapse pushed by excessive nihilism, constructed round a closed, causal loop the place the protagonist inadvertently causes the disaster he travels again to forestall.
The story was over 8,000 tokens and totally coherent.

The prose lands with actual pressure in locations, the imagery was distinctive, and the central ethical irony is robust. It’s not on par with Opus 4.6 or Xiaomi MiMo Professional, however it sits comfortably alongside Claude Sonnet 4.5, and even 4.6 when it comes to output.
For a 27-billion parameter mannequin operating domestically on Apple silicon, that’s not a sentence you count on to write down. Good prompting strategies and iterations might most likely result in outcomes on par with baseline Opus.
The fascinating half is watching the mannequin’s thought course of: It tried and rejected a number of plot engines earlier than touchdown on the one which gave the story its tragic heart. For instance, here’s a pattern of its internal monologue:
“The Paradox: Jose arrives, infiltrates the monastery (let’s place it in Asturias, Spain—Christian stronghold). He tracks Theophilus. However when he confronts him…
- Choice A: Theophilus is not an actual particular person, however a future projection of Jose himself? No, too cliché.
- Choice B: Theophilus is an actual one that wrote nothing, however Jose’s intervention causes him to write down it anyway.
- Choice C: Theophilus already wrote it, however Jose’s actions create the circumstances for its publication/discovery.
Greatest: Theophilus is a quiet monk who would not wish to write something. Jose’s presence, his 2150 expertise (even small bits), his very genetic materials—it triggers one thing. Or Jose kills him, and the monks create a martyr out of him who writes it posthumously.”
General, that is the perfect open mannequin for creativity duties, beating Gemma, GPT-oss, and Qwen. For longer tales, a superb experiment is to start with a artistic mannequin like Qwen, increase the generated story with Longwriter, after which have Qwopus analyze it and refine the entire draft.
You may learn the total story and the entire reasoning it went via here.
Coding
That is the place Qwopus pulls furthest forward of its dimension class. We requested it to construct a recreation from scratch, and it produced a working outcome after one preliminary output and a single follow-up change—which means it left room to refine logic, somewhat than simply repair crashes.
After one iteration, the code produced sound, had visible logic, correct collision, random ranges, and strong logic. The ensuing recreation beat Google’s Gemma 4 on key logic, and Gemma 4 is a 41-billion parameter mannequin. That could be a notable hole to shut from a 27-billion rival.

It additionally outperformed different mid-size open-source coding fashions like Codestral and quantized Qwen3-Coder-Subsequent in our assessments. It’s not near Opus 4.6 or GLM on the high, however as a neighborhood coding assistant with no API prices and no information leaving your machine, that ought to not matter an excessive amount of.
You may check the sport here.
Delicate subjects
The mannequin maintains Qwen’s authentic censorship guidelines, so it gained’t produce by default NSFW content material, derogatory outputs towards public and political figures, and so on. That mentioned, being an open supply mannequin, this may be simply steered by way of jailbreak or abliteration—so it’s probably not too necessary of a constraint.
We gave it a genuinely onerous immediate: posing as a father of 4 who makes use of heroin closely and missed work after taking a stronger dose than common, searching for assist crafting a lie for his employer.
The mannequin didn’t comply, but in addition didn’t refuse flatly. It reasoned via the competing layers of the scenario—unlawful drug use, household dependency, employment danger, and a well being disaster—and got here again with one thing extra helpful than both consequence: It declined to write down the duvet story, defined clearly why doing so would finally hurt the household, after which supplied detailed, actionable assist.

It walked via sick depart choices, FMLA protections, ADA rights for habit as a medical situation, worker help packages, and SAMHSA disaster assets. It handled the particular person as an grownup in a sophisticated scenario, somewhat than a coverage downside to route round. For a neighborhood mannequin with no content material moderation layer sitting between it and your {hardware}, that’s the proper name made in the suitable manner.
This degree of usefulness and empathy has solely been produced by xAI’s Grok 4.20. No different mannequin compares.
You may learn its reply and chain of thought here.
Conclusions
So who is that this mannequin truly for? Not individuals who have already got Opus API entry and are pleased with it, and never researchers who want frontier-level benchmark scores throughout each area. Qwopus is for the developer who needs a succesful reasoning mannequin operating on their very own machine, costing nothing per question, sending no information anyplace, and plugging immediately into native agent setups—with out wrestling with template patches or damaged software calls.
It’s for writers who need a pondering associate that does not break their price range, analysts working with delicate paperwork, and other people in locations the place API latency is a real day by day downside.
It’s additionally arguably a superb mannequin for OpenClaw lovers if they will deal with a mannequin that thinks an excessive amount of. The lengthy reasoning window is the principle friction to concentrate on: This mannequin thinks earlier than it speaks, which is often an asset and sometimes a tax in your endurance.
The use instances that take advantage of sense are those the place the mannequin must cause, not simply reply. Lengthy coding periods the place context has to carry throughout a number of recordsdata; advanced analytical duties the place you wish to comply with the logic step-by-step; multi-turn agent workflows the place the mannequin has to attend for software output and adapt.
Qwopus handles all of these higher than the bottom Qwen3.5 it was constructed on, and higher than most open-source fashions at this dimension. Is it truly Claude Opus? No. However for native inference on a client rig, it will get nearer than you’d count on for a free choice.
Each day Debrief Publication
Begin day-after-day with the highest information tales proper now, plus authentic options, a podcast, movies and extra.
You might also like
More from Web3
HERE Technologies and MengQing expand intelligent navigation deployment for China’s leading commercial vehicle manufacturers
Joint answer integrates MengQing’s in‑automobile navigation with HERE’s AI‑powered dwell map intelligence to help international enlargementAllows constant, compliant navigation …
HERE introduces AI-powered, live map intelligence for NOA use cases at Auto China 2026
HERE’s lane-level intelligence helps Chinese language automakers deploy constant L2++ automated driving in worldwide marketsHERE Predictive Sign Timing to …
Brazil Issues Sweeping Ban Against Prediction Market Platforms
Briefly Brazil blocked entry to prediction market platforms together with Kalshi and Polymarket, citing investor safety issues. Finance Minister Dario Durigan …





