I bookmarked a GMKtec EVO-X2 listing in October last year. 128GB Ryzen AI MAX+ 395, listed at $2,099. I closed the tab, told myself I’d think about it for a week, and went to bed.

Six months later I checked again. The exact same SKU is now $3,299. That’s not a typo. The “rampocalypse” (LPDDR5 prices spiking, AI demand, take your pick) has eaten 60% on top of the original price. Corsair quietly raised their AI Workstation 300 by $1,100. Reddit threads on r/LocalLLaMA are full of people kicking themselves for not buying when these things first launched.

So here’s the thing. AMD just announced their own in-house Halo Box at AI Dev Day, ships in June. Every mini PC vendor on the planet is now slapping “Ryzen AI MAX+ 395” on something. Every YouTube video says it’s the local LLM machine you’ve been waiting for. And it… kind of is. But there are landmines, and the “obvious” buying advice is wrong for most people.

I spent the last two weeks reading through Reddit threads, Phoronix coverage, and actual product specs to figure out what’s worth buying right now. This is what I landed on.

A Ryzen AI MAX+ 395 mini PC running a local LLM

TL;DR — my picks for local LLM mini PCs in 2026:

What’s a Strix Halo and why is r/LocalLLaMA losing its mind

For context: Strix Halo is AMD’s codename for the Ryzen AI MAX+ 395 platform. It’s a laptop-class APU that AMD also licensed for desktops and mini PCs. The selling point isn’t the CPU. It’s the unified memory architecture: up to 128GB of LPDDR5x soldered to the package, addressable by both the CPU and the integrated 8060S GPU at roughly 256 GB/s.

For local LLMs, that’s the magic spec. Most mid-range gaming PCs have 16 to 24GB of VRAM total. A 70B parameter model in 4-bit quantization needs ~40GB. Without unified memory you’re stuck either buying a used RTX 3090 stack (loud, hot, two PCIe slots, ~$1,500 used) or accepting that the model lives partly in system RAM and runs at glacial speed.

Strix Halo flips the math. 128GB unified means you can load a 70B model and still have room left for the OS, your text editor, and a half-dozen Docker containers. And it does it in a 2.5L mini PC enclosure that draws ~140W under load.

That’s the pitch. Now the reality.

The price has more than doubled

This is the part nobody is leading with. Six months ago you could get a 128GB Strix Halo box for around $1,500-$1,800. Today that same machine is $3,000+. There’s a Reddit thread from this week where one buyer flat-out says “I got mine for $2,000 in October. Same Amazon listing is now $3,299.”

Model RAM Oct 2025 launch May 2026
GMKtec EVO-X2 128GB ~$2,099 $3,299
Corsair AI Workstation 300 128GB ~$2,299 $3,399
Framework Desktop 128GB $1,999 (preorder) ~$3,100
Beelink GTR9 Pro 128GB ~$1,899 $3,299

Pricing pulled from current Amazon listings on May 1, 2026, plus historical references in r/LocalLLaMA threads. Your mileage will vary depending on the week, but the trend is everywhere.

So when you read a “best Strix Halo mini PC of 2026” post that quotes prices from a launch review, double-check Amazon before you get excited.

Two gotchas before you click buy

The 120W cap on AMD eGPUs. This one only shows up if you read deep into the FEVM FAEX1 review threads. The FEVM and a couple of the MINISFORUM SKUs include an Oculink port, which lets you attach an external GPU. Sounds great. Pair a Strix Halo box with a used RTX 4090 and you’ve got a real local AI rig, right?

Sort of. There’s a BIOS limitation on most current Strix Halo boards: any AMD discrete GPU connected over Oculink (or M.2 to PCIe riser) gets capped at 120W. Doesn’t matter if it’s a 7900 XTX, a 6700 XT, or even an old Vega 64. All limited.

NVIDIA cards are reportedly fine, mostly. But anyone hoping to bolt a cheap AMD card onto a Strix Halo box for extra inference muscle is going to have a bad time. From one r/MiniPCs review:

I tried my 7900xtx, 120w limited. Tried my 6700xt, 120w limited. Even tried my OG Vega 64… 120w limited. Tried my best friend’s 4090, not limited.

MINISFORUM is allegedly working on a BIOS fix for their boards. AMD has been silent. If you’re buying Strix Halo specifically to attach an eGPU, wait for the fix or buy a different platform.

The memory bandwidth ceiling. The 256 GB/s number is real, but for context, an Apple M5 Ultra hits ~800 GB/s on its unified memory. A used RTX 3090 hits ~936 GB/s on its 24GB of VRAM. So Strix Halo is a third the bandwidth of the alternatives.

For inference on big models that fit entirely in unified memory, that’s still way better than nothing. You’ll get usable token rates on a 70B Q4 model, which a 24GB GPU can’t even load. But for prompt processing (the slow part where the model reads your input before it starts generating), bandwidth matters a lot, and Strix Halo is noticeably slower than a dedicated GPU.

Translation: for chat-style use cases with shorter prompts, Strix Halo feels great. For document-stuffing RAG workflows or long-context coding agents, the prompt processing wait will frustrate you. AMD’s Medusa Halo is supposed to roughly double the bandwidth, but that’s late 2027 at the earliest.

Should you actually buy one?

Let me address the elephant in the room. There’s an old r/LocalLLaMA comment that I’ve thought about a lot:

It will virtually ALWAYS be cheaper per token to run Kimi in a giant warehouse running constantly at 90% capacity than it is to run a local version that will be idle 90% of the time.

The math is brutal. A $3,299 mini PC, even amortized over three years, comes to about $90 per month before electricity. For $90 a month I can run roughly 18,000 average chat exchanges through the Claude or Gemini API at current rates, or sit comfortably under the Claude Pro $20 plan with room left over. Local inference is rarely cheaper if you’re being honest with yourself about how much you actually use it.

The case for local hardware is not economic. It’s two other things:

  1. Privacy. Some workloads (medical letters, legal docs, personal journaling, anything covered by an NDA) genuinely cannot leave your network. For those, the question isn’t “is local cheaper” but “is local possible.” Strix Halo is the cheapest path to “yes.”

  2. The mental model shift. This one I find personally compelling. Another r/LocalLLaMA quote, slightly paraphrased: “Owning the hardware shifts your relationship from ’every prompt costs me money’ to ‘I have free tokens, let me try this.’” If you’re the kind of person who hesitates to send a message to Claude because you’re watching the API meter tick, having local inference unlocks a different way of working. I noticed this with my own setup. I throw way more half-baked ideas at a local model than I ever did at a paid API.

If neither of those applies to you, save the $3,299. Use Claude Pro or one of the free-tier API rotations and put the money into something else.

My picks at each tier

OK, you’ve made it this far and you still want to buy. Here’s what I’d actually pick.

Tier 1: the full Strix Halo flagship

GMKtec EVO-X2 (Ryzen AI MAX+ 395) — from $2,349 (96GB) to $3,299 (128GB)

GMKtec EVO-X2 Ryzen AI MAX+ 395 mini PC

This is the unit most r/LocalLLaMA buyers settled on. Ryzen AI MAX+ 395, LPDDR5x at 8000 MHz, 1TB or 2TB NVMe, dual 2.5G LAN, USB4. The 96GB SKU at $2,349 handles everything up to a 30B Q8 model and a 70B Q4 with some squeezing. The 128GB SKU at $3,299 gives you the full 70B headroom plus room for the OS and a half-dozen Docker containers.

The downside is the price tag and the fact that it’s loud under sustained load. Not jet-engine loud, but you’ll know when it’s working. If you’re putting it in a bedroom, get a Beelink GTR9 Pro 128GB instead. Same chip, similar price ($3,299), better thermals at the cost of being a chunkier enclosure.

If you’d rather drop down to the Ryzen AI 9 HX 470 tier (cheaper, more expansion, less unified memory headroom), the MINISFORUM AI X1 Pro-470 with 32GB DDR5 + 1TB SSD is the cleanest mid-flagship pick at $1,359. You give up the 128GB unified ceiling but you keep the strong NPU and the wider expansion options.

MINISFORUM AI X1 Pro-470 mini PC

Tier 2: the smarter buy for most people

Beelink SER10 MAX (Ryzen AI 9 HX 470) — $1,799

Beelink launched this as their “OpenClaw edition” and the branding is silly, but the hardware is genuinely the sweet spot for 2026. 86 TOPS combined NPU+iGPU, 32GB DDR5, 1TB NVMe. The HX 470 is a refresh of the HX 370 with about 10% more performance and an updated NPU. You can run 13B models comfortably and 27B models in Q4 (Qwen 3.6 27B fits, and runs well).

For my money this is the box I’d actually buy if I were starting from scratch. It’s half the price of a 128GB Strix Halo, runs the models that are realistically useful for a developer’s day-to-day work, and it doubles as a perfectly good OpenClaw host when you’re not running inference. If you’re new to local LLMs, this is where I’d start.

The MINISFORUM AI X1 Pro-470 mentioned above at $1,359 is the closest direct competitor if you want to save $400 and don’t need the Beelink chassis.

Tier 3: the budget tier that still does real work

Beelink SER9 (Ryzen 7 H 255, 32GB) — $859

Beelink SER9 Ryzen 7 H 255 mini PC

Eight cores, 32GB LPDDR5, 1TB NVMe. No NPU, no AI branding, just a solid 2024-vintage chip in a quiet enclosure. The Radeon 780M iGPU is enough for running 7B and 13B models in Q4 at usable speed using llama.cpp. You won’t run 27B models comfortably and you definitely won’t run 70B. But for a lot of practical use cases (code completion with Qwen 2.5 Coder 7B, simple chat with Gemma 4 4B, RAG over a small document set with Llama 3.2 3B), the H 255 is more than enough.

I keep recommending the SER9 because it nails the 80/20. Most people who think they need Strix Halo actually need this plus a Claude API subscription for the heavy stuff. Read my home lab guide if you’re building out for general homelab duty rather than dedicated inference.

If you want something even cheaper, the origimagic A3 (Ryzen 7 8745HS, 32GB DDR5, 1TB SSD) at $609 is the best price-to-performance pick I’ve seen this month. The 8745HS is roughly equivalent to the H 255 for inference workloads, and the upgradeable DDR5 (versus soldered LPDDR5) means you can take it to 64GB later if you start playing with bigger models.

origimagic A3 Ryzen 7 8745HS mini PC

The software side: Lemonade SDK is worth knowing about

If you do go AMD, AMD’s Lemonade SDK is the open-source local-AI server they’ve been pushing as their answer to Ollama. Last week’s 10.3 release ditched Electron for Tauri and shrunk the binary from 100MB to 9MB. They added an “OmniRouter” that automatically picks between CPU, integrated GPU, and NPU back-ends for each request, and they made it easy to switch between ROCm 7.2 stable, ROCm 7.12 preview, and TheRock nightly builds.

It’s not as polished as Ollama yet. The Linux AppImage on Ubuntu 26.04 still has Wayland issues. But it’s the only project getting day-zero support for new AMD hardware, including Strix Halo’s NPU. If you’re buying Ryzen AI specifically because of the NPU, Lemonade is the path that actually uses it. Plain Ollama will fall back to CPU/iGPU and ignore the NPU entirely on most setups.

For now my honest recommendation is: install both. Ollama for the polished experience and broad model availability, Lemonade for benchmarking what your NPU can actually do. They don’t conflict.

What I’d do if I were starting today

If I were buying my first local-LLM mini PC tomorrow, knowing what I know now:

  • If I had $1,500 to spend: Beelink SER10 MAX HX 470 ($1,799 is close enough, and you can flip Cyber Week for $1,499). Run Qwen 3.6 27B Q4 for coding, Gemma 4 9B for chat. Pair with a Claude API subscription for anything that needs frontier capability.

  • If I had $3,500 to spend and privacy mattered: GMKtec EVO-X2 128GB. Full local stack, 70B models, no cloud dependency. Worth it specifically for the workloads where the data can’t leave your network.

  • If I had $700 and was just curious: origimagic A3 plus a free week of Claude Pro. Run Llama 3.2 3B and Qwen 2.5 Coder 7B locally for the “every prompt is free” experience, lean on Claude for anything serious.

  • What I would not do: spend $3,299 on a 128GB Strix Halo box right now if I weren’t already certain I needed local-only inference. The price will come down once the AMD-branded Halo Box ships in June, and the next-gen Medusa Halo (late 2027) will roughly double memory bandwidth at hopefully sane prices.

I might be wrong about the price coming down. Tariffs and RAM inflation have a way of making predictions look stupid. But buying at peak hype almost never works out.

Resources

Related posts on terminalbytes:

External:

Drop a comment with what you’re running locally and how it’s holding up. I’m especially curious about the GTR9 Pro 128GB thermals if anyone has one in a bedroom setup.

Happy inferencing! 🚀

Last updated: May 2026. Pricing shifts weekly thanks to the LPDDR5 supply chaos. Verify before you buy.