← all editions
EDITION 09 · INFERENCE & SERVING2026·06·065 min readlinks verified live

Inference & serving — the open-weights stack — what's accelerating

Round two on local AI. The first special edition covered the engines — ds4, llmfit, omlx, NemoClaw, QwenPaw. This one is the layer above them: the serving front-ends, control panels and fully-local agents that turn an open-weights model into something you actually use. Smaller, more honest list — most of this bucket is local-adjacent rather than true serving, and it's labelled that way below.

↑144/day
fastest climber
in the edition
3
picks that
earned a slot
live
counts pulled
at publish
5min
to read the
whole edition
01

Top mover

★ TOP MOVER
open-webui/open-webuiUSEPython▲ 144.2 /day★ 140,280

The de-facto front-end for self-hosted models — a polished UI that speaks Ollama and the OpenAI API alike. At 140k stars still adding ~144/day, it's the serving layer most open-weights deployments end up sitting behind. The engine gets the headlines; this is what users actually look at.

Who needs itanyone running Ollama or a local OpenAI-compatible endpoint who wants a real interface, not a curl loop.

---

02

The serving + open-weights layer

Fosowl/agenticSeekUSEPython▲ 56.2 /day★ 26,466

A fully local "Manus" — an autonomous agent that thinks, browses and codes with no APIs and no monthly bill, paying only in electricity. It's the demand-side proof for this whole stack: people want agentic behaviour running entirely on open weights they host themselves.

Who needs itprivacy-first users who want an autonomous assistant with nothing leaving the machine.
1Panel-dev/1PanelUSEGo▲ 25.2 /day★ 35,772

A modern open-source VPS control panel with native AI-agent support — run Ollama models and deploy agents from a managed UI. The interesting move is infrastructure tooling treating local model-serving as a first-class workload rather than a bolt-on.

Who needs itself-hosters who want to run open-weights models alongside the rest of their server stack from one panel.

---

03

Local-adjacent, not serving

labelled honestly

Three fast climbers in this bucket aren't really inference/serving and shouldn't pad the list: tobi/qmd (⭐26,179 · ↑146.3/day · TypeScript) — actually the highest-velocity repo here — is an all-local CLI search engine for your docs and notes; iOfficeAI/AionUi (⭐27,698 · ↑91.4/day · TypeScript) is a local desktop client for OpenClaw, Claude Code, Codex and 20+ CLIs; PDFMathTranslate/PDFMathTranslate (⭐34,565 · ↑54.2/day · Python) is a layout-preserving PDF translator that can call Ollama. All local-first, none of them a serving engine — flagged so the ranking stays straight.

> The honest read: after removing the engines already covered in edition #1 and the off-theme tooling, the genuine open-weights serving layer is thin this round. open-webui dominates because there isn't yet a crowded field of credible self-hosted serving front-ends — a gap worth watching.

---

04

How this was made

Live GitHub pull, bucketed by inference/local-runtime keywords, each repo verified not-archived and pushed recently, ranked by stars/day, then curated for substance — and de-duplicated against the prior local-inference special edition so nothing repeats. Star counts pulled at publish — they move daily; re-verify before reposting.

1 · pull the firehose, verify live2 · bucket by keyword3 · rank by stars/day4 · separate signal from noise, by hand

Accelbrief · catch acceleration, not stars · all editions

1 · pull the firehose, verify live2 · bucket by keyword3 · rank by stars/day4 · separate signal from noise, by hand

Catch the next breakout before it trends.

The fastest-accelerating open-source AI, curated and called. One read a week. Free.

Join 8,400+ engineers · free · no spam
You're in. The next edition lands in your inbox.