

If it gets you started with local models, by all means go ahead, their onboarding is the easiest and it works. Also a lot of 3rd party stuff uses it as a first class citizen allowing you to try out other things (e.g. Open WebUI) easily as you explore what’s possible. Currently try the Qwen 3.6 and Gemma4 models as best bang for buck, somewhere there’s a does it fit in my machine website that can help (search for it).
That said, basically all roads in local LLM lead to llama.cpp, which gets the innovations first and then others copy their homework. Ollama (looks like they’re angling to go commercial) for a long time used it internally without attribution, now they use a bodged up engine of their own that is less performant and almost certainly a copy (possibly vibe coded) of llama.cpp. They heavily encourage using their own models / quantizations and don’t let you play with a lot of parameters without a lot of friction (possibly because they’re not implemented yet, but who knows, low transparency). You get the picture, wannabe techbros. That’s off the top of my head, search for more authoritative sources.
After you’ve gotten the hang of things, have a look at llama-swap which just wraps llama.cpp, lemonade if you’re on AMD, vLLM for nvidia, LM Studio for mac.


Hmmf, nasty, but labor intensive. Is it working on backscatter ? because your devices shouldn’t be responding much (beyond ping / authentication query level).
Also, at that point they can just use whatever fits in a van, radar, IR scanners, who knows what, fucking X-rays maybe, don’t know that they’d bother with this.
Avoiding it being deployed at scale to everybody’s router might be more important.