Local LLMs thread

chrisdone · January 27, 2026, 7:56pm

There’s another thread which appears mainly to be about paying rent on a big tech LLM provider. The models are big and powerful.

That’s alright but I thought in this thread there could be some chat about local models, i.e. ones that run on your desk on your workhorse computer. I think this is a very interesting area, not many people talk to me about it, but it has some interesting properties:

They are airgapped.
Local models don’t change. Haskellers love immutability.
Local models don’t charge per token. They can run on your CPU or GPU.
Local models are generally less capable because our hardware is less capable.
Because local models are small, one has to be a bit more clever, but in there are fun things like GBNF (which are more efficient on smaller models).
In business, small models are increasingly used for narrow tasks over big models. Reportedly.
Local models are probably the future?

So this thread is about that and what people have tried. I have one example that I’ll add as a reply.

chrisdone · January 27, 2026, 8:28pm

Here’s a teeny tiny experiment:

I didn’t add an Alternative instance (yet) to support disjunction in the grammar, but this direction appeals to me. I’ve been dabbling with llama3.2 (3b model) via llama.cpp’s server locally on a MacBook Pro M4 Max.

This repo uses two packages I just vibed (sse-conduit and llm-conduit), so don’t pay much attention to those.

When I next get a few cycles I thought I’d try out a few of those cases I aspirationally listed on the readme. Although I did a basic test of “list conceptual entities in use in this code”, which was producing accurate results.

newmind · January 27, 2026, 8:35pm

i’ve been working on a llm library (for multiple providers and models, including llamacpp, but it should be east enough to add anything else you might need, if you’re looking for something specific just let me know). if you’re interested how this looks integrated, check out here

but the gist if it: it’s basically translating [Message provider] into a provider specific request, and translating the response back into [Message provider] where Message is a GADT, and provider can hold some constraints (like: supports tools, reasoning, etc)

as for local models, i’ve been quite surprised how capable glm-4.5-air actually is, as well as qwen3-coder, but the limitations are real, expecially when it comes to producing haskell code

amiri · January 28, 2026, 4:24pm

Literally just last week I followed this fellow’s advice to set up OpenWebUI with 4 or 5 AI models locally, Deep Research, and SearXNG last week.

https://www.youtube.com/watch?v=pfxgLX-MxMY

He provides very clear and simple instructions, and makes a clear case for why to do it.

Amiri