Local AI Server Guide

One thing I’d maybe suggest (and this is kinda just from messing around with this stuff myself) is looking at how some real tools are already doing this. Like, for example, Brave (https://brave.com/) and Sigma Browser (https://www.sigmabrowser.com/). They’re obviously not “local AI servers” or anything, but they’re kinda pushing in the same direction. More stuff happening locally, less data being shipped off somewhere, more user control by default (at least that’s the idea).

I’ve been playing with both a bit, and it’s interesting because you can kinda feel the difference? Like it’s not just about features, it’s more about how much you trust what’s going on under the hood… if that makes sense.

Might actually be worth trying them out and using them as examples in the guide. Just so it’s not all theoretical setups that break the second you update something (which… happens a lot, lol). Also yeah, I’d personally be really interested in the “how do you actually live with this” part. Not just install → done, but what happens a week later when something stops working and you have no idea why

Those are good examples. I’m not sure how Brave does it now but when I tried Leo I had to hook it into an inference engine using the OpenAI API standard. It worked fine with KoboldCpp but I didn’t get much use out of it. I tend not to use those kinds of features that are tacked on top of products that I use regularly.

Honestly what I use local AI for is

  1. I serve a UI on a web page that I can access from outside (anyone can, actually) and use that, or
  2. I write bespoke applications / scripts to do one specific task and hook it into my server via API calls

So, the reason I didn’t mention the application aspect of it is because it is just either too generic (web page with chat interface) or it is too personalized (a script that calls an image generation model to convert photos into layered vinyl transfers is current thing I am doing with it).

In this case I think your experience would be valuable, so I encourage you to write it up if you want to.

And yeah, I know that these kinds of guides go out of date by the time I have hit the ‘post’ button. I hope it deals with things in a way which works even if the specific hardware examples or models specifics aren’t useful. The way of approaching ‘how to find cheap hardware on ebay’ or ‘this is how you should think about what the model files are composed of’ are general enough to get a foot in the door and go from there.

As far as what happens when something stops working – I can’t really answer that because it is troubleshooting and how to find practical solutions in a large information space. I wish I knew how to teach that, but what usually happens is I get frustrated that others don’t think the way I do about the approach and so I think it is either kind of innate or more likely I just suck at teaching it. But the main this I would advise is go on github and search the issues and if you can’t find your problem there then post it. Actively maintained projects will have the dev(s) check in regularly.