Without AVX2 you will need to compile it. Don’t worry it is super easy.
git pull https://github.com/LostRuins/koboldcpp
cd koboldcpp
make -j 8 LLAMA_CUBLAS=1
Change the 8 to the number of cpu cores you have. It should only take a few minutes to compile.
The reason you need to do this is that with your CPU limitations, it is going to be difficult to get a precompiled engine that will be optimal for you. This isn’t specific to any particular engine; I would advise doing it regardless, and it is pretty painless to do with kobold because they use make.
By compiling it on your machine it will automatically find the appropriate instruction sets and optimizations for your CPU and compile that into the binary libraries.
There is no reason to run it in a docker on a local server because it doesn’t install anything and it doesn’t require an evironment or any specific runtimes except what you would need to run your system anyway.
If you run into issues with the CUDA libraries and compiling, go through the steps in the guide in the last post to install the CUDA toolkit.
I have some DepthAI cameras from a previous project which have a TPU built in similar to Corals. They do a decent job of image detection but are kind of a pain in the ass to program.
Ahh.. people mostly. The GPUs then do face rec on top of that but timeline people detection is what the TPUs are up to.
I’ll probably try to make a kind of “presence” filter for home assistant operations based on people detection, at least for indoor cameras. Outdoor cameras seem to be too unpredictable / chaotic to get it working reliably. Spiders crawling across the camera fov trigger people events etc.
I recommend llama-server, from llama.cpp. You can find it on github. Koboldcpp has some real quality of life additions but that’s personal preference. Most people are happy with llama.cpp. It has a built in webgui, but if you want to do things like MCP I recommend witsy as a client.
Ugh, this is the absolute worst time to have to RMA my Gigabyte RTX 4090. I sent it in a couple of weeks ago, but I’m not holding out much hope. Even if they do actually agree there is an issue I doubt they have any stock to replace it, even refurbished.
My guess is they plug it in, power it up, run it for 30 seconds, then return it to me as “working”.
Lol, I insured it for $3K when I sent it in. I was half-hoping UPS lost it. Crazy that a used two-year old card is selling for double what I paid new.
I heard back from Gigabyte today. They hooked it up and ran 3DMark for an hour with no crashes and want to send it back to me. The problem is that the card is only crashing when playing games. Multiple games crash with a graphic card fault within ~10-20 minutes of playing. Happens in multiple systems, multiple sets of drivers, Win 10 and 11, and the crashes immediately resolve when I swap the card out for an RTX 5070 with no other system or software changes.
I was able to convince the CSR on the phone to add some notes to try and get their tech support department to actually test the card in some modern games. The rep told me that the repair/RMA department techs aren’t allowed/able to install games on their rigs … rigs built to test high end cards that are 90% used for gaming .
My hunch is that it’s a memory or memory controller issue coupled with Nvidia specific features like DLSS. I was never able to find a benchmark that could stress the card in the same way that that a modern game will, especially at high resolutions (I have a 5120 x 2160 monitor). Even Furmark’s stress test ran stable.
It’s also not a overheating issue (unless the cards temperature sensors are faulty). Power use, fan speeds and temperatures all had normal looking curves, and max reported temps were in the low 70s
A 1200W Be Quiet Pure Power 12. I also tested with a friends 1000W power supply (I don’t remember the brand, but it was powering a 5080 without issue) and the crashes persisted.