Home assistant with local ollama AI

So I’ve been working with various home automation solutions to make my home run more efficiently and control things in intelligent ways. I run both Google Home and Amazon Alexa systems, and I find I need to put both in each room because one is good at certain things and the other is good at other things.

For example, Google Home allows the creation of rooms and assigning devices to rooms. Amazon Alexa does not. This allows me to assign a Google Home device to a room and have lights and other devices assigned to that room, and then I can address those devices in a much more simple way. I can tell Google “hey Google turn off the lights” and it will turn off any light in the same room as the Home device you’re speaking to. Alexa basically shits the bed if you try to do that with an Echo device.

Alexa echo hardware is just better at hearing you speak. The microphone and voice processing is just better. Alexa will also whisper back at you if you whisper to it, which is kind of a critical feature if you’re trying to get Alexa to do something in the middle of the night and don’t want “OK I HAVE DONE THE THING YOU ASKED FOR. BY THE WAY YOU CAN ORDER MORE DISHSOAP OR SCHEDULE YOUR PET’S EUTHANASIA BY SIMPLY SAYING ADD DISHSOAP TO MY SHOPPING CART OR KILL MY PET PLEASE.” bellowing out at volume 8 and waking up the whole house.

That said, Home Assistant has come out with it’s own voice hardware;

This is a preview edition and I bought 5. It has a speaker output which I highly recommend using with a proper speaker unless you don’t mind a tinny terrible hardly hearable speaker. Audio limitations aside, it’s a self-contained hardware device that puts me on the path to going totally local for home automation. I will say this though… without a local AI conversational agent, the system is very very dumb and borderline useless.

Setting up ollama itself in a docker container that used my two GPUs was not super easy, but it was up and running in a few hours. Adding the ollama integration to home assistant was dead simple. Getting wyoming-whisper set up as a separate service in a docker container so it could also use my GPUs was easy as well. What has not been easy is figuring out how to improve the text to speech engine speed. It’s currently the primary issue I’m having with speed of reply.

What I find super interesting about having an AI connected up to a voice assistant is just how NLP is able to change behavior through simply telling it to. This is how chatGPT works yes, but after diddling with config files and looking for ways to change code to use updated libraries and updating packages to make things work, being able to just tell the AI to change how it output data and it doing it was a breath of fresh air. Case in point; the time.

When I first connected everything together, my first voice query was simple - what time is it? The output (after several seconds of waiting) was “The current time is thirteen thirty three.” … this is obviously because the convo agent output “The current time is 13:33.” and the text to speech engine is not intelligent, it’s just convering text to speech verbatim. So, under the Home Assistant voice agent instructions, there are some basic AI agent staging instructions;

You are a voice assistant for Home Assistant.
Answer questions about the world truthfully.
Answer in plain text. Keep it simple and to the point.

I added one line to it;

Your text output is read by a text to speech processor which is not intelligent, so you will need to change numerical data like times and dates from numbers to spelled out words. For example, a current time of 13:33 should be changed to “one thirty three pm”.

After hitting save, I asked the voice assistant what time it was, and it replied “The current time is one thirty five pm”.

This is like all the 1980s and 1990s computer / AI interactions in a nutshell… the computer has a voice interface with pretty good NLP but it’s still a computer and needs instructions to clarify and tweak responses. I’m pretty sure I could have just instructed the model to output data properly through the voice assistant, but the context window would expire and it would not last. Still, this is pretty amazing to me, even after having used chatGPT for so long now… somehow running it locally and basically just talking to it to change behavior … that’s amazing.

4 Likes

I have an Amazon Echo in every room.

You can set up groups, so for example, all the lights in my loungeroom can be controlled by the word loungeroom. The individual lights I have numbered (closest to front of house is lowest), so loungeroom 1 etc.

I then have another group for downstairs lights (which groups all lights downstairs obviously), another for upstairs, another for outside, another for all of them collectively.

I then do the same with ceiling fans, and other appliances.

I find it works much better than the Google room model because as you point out the echos will hear you better

It takes a bit of setting up but is pretty easy.

Then you have routines, so saying ‘good night’ turns off all lights, turns on the bedrooms lights which dim out over 10 minutes, turn off all fans and aircons, turn on the bedroom fan, lock the dead bolts, and so forth.

Ah yeah I did have groups at one time but setting them up was kind of a pain (for me) and then I went and swapped out a ton of stuff for better / smarter / more reliable / locally controllable sans cloud hardware and all that config went bye bye… never set it up again.

Oh yeah! The routines are actually a huge help… very configurable and you can cascade events very easily. My number one routine use case is voicing events… so I have a door open sensor on the back gate, and if anyone opens it then 5 different Echo devices all say “The back gate is open” in unison. That feature is very handy indeed.

As I pair down I will consider bothering with groups… but something still doesn’t sit right with having to reference the group… like having to say “lights off in master bedroom” seems silly when you’re sitting there in the master bedroom. It’s like saying “I’m going to brush my teeth by putting my toothbrush in my own mouth now” instead of just doing it :slight_smile:

I really am just hoping I can eventually move purely to home assistant with local AI… the AI is so much more smart and helpful… and I’m hoping to be able to figure out a way to add web search and personal RAG data integration to make it even smarter.

1 Like

Dude, I have one of these sitting here waiting for me to do exactly what you did.
Oddly, I’m in the middle of taking an AI course on Udemy and we are setting up and using all kinds of LLMs both local and API based.

Gonna ready through this after I’m done painting trim.

1 Like

that looks awesome! the case is 3D printed? it says waitlist… i guess i should join!

1 Like

You have NO IDEA how badly I want to do this. I doubt my skills, and I’d want a decent server to run it on, but I want it SO BADLY, I might just find a way.

Even a gaming laptop with a so so GPU could totally do it

1 Like

Yeah, I’m the king of saving old junk computers. But, I’ve also been looking for an exscuse to screw around with a server.

I totally don’t need it, but… I want one.

1 Like

if you’re in the market for a server i personally would go for a wall mount 2u rack and a 1u or 2u server for it, instead of a typical vertical desktop box or something. The mount is cheap as hell like $30.

https://www.amazon.com/s?k=2u+wall+mount+rack

check the derailment thread. I’ve got 20 ish minutes left on an auction. need advice.

So may I propose an alternate solution?

The new M4 Mac mini is an absolute beast and handles LLM’s decently well.

The base model is absolutely fine and there are third party options for upgrading the hard drive thankfully.

I suggest looking into it because the price point is pretty silly given the values

2 Likes

I’m firmly in the absorbing info and planning stage. I.E. the “cheap to change my mind now” stage.

With regards to this:

I’m looking at P40 GPUs. Pros, double the Vram, about the same cost, physically designed to fit the R720. Cons, slightly older tech, doesn’t really do fp16, which may cause limitations.

Basically it looks like the 3060 has a slight edge in speed, and the P40 has an edge in the size of model it can hold.

Currently leaning toward P40. Still reading and learning though.

Thoughts?

I skipped the p40 because the lesser cuda core count and the older architecture which if you look at Nvidia driver set and cuda tool chain, and things like pytorch support they move on from older architectures pretty quickly. With the rate of advancement in this area I’d expect older supported architectures won’t be for much longer.

I know close to fuckall about ML performance besides it likes VRAM, would Quadro RTX 4000’s be of any use? I’d imagine the 8GB is the limiting factor there. But I do have two kicking around somewhere… And I think a 3060 as well if not.

2 Likes

Yup, same here. I’m on a, (google / read / question), loop until I can get enough footing to sound like I know what I’m talking about.

I do want a bunch of vram though, so the Quadro RTX 4000’s would probably be less useful. I might end up with 3060, but I don’t have a firm choice yet, still learning by leaps and bounds. I hadn’t even truly considered doing this last week.

That is a solid point. Which makes me wonder how long the r720 is good for, or even the 3060? Basically everything ages out in tech, but how long till it’s functionally useless? Still it’s probably better not to start too close to the back side of the envelope.

Gonna think some more, AND simultaneously try to avoid paralysis by analysis.

2 Likes

So basically it’s pretty difficult to get a grasp on everything but there are cuda cores, tensor cores, and vram and they are each important for different reasons.

Cuda cores;

Tensor cores;

VRAM s where your model has to get loaded into so.your GPU can crunch shit you throw at it. Larger models with more parameters need more vram. Effectively if you want to load a model that this larger than the vram available you can forget it because performance tanks so hard it’s basically useless. I’m running two 12gb cards and some systems like pytorch can split a model and load it into both but it’s pretty specialized. What I find value from and having two cards running is that I can have open web UI using one model that takes up 10gb of vram and tinker on home assistant integration that uses a different model and therefore loads 8gb at the same time.

If you watched the videos above you will start to see why it is very difficult to assess performance with LLMs based only on things like cuda core count… the different architectures drastically change what a cuda core means.

The best way to figure out performance is to check out Nvidia or any other number of sites now that directly compare different GPUs against each other with regard to TPS (tokens per second). Or you can ask chapgpt to compare the cards you’re interested in :slight_smile:

1 Like

The M4 was built with LLM’s in mind. You can actually daisy chain them with thunderbolt cables to group the cores. I’ll have to find the video

2 Likes

Yeah the thunderbolt cables are awesome! But your underlying LLM / AI software still needs to support it, just fyi

2 Likes

Is there a way to run the home assistant/AI combo entirely locally?
ie, i dont want it be accessing the web, and even more important, not be accessible form the web …

Also, TrueNAS Scale has an app for Home Assistant and Olamma, any tried running them from there?

3 Likes