Home assistant with local ollama AI

Absolutely dumb-o question. I’m getting ready to start setting up HA and realized I should really have at least one thing to control. Looking at Sengled Zigbee light bulbs / hub.

Do the RGB have any actual use beyond playing with colors till the “new shiny toy” vibe wears off? Or should I just get white?

p.s. There’s more out on the net about this than I can digest, but it all boils down to pitching this or that brand bulb. You know, “Best Bulbs of 2025” or “Top 10 Easiest to Install RGB Bulbs”. Nothing on whether the concept itself is useful.

I have a bunch of sengled rgb bulbs. They are reliable, have good colors, and are affordable.

I use them every night, I’ll put them into a dimmer mood light setup while I watch TV right before bed. Much more pleasant than a warm white IMO. I definitely recommend them.

Note that they, like all other bulbs in their class, are not work lights. If you have a hobby space you’ll still want dedicated bulbs for nice bright light.

Also, unlike other bulbs, the sengled bulbs don’t act as repeaters. Repeaters are essential in a good/reliable ZigBee network. Don’t think of them as extenders, the repeaters work together to form a mesh.

If you go this route, I recommend picking up some ZigBee outlet relays. These generally act as repeaters and will help your network in addition to being useful for controlling other electronics. Some have energy monitoring capabilities.

I also recommend skipping the sengled hub and getting a USB zigbee 3.0 adapter. Home assistant has great integrations with them and will be a better experience. Also be aware that if you get the usb dongle, you should also get a USB extender cable to move the dongle a few feet from your computer. They don’t like being close to WiFi sources.

Fuck me with the forgetting to reply.

I was shocked at how many devices and things HA found on its own right after booting up… though I do have a bunch of dumb shit like my Samsung clothes washing machine tells me when it’s done with a pop-up on my Samsung TV… that kind of thing… so I already had some smart home stuff set up… it was just a suprise it found a lot of these things… it even found my Synology NAS devices and once I gave it my admin pw (HA is local of course) then I had all kinds of metrics and data from it at my HA disposal.

I’d say it’s likely most people have one or two smart appliances / gadgets even if they don’t know it. I, however, have actively avoided them. I got nothin. Washer/Dryer are old school driven by a mechanical timer switch. Thermostats are bimetal on/off. I even went out of my way to find a “dumb” tv.

The only things I have that are connectable, are my PC, phone (which I won’t even let update), Server (destined to be offline), 1 android tablet that stays in it’s box and only gets used at college, when I need to enroll each semester, and I’ve already enrolled in my last semester, so It’ll get gone soon.

The beauty is, now that I have the ability to keep it all local, I can build the system from nothing, in a deliberate planned method.

2 Likes

Correct me if I’m wrong here, but I think part of the problem with RAGs at the moment is that they try to make information instantly accessible to the LLM by creating a kind of mini-model matrix of the content and then “search” through that matrix to find the answer. Other problems exist as well such as document formatting creating problems, OCR issues for PDFs, etc. but all these problems seem to reach a head because the RAG is trying to matrix the data for “instant recall”.

Because most LLMs are powerful enough to do NLP pretty damn well, what I would love is a “job queue” type of approach where the LLM manually reviews documents instead, sort of like how it works through web search results. This way the LLM can evaluate the content of the document like it’s making a summary, then decide if it fits the query criteria and if not, discard if. If so, include it in the list of possible hits, then carefully discard it from the working context window so it can work the next document (or email or whatever). Work more like Commander Data searching quickly through data;

data looking through data

Put the task in a queue for me to check on later. I don’t mind if it’s not generating instant responses to my query… I care about accuracy and thoroughness. Every single RAG I set up was meant to look through my accounting data… PDFs and emails… and I had a single simple PDF from Home Depot with a specific dollar amount on it… and every single model I used with every single variation of RAG I asked “Do you have any receipts from Home Depot for $248.41?” all came back with 5 potential hits, none of them the correct one. I tried other targets too, from emails that were just plain text exports from my IMAP mailbox… still could not find jack shit. It disappointed me so hard I almost shut down my LLM server completely just so I didn’t have to feed power to a useless hunk of digital garbage. Maybe the problem was me though… actually it’s very likely it was me… I don’t know what I’m doing… but still it was disappointing.

Is there any chance an LLM could be set up to chunk through specific documents and text data, every single time I ask a query, to find relevant targets with high accuracy, without needing to try to navigate the nuanced annoying process of setting up a RAG?

What RAG is doing is chunking the document and creating embeddings (what I think you mean by mini-model matrix) and puts them into a vector database. It then creates embeddings for your submission and performs a similarity search and if they pass a certain threshold it adds that chunk and possibly surrounding chunks to the context.

I think the problem you are having is caused by the nature of embeddings themselves. They are meant to create a number representation of an idea or a concept, and not a specific thing. When looking for ‘When was Home Depot founded’ you would find chunks about that concept and return it and find the date in one of them. I have no idea how a receipt is turned into an embedding as it is a list of separate numbers with the name of the store and clerk, etc along with them, but I imagine it isn’t going to match well with your question because there is no real concept for a receipt with a specific amount of money.

Maybe a hybrid of embeddings, keywords, and your idea would work, where you do a broad match for both embedding similarity and plain text words and then feed a large portion of the matching text to the LLM along with the ‘is it relevant’ instruction and see what you get from that. What do you think?

Sorry, I edited out a portion of my reply before submitting and missed that it was actually required to make the last paragraph relevant. The gist of it was that your idea is basically submitting text and the input with the instruction ‘determine if this text is relevant to this question’.

Oh yeah vector… matrix… you understood what I was getting at. Still, that totally sucks. It puts the job of understanding context into the chunker and rag processor. Fiddling with chunking size and all that crap for different types of data seems like a nightmare for accuracy and false negatives.

Would it be so hard to have the LLM literally read through a document in its entirety and determine if the content was relevant to my query in real time? I’m not interested in making the vector better, I want to be able to leverage large context windows in order to empower the llm to determine context and content directly… even if it takes a much longer amount of time to do this. Am I correct to that this would result in much higher accuracy in finding the information I’m looking for in the weeds of my file repository?

What kind of hardware are you working with?

I have an R720 server with 2 multi-core CPUs, 280GB RAM, and two 3060 12GB GPUs. Ubuntu OS.

This is kind of what I mean… the way “research” works on OpenAI … it sucks in information on demand… interprets it… moves forward through it…

No RAG… no vectorizing… no vector matching… raw NPL on raw data… and yes it is slow AF… but it is much much more accurate. I want this for my local document / data stores… start up some threads… get them working… queue jobs if you need to… spit out what I am looking for.

With 2x3060s you will have 24GB, which can fit a decent 8B model and about 32 - 64K of context. That would be like 1 or 2 chapters of a book per inference run. I don’t know how many documents you have, but if it sounds reasonable to go through all them in this manner for every search result, I’m sure that is doable.

I guess the question is how… how would one set up a system to do this? Any expertise or advice rendered would be greatly appreciated :slight_smile:

Just another reason to go local. Looks like I’ll be replacing my thermostat with something controllable through matter or zigbee.

That depends on whether you want to do this through the UI layer or the inference layer. In the UI layer you could add a keyword or something that if detected would call a function to go through your documents and send them to the model with your query. If you went for the inference layer you would use tool calling to have the model call a function that would do the same thing. To make it more efficient I would do an initial BM25 search to narrow the documents down to only a few. Are you interested in doing this? It would require some coding.

Yes I would love to walk through this. I’m not familiar with python but have coding experience… ooooolllldddd stuff… desktop application development for Windows… PHP… ASP… Basic… VB… VB.NET… but for a worthy cause I’m ready to take a crack at phyton.

1 Like

Which engine and frontend are you running at the moment? Ollama and Openwebui? I assume it is in a docker? How do you connect to it, locally and/or remote via reverse proxy? Does anything depend on this server right now? Anything else important to know? We can also do this over email if you prefer.

I’m open to blank slating this from scratch. I’d prefer to do it here so others may follow along and benefit. I am using docker as the host is running other dockerised services as well as a couple guest OSes on virtualbox including a home assistant OS machine. The host also has two coral TPU modules installed (pcie modules)

Please. I know I’ll learn something, even if it is just fancy new words that I don’t (yet) understand.

To be specific;

Yes the openwebui docker which has it’s own ollama engine bundled in a multi-service docker… but I am willing to punt it and start over with anything.

Locally only. For remote access I’m running pfSense with certificate based OpenVPN configured… OpenVPN clients on phones, Viscosity clients on laptops.

Yes this host is running other services via Docker and VirtualBox guest OS machines including HomeAssistant OS. I am experimenting with Frigate as well, and have two Coral TPU mini pcie modules installed via pci slot converters.

Goals

  • Access an LLM service over a web ui of some sort in order to allow myself and employees (via separate logins with access permissions for various data stores) to access accounting documents to find receipts and other relevant information… more like an advanced search system than anything.

  • The ability to assist with accounting mysteries would be a huge plus. For example, Amazon often groups purchases together when charging credit cards, making it very hard to match up order totals with card charges. Having even a little bit of investigative fervor and being able to see a charge for $10 with two Amazon orders placed around the same time for $8 and $2 and making that connection might be helpful… but at this point I would settle for a simple local search engine that can actually pull straightforward data to the foreground.

  • The system presents relevant data (citations) with links or paths to the relevant documents (like openwebui RAGs can do) so myself or the employee can simply click the document link to review and confirm.

Stretch goals

  • Be able to answer questions about connected comms logs (emails, chats, etc.) such as “Who did I talk to about XYZ recently, like in the last 18 months?” or “How many people contacted me about ABC?”

  • Manage “long term” memory conversationally like “Remember I put my winter gloves in box 5” or “My name is Amal Graafstra, you better remember that or I’ll jam a paperclip into the cooling fan of one of your 3060s”… stuff like that.

I realize there might be better approaches for solving certain things like email analysis vs straight up accounting data digging… but ultimately the golden rule for me here is that in any case, speed is far far less of a concern than accuracy. I would rather a job take 4 hours to complete and have all the correct relevant data presented than it take 30 seconds but miss important data or produce gibberish. If it turns out a system can be built that is accurate and does what I want, I’ll sink cash into buying a GPU rack and rebuild it with screamin hardware.