Home assistant with local ollama AI

Plato · May 4, 2025, 10:06pm

What you are proposing is just a combination of how to package and cache the data and some prompt engineering. I admit I don’t really know what you mean by ‘NLP’ in that context though, but I am self-taught and don’t really know the proper terms for things sometimes.

If this is something you want to pursue, the code I wrote so far should be a good foundation. If you want, you can use an LLM to break down what it does and help with the unfamiliar syntax, but with your background I don’t see it being a huge challenge for you.

amal · May 5, 2025, 9:03am

Well your miles above me so good on you

This is where my ignorance really shines… but really I just mean that the tool which is looking for “home depot receipts that amount to $300” might look at the first chunk and see that it is something to do with home depot, maybe even that it looks like it might be a receipt… so when it’s done processing that chunk, it makes a note about it like “this chunk appears to be somewhat relevant to the query because it looks like it may be the start of a home depot receipt.” … so literally a text note is made off to the side about chunk one. Then for chunk two, the note generated might say “I didn’t find any reference to home depot but I did find an amount of $300 that is exactly what the query is looking for.” … so now you have two notes about the chunks and their relevance to the original query. Now the tool could evaluate those notes to see if this particular document might fit the query. I think a tool evaluating two notes about a document that say “this is a home depot receipt” and “I found an amount of $300” would clearly flag this document as extremely relevant to the query.

Again I’m ignorant about how this stuff works but to me that seems like a really good approach for documents that are so large they need to be chunked.

Plato · May 5, 2025, 9:36am

Perhaps it would help to show what the data looks like internally to the tool. Here is the data collected from the professional guide for implant installation PDF you have on the site when I asked the question ‘how far apart should implants be spaced’.

If you can’t read json, it is just a way to pass information between APIs and services using text to create structured data. Each chunk index is a separate page in this case because the tools had to convert the pages to images in order for the LLM to read them (it is easier/faster/arguably better to send the image of text to a modern vision capable LLM than to perform OCR on it and pass that text to it). The ‘score’ is given by the tool after evaluating each document/image and is a relevance score between 0 and 10. If the score is below 5, the document is marked as irrelevant and is not included in data object as seen in the link.

Is this data kind of what you are thinking of?

Plato · May 5, 2025, 9:54am

Sorry I just realized that when I say ‘document’ I actually mean ‘chunk of document’ because the tool considers each chunk to be its own data object just like a document, so if any chunk in a document is deemed relevant for any reason then the relevant information will be passed along. As you can see in the data linked in my previous comment, it tends to cast a decently wide net when looking for relevant data. Sorry for the confusion – I tend to forget that other people don’t know what I know having worked on a thing and just throw around a bunch of assumed knowledge.

amal · December 31, 2025, 5:18am

Since discovering frigate (tried a while ago but never got it working) face recognition features and semantic search, immich image search, and a couple other home assistant image recognition integrations, I decided to build out a “new” dedicated AI server to handle some of the workload.

Dell 1u R640 with dual Gold 6138 CPUs
128GB RAM (should be enough)
3x Tesla T4 16GB GPUs

Compute is on par with the RTX 3060s in my other server but half the wattage and more vram per GPU. With 3x I get 48GB vram total across all 3 cards. Also the T4 cards are Dell compatible meaning they work with the idrac directly for reporting, IPMI events, thermal fan control, etc.

Total cost (so far) has been around $3500 but that’s because both RAM and GPUs are absolutely disgustingly overpriced right now. Probably paid $1000 at least over market rate just a couple years ago.

Now I’ll be able to start testing and exploring more LLM integrations like RAGs, document deep dives, etc. that I wanted to explore and not fret over vram used for “real” workloads being used on a daily basis.

amal · January 8, 2026, 8:58pm

@Plato I just used the Textify feature and local browser search to quickly and easily find your link to;

so thank you for being persnickety and effectively creating the parameters for such a tool to exist

amal · March 3, 2026, 10:05am

Once again I needed to use Textify to find a post in a huge thread that inbuilt search was having a hard time locating efficiently. I quickly came to find a feature lacking - a link back to normal view from each post in the Textify view (so I could use the normal features like quoting etc.).. so I’ve updated the textify view now to support little link-backs for each post in the topic.

hairdie1 · March 4, 2026, 4:34am

Hello I’m new here but got lead here by a search, read through this and i feel like y’all can accomplish the same effect for allot less. I’ll leave my who I am for the end. My mission was replacing alexa and building my own home system that could run offline. og plan was a raspberry pie 2 running interface with a touch screen display, learned allot from that lol. Upgraded to a pi4 8 gb made the device with the help of early llms. Then built my own firewall, after i got into networking, hyper fixated, on security like a bear after honey. Long story short I’m running proxmox via mini pc with a ryzen 9, 32 gb ram (like $360 when i bought it they are like $900 range now with ddr5). with 8 of 16 cores dedicated to my llm via ollama (using qwen 2.5:7b) in a container. connect that to a home assistant profile. running linux on a Radxa Cubie A7A (total of $78 board and fan no case, because of 35 dollars in tariffs!) that i bought because it was cheaper then my pi 4 ( $75 dollars for case fan power supply, now $140 dollars) with the same ram and better processing power with 3 tops on it. I than can run whisper directly on the new Cubie and send text to my mini pc, the pc only has to handle the text input via a small WebUI container to connect to a Home Assistant connected via a dedicated usb Zigbee/thread device $20. For a speaker/mic I connected a $30 amazon conference device via usb so the sound card can be set and done no drivers nothing running online.

I’m new to the space, always been building things but not shying away from programing things not enough time. In the last year and half that changed with using llms to help me do so. I got hacked once and then started learning about network security. then dived straight in to programing the rabbit hole is a fun one.

hairdie1 · March 4, 2026, 4:40am

the cubie runs wake words, trixie for text to audio and whisper base for audio to text. this helps to offload the work the mini pc handles. all tracked via network firewalls. for any issues.