6 Months Later…
I’m re-engaging the project, basically I’ve gotten my life a little (a lot really) more focused, and I’m gonna go at this at a more sensible, well planned pace.
I reformatted and reinstalled Debian 12, Ollama, and chunks of whisper / piper / etc. Currently I can run Ollama with Qwen2.5 and do the normal type driven interaction in the terminal window. I can also (unrelated to the LLM) Interact with voice commands. But it’s dumb. Like brick dumb. It does exactly two things, in fact.
- Using the wake word “Sweetie”, it will parrot back whatever you said to it. “Sweetie, How do you feel?” = “I heard you say, How do you feel?”
- It will tell you the time on the system clock. “Sweetie, what time is it?” = “It is five oh seven.”
A great deal of time was spent in trying to keep it from locking up / out whenever the server would sleep / screen off. Also preventing it from reading extraneous b.s. like punctuation, and numbers in word format. Still dumb, but much more well spoken.
Now, where I’m going.
I need to A. connect the voice address to the LLM so that it can do more conversationally. B. I want to revisit the hardware to try and make replies more “snappy”. Right now I’m focusing on the hardware.
I currently have the R720, 2080ti (modded to 22gb vram), 24 x 8gb ram at 1333mhz (192gb), and dual E5-2630 v2 processors. After a long chat with chatgpt, I’ve decided to make the following changes.
- E5-2630 v2 to E5-2667 v2.
- 24x8gb RAM @ 1333mhz to 8x16gb RAM @ 1600mhz. ChatGPT also strongly recommended 2RX4 RAM.
The Processor change bumps my clock speed way up, but with fewer cores. My AI setup simply doesn’t need or use all those extra cores, and CPU speed looks to be a bottleneck.
The faster RAM should also be obvious, and just like the CPU I simply don’t need that many GB, but there’s a quirk specific to the R720 that ought to be mentioned. While it will accept and run 1600mhz RAM, if you install too much, it literally sucks more power than the chassis provides, and the power saving strategy kicks in where it underclocks the RAM to 1333mhz. As it stands, I probably don’t even need half of what I’m going to put in there, but as long as it doesn’t drag the clock speed down, why not?
One potential further mod.
I’m considering (mostly money issue), putting in a second GPU. But not to expand the LLM VRAM pool. I’m thinking of getting an unmodded 2080ti or 3060, and assigning specific duties to each GPU. Basically I’d keep the LLM on my 22GB GPU, and segregate everything else that would benefit from GPU useage to the second GPU. Because that needs speed, but not so much max VRAM, is where I don’t bother getting something with more than 8-12gb VRAM.
I guess it’s worth saying that I have not got homeassistant, my USB Zigbee module, or my homeassistant voice preview module installed at all. I’m currently using a RodeNT mic and a set of computer speakers. They have to be used together because the R720 does not have an audio jack, and the only way to hook it up was to pass through the RodeNT.
Processors get here the end of the week, RAM should be next week. Then I’m gonna hafta get back down into the terminal and start getting the software put together. First smart voice, then homeassistant, then build out homeassistant peripherals.