Hacker Timesnew | past | comments | ask | show | jobs | submit | bachittle's commentslogin

Did you also try Forgejo? If so, what are the differences between the two? I didn't even know GitLab had a self-hosted option. I assume it's probably better for Enterprise-grade projects, and dealing with CI/CD, actions, etc. But for smaller projects that just have issues and PRs and minor test suites, I assume Forgejo is the better lightweight option.

Yeah I tried hosting forgejo and the first issue I found was that it was crashing some of the time with our large monorepo and getting actions/runners up and running was proving time consuming; I really did like how lightweight it was, monolith wise. gitlab has a lot more architecture behind it but the documentation is very good at describing how you should configure it for your needs.

I think Forgejo would work fine for smaller projects and teams. We really wanted to stop having to worry about GitHub going and not being able to do CD as well as get away from a lot of the action zero-days happening.

And yes, it's self-hosted and free! You can run a reference implementation pretty easily with non-production components (i.e. they won't backup or scale well).


I have enjoyed using Forgejo over GitHub for local work. The features that GitHub has that plain Git does not includes a nice web renderer of markdown and code, issues and pull requests with comments, and project kanban boards. It's nice to have an alternative for local usage if GitHub ever goes down or just for private projects. Especially nice with agentic workflows, because agents can port issues, PRs, etc. back and forth between GitHub and Forgejo.

Coqui TTS is actually deprecated, the company shut down. I have a voice assistant that is using gpt-5.4 and opus 4.6 using the subsidized plans from Codex and Claude Code, and it uses STT and TTS from mlx-audio for those portions to be locally hosted: https://github.com/Blaizzy/mlx-audio

Here are the following models I found work well:

- Qwen ASR and TTS are really good. Qwen ASR is faster than OpenAI Whisper on Apple Silicon from my tests. And the TTS model has voice cloning support so you can give it any voice you want. Qwen ASR is my default.

- Chatterbox Turbo also does voice cloning TTS and is more efficient to run than Qwen TTS. Chatterbox Turbo is my default.

- Kitten TTS is good as a small model, better than Kokoro

- Soprano TTS is surprisingly really good for a small model, but it has glitches that prevent it from being my default

But overall the mlx-audio library makes it really easy to try different models and see which ones I like.


Do you know which HA integration I would use if I want to try out Qwen 3 ASR in HA? Some screenshots in the OP reference Qwen 3 ASR for STT but I can't seem to find any reference to which integration I'd use.

If you want your comments to sound more human — stop using em dashes everywhere. LLMs love them — along with neat structure, “furthermore”-style transitions, and perfectly balanced paragraphs.

Humans write a bit messier — commas, short sentences, abrupt turns.


I think em-dashes were once a reliable indicator (though never proof), but recent models have been fine-tuned to use them much less. Lots of recent AI-generated writing I've seen doesn't have em-dashes. Meanwhile, I've heard many people say that they naturally use em-dashes, and were already and/or are afraid of being accused of AI; so ironically this rumor may be causing people to use their own voice less.


Before, I naturally used hyphens as if they were em-dashes. The kerfuffle over LLM use of em-dashes motivated me to figure out how to type them properly (and configure my system to make that easier). Now I even go over old writing to fix the hyphens.


The RTX 5090 only has 32gb of VRAM. So the tradeoff is NVIDIA is for blazing speed in a tiny memory pool, but Apple Silicon has a larger memory pool at moderate speed.


Or, there's the DGX Spark, which effectively neutralizes both of these trade-offs, and is the same price as the RTX 5090.


For reference, DGX Spark is at 273 GB/s


It's not 5090 performance though.


Nothing stops you from plugging in a 5090. Nvidia ships ARM64 GPU drivers.


So, what were we talking about even then in the thread?


I'm running a local voice agent on a Mac Mini M4. Qwen ASR for STT and Qwen TTS on Apple Silicon via MLX, Claude for the LLM. No API costs besides the Claude subscription but the interesting part is the LLM is agentic because it's using Claude Code. It reads and writes files, spawns background agents, controls devices, all through voice.

The insights about VAD and streaming pipelines in this thread are exactly what I'm looking at for v2. Moving to a WebSocket streaming pipeline with proper voice activity detection would close the latency gap significantly, even with local models.


Do you think it would be possible in the future to maybe add developer settings to enable or disable certain features, or to switch to other sandboxing methods that are more lightweight like Apple seatbelt for example?


Yup it uses Apple Virtualization framework for virtualization. It makes it so I can't use the Claude Cowork within my VMs and that's when I found out it was running a VM, because it caused a nested VM error. All it does is limit functionality, add extra space and cause lag. A better sandbox environment would be Apple seatbelt, which is what OpenAI uses, but even that isn't perfect: https://qht.co/item?id=44283454


I don’t have an opinion on how they should handle the nested VMs probably, but I very much disagree that Seatbelt is better. Claude Code (aka `claude`) uses it, and it’s barely good for anything.

Out of curiosity, why are you running Cowork inside a VM in the first place? What does that get you that letting Cowork use its own VM wouldn’t?


seatbelt is largely undocumented.


OpenAI Codex CLI was able to use it effectively, so at least AI knows how to use it. Still, its deprecated and not maintained, Apple needs to make something new soon.


just ask AI to document it


Not sure why you're getting down voted. This is totally reasonable.


I've been running something similar for a few months, which is a voice-first interface for Claude Code running on a local Flask server. Instead of texting from my phone, I just talk to it. It spawns agents in tmux sessions, manages context with handoff notes between sessions, and has a card display for visual output.

The remote control feature is cool but the real unlock for me was voice. Typing on a phone is a terrible interface for coding conversations. Speaking is surprisingly natural for things like "check the test output" or "what did that agent do while I was away."

The tmux crowd in this thread is right that SSH + tmux gets you 90% of the way there. But adding voice on top changes the interaction model. You stop treating it like a terminal and start treating it like a collaborator.

Here is a demo of it controlling my smart lights: https://www.youtube.com/watch?v=HFmp9HFv50s


Is this the same continue that was for running local AI coding agents? Interesting rebrand.


That's us! I figure others will wonder the same, so we wrote about what exactly we're doing here: https://blog.continue.dev/from-extension-to-mission-control

tl;dr

- a _lot_ of people still use the VS Code extension and so we're still putting energy toward keeping it polished (this becomes easier with checks : ))

- our checks product is powered by an open-source CLI (we think this is important), which we recommend for jetbrains users

- the general goal is the same: we start by building tools for ourselves, share them with people in a way that avoids creating walled gardens, and aim to amplify developers (https://amplified.dev)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: