marques576's comments

marques576 · 2026-01-08T11:33:29 1767872009

Some features:

169M parameters

Streaming support

Zero-shot voice cloning

0.25 RTF on CPU, meaning it generates 30 seconds of audio in 7.5 seconds

Requires 3-12 seconds of reference audio for voice cloning

Apache 2.0 license

The model was trained on a single L40S GPU. It’s not SOTA in most cases, can be a bit unstable, and sometimes fails to capture voice likeness.

marques576 · on Oct 11, 2024

Such a talented guy!