Hacker Timesnew | past | comments | ask | show | jobs | submit | marques576's commentslogin

Some features:

169M parameters

Streaming support

Zero-shot voice cloning

0.25 RTF on CPU, meaning it generates 30 seconds of audio in 7.5 seconds

Requires 3-12 seconds of reference audio for voice cloning

Apache 2.0 license

The model was trained on a single L40S GPU. It’s not SOTA in most cases, can be a bit unstable, and sometimes fails to capture voice likeness.


Such a talented guy!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: