Any-to-any Singing Voice Conversion (SVC) aims to transfer a target singer’s timbre to other songs using a short voice sample. Many SVC methods have achieved impressive results using diffusion models, which, however often suffer from high latency caused by inference steps. In this paper, we propose LCM-SVC, a latent consistency distillation (LCD) based latent diffusion model (LDM) to improve the inference speed. By leveraging significant improvements in timbre decoupling and sound quality offered by LDM, we can distill a pre-trained LDM, enabling one-step or multi-step inference while maintaining high performance. Results show that our method can significantly reduce the inference time and largely preserve the sound quality and timbre similarity.
Source | Target | Conversion | |
---|---|---|---|
|
DiffSVC So-VITS-SVC CoMoSVC | LCM-SVC-T LCM-SVC-4 LCM-SVC-2 LCM-SVC-1 |
Source | Target | Conversion | |
---|---|---|---|
DiffSVC So-VITS-SVC CoMoSVC | LCM-SVC-T LCM-SVC-4 LCM-SVC-2 LCM-SVC-1 |
Source | Target | Conversion | |
---|---|---|---|
DiffSVC So-VITS-SVC CoMoSVC | LCM-SVC-T LCM-SVC-4 LCM-SVC-2 LCM-SVC-1 |
Source | Target | Conversion | |
---|---|---|---|
DiffSVC So-VITS-SVC CoMoSVC | LCM-SVC-T LCM-SVC-4 LCM-SVC-2 LCM-SVC-1 |
Source | Target | Conversion | |
---|---|---|---|
DiffSVC So-VITS-SVC CoMoSVC | LCM-SVC-T LCM-SVC-4 LCM-SVC-2 LCM-SVC-1 |
Source | Target | Conversion | |
---|---|---|---|
DiffSVC So-VITS-SVC CoMoSVC | LCM-SVC-T LCM-SVC-4 LCM-SVC-2 LCM-SVC-1 |
Source | Target | Conversion | |
---|---|---|---|
DiffSVC So-VITS-SVC CoMoSVC | LCM-SVC-T LCM-SVC-4 LCM-SVC-2 LCM-SVC-1 |
Source | Target | Conversion | |
---|---|---|---|
DiffSVC So-VITS-SVC CoMoSVC | LCM-SVC-T LCM-SVC-4 LCM-SVC-2 LCM-SVC-1 |