LCM-SVC Demo

LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation

Abstract

Any-to-any Singing Voice Conversion (SVC) aims to transfer a target singer’s timbre to other songs using a short voice sample. Many SVC methods have achieved impressive results using diffusion models, which, however often suffer from high latency caused by inference steps. In this paper, we propose LCM-SVC, a latent consistency distillation (LCD) based latent diffusion model (LDM) to improve the inference speed. By leveraging significant improvements in timbre decoupling and sound quality offered by LDM, we can distill a pre-trained LDM, enabling one-step or multi-step inference while maintaining high performance. Results show that our method can significantly reduce the inference time and largely preserve the sound quality and timbre similarity.

Side by Side Images
lcm-svc lcm-svc

Seen-to-Seen Singing Voice Conversion

Source Target Conversion


DiffSVC So-VITS-SVC CoMoSVC LCM-SVC-T LCM-SVC-4 LCM-SVC-2 LCM-SVC-1
Source Target Conversion


DiffSVC So-VITS-SVC CoMoSVC LCM-SVC-T LCM-SVC-4 LCM-SVC-2 LCM-SVC-1
Source Target Conversion


DiffSVC So-VITS-SVC CoMoSVC LCM-SVC-T LCM-SVC-4 LCM-SVC-2 LCM-SVC-1
Source Target Conversion


DiffSVC So-VITS-SVC CoMoSVC LCM-SVC-T LCM-SVC-4 LCM-SVC-2 LCM-SVC-1

Unseen-to-Unseen Singing Voice Conversion

Source Target Conversion


DiffSVC So-VITS-SVC CoMoSVC LCM-SVC-T LCM-SVC-4 LCM-SVC-2 LCM-SVC-1
Source Target Conversion


DiffSVC So-VITS-SVC CoMoSVC LCM-SVC-T LCM-SVC-4 LCM-SVC-2 LCM-SVC-1
Source Target Conversion


DiffSVC So-VITS-SVC CoMoSVC LCM-SVC-T LCM-SVC-4 LCM-SVC-2 LCM-SVC-1
Source Target Conversion


DiffSVC So-VITS-SVC CoMoSVC LCM-SVC-T LCM-SVC-4 LCM-SVC-2 LCM-SVC-1