Skip to the content.

SiFiSinger: Source-filter based Singing Voice Synthesizer with Variational Autoencoder and Adversarial Learning


Abstract

This paper presents an advanced end-to-end singing voice synthesis (SVS) system combining the source-filter mechanism which directly translates lyrical and melodic cues into expressive and high-fidelity human-like singing. Similar with other systems such as VISinger2, the proposed system also utilizes training paradigms evolved from VITS and incorporates elements like the fundamental pitch (F0) predictor and waveform generation decoder. To address a critical challenge that the coupling of mel-spectrogram features with F0 information that may introduce additional errors during F0 prediction, two primary solutions are proposed in this paper. Firstly, we leverage mel-cepstrum (mcep) features to decouple the intertwined mel-spectrogram and F0 characteristics. Secondly, inspired by the neural source-filter models, we introduce source excitation signals as the representation of F0 in SVS system, aiming to capture pitch nuances more accurately. Meanwhile differentiable mcep and F0 losses are employed as the waveform decoder supervision to fortify the prediction accuracy of speech envelope and pitch in generated speech.Extensive experiments on the Opencpop dataset demonstrate that our proposed model surpasses VISinger2 predecessor in synthesis quality and intonation accuracy.


arch

System Demo1 Demo2
GT (Recording)
VISinger2
SiFiSinger-as
SiFiSinger-ds
SiFiSinger


System Demo3 Demo4
GT (Recording)
VISinger2
SiFiSinger-as
SiFiSinger-ds
SiFiSinger


System Demo5 Demo6
GT (Recording)
VISinger2
SiFiSinger-as
SiFiSinger-ds
SiFiSinger
System Demo7 Demo8
GT (Recording)
VISinger2
SiFiSinger-as
SiFiSinger-ds
SiFiSinger


System Demo9 Demo10
GT (Recording)
VISinger2
SiFiSinger-as
SiFiSinger-ds
SiFiSinger


System Demo11 Demo12
GT (Recording)
VISinger2
SiFiSinger-as
SiFiSinger-ds
SiFiSinger


System Demo13 Demo14
GT (Recording)
VISinger2
SiFiSinger-as
SiFiSinger-ds
SiFiSinger