Skip to the content.

SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer Based on Source-Filter Model


Abstract

This paper presents an advanced end-to-end singing voice synthesis (SVS) system combining the source-filter mechanism which directly translates lyrical and melodic cues into expressive and high-fidelity human-like singing. Similar with other systems such as VISinger2, the proposed system also utilizes training paradigms evolved from VITS and incorporates elements like the fundamental pitch (F0) predictor and waveform generation decoder. To address a critical challenge that the coupling of mel-spectrogram features with F0 information that may introduce additional errors during F0 prediction, two primary solutions are proposed in this paper. Firstly, we leverage mel-cepstrum (mcep) features to decouple the intertwined mel-spectrogram and F0 characteristics. Secondly, inspired by the neural source-filter models, we introduce source excitation signals as the representation of F0 in SVS system, aiming to capture pitch nuances more accurately. Meanwhile differentiable mcep and F0 losses are employed as the waveform decoder supervision to fortify the prediction accuracy of speech envelope and pitch in generated speech.Extensive experiments on the Opencpop dataset demonstrate that our proposed model surpasses VISinger2 predecessor in synthesis quality and intonation accuracy.


arch

(We apologize for an oversight in the camera-ready version of our paper. A minor error occurred in Fig. 1 of the document, where the position of the gradient cutoff symbol was incorrect. This has now been corrected in the figure displayed on this webpage.)

System Demo1 Demo2
GT (Recording)
VISinger2
SiFiSinger-as
SiFiSinger-ds
SiFiSinger


System Demo3 Demo4
GT (Recording)
VISinger2
SiFiSinger-as
SiFiSinger-ds
SiFiSinger


System Demo5 Demo6
GT (Recording)
VISinger2
SiFiSinger-as
SiFiSinger-ds
SiFiSinger
System Demo7 Demo8
GT (Recording)
VISinger2
SiFiSinger-as
SiFiSinger-ds
SiFiSinger


System Demo9 Demo10
GT (Recording)
VISinger2
SiFiSinger-as
SiFiSinger-ds
SiFiSinger


System Demo11 Demo12
GT (Recording)
VISinger2
SiFiSinger-as
SiFiSinger-ds
SiFiSinger


System Demo13 Demo14
GT (Recording)
VISinger2
SiFiSinger-as
SiFiSinger-ds
SiFiSinger