FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction

Paper: arXiv

Authors: Qiao Tian, Zewang Zhang, Heng Lu, Ling-Hui Chen, Shan Liu

Abstract: In this paper, we propose the FeatherWave, yet another variant of WaveRNN vocoder combining the multi-band signal processing and the linear predictive coding. The LPCNet, a recently proposed neural vocoder which utilized the linear predictive characteristic of speech signal in the WaveRNN architecture, can generate high quality speech with a speed faster than real-time on a single CPU core. However, LPCNet is still not efficient enough for online speech generation tasks. To address this issue, we adopt the multi-band linear predictive coding for WaveRNN vocoder. The multi-band method enables the model to generate several speech samples in parallel at one step. Therefore, it can significantly improve the efficiency of speech synthesis. The proposed model with 4 sub-bands needs less than 1.6 GFLOPS for speech generation. In our experiments, it can generate 24 kHz high-fidelity audio 9x faster than real-time on a single CPU, which is much faster than the LPCNet vocoder. Furthermore, our subjective listening test shows that the FeatherWave can generate speech with better quality than LPCNet.

Samples synthesis conditioned on ground-truth acoustic features

MoL WaveNetLPCNetFeatherWave(ours)
1: 快跟着魔都捉歪记的小编走上街头,看看《人民的名义》和歪果仁们能擦出怎样的火花?
2: 在鳗鱼挣扎死去之际,雪莉在视频中配音“啊,救救我,哇,救救我”。
3: 减小了因温差而引起的汽包壁的热动力,从而改善了汽包的工作条件。
4: 两只大熊猫星雅与武雯于四月十二日下午两点三十五分坐飞机启程前往荷兰。
5: “以前是多么听话的乖娃娃哦!”于秀英使劲摇头。

Text-to-speech with FeatherWave

Mel spectrograms were generated by an improved Tacotron model(To be published soon).

1: 运动结束了。这次运动了35分钟,距离4.5公里,平均配速7.2,下次运动再见!
2: 春眠不觉晓,处处闻啼鸟。夜来风雨声,花落知多少。
3: 啊哦,啊哦诶。啊嘶嘚啊嘶嘚,啊嘶嘚咯嘚咯嘚。啊嘶嘚啊嘶嘚咯吺。啊哦,啊哦诶,啊嘶嘚啊嘶嘚,啊嘶嘚咯嘚咯嘚,啊嘶嘚啊嘶嘚咯吺。
4: 倒计时20秒钟,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0