About Page
This page compares audio samples from all methods involved in the article, which contains:
- FedSpeech The proposed methods
- GT The ground truth audio in VCTK
- GT (Mel + PWG) We first convert the ground-truth audio into mel-spectrograms, and then convert the mel-spectrograms back to audio using Parallel WaveGAN [1] (PWG)
- Scratch Learning each task independently from scratch
- Finetune Fine-tuning from a previous model randomly selected and repeats the process 5 times (for task 1, Finetune is equal to Scratch)
- FedAvg [2] Aggregating the local information (e.g., gradients or model parameters) and train a global model
- CPG [3] A parameter isolation method used in continual learning
- FedSpeech - SM Removing selective mask from FedSpeech
- FedSpeech - GPM Removing gradual pruning mask from FedSpeech
- FedSpeech - GPM - SM Removing both masks from FedSpeech
Audio Qulity and Speaker Similarity Experiments
Example 1
1st speaker in the paper
Text: The Greeks used to imagine that it was a sign from the gods to foretell war or heavy rain.
GT | GT (Mel + PWG) | Multi-task |
---|---|---|
FedSpeech | CPG | FedAvg |
---|---|---|
Scratch | Finetune |
---|---|
Example 2
4th speaker in the paper
Text: The difference in the rainbow depends considerably upon the size of the drops, and the width of the colored band increases as the size of the drops increases.
GT | GT (Mel + PWG) | Multi-task |
---|---|---|
FedSpeech | CPG | FedAvg |
---|---|---|
Scratch | Finetune |
---|---|
Example 3
9th speaker in the paper
Text: Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob.
GT | GT (Mel + PWG) | Multi-task |
---|---|---|
FedSpeech | CPG | FedAvg |
---|---|---|
Scratch | Finetune |
---|---|
Audios of Ablation Studies
Example 1
1st speaker in the paper
Text: The Greeks used to imagine that it was a sign from the gods to foretell war or heavy rain.
FedSpeech |
---|
FedSpeech - SM | FedSpeech - GPM | FedSpeech - GPM - SM |
---|---|---|
Example 2
4th speaker in the paper
Text: The difference in the rainbow depends considerably upon the size of the drops, and the width of the colored band increases as the size of the drops increases.
FedSpeech |
---|
FedSpeech - SM | FedSpeech - GPM | FedSpeech - GPM - SM |
---|---|---|
Example 3
9th speaker in the paper
Text: Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob.
FedSpeech |
---|
FedSpeech - SM | FedSpeech - GPM | FedSpeech - GPM - SM |
---|---|---|
References
[1] Ryuichi Yamamoto, Eunwoo Song,and Jae-Min Kim. Parallel wavegan: A fast waveform gen-eration model based on generative adversarial networkswith multi-resolution spectrogram.InICASSP 2020-2020 IEEE International Conference on Acoustics, Speechand Signal Processing (ICASSP), pages 6199–6203. IEEE,2020.
[2] Brendan McMahan, Eider Moore,Daniel Ramage, Seth Hampson, and Blaise Aguera y Ar-cas. Communication-efficient learning of deep networksfrom decentralized data.InArtificial Intelligence andStatistics, pages 1273–1282. PMLR, 2017
[3] Ching-Yi Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, and Chu-SongChen. Compacting, picking and growing for unforgettingcontinual learning. InAdvances in Neural InformationProcessing Systems, pages 13669–13679, 2019.