About Page

This page compares audio samples from all methods involved in the article, which contains:

  • FedSpeech The proposed methods
  • GT The ground truth audio in VCTK
  • GT (Mel + PWG) We first convert the ground-truth audio into mel-spectrograms, and then convert the mel-spectrograms back to audio using Parallel WaveGAN [1] (PWG)
  • Scratch Learning each task independently from scratch
  • Finetune Fine-tuning from a previous model randomly selected and repeats the process 5 times (for task 1, Finetune is equal to Scratch)
  • FedAvg [2] Aggregating the local information (e.g., gradients or model parameters) and train a global model
  • CPG [3] A parameter isolation method used in continual learning
  • FedSpeech - SM Removing selective mask from FedSpeech
  • FedSpeech - GPM Removing gradual pruning mask from FedSpeech
  • FedSpeech - GPM - SM Removing both masks from FedSpeech

Audio Qulity and Speaker Similarity Experiments

Example 1

1st speaker in the paper

Text: The Greeks used to imagine that it was a sign from the gods to foretell war or heavy rain.

GT GT (Mel + PWG) Multi-task
FedSpeech CPG FedAvg
Scratch Finetune
Example 2

4th speaker in the paper

Text: The difference in the rainbow depends considerably upon the size of the drops, and the width of the colored band increases as the size of the drops increases.

GT GT (Mel + PWG) Multi-task
FedSpeech CPG FedAvg
Scratch Finetune
Example 3

9th speaker in the paper

Text: Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob.

GT GT (Mel + PWG) Multi-task
FedSpeech CPG FedAvg
Scratch Finetune

Audios of Ablation Studies

Example 1

1st speaker in the paper

Text: The Greeks used to imagine that it was a sign from the gods to foretell war or heavy rain.

FedSpeech
FedSpeech - SM FedSpeech - GPM FedSpeech - GPM - SM
Example 2

4th speaker in the paper

Text: The difference in the rainbow depends considerably upon the size of the drops, and the width of the colored band increases as the size of the drops increases.

FedSpeech
FedSpeech - SM FedSpeech - GPM FedSpeech - GPM - SM
Example 3

9th speaker in the paper

Text: Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob.

FedSpeech
FedSpeech - SM FedSpeech - GPM FedSpeech - GPM - SM

References

[1] Ryuichi Yamamoto, Eunwoo Song,and Jae-Min Kim. Parallel wavegan: A fast waveform gen-eration model based on generative adversarial networkswith multi-resolution spectrogram.InICASSP 2020-2020 IEEE International Conference on Acoustics, Speechand Signal Processing (ICASSP), pages 6199–6203. IEEE,2020.

[2] Brendan McMahan, Eider Moore,Daniel Ramage, Seth Hampson, and Blaise Aguera y Ar-cas. Communication-efficient learning of deep networksfrom decentralized data.InArtificial Intelligence andStatistics, pages 1273–1282. PMLR, 2017

[3] Ching-Yi Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, and Chu-SongChen. Compacting, picking and growing for unforgettingcontinual learning. InAdvances in Neural InformationProcessing Systems, pages 13669–13679, 2019.