FedSpeech - Index

About Page

This page compares audio samples from all methods involved in the article, which contains:

FedSpeech The proposed methods
GT The ground truth audio in VCTK
GT (Mel + PWG) We first convert the ground-truth audio into mel-spectrograms, and then convert the mel-spectrograms back to audio using Parallel WaveGAN [1] (PWG)
Scratch Learning each task independently from scratch
Finetune Fine-tuning from a previous model randomly selected and repeats the process 5 times (for task 1, Finetune is equal to Scratch)
FedAvg [2] Aggregating the local information (e.g., gradients or model parameters) and train a global model
CPG [3] A parameter isolation method used in continual learning
FedSpeech - SM Removing selective mask from FedSpeech
FedSpeech - GPM Removing gradual pruning mask from FedSpeech
FedSpeech - GPM - SM Removing both masks from FedSpeech

Audio Qulity and Speaker Similarity Experiments

Example 1

1st speaker in the paper

Text: The Greeks used to imagine that it was a sign from the gods to foretell war or heavy rain.

GT	GT (Mel + PWG)	Multi-task

FedSpeech	CPG	FedAvg

Scratch	Finetune

Example 2

4th speaker in the paper

Text: The difference in the rainbow depends considerably upon the size of the drops, and the width of the colored band increases as the size of the drops increases.

GT	GT (Mel + PWG)	Multi-task

FedSpeech	CPG	FedAvg

Scratch	Finetune

Example 3

9th speaker in the paper

Text: Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob.

GT	GT (Mel + PWG)	Multi-task

FedSpeech	CPG	FedAvg

Scratch	Finetune

Audios of Ablation Studies

Example 1

1st speaker in the paper

Text: The Greeks used to imagine that it was a sign from the gods to foretell war or heavy rain.

FedSpeech

FedSpeech - SM	FedSpeech - GPM	FedSpeech - GPM - SM

Example 2

4th speaker in the paper

Text: The difference in the rainbow depends considerably upon the size of the drops, and the width of the colored band increases as the size of the drops increases.

FedSpeech

FedSpeech - SM	FedSpeech - GPM	FedSpeech - GPM - SM

Example 3

9th speaker in the paper

Text: Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob.

FedSpeech

FedSpeech - SM	FedSpeech - GPM	FedSpeech - GPM - SM

References

[1] Ryuichi Yamamoto, Eunwoo Song,and Jae-Min Kim. Parallel wavegan: A fast waveform gen-eration model based on generative adversarial networkswith multi-resolution spectrogram.InICASSP 2020-2020 IEEE International Conference on Acoustics, Speechand Signal Processing (ICASSP), pages 6199–6203. IEEE,2020.

[2] Brendan McMahan, Eider Moore,Daniel Ramage, Seth Hampson, and Blaise Aguera y Ar-cas. Communication-efficient learning of deep networksfrom decentralized data.InArtificial Intelligence andStatistics, pages 1273–1282. PMLR, 2017

[3] Ching-Yi Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, and Chu-SongChen. Compacting, picking and growing for unforgettingcontinual learning. InAdvances in Neural InformationProcessing Systems, pages 13669–13679, 2019.

FedSpeech :

Federated Text-to-Speech with Continual Learning View on Github

About Page

Audio Qulity and Speaker Similarity Experiments

Example 1

Example 2

Example 3

Audios of Ablation Studies

Example 1

Example 2

Example 3

References