Disentangled Speaker Representations in Neural Text-to-Speech Synthesis Based on Facebook's Voiceloop model. I use four architectures: