Here's a recent work [0] where you can train the model with 10s audio and convert any "text to speech" (all doable in the browser). I tried with Google Colab demo [1] and its performance fluctuates with the training audio sample that you give it so might need some trial and error to get the sweet spot.
Also the model is not saved in the browser with Colab so you might also want to do it locally to save it eventualy (if it comes to that).
Also the model is not saved in the browser with Colab so you might also want to do it locally to save it eventualy (if it comes to that).
All the best mate!
[0] Main repo: https://github.com/CorentinJ/Real-Time-Voice-Cloning [1] Google colab repo to try it out: https://github.com/CorentinJ/Real-Time-Voice-Cloning/blob/ma...