You may be interested in Jasper[1]. You can use many different TTS and STT systems. API is also really easy to use.
[1] https://jasperproject.github.io/