I wonder why they don't mention the problem of the audio quality of the output. As far as I know the best models work on magnitude spectrograms and have issues with recreating the phase information. Sub-par algorithms like Griffin-Lim are used instead