Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have an open source web service for rapidly recording lots of text prompts to flac: https://speech.talonvoice.com (right now the live site prompts for single words because I’m trying to build single word training data, but the prompts can be any length)

You can set it up yourself with a bit of Python knowledge from this branch: https://github.com/talonvoice/noise/tree/speech-dataset

There are keyboard shortcuts - up/down/space to move through the list and record quickly.

If you want to use it on arbitrary text prompts, you can modify this function to return each line from a text file: https://github.com/talonvoice/noise/blob/speech-dataset/serv...

If you use this, before recording too much, do some test recordings and make sure they sound ok. Web audio can be unreliable in some browsers.

The uploaded files are named after the short name, so make sure you can correspond the short name with the original text prompts, eg with string_to_shortname().

If you aren’t easily able to do this yourself, I’d be happy to spin up an instance of it for you with text prompts of your choosing.



Somewhat OT question: after taking a quick look at this I stumbled on the eye-tracking video you made using pops to click... and I'm curious, can eye trackers not detect and report blinking?

Also, I noted the VLC demo says it doesn't use DNS! That's awesome...


I can detect blinking, yes. However your eyelids have very small muscles that are not meant to be consciously controlled all day and I don’t recommend straining them. Your eyelids twitching from muscle strain is rather uncomfortable (from experience)

The VLC demo was using macos speech recognition. In the beta now I’m shipping my own engine+trained models based on Facebook’s wav2letter, which is going pretty well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: