Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The bottleneck is the annotations: there's no easy way to annotate "emotions" on the scale of data needed to have the model learn the necessary verbal tics.

In contrast, image data on the intent for image generation models is very highly annotated in most cases.



Oh yeah, the annotations are lacking compared to images. Again from the academic side, I think one solution could be to recruit theater majors just learning about 'verbing their lines' and having a collaboration between CS and Theater to produce a a proof-of-work dataset (since an acting class won't have more than 20-30 students in it). You'd need significantly more annotations, but you'd now have some labels to ascribe to texts with context since its a dialogue involving 1-* individuals.


I wonder how theatre students will feel about helping to train an AI to produce theatrical TTS? Artists seem pretty mad about their work being used to automate artwork.


There are lots of video content with audio. We can train a facial expression classification model to detect the speaker's emotion(we can also use a multimodal model to take in consideration of the language context).

Another potential source of data is voice acting script of animations. I always thought the storyboards of films/animations can be great annotated training data but it seems there are no open datasets, probably because of copyright issues.


Just run an LLM in sentiment analysis mode to annotate.


That doesn't factor in line delivery. You can have the words say/mean one thing (e.g. "I'm fine.") and the delivery say/mean another (defensive, distraught, etc.).

It also does not account for where stresses, emphasis, pauses, etc. are placed to enhance the delivery of a given text.

How do you get sentiment analysis to properly annotate an audiobook that has a dramatic reading, or something akin to the narration of the Game of Thrones or Harry Potter books where the narrators switch characters, accents, manarisms to portray the written content?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: