Tech starts 1/3 the way down the article (ctrl+f "clean speech")
> My lab was the first, in 2001, to design such a filter, which labels sound streams as dominated by either speech or noise. With this filter, we would later develop a machine-learning program that separates speech from other sounds based on a few distinguishing features, such as amplitude (loudness), harmonic structure (the particular arrangement of tones), and onset (when a particular sound begins relative to others).
> Next, we trained the deep neural network to use these 85 attributes to distinguish speech from noise.
> One important refinement along the way was to build a second deep neural network that would be fed by the first one and fine-tune its results. While that first network had focused on labeling attributes within each individual time-frequency unit, the second network would examine the attributes of several units near a particular one
> Even people with normal hearing were able to better understand noisy sentences, which means our program could someday help far more people than we originally anticipated
> There are, of course, limits to the program’s abilities. For example, in our samples, the type of noise that obscured speech was still quite similar to the type of noise the program had been trained to classify. To function in real life, a program will need to quickly learn to filter out many types of noise, including types different from the ones it has already encountered
As someone with a cochlear implant who lives with the consequences of overly clever programmers who thought they'd "help" by filtering out noise and volume and whatever else... I really wish they wouldn't. This is a technology that makes me so angry some days that I sometimes wonder if it was worth getting implanted, even though I know it was.
This is something I do wonder about, in this context. I don't have a CI myself, but my 6 year old son does, and I am somewhat concerned that he is-or-might be experiencing partial sound "blindness" (meaning: sure speech processing is adequate but there are surely some things that are processed away). I have a fair amount of experience in music/sound-recording environments and it makes me somewhat sad for him that he's still "missing out" (although obviously this is outweighed by the fact that he can actually hear and communicate now, but I'm sure you get what I mean).
I'd get into the area myself, if I was in anyway useful with DSP code or C++..
May I ask, were your hearing issues (leading to the CI) a recent thing, or long-term? My main interest here is about using machine learning to assist people who do not know sign-language to understand signers rather than to "improve" the actual hearing process (because - personally - my S/L skills are abysmal).
Interesting idea on using ML to help non-signers understand sign language. In this thread's context, the ML is designed to help people hear better. In visual contexts (which sign language lives in) would this hypothetical ML help low vision or blind people see better?
Because people with 20/20 vision just need a sign language dictionary handy and some patience.
> Interesting idea on using ML to help non-signers understand sign language.
The dream for me is something like Google Glass with an app that can subtitle spoken, written, and signed language.
> Because people with 20/20 vision just need a sign language dictionary handy and some patience.
I would think a LOT of patience... the easiest way at that point would just to have the other person fingerspell or write what they're saying; if you're watching something where that's not possible, then the dictionary will just be an exercise in frustration.
Yeah, well that would be one way of handling it, but unfortunately the real world has terrible issues with not impeding my progress on that front. Not that I'm anti-learning, at all, but - personally - I'm fighting a losing battle against learning German, Swiss-German and Swiss-German Sign-language whilst also being a walking-talking-english-lesson :D
Taking the slow way, with dictionary in hand, is as you point out, an exercise in frustration (especially if the talker/signer is 6 years old).
Yes, I share your dream of something google-glass-like that can add subtitles. There are people working on this (mostly in the UAE, if memory serves). Interesting times ahead - hopefully I won't have to wait long, otherwise I'll have to do it myself and that really would take a while ;)
Long term; I have progressive sensorineural hearing loss that was noticed when I was about two and had reached "profound" levels by the time I was about ten. Implanted in my left ear at 16.
I actually made the decision to be a part of the normal public school system and never learned sign language, partially out of stubbornness, so I can't really help with SL-related questions, though I'd be more than glad to answer hearing/CI-related questions.
Which one have you got? I know what you mean with the "helpful" bullshit. My conventional hearing aid, on my left ear, has this "smart" mode where it tries to detect speech vs noise, and change the volume or the directionality of the microphone to compensate. You end up with this wildly fluctuating volume all the time where it feels like stationary objects are coming at you. I had them turn that feature off asap.
On the other hand, my cochlear implant (right side), has a directional microphone that's actually incredibly useful in noisy situations. Combined with the directional mic on the hearing aid, I can actually hear almost as well as a normal person in a crowded bar, after 20+ years of avoiding them because of how impossible they were to cope with.
I strongly recommend it if you can get it - the Nucleus Freedom 6. I'm saving my pennies up to get a second one.
I have the Advanced Bionics Harmony BTE. Since my implant is AB, I wouldn't be able to get the Nucleus Freedom 6.
I have an in-ear mic, which does wonders for reducing surrounding noises and also for letting me use a phone normally, but my main issue is with the software itself; I've had issues with it since implantation and they've always been pooh-poohed by audiologists at Hopkins, Tokyo University, and Toranomon. The biggest problem is that it seems to operate on some kind of averaging system -- when there's a noise that's louder than the recent average, everything just cuts out for a few seconds. This is especially noticeable in the morning, where I've just woken up and am trying to get to work, but there are cars and trains etc. making noise and making my hearing cut in and out constantly, which not only drives me up the wall but gives me a terrible headache.
I have had so many problems with the implant in general that are just brushed off as "well, you're unusual." It won't even stay on my head without me putting a few extra magnets on the headpiece.
> overly clever programmers who thought they'd "help" by filtering out noise and volume and whatever else
I used to work at a CI manufacturer. Just thought you should know that this isn't what happens: all the features are developed by researchers or experts before even getting to the engineers, and go through clinical trials (usually several) to prove their effectiveness. They don't, and can't, add new sound processing on a whim to be 'clever'.
If the sound processing on your device is making you frustrated you should definitely discuss it with your audiologist. A lot of features can be configured or disabled completely.
> I used to work at a CI manufacturer. Just thought you should know that this isn't what happens
I didn't actually think it was; it's just a bit of annoyed snark because this has severely impacted my QOL for a decade. :)
> If the sound processing on your device is making you frustrated you should definitely discuss it with your audiologist. A lot of features can be configured or disabled completely.
Tried it several times; she was unconvinced that it was actually affecting me. Got a new audiologist; she was convinced that the setting couldn't be changed. Moved to Japan, got a new audiologist: convinced that I'm imagining it and doesn't think the setting can be changed anyway.
I hear that a long time ago, people used to be able to buy the adapters to program their processors themselves.
That last part seems to really put a damper on things, since the problem the author describes is that a person with a hearing aid requires speakers to take turns. Apparently, when people speak together, the multiple voices clash. Even if the hearing aid amplifies voices only, that problem remains.
Still, cooler than a lot of things Deep Learning is being applied to these days.
> My lab was the first, in 2001, to design such a filter, which labels sound streams as dominated by either speech or noise. With this filter, we would later develop a machine-learning program that separates speech from other sounds based on a few distinguishing features, such as amplitude (loudness), harmonic structure (the particular arrangement of tones), and onset (when a particular sound begins relative to others).
> Next, we trained the deep neural network to use these 85 attributes to distinguish speech from noise.
> One important refinement along the way was to build a second deep neural network that would be fed by the first one and fine-tune its results. While that first network had focused on labeling attributes within each individual time-frequency unit, the second network would examine the attributes of several units near a particular one
> Even people with normal hearing were able to better understand noisy sentences, which means our program could someday help far more people than we originally anticipated
> There are, of course, limits to the program’s abilities. For example, in our samples, the type of noise that obscured speech was still quite similar to the type of noise the program had been trained to classify. To function in real life, a program will need to quickly learn to filter out many types of noise, including types different from the ones it has already encountered
oh