It is, I fully expect at some point we'll get "in game" MMORPG player characters that are emotively very very effective. The immersion level will be pretty intense at that point, especially if it's VR but even a 3rd person view type like WoW using a web cam to process your facial expression?
Another interesting application would be "zoom" meetings where everyone is shown around the table or in the audience and it processes their emotive state in real time. That could help speakers engage with the audience in a better way.
Of course the "bad" uses of this tech are pretty out there too, from porn apps, to ripping people off by getting a "zoom" from a relative.
> Another interesting application would be "zoom" meetings where everyone is shown around the table or in the audience and it processes their emotive state in real time. That could help speakers engage with the audience in a better way.
See Permutation City[0]. Another application is actually masking reactions selectively. There's some interesting aspects that play out with respect to this and some other interesting aspects tech that people will now view as near Sci-Fi. The VR identity cloning is a common Sci-Fi plot, not specifically a Permutation City thing. Great book, highly recommend.
The big roadblock to commercialisation for the moment is the original capture — for the paper they used a 110-camera capture rig under ideal lighting conditions.
In the above podcast Zuckerberg mentions that in the future people will be able to use their phones to do that same capture, but I don’t think that tech is coming next year.
I wonder if there’ll be an interim period where people who want high-quality avatars will have to book an appointment.
I didn’t expect Gaussian splats to be so good at approximating geometry. It’s cool when you see a new foundational approach to something that’s been done a certain way for decades.
The original NERF paper had a voxel-like grid, but instead of regular voxels with a single color per grid cell, they had an MLP in each cell that models the color as a function of position and angle.
The literal definition of a NeRF and rendering function in the original paper are a continuous non-discretized vector-field and sampling based render function.
Most of the time they aren't that good at approximating geometry. They are good at approximating the "appearance" of geometry. However, many regularization techniques and priors can be introduced to make the Gaussian splatting technique better at geometry approximation.
The rotating avatars have some uncannyness to them due to the eye gaze not fixating on a target as it moves around. I think slowly rotating the camera around the model would have done better. Perhaps also some background elements so it's clear that the avatar isn't moving, the camera is.
I'm hoping this technique can be used in video games because it's significantly better than what we have now.
It will get updated. This is pretty common in research papers. They're just getting the link attached because it's easier to update the repo than the paper.
The actual rotating avatar videos still have extremely poor approximations of human musculature especially at the eyes and jaw (bc these are hollow surface meshes naturally)
Is there research to overlay these models on more representative facial muscles?
People can lie, people can send emails that look like they're from your boss. People now know deepfakes are a thing and have an immunity from trusting suspicious online meetings where your boss acts different than they usually do. Etc etc. It's not as big of a threat as people want to make it out to be
People routinely fall for lies and get phished from those emails that look like they're from your boss. Every year there are a handful of high profile tech companies that get hacked because someone you would think should know better falls for a phishing scam. I think this is a bigger threat than people are making it out to be.
People get their company accounts compromised all the time.
It's one thing to get a poorly-worded email from your CFO asking for company bank info, but it's a whole other thing to be asked over a Zoom video call by who you think is the right person, but it's a fake gaussian splat avatar.
There's already precedent for scammers doing similar things using voice deepfaking over phone calls. This could be a whole new level of phishing.
Not really. There are many things named after Gauss but a "Gaussian" is almost always meant to be a probability distribution / density function that is very well understood and defined (and common)
Seems like they just mean the vector version of a Gaussian function: f(r) = exp(-r•r). Basically a "bell curve" except in 3D so it's a ball that's dense at the center and dies off. Then the optimizer might learn to produce an intensity, offset, and width for each point in a cloud, so the A,B,C for A*f((r-B)/C) at each point or something.
https://shunsukesaito.github.io/rgca/
Here's the discussion for it(empty as of right now): https://news.ycombinator.com/item?id=38554537