Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Gaussian Head Avatar: Ultra High-Fidelity Head Avatar via Dynamic Gaussians (yuelangx.github.io)
175 points by phil9l on Dec 8, 2023 | hide | past | favorite | 44 comments


This is excellent! A similar paper from Meta was published two days ago on Avatars too

https://shunsukesaito.github.io/rgca/

Here's the discussion for it(empty as of right now): https://news.ycombinator.com/item?id=38554537


It is, I fully expect at some point we'll get "in game" MMORPG player characters that are emotively very very effective. The immersion level will be pretty intense at that point, especially if it's VR but even a 3rd person view type like WoW using a web cam to process your facial expression?

Another interesting application would be "zoom" meetings where everyone is shown around the table or in the audience and it processes their emotive state in real time. That could help speakers engage with the audience in a better way.

Of course the "bad" uses of this tech are pretty out there too, from porn apps, to ripping people off by getting a "zoom" from a relative.


> Another interesting application would be "zoom" meetings where everyone is shown around the table or in the audience and it processes their emotive state in real time. That could help speakers engage with the audience in a better way.

See Permutation City[0]. Another application is actually masking reactions selectively. There's some interesting aspects that play out with respect to this and some other interesting aspects tech that people will now view as near Sci-Fi. The VR identity cloning is a common Sci-Fi plot, not specifically a Permutation City thing. Great book, highly recommend.

[0] https://en.wikipedia.org/wiki/Permutation_City


Star citizen has a very basic version of FoIP, it does help convey emotive state even if a little janky.

Add in stuff like head tracking and you get a very natural feeling to interacting. Give me haptics and I'll be super happy.

Hand gestures go a long way too


I think the tech in that paper was demoed for this podcast: https://youtu.be/MVYrJJNdrEg

The big roadblock to commercialisation for the moment is the original capture — for the paper they used a 110-camera capture rig under ideal lighting conditions.

In the above podcast Zuckerberg mentions that in the future people will be able to use their phones to do that same capture, but I don’t think that tech is coming next year.

I wonder if there’ll be an interim period where people who want high-quality avatars will have to book an appointment.


There was also this project that was posted a couple days ago: https://blog.metaphysic.ai/controllable-deepfakes-with-gauss...

Why they call a virtual avatar a deepfake beats me though....


> Why they call a virtual avatar a deepfake beats me though....

What's the difference? In both cases you're simulating somebody's face in a way that doesn't actually require having the original to drive it.


If you can "wear" the face of someone else then that seems like deepfake territory.


I didn’t expect Gaussian splats to be so good at approximating geometry. It’s cool when you see a new foundational approach to something that’s been done a certain way for decades.


Someone mentioned previously that guassian splats are ideal extension of Ai image generation into 3d and they might have turned out correct


They’re basically voxels after we abandon Cartesian 3-dimensionality (which imposes cubes)


More like point clouds since they are not constrained to a grid.

NERF is more like voxels though.


What do you mean NERF is like voxels? It is very much NOT like voxels


The original NERF paper had a voxel-like grid, but instead of regular voxels with a single color per grid cell, they had an MLP in each cell that models the color as a function of position and angle.


The literal definition of a NeRF and rendering function in the original paper are a continuous non-discretized vector-field and sampling based render function.


I'm sorry, you're right. I got confused with some other papers like Plenoxels that use voxels.


Most of the time they aren't that good at approximating geometry. They are good at approximating the "appearance" of geometry. However, many regularization techniques and priors can be introduced to make the Gaussian splatting technique better at geometry approximation.


The rotating avatars have some uncannyness to them due to the eye gaze not fixating on a target as it moves around. I think slowly rotating the camera around the model would have done better. Perhaps also some background elements so it's clear that the avatar isn't moving, the camera is.

I'm hoping this technique can be used in video games because it's significantly better than what we have now.


Why would they even include a "Code" link if it's just an empty GitHub repo with a README?


It will get updated. This is pretty common in research papers. They're just getting the link attached because it's easier to update the repo than the paper.


Stills look great, fidelity is there ...

The actual rotating avatar videos still have extremely poor approximations of human musculature especially at the eyes and jaw (bc these are hollow surface meshes naturally)

Is there research to overlay these models on more representative facial muscles?


And now: - the 4-hr Work Week toolkit is complete. - deep-fakes are now just regular fakes


When can we send our gaussian heads to the Zoom meeting?


Excited for the next generation of Personas for the Vision Pro, etc :-)


How can we trust photos now? Fiction or reality, it's becoming harder to differentiate


I think that point has come and gone. Not sure how society will adapt - I really hope smart people are out there working on this sort of stuff.


at which point in the future or past will it be accepted theory that the singularity already happened some 20 years ago


We will get a whole new level of fake news.

Online meetings for important things are now unsecure because you can't be sure the other people are who they claim to be.


People can lie, people can send emails that look like they're from your boss. People now know deepfakes are a thing and have an immunity from trusting suspicious online meetings where your boss acts different than they usually do. Etc etc. It's not as big of a threat as people want to make it out to be


People routinely fall for lies and get phished from those emails that look like they're from your boss. Every year there are a handful of high profile tech companies that get hacked because someone you would think should know better falls for a phishing scam. I think this is a bigger threat than people are making it out to be.


We will get "recordings" were people will plot the great reset.

We already got fake audio of news anchors apologizing for years of lies.

We will get a lot more of that.


Security for a video call is from the user account not visual verification.


People get their company accounts compromised all the time.

It's one thing to get a poorly-worded email from your CFO asking for company bank info, but it's a whole other thing to be asked over a Zoom video call by who you think is the right person, but it's a fake gaussian splat avatar.

There's already precedent for scammers doing similar things using voice deepfaking over phone calls. This could be a whole new level of phishing.


Should be but people can easily be fooled.


"represented by controllable 3D Gaussians"

They just assume everyone knows what they mean with "3D Gaussian".


I don't think a research paper is meant to be understood by everyone, and I imagine the authors don't have that expectation either.


No, they assume their peers do.


It is academia ;-)


I actually have the same gripe, unfortunately there is a long history of academic fields naming things like this (there’s an entire Wikipedia page on this banned after Gauss/“Gaussian” https://en.wikipedia.org/wiki/List_of_things_named_after_Car...)


Not really. There are many things named after Gauss but a "Gaussian" is almost always meant to be a probability distribution / density function that is very well understood and defined (and common)


Seems like they just mean the vector version of a Gaussian function: f(r) = exp(-r•r). Basically a "bell curve" except in 3D so it's a ball that's dense at the center and dies off. Then the optimizer might learn to produce an intensity, offset, and width for each point in a cloud, so the A,B,C for A*f((r-B)/C) at each point or something.


This is in fact what the optimizer does. At least in the original paper, the model learns to skew and rotate the gaussians.


Take a linear algebra course or read a textbook before trying to read and understand cutting edge ML research!


Rude!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: