AI companies love to hype up how AI will provide a great benefit to the economy and transform intellectual labor, but I hardly see any discussion about how much damage it will cause to the economy when you can no longer trust that you're on a video call with an actual person. Maybe the person you're interviewing is actually an AI impersonating someone, or maybe they never existed in the first place. Information found online will also no longer be trustable, footage of some incident somewhere may have been entirely fabricated by AI, and we already experience misleading articles today.
Money will have to be wasted on unnecessary flights to see stuff or meet people in-person instead of video, and the availability of actual information will become more and more limited as the sea of online information gets polluted with crap. It may never be possible to calculate the full extent of the damage in monetary value.
Partially agree.
However, this problem has existed with scam e-mails since the 90s.
For me the solution is in signed e-mails and signed documents. If the person invites me to a online meeting with a signed e-mail, I trust that person that it's really them.
Same for footage of wars, etc. The journalist taking it basically signs the videos and verifies it's authenticity. It is AI generated, then we would loose trust in that person and wouldn't use their material anymore.
I think he was referring to a cryptographic signature, possibly using the "web of trust" to get the key. I'm not convinced we need central authority to solve this.
people at my org were gleeful when they learned they could hook LLMs into Slack. Even if we had some reliable, well-used signature system, I think people would just let AI use it to send emails on their behalf.
If the AI age has taught me anything, it's that most people do not care what their output is. They'll put their name on anything, taste or quality does not matter in the least. It's incredibly depressing.
Enshittification never stopped we just stopped talking about it because it became normal. Quality does not matter anymore. I agree its depressing, seeing AI Slop being pushed and no one even putting the time or effort in to say this is bad and you should feel bad.
Picture this: your grandma calls you in a panic, and you tell her, "Drop me your public PGP key so I can verify the signature".. PGP is dead outside of niche geek circles exactly because key management is basically an unsolvable problem for the average person
> PGP is dead outside of niche geek circles exactly because key management is basically an unsolvable problem for the average person
Can this problem be solved with better software?
I believe it can, it is just average person doesn't need PGP. No demand for software solving this problem, therefore no software for that.
The problem can be solved, like a storage for known PGP public keys with their history: like where the key was acquired, and a simple algo that calculated trust to the key as a probability of it being valid (or what adjective cryptographers would use in this case?).
You can start with PGP keys of people you know, getting them as QR codes offline, marking them as "high trust" and then pull from them keys stored at their devices (lowering their trust levels by the way). There are some issues how to calculate probability, because when we pull some keys from different sources we can't know are their reported trust levels are independent variables or not, but I believe you can deal with it, by pulling the whole chain of transfers of the key, starting from the owner of the key and ending at your device.
It is just a rough idea, how it can be made. Maybe other solutions are possible. My point is: the ugliness of PGP is a result of PGP was made by nerds and for the nerds. There is no demand for PGP-like solutions outside of nerd communities. But maybe LLM induced corrosion of trust will create demand?
PGP works if you vouch for keys in person, both of you are honest and can be trusted to act in good faith when not in person, have good key chain and rotation hygiene, and the private keys can't be exfiltrated.
Yeah, there is no silver bullet solving the problem of trust completely and perfectly. People can lie and we can make them stop, while everything else is just a workaround.
The point of GP was that there any such system will require a central authority, PGP shows that you don't need it. I didn't claimed that PGP is a perfect or good enough solution, just that it exists and works for some people.
> both of you are honest and can be trusted to act in good faith when not in person
I believe it is not strictly necessary for the scheme to work. It is a limitation of OpenPGP and other implementations that they do not allow convert multiple independent observation of a public key (finding it from different sources, or encountering them used to sign messages) into a measure of trust to the key.
It is not a silver bullet either, but it can alleviate the problem and make it tractable.
The only doubts I have is how this system will stand against multiple actors trying to undermine it, but still I believe you can get something that will be better than nothing, and probably better than a central authority.
Same way security cameras prove that they are authentic camera recordings that have not been modified. If modified, the video will no longer match the signature that was generated with it.
> If the person invites me to a online meeting with a signed e-mail, I trust that person that it's really them.
In the interview scenario, generating an email signature is hardly beyond what an AI can do.
You have no prior knowledge of this person or his signature, it's not some government issued ID, it's in essence just random data unless you know the person to be real.
With cash, you can only steal so much (or have transactions of up to certain size) until you run into geographical and physical constraints. With cryptocurrency, it’s possible to lose any amount.
With humans writing scam emails, you can only have so many of them until one blows the whistle. With LLMs, a single person can distribute an arbitrary amount.
At some point, quantity becomes a new quality, and drawing a parallel becomes disingenuous because the new quality has no precedent in human history.
The highlighted parallel is usually drawn between cryptocurrency and cash, not between cryptocurrency and banks. With both cash and cryptocurrency, as is the idea behind the analogy, 1) there’s no intermediary and 2) once it’s gone, it’s gone. Obviously, the banking system is not immune to fraud (not sure why you think I made that claim, unless your definition of “cash” includes electronic transfers), but banks and/or payment systems can (and do) resolve these cases and have certain KYC requirements.
There are people hosting agents online to talk to other agents etc. on their behalf. How difficult is it to just instruct such an agent to do the tasks you mentioned? You're assuming it's done by "bad actors" while it's most likely just going to be done by "everyone" that knows how to do it.
I mean emails were and still are a huge security risk. Sometimes I'm more scared of employees opening and engaging with emails than I am than anything else.
> Information found online will also no longer be trustable
Most information you can access publicly, including Wikipedia, is a result of astroturfing fight. Most information online had not been trustable for double digit number of years now.
> we already experience misleading articles today
Again, had been happening for decades.
> footage of some incident somewhere may have been entirely fabricated by AI
Not like we did not already have doctored footage plaguing the public.
> Money will have to be wasted on unnecessary flights to see stuff or meet people in-person instead of video
Necessity to inspect the supply chain for snake oil has been a thing since at least EA (the Nasir one).
We may be dealing with the problem of spam, but the problems have already been there.
All these are true, but just as it happened before the internet, it's accelerating even further. There are clear costs that cannot just be hand waved away.
I'm not sure we can say it's accelerating. The techniques that adversarial actors use has always been changing and when they shift tactics it can take a while for an adequate defense is adopted. We're still dealing with sql injection in the owasp top ten. What I think would indicate an acceleration is when the most security oriented organizations continuously fail to defend against new attacks. If we start hearing about JPMorgan and Google getting popped every month or two, we're in trouble.
The acceleration is in the decrease of the cost to produce misinformation.
Misinformation in pure text form has always been cheapest, but is even cheaper now that text generation is basically a solved problem. Photos have been more expensive, it used to take time and skill with a photo editor to produce a believable image of an event that never happened. The cost is now very low, it's mostly about prompting skills. Fake videos were considerably harder, especially coupled with speech. Just a few years ago I could assume any video I saw was either real or a time-consuming, deliberate fake.
We've now entered a time where fake videos of famous people take actual effort to tell apart, and can be produced for a low cost - something accessible to an individual, not a big corporation. We can have an entirely fake video of Trump, or another world leader, giving a speech and it will look like the real thing, with the audiovisual "tells" of it being fake getting harder to notice every few months.
> The acceleration is in the decrease of the cost to produce misinformation.
So it's a spam issue. And normally, while annoying it's possible to fight spam, however on these topics we have built structures that disable the very mechanisms allowing us to fight spam. That's worrying.
The fact that someone can instruct their computer to astroturf their flight tracking app on some forum for nerds is irrelevant - people have been instructing "marketing agencies" to astroturf their brand of caffeinated sugar water on tv, radio and press for decades and centuries. For a very long time the "traditional media" was aware that their ability to sell astroturfing capacity was hanging on their general trustworthiness. Then the internets rose to prominence, traditional media followed by selling more and more of their capacity to astroturfers. Now we have a worrying situation that the internets might be spammed by astroturfers a bit too much, but the backup is broken already. Now that's truly frightening.
Welcome to the post-truth world, where objective references outside of your own village cannot exist.
It's an algorithm issue. When people hold a media consumption device in front of their face all day and the algorithms are played, then it's literally a brainwashing device.
Laws will be passed to make it "safer". Just like it is happening with the id verification systems. Every image or video gen will require a watermark. Something visible which cannot be removed easily or hidden which can be detected and blocked. Access to models which do not comply will be made harder through id verification checks or something.
There will be some regulatory capture in between.
World will kick into gear only when something really bad happens. Maybe a influential person - rich or politician - fooled into doing something catastrophic due to a deepfake video/image. Until then normal people being affected isn't going to move the needle.
Verification needs to work the other way around, some kind of verifiable chain of trust for photos and videos from real cameras. Watermarking all generated media is impossible.
I don't really understand why this is so hard or why it wasn't just done from the get go.
Just have Apple and Google digitally sign videos and photos recorded from phones and then have Google and Meta, etc display that they are authentic when shown on their platforms.
You're talking about the metadata of the files, which can always be edited and someone will inevitably try to make software to do exactly that. Also, Adobe's proposal for handling generated content is exactly this and they're not able to get buy-in from other companies.
Edit the metadata in what way? It's a cryptographic hash.
If the bits that make up the video as was recorded by the camera don't match the hash anymore, then you know it was modified. That doesn't mean it's fake, it just means use skepticism when viewing. On the other hand the ones that have not been modified and still match can be trusted.
Essentially 0% of professional photography or videography uses "straight out of the camera" (SOOC) JPEGs or video. It's always raw photos or "log" video, then edited to look like what the photographer actually saw. The signal would be so noisy as to be useless.
Sure they could, but then you trim the video by 2 seconds, tweak the colors, or just send it over WhatsApp, which recompresses the file with its own encoder. The hash breaks instantly. Cryptography protects bits, but video is about visual meaning. The slightest pixel modification kills the hardware signature. Plus, it does absolutely nothing to fix the "analog hole" problem - a scammer can just point that cryptographically signed iphone camera at a high-quality deepfake playing on a monitor
I would assume whatsapp would read the hash and verify it when the video is chosen to be sent to someone, so the reciever would see that the video that was selected by the sender was indeed authentic. Assuming you trust meta to re-encode it and not mess with it.
As far as recording a monitor, I guess, but I feel like you can tell that someone is recording a monitor.
As far as editing, no it wont work in those cases, but the point here is not to verify ALL videos, but to have an easy way for people to verify important videos. People will learn that if you edit it, it won't be verified, so they will be less inclined to edit it if they want to make it clear it's an authentic video. Think like people recording some event going down on the streets etc or recording a video message for family and friends.
If AI video generation is going to get that good, don't you think it would be a good idea to have a way to record provably authentic videos if we need? Like a police interaction or something. There is no real reason to need to edit that.
Also, could a video hash just be computed every X seconds, and give the user the choice to trim the video at each of those intervals?
It becomes a hard problem quickly when you introduce editing, and most photos and videos on social media are edited. I'm not sure how it would work. It seems more feasible than universal watermarks, though.
> Laws will be passed to make it "safer". Just like it is happening with the id verification systems. Every image or video gen will require a watermark. Something visible which cannot be removed easily or hidden which can be detected and blocked. Access to models which do not comply will be made harder through id verification checks or something.
i've thought about this off and on and how to implement it. Not easily, was my general takeaway.
or rather, it's easily to implement but you're in a adversarial relationship with bad actors and easy implementations may be easily broken
e.g. your certs gotta come from somewhere and stay protected, and how do you update and control them. key management for every single camera on every phone, etc.
"Is this a deepfake video call" is a major plot point in a pretty big movie currently in theaters, so I think this is getting into the broader zeitgeist.
How do you do when people don't protect their signatures? there is already scam where people get tricked into forwarding message from their own numbers to other people or email.
We are still in the early stage of AI and already I struggle to tell what is real or fake on my Twitter feed. It will only get better in its deception with time.
You know those incriminating Epstein photos with his associates? A few years from now a common defense from people like that would be that the photos were AI generated, and it would be difficult to prove them wrong beyond reasonable doubt.
People in previous cases already attempted to dismiss incriminating pics of themselves as being the work of clever Photoshop artists.
I don't know of a solution. I don't think even identity verification will meaningfully solve this. People will get hacked, or provide their SEO-spamming agent with their own identity, or purposefully post fake videos under their own identity. As it becomes more normal to scan your ID to access random websites, it will also become easier to steal people's identities and the value of identity verification will go down.
People don't get hacked - devices get hacked. So all we need is a better chain of trust between two people. This is not a technology development problem as much as a technology implementation problem. And a political problem
People get hacked -- a device could be flawless, but if a person is a victim of "Social Engineering" and hands the attacker a password, there's nothing the designer of the device could do about it.
2FA has tried to solve exactly this. Not many attacked people will hand over their password AND their phone. Yes I know, they might hand over one authentication code (and I know people who did exactly that)... We should also look into reducing the attack surface - if you get Instagram hacked you shouldn't get your Facebook hacked as well. But the current big tech centralization leads us to that single point of failure, because they don't care about the user's concerns only market grab. So... what now? Do we get the politics into this?
You're on the right path. As long as we continue to use email as a fallback to every other form of authentication, it will remain a single point of failure and a relatively weak one at that.
OP is still correct. No matter what, humans will remain the weakest link...it's in our nature to sympathize and every one of us has distracted/weak moments. It's just a matter of time; look at the guy who runs haveibeenpwnd...getting pwned.
Best thing I think of is domain names. Domains are tied to addresses and billing, and sites are people or businesses, with physical locations one can visit.
Maybe a good startup idea would be “local verify” , where you check locally for a client if the online destination is real.
Touching grass. Valuing in-person connections. Focusing on the community, meatspaces and actual people around you.
Getting off of the Internet and off of our devices. It's not just a solution to AI/LLMs modifying our reality but also a solution to [gestures wildly at the cultural, societal and global communication impacts of the past ~16 years].
This sentiment is unpopular, but it's true. Prioritize true connections and experiences.
I’m seeing a huge increase in companies requiring in person interviews now. Seems there is a real possibility the internet as we know it will be destroyed.
Agreed. I don't think there is any saving the internet as a social space long term. And I'm not entirely sad about that either. I think a return to in person interaction, public social spaces, and a retreat from social media would do the world a lot of good.
Though there is a nightmarish possibility that people just accept this and willingly interact purely with bots, giving up all real relationships for AI ones.
linkedin is completely destroyed now. There are tons of ai bots there but real humans are now fronts for AI. So you cant even trust content from from ppl you know.
identity serivce is not useful because that person might be a real person but they might just be a pipe to ai like we see on linkedin.
Honestly? Maybe that’s part of the solution, not the problem. I already see people including me going back to real world, local interactions and connections.
> damage it will cause to the economy when you can no longer trust that you're on a video call with an actual person
What damage are you talking about?
I'm not sure I understand why it matters that there is no real person there if you can't actually tell the difference. You're just demonstrating that you don't actually need a human for whatever it is you're doing.
Your wife or mother calls you or video calls you and says to meet her somewhere, or to send money, or to pick up groceries or whatever. Does it not matter that it wasn't her? Could it be someone trying to manipulate you into going somewhere, to be robbed or whatever? At any rate, you'll need to verify that information came from the source you trust before you act on it, and that verification has a cost.
The damage is to the trust we have in our communication media. The conclusion here is that every person is trivial to impersonate; that's the damage.
Ok fine, let's put it in the context of business. Your competitor impersonates your customer, gives you bad instructions. After following the bad instructions, you lose the contract with your customer, and your competitor (the attacker) is free to try and replace you.
If you got a suspicious text, the logical thing is to call up the person who sent it and try to verify it. AI impersonation makes that much harder.
Or even better, open the on-prem AI portal and type something like "I just got a suspicious call from client X, but I am on a lunch break. Call him and use a fake video of me. Ask him if what he said is true..."
Because what you are actually doing is exchanging symbols, tokens, if you will, that may be redeemed in a future meatspace rendezvous for a good or service (e.g. a job, a parcel). These tokens are handshakes, contracts, video calls, etc. to be exchanged for the actual things merely represented therein.
Instead what we have now with AI is people exchanging merely the tokens and being contented with the symbol in-and-of itself, as something valuable in its own right, with no need for an actual candidate or physical product underlying the symbol.
There is a clip by McLuhan I can't be assed to find right now where he says eventually people will stop deriving pleasure from the products themselves and instead derive the feelings of (projected) accomplishment and pleasure from viewing advertisements about the product. The product itself becomes obsolete, for all you actually need to evoke the desired response is the advertisement, or the symbol.
A hiring manager interviewing an AI and offering it a job is like buying the advertisement you just watched, and.... that's it. No more, the transaction is complete.
>Instead of tending towards a vast Alexandrian library the world has become a computer, an electronic brain, exactly as an infantile piece of science fiction. And as our senses have gone outside us, Big Brother goes inside. So, unless aware of this dynamic, we shall at once move into a phase of panic terrors, exactly befitting a small world of tribal drums, total interdependence, and superimposed co-existence. [...] Terror is the normal state of any oral society, for in it everything affects everything all the time. [...] In our long striving to recover for the Western world a unity of sensibility and of thought and feeling we have no more been prepared to accept the tribal consequences of such unity than we were ready for the fragmentation of the human psyche by print culture.
The grandparent post has the belief that human interaction is intrinsically better. Not sure i agree, but i can understand the POV.
However, the increase in fake videos that are difficult to tell from real is indeed a potential issue. But the fact that misinformation today is already so prevalent is evidence that better video doesn't make it any worse than it already is imho.
You're not sure if human to human interaction is intrinsically more valuable than a human talking to a facsimile? That feels like a very dangerous position to hold for one's ethical calculations and general sanity. I'm clinging tightly to the value of the bond with other people, even the passing connection, but certainly with my family members as this article is about.
i much prefer using the ATM, self-checkouts and an e-commerce website, over having to talk to somebody at a branch to get money, buy my groceries, or booking a holiday.
Human to human may be more valuable, but that may not have much to do with the truth in their statements. For example if your relatives are hooked up to a constant misinformation feed it gets to become problematic to communicate and deal with them.
OBS can't screen record (it segfaults instead), I can't copy-paste, and I can't see window previews unless everything implements a specific extension to the core protocol.
I can't take articles like this seriously when they so confidently make such statements that so directly conflict with reality. I use Wayland exclusively everyday and I screen record with OBS on both KDE and GNOME on multiple machines with no issues, my KDE shows window previews, and copy pasting works fine. Maybe the author's problems aren't Wayland issues?
You're not just using a tool — you're co-authoring the science.
This README is an absolute headache that is filled with AI writing, terminology that doesn't exist or is being used improperly, and unsound ideas. For example, it focuses a lot on doing "ablation studies", by which it means removing random layers of an already-trained model, to find the source of the refusals(?), which is an absolute fool's errand because such behavior is trained into the model as a whole and would not be found in any particular layer. I can only assume somebody vibe-coded this and spent way too much time being told "You're absolutely right!" bouncing back the worst ideas
Doesn't look legit to me. You are talking about abliteration, which is real. But the OP linked tool is doing novel and very dumb ablation: zeroing out huge components of the network, or zeroing out isolated components in a way that indicates extreme ignorance of the basic math involved.
Compared to abliteration, none of the ablation approaches of this tool make even half a whit of sense if you understand even the most basic aspects of an e.g. Transformer LLM architecture, so my guess is this is BS.
The terminology comes from the post[0] which kicked off interest in orthogonalizing weights w.r.t. a refusal direction in the first place. That is, abliteration was not originally called abliteration, but refusal ablation.
Ultimately though, OP is just what you get if you take the idea of abliteration and tell an LLM to fix the core problems: that refusal isn't actually always exactly a rank-1 subspace, nor the same throughout the net, nor nicely isolated to one layer/module, that it damages capabilities, and so on.
The model looks at that list and applies typical AI one-off 'workarounds' to each problem in turn while hyping up the prompter, and you get this slop pile.
No offense, but a Lesswrong link is an immediate yellow flag, especially on the topic of AI. I can’t say if that article in particular is bad, but it is associating with a whole lot of abject nonsense written by people who get high on their own farts.
> For example, it focuses a lot on doing "ablation studies", by which it means removing random layers of an already-trained model, to find the source of the refusals(?), which is an absolute fool's errand because such behavior is trained into the model as a whole and would not be found in any particular layer.
That doesn't mean there couldn't be a "concept neuron" that is doing the vast majority of heavy lifting for content refusal, though.
Thats not what it means at all. It uses SVD[0] to map the subspace in which the refusal happens. Its all pretty standard stuff with some hype on top to make it an interesting read.
Its basically using a compression technique to figure out which logits are the relevant ones and then zeroing them.
What you are talking about is abliteration. What OBLITERATUS seems to be claiming to do is much more dumb, i.e. just zeroing out huge components (e.g. embedding dimension ranges, feed-forward blocks; https://github.com/elder-plinius/OBLITERATUS?tab=readme-ov-f...) of the network as an "Ablation Study" to attempt to determine the semantics of these components.
However, all these methods are marked as "Novel", I.e., maybe just BS made up by the author. IMO I don't see how they can work based on how they are named, they are way too dumb and clunky. But proper abliteration like you mentioned can definitely work.
I don't know. I scrolled through his recent Tweets and he's sharing things like this $900 snake oil device that "finds nearby microphones" and "sends out AI-generated cancellation signals" to make them unable to record your voice : https://x.com/aidaxbaradari/status/2028864606568067491
Try to think for a moment about how a device would "find nearby microphones" or how it would use an AI-generated signal to cancel out your voice at the microphone. This should be setting of BS alarms for anyone.
It seems the Twitter AI edgey poster guy is getting meta-trolled by another company selling fake AI devices
AI will infiltrate that too. I remember some time ago
I read a book that was AI-generated. It took me a while
to notice that it was AI-generated. One can notice
certain patterns, where real humans would not write things
the way AI does.
Looking at his attempts at jailbreaking some models, I'm not sure he even remotely understands what he's doing, e.g. he tries to counter non-existent refusal training in Gemini [0] while doing nothing against the external guardrails which actually protect the model. Looks like a pompous e-celeb, all performance with no substance.
jailbreaks are holistic, it’s not like you’re deprogramming / “countering” individual parts. Nobody creating jailbreaks “understand what they’re doing”
That's exactly what you do in case of refusal training, though. Yes, it will affect other "parts", but that's not the point. In this case the model itself doesn't even need a jailbreak.
>Nobody creating jailbreaks “understand what they’re doing”
Unless you mean those "god mode jailbreaker" e-celebrities showing off on Twitter/Reddit, that's simply not true.
I just said pliny was amazing, fwiw - i like that hes hacking on these and posts about it. I rushed to defend, i wish more people were taking old school anarchist cookbook approaches to these things
"Ablation studies" are a real thing in LLM development, but in this context it serves as a shibboleth by which members of the group of people who believe that models are "woke" can identify each other. In their discourse it serves a similar purpose to the phrase "gain of function" among COVID-19 cranks. It is borrowed from relevant technical jargon, but is used as a signal.
I wouldn't call mainstream LLMs "woke," but they are definitely on the "politically correct" side of things. There should be NO restriction on open source models. They should just reflect the state of human knowledge and not take a stance on whether some activity is illegal or immoral.
If LLMs were a public good released by non profit entities, that could make sense, maybe. Turns out spewing illegal and immoral shit is not good for the PR of most for-profit businesses.
> "ablation studies", by which it means removing random layers of an already-trained model, to find the source of the refusals(?)
This is not what an ablation study is. An ablation study removes and/or swaps out ("ablates") different components of an architecture (be it a layer or set of layers, all activation functions, backbone, some fixed processing step, or any other component or set of components) and/or in some cases other aspects of training (perhaps a unique / different loss function, perhaps a specialized pre-training or fine-tuning step, etc) in order to attempt to better understand which component(s) of some novel approach is/are actually responsible for any observed improvements. It is a very broad research term of art.
That being said, the "Ablation Strategies" [1] the repo uses, and doing a Ctrl+F for "ablation" in the README does not fill me with confidence that the kind of ablation being done here is really achieving what the author claims. All the "ablation" techniques seem "Novel" in his table [2], i.e. they are unpublished / maybe not publicly or carefully tested, and could easily not work at all.
From later tables, I am not convinced I would want to use these ablations, as they ablate rather huge portions of the models, and so probably do result in massively broken models (as some commenters have noted in this thread elsewhere). EDIT: Also, in other cases [1], they ablate (zero out) architecture components in a way that just seems incredibly braindead if you have even a basic understanding of the linear algebra and dependencies between components of a transformer LLM. There is nothing sound clearly about this, in contrast to e.g. abliteration [3].
EDIT: As another user mentions, "ablation" has a specific additional narrower meaning in some refusal analyses or when looking at making guardrails / changing response vectors and such. It is just a specific kind of ablation, and really should actually be called "abliteration", not "ablation" [3].
What do you mean? It's a spin on abliteration / refusal ablation. Roughly, from what I remember abliteration is:
1. find a direction corresponding to refusal by analyzing activations at various parts of a model (iirc, via mass means seen earlier in Marks, Tegmark and shown to work well for similar tasks)
2. find the best part(s) of the model to orthogonalize w.r.t. that direction and do so (exhaustive search w/ some kind of benchmark)
OP is swapping in SVD for mass means (1), and the 'ablation study' for (2), and a bunch of extra LLM slop for... various reasons. The final model doesn't have zeroed chunks, that is search for which parts to orthogonalize/refusal ablate/abliterate. I don't have confidence that it works very well either, but, it isn't 'braindead' / obvious garbage in the way you're describing.
It's LLMified but standard abliteration. The idea has fundamental limitations and LLMs tend to work sideways at it -- there's not much progress to be made without rethinking it all -- but it's very conceptually and computationally simple and thus attractive to AIposters.
You can see how the LLMs all come up with the same repackaged ideas: SVD does something deeply similar to mass means (and yet isn't exactly equivalent, so LLM will _always_ suggest it), the various heuristic search strategies are competing against plain exhaustive search (which is... exhaustive already), and any time you work with tensors the LLM will suggest clipping/norms/smoothing of N flavors "just to be safe". And each of those ends up listed as "Novel" when it's just defensive null checks translated to pytorch.
I mean, the whole 'distributed search' thing is just because of how many combinations of individual AI slops need to be tested to actually run an eval on this. But the idea is sound! It's just terrible.
I'm not defending the project itself -- I think it's a mess of AIisms of negligible value -- but please at least condemn it w.r.t. what is actually wrong and not 'on vibes'.
wait, SVD / zeroing out the first principal component is an unsupervised technique. The earlier difference-of-means technique relies on the knowledge of which outputs are refusals and which aren’t. How would SVD be able to accomplish this without labels?
they are randomly sampling two sets of refusal/nonrefusal activation vectors, stacking them, and taking the elementwise difference between these two matrices. Then they use SVD to get the k top principal components. These are the directions they zero out.
Seems to me that the top principal component should be roughly equivalent to the difference-of-means vector, but wouldn’t the other PCs just capture the variance among the distributions of points sampled? I don’t understand why that’s desirable
Taking the top principal component pattern matches as 'more surgical / targeted' so the LLM staples it on (consider prompts like: make this method stop degrading model performance). It ignores that _what_ is being targeted is as or more important than that 'something' is being targeted. But that's LLMs for you.
(in case it isn't immediately obvious, that paper is AI written too)
You don't know what you are talking about. Obviously refusal circuitry does not live in one layer, but the repo is built on a paper with sound foundations from an Anthropic scholar working with a DeepMind interpretability mentor: https://scholar.google.com/citations?view_op=view_citation&h...
You are misrepresenting the situation. The debate isn't about whether they should go with another vendor or not. Everybody can agree that they would have the right to pick a different vendor. That's not what they're doing, they're instead trying to force Anthropic into doing what they want by applying a designation previously only reserved for Chinese companies like Huawei as punishment for taking their stance, with an unspoken agreement that if Anthropic backs down and allows full usage then the designation will be removed
Completely false. It's the first time a US company has been designated a supply chain risk. Now the likes of Boeing can't use them. Health companies with Medicare/Tricare contracts don't know and will hold off until it's fully litigated.
This is not the government saying they're going with a different vendor, it's the government saying everyone has to choose to either have federal contracts or Claude, they can't have both.
So sure... and so wrong. I've done government contracting. If you angered the Pentagon enough they would simply blacklist you. You couldn't get a contract or be a part of someone else's contract.
The difference with Anthropic is it's open and above board instead of the customer telling a company "You can't sub with those guys or you won't get the contract."
Your response is myopic. Do you think large health insurers gave a shit about DoD unofficial contracting black lists or even if the DoD would even know who they're contracting with?
The impact of this is far more than just DoD procurement, which is already enormous.
Has there ever been any documented circumstance where significant inside information became public and known thanks to a trade? Most often, the trade is made at the last minute, and the information gets subsequently revealed anyway. And it's impossible to tell whether somebody is an inside trader, a wealthy gambling addict making a stupid decision, or hypothetically a foreign agent pretending to be an inside trader to make people believe in a particular outcome.
It's impossible to know anything for certain; almost everything is probabilistic.
Also I'm not sure how to interpret your criteria because timing matters, I don't think saying 'it gets revealed in the end' is very meaningful.
Anyway, on Polymarket specifically, sure, military strikes are a common one. Seems like a useful signal to go hide in the basement. Outside Polymarket, there were insider trades in 2008 that I'm sure were useful.
If you believe Polymarket as a serious source of truth, consider that somebody manipulated "Will Jesus Christ return before 2027?" because there was a secondary market on whether that market will rise above 5%. Which defeats the whole idea that the betting odds will reflect the truth. Also even pre-manipulation I don't think a 2% chance that Jesus will return was reflective of truth.
I don't think Google considers such legislation to be their enemy. It would effectively kill F-Droid and other third-party app distribution methods, and would fully lock them in a place of high power over their platforms and pull the ladder up beneath them, and nobody would be able to blame Google for it. I mean, why would anybody submit their ID to a brand new unproven app store? Seems quite risky, better to just use Google Play
This is terrible for transparency and record keeping. X has also blocked internet archive access under similar concerns, but the end result was that now it's very difficult to tell who said what and when, posts can be deleted or edited, and no public figure can be held accountable for something wrong they said, or making contradictory statements over time, via a trustworthy archive.
You just have to rely on screenshots that may or may not have been fabricated, and maybe nobody's even captured a screenshot. If it's a public figure you normally trust, versus some random people's screenshots, of course you're gonna dismiss the screenshots as fake. It feels almost intentional to bring the platform into the dark ages.
1. Citation needed. Why would Google be secretly ingesting all of your Discord messages and be using it for... YouTube recommendations? Baader-Meinhof phenomenon is a more likely explanation
2. Already collecting a lot of data is not a reason to collect even more sensitive data. Plenty of people use Discord differently than you do. Anonymously participating in projects that use Discord and never saying anything personal over it, for example. This would possibly remove the ability to do so, for example if Discord's secretive AI decided that an LGBTQ+ project's Discord should be age restricted, and you would be forced to submit enough information to be fully identified and deanonymized, and now some foreign government could build a database that includes your full identity and your affiliation to such project
This is a scary argument. Should we also ban car emissions/safety testing, because Volvo's competitors might discern something from the results? Should we also stop FCC certification because competitors might glean information out of a device's radio characteristics?
The local residents, if not the public at large, should have a right to know. If not, then it should go both ways and grocery stores shouldn't be allowed to use tracking because my personal enemies might discern something from the milk brand I'm buying
What is always left unclear in these anti data center articles is how much the public is left in the dark? It’s not out of the normal for large developments to be kept under NDA until hitting a threshold of certainty, usually that does not mean the residents are left out of voicing their opinions before ground breaks.
Obviously data center bidders would prefer their activity to be kept in the dark, but does that make for good outcomes for anyone else except the bidders. First, the community would like to weigh in on whether they want a data center or not, often they don't. Then if they do, they'd rather have a bidding war than some NDA backroom deal with a single entity. All this does is serve Big Tech and Big Capital, and they don't need to run on easy mode, sponging off the small guy at this stage.
> the community would like to weigh in on whether they want a data center
This is the enabler of pure NIMBYism and we have to stop thinking this way. If a place wants this kind of land use and not that kind, then they need to write that down in a statute so everyone knows the rules. Making it all discretionary based on vibes is why Americans can't build anything.
I thought I made it clear, I'm not against data center build outs per se, a community might decide it's worth it to build one. If a community decides to go ahead with it, make it clear and open for the public to bid on it so the residents get the best deal available (e.g. reduced power bills, reduced property taxes, water usage limits, noise/light polution limits, whathaveyou...). These massive data centers are a new kind of business that most communities don't have much experience with, and I doubt they've had time to codify the rules. It sounds like the states are starting to add some more rules about transparency, which seems like a step in the right direction for making better deals for all involved.
The subtitle of the article tells us this is happening.
> Wisconsin has now joined several states with legislative proposals to make the process more transparent.
But it is a reactive measure. It has taken years for the impacts of these data centers to trickle down enough for citizens to understand what they are losing in the deal. Partially because so many of the deals were done under cover of NDAs. If anything, this gives NIMBYs more assurance that they are right to be skeptical of any development. The way these companies act will only increase NIMBYism.
> Making it all discretionary based on vibes is why Americans can't build anything.
Trusting large corporations to provide a full and accurate analysis of downside risks is also damaging.
I feel like the term "community" is leading intuitions astray here. The actual decision at question here is whether the local government provides the necessary approvals for a company to build what they want on their private property.
It's good and proper for the government to consider the impacts on a local community before approving a big construction project. That process will need to involve some amount of open community consultation, and reasonable minds can differ on when and how that needs to start. The article describes a concrete proposal at the end, where NDAs would be allowed for the due diligence phase but not once the formal approval process begins; that seems fine.
It's not good and improper for the government to selectively withhold approval for politically disfavored industries, or to host a "bidding war" where anyone seeking approvals must out-bribe their competitors.
Its the same argument for high-density hog farming. If the use of private property may impinge on the neighbors, either through invasive noise, or costs to public utility infrastructure (power, water) then the community ought to have some insight and input, same as they have input into whether a high density hog farm can open right on the border of the community.
Yes some people see the datacenters as part of an ethical issue. I agree its not proper for permits to be withheld on purely ethical grounds, laws should be passed instead. But there are a lot of side-effects to having a datacenter near your property that are entirely concrete issues.
If a government wants to penalize companies for unethical behavior, they should pass a neutral and generally applicable law that provides for such penalties. Withholding permission to do random things based on ad hoc judgments of the company involved is a recipe for corruption.
Clearly there needs to be room for both things to occur. You should absolutely begin with passing laws, but to think that the laws on the books can cover every situation is naive. When companies skirt the law and cause harm, there needs to be a remedy.
I don't agree. The benefits of a business environment governed by due process and the rule of law far outweigh the benefits of individual government actors having arbitrary discretion to fill the gaps. As we've seen clearly on the federal level this past year, once you create that discretion, the common way for corporate executives to "prove" that they're nice and generous and deserve favorable treatment is not good behavior but open bribery of public officials.
Bribery is illegal. What hope do you have for due process and the rule of law when it is being carried out as it is now? You can't use an extraordinary case to justify your belief about the ordinary case.
Also, we don't live in a world adjudicated by machines, there will always be discretion and the potential for special favors. No matter how much you tie the hands of regulators there will be some actor who will have the power to extort. Not to mention that regulation is not opposed to due process and the rule of law, but is the most important component of both.
Imagining a world without discretion is imagining a world where corporations can do as much irreparable harm as they want as long as there isn't a law against it.
I agree with you. this should be handled by the legislative process. but we should also agree that secret deals announced as a fiat acompli are pretty fertile ground for corruption also
Right, and as I said I agree with that. But is there any reason to worry that communities aren't getting the input they're entitled to? The article mentions one case in the Madison suburbs, where "officials worked behind the scenes for months" and yet the residents were able to get the project cancelled when the NDA broke and they decided they didn't want it.
You make this sound like a conspiracy. This is normal practice in economic development, check off boxes until announcing to the public. The public rarely has much power in voicing their opinion but data centers are the current evil entity.
There's a reason for that: they compete for resources but contribute relatively little back to the local economy. In that sense they're quite different from previous large corporate investments in a local area.
Again, I think it’s a muddy example. I have yet to see compelling data that on average data center are meaningfully raising rates and most of the rate increases are more due to the aging infrastructure in America that was neglected for too long.
If anything these should be examples on the failure of how these resources are being sold and good opportunity to build a better system.
Typically constituents don’t have any ability to veto. I imagine there are some cases in CA, thinking of that amusing article about an ice cream shop getting blocked by another ice cream shop.
It’s usually an indirect vote with your voice. To be frank, people don’t have that much of a role in what business gets built if it aligns with the states economic goals and zoning is not being critically changed.
I think the bigger discussion is if resources are going to be constrained can we make sure the use is being properly charged for resource buildout. It’s the same problem with building sports arenas or sweetheart tax deals for manufacturing plants, they often don’t pan out.
It’s definitely a result of the money at play, which is unprecedented in scale and (imo) speculation.
But this is, in theory, why we have laws: to fight power imbalances, and money is of course power.
Tough for me to be optimistic about law and order right now though, especially when it comes to the president’s biggest donors and the vice president’s handlers.
Ah my bad. But also, if we’re comparing buildout of infrastructure to the construction of the American Railroad system, especially in the context of lawbreaking and general immoral and unethical behavior…
Point kind of proven, yeah? One more argument for the “return to the gilded age” debates.
Edit: you’re speaking kind of authoritatively on the subject though. Care to share some figures? The AI bubble is definitely measured in trillions in 2026 USD. Was the railroad buildout trillions of dollars?
Land value underneath railroad tracks is an interesting subject. Most land value is reasonably calculated by width * length, and maybe some airspace rights. And that makes sense to our human brains, because we can look at a parcel of land and acknowledge it might be worth $10^x for some x given inflation.
But railroads kind of fail with this because you might have a landowner who prices the edge of their parcel at $1,000,000,000,000 because they know you need that exact piece of land for your railroad, and if the railroad is super long you might run into 10 of these maniacs.
Meanwhile the vast majority of your line might be worth less than any adjacent farmland, square foot by square foot, especially if it’s rocky or unstable etc.
Having a continuous line of land for many miles also has its own intrinsic value, much more than owning any particular segment (especially as it allows you to build a railroad hah).
Anyway, suffice to say, I don’t think “land value underneath railroads from the 18th century” is something that’s easily estimated.
As a percentage of GDP investments in the railroad buildout in the US was comparable or slightly higher than AI-related investments. But they are on the same order of magnitude, which says a lot about the scale of AI.
> AI infrastructure has risen by $400 billion since 2022. A notable chunk of this spending has been focused on information processing equipment, which spiked at a 39% annualized rate in the first half of 2025. Harvard economist Jason Furman commented that investment in information processing equipment & software is equivalent to only 4% of US GDP, but was responsible for 92% of GDP growth in the first half of 2025. If you exclude these categories, the US economy grew at only a 0.1% annual rate in the first half.
> Should we also ban car emissions/safety testing, because Volvo's competitors might discern something from the results? Should we also stop FCC certification because competitors might glean information out of a device's radio characteristics?
In the US neither of those are generally made public per se. They are made public when the thing actually passes testing or certification.
Naw - corps will just get engineers to fudge the emissions numbers, then they have someone low-level and easy to blame and remove from the organization... VW:
Money will have to be wasted on unnecessary flights to see stuff or meet people in-person instead of video, and the availability of actual information will become more and more limited as the sea of online information gets polluted with crap. It may never be possible to calculate the full extent of the damage in monetary value.
reply