Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> the people with physical access is separate from the people with knowledge of [...]

Welcome to the brave new world of troubleshooting. This will seriously bite us one day.



I like how FB decided to send "ramenporn" as their spokesperson.


A particular facet I love of the internet era is journalists reporting serious events while having to use the completely absurd usernames...

"A Facebook engineer in the response team, ramenporn..."


I remember some huge DDOS attacks like a decade ago, and people were speculating who could be behind it. The three top theories were Russian intelligence, the Mossad, and this guy on 4chan who claimed to have a Botnet doing it.

That was the start of living in the future for me.


4chan is disturbingly resourceful at times. I have heard them described as weaponized autism.


Ya, on hn it's merely productized.


That's a pretty accurate description of the site, lol.

On a side-note, I think you'll enjoy some of the videos by the YouTube 'Internet Historian' on 4chan:

* https://www.youtube.com/watch?v=SvjwXhCNZcU

* https://www.youtube.com/watch?v=HiTqIyx6tBU


My favorite example of this is when I saw references to "Goatse Security" on the front page of the Wall Street Journal


This felt like something straight out of a post modern novel during the whole WSB press rodeo, where some user names being used on TV were somewhere between absurd to repulsive.

Loved it.


I believe that's the exact reason behind the pattern of horrifying usernames on reddit and imgur. It's magnificent in its surrealness.


Exactly, I'm having deja vues from Vernor Vinge's Rainbow's End constantly lately.


>journalists reporting serious events

A facet I don't love is journalism devolving to reposting unverified, anonymous reddit posts.


"Discussed in Hacker News, the user that goes by the 'huevosabio' handle, stated as a fact that..."


‘He was then subsequently attacked by “OverTheCounterIvermectin” for his tweets on transgender bathrooms from several months ago’.


The problem with tweets on transgender bathrooms is that you can be attacked for them by either side at any point in the future, so the user OverTheCounterIvermectin should have known better.


I got quoted as noir_lord in the press.

My bbs handle from 30 years ago.


Immortality.


I'm worried about that person. I doubt Facebook will look kindly on breaking incident news being shared on reddit.


Apparently Facebook HQ didn't like how ramenporn handled the situation. His account has been deleted, as well as all his messages about the incident.


his account is active, only the incident comments were deleted


> [Reddit logo] u/ramenporn: deleted

> This user has deleted their account.


At least that department at Facebook is still working!


There never was a ramenporn.


That Ramenporn got engagement by Hate Speech


They work at facebook. Can’t imagine they have any illusions regarding their privacy/anonymity.


Curious what the internal "privacy" limitations are. Certainly FB must track reddit users : fb account even if they don't actually display it. It just makes sense.


Thanks to the GDPR at least that's easy to verify for European users.


That said, it will be interesting to read their post-mortem next year and compare it with what ramenporn wrote.


lol no one cares. we're all laughing about this too (all of us except the networks people at least...)


I hope you won't have to delete your account too :)


Well, seems like FB shutdowned his post...


This is why so many teams fight back against the audit findings:

"The information systems office did not enforce logical access to the system in accordance with role-based access policies."

Invariably, you want your best people to have full access to all systems.


Well, you want the right people to have access. If you're a small shop or act like one, that's your "top" techs.

If you're a mature larger company, that's the team leads in your networking area on the team that deal with that service area (BGP routing, or routers in general).

Most likely Facebook et. al. management never understood this could happen because it's "never been a problem before".


I can't fathom how they didn't plan for this. In any business of size, you have to change configuration remotely on a regular basis, and can easily lock yourself out on a regular basis. Every single system has a local user with a random password that we can hand out for just this kind of circumstance...


Organizational complexity grows super-linearly; in general, the number of people a company can hire per unit time is either constant or grows linearly.

Google once had a very quiet big emergency that was, ironically(1), initiated by one of their internal disaster-recovery tests. There's a giant high-security database containing the 'keys to the kingdom', as it were... Passwords, salts, etc. that cannot be represented as one-time pads and therefore are potentially dangerous magic numbers for folks to know. During disaster recovery once, they attempted to confirm that if the system had an outage, it would self-recover.

It did not.

This tripped a very quiet panic at Google because while the company would tick along fine for awhile without access to the master password database, systems would, one by one, fail out if people couldn't get to the passwords that had to be occasionally hand-entered to keep them running. So a cross-continent panic ensued because restarting the database required access to two keycards for NORAD-style simultaneous activation. One was in an executive's wallet who was on vacation, and they had to be flown back to the datacenter to plug it in. The other one was stored in a safe built into the floor of a datacenter, and the combination to that safe was... In the password database. They hired a local safecracker to drill it open, fetched the keycard, double-keyed the initialization machines to reboot the database, and the outside world was none the wiser.

(1) I say "ironically," but the actual point of their self-testing is to cause these kinds of disruptions before chance does. They aren't generally supposed to cause user-facing disruption; sometimes they do. Management frowns on disruption in general, but when it's due to disaster recovery testing, they attach to that frown the grain of salt that "Because this failure-mode existed, it would have occurred eventually if it didn't occur today."


That's not quite how it happened. ;)

<shameless plug> We used this story as the opening of "Building Secure and Reliable Systems" (chapter 1). You can check it out for free at https://sre.google/static/pdf/building_secure_and_reliable_s... (size warning: 9 MB). </shameless plug>


Thanks for telling this story as it was more amusing than my experiences of being locked in a security corridor with a demagnetised access card, looooong ago.


what if the executive had been pick-pocketed


EDIT: I had mis-remembered this part of the story. ;) What was stored in the executive's brain was the combination to a second floor safe in another datacenter that held one of the two necessary activation cards. Whether they were able to pass it to the datacenter over a secure / semi-secure line or flew back to hand-deliver the combination I do not remember.

If you mean "Would the pick-pocket have access to valuable Google data," I think the answer is "No, they still don't have the key in the safe on the other continent."

If you mean "Would the pick-pocket have created a critical outage at Google that would have required intense amounts of labor to recover from," I don't know because I don't know how many layers of redundancy their recovery protocols had for that outage. It's possible Google came within a hair's breadth of "Thaw out the password database from offline storage, rebuild what can be rebuilt by hand, and inform a smaller subset of the company that some passwords are now just gone and they'll have to recover on their own" territory.


> I can't fathom how they didn't plan for this

Maybe because they were planning for a million other possible things to go wrong, likely with higher probability than this. And busy with each day's pressing matters.


Anyone who has actually worked in the field can tell you that a deploy or config change going wrong, at some point, and wiping out your remote access / ability to deploy over it is incredibly, crazy likely.


That someone will win the lottery is also incredibly likely. That a given person will win the lottery is, on the other hand, vanishingly unlikely. That a given config change will go wrong in a given way is ... eh, you see where I'm going with this


Right, which is why you just roll in protection for all manner of config changes by taking pains to ensure there are always whitelists, local users, etc. with secure(ly stored) credentials available for use if something goes wrong; rather than assuming your config changes will be perfect.


I'm not sure it's possible to speculate in a way which is generic over all possible infrastructures. You'll also hit the inevitable tradeoff of security (which tends towards minimal privilege, aka single points of failure) vs reliability (which favours 'escape hatches' such as you mentioned, which tend to be very dangerous from a security standpoint).


Absolutely, and I'd even call it a rite of passage to lock yourself out in some way, having worked in a couple of DCs for three years. Low-level tooling like iLO/iDRAC can sure help out with those, but is often ignored or too heavily abstracted away.


A config change gone bad?

That’s like failure scenarios 101. That should be the second on the list, after “code change gone bad”.


Exactly! Obviously they have extremely robust testing and error catching on things like code deploys: how many times do you think they deploy new code a day? And at least personally, their error rate is somewhere below 1%.

Clearly something about their networking infrastructure is not as robust.


Right? Especially on global scale. Something doesn't add up!


Curious/unfortunate timing. The day after a whistleblower docu and with a long list of other legal challenges and issues incoming.


Haha sure. They were too busy implementing php compilers to figure out that "whole DR DNS thing"

rotflmao. I'd remove Facebook from my resume.


Most likely they did plan for this. Then, something happened that the failsafe couldn't handle. E.g. if something overwrites /etc/passwd, having a local user won't help. I'm not saying that specific thing happened here -- it's actually vanishingly unlikely -- but your plan can't cover every contingency.


Agreed, it’s also worth mentioning that at the end of every cloud is real physical hardware, and that is decidedly less flexible than cloud, if you locked yourself out of a physical switch or router you have many fewer options.


In risk management cultures where consequences from failures are much, much higher, the saying goes that “failsafe systems fail by failing to be failsafe”. Explicit accounting for scenarios where the failsafe fails is a requirement. Great truths of the 1960s to be relearned, I guess.


Another Monday morning at a boring datacenter job, i bet they weren't even there yet at 830 when the phones started ringing.


You mean the VOIP phones that could no longer receive incoming calls?


Assuming anyone can actually look up the phone numbers to call.


There should be 24/7 on-site rotations. I wonder if physical presence was cut on account of COVID?


phones? how lame.


It certainly wasn't the Messenger.


Phones - the old, analogue, direct cable ones - were self-sustaining, and kept running even when there was a power cut in the house.


yes, indeed. Reliability. That's so 20th century. #lame.

(Actually not lame at all in my eyes)


This sounds like something that might have been done with security in mind. Although generally speaking, remote hands don't have to be elite hackors.


Have you ever tried to remotely troubleshoot THROUGH another person?!


My company runs copies of all our internal services in air-gapped data centers for special customers. The operators are just people with security clearance who have some technical skills. They have no special knowledge of our service inner workings. We (the dev team) aren’t allowed to see screenshots or get any data back. So yeah, I have done that sort of troubleshooting many times. It’s very reminiscent of helping your grandma set up her printer over the phone.


And this is why we should build our critical systems in a way that can be debugged on the phone... With your grandma.


We try to write our ops manuals in a way that my grandma could follow but we don’t always succeed. :)


For all the hours I spent on the phone spelling grep, ls, cd, pwd, raging that we didn't keep nano instead of fucking vim (and I'm a vim person)... I could have stayed young and been solving real customer problems, not imperium-typing on a fucking keyboard with a 5s delay 'cause colleague is lost in the middle of nowhere and can't remember what file he just deleted and the system doesn't start anymore your software is fragile, just shite.


Yes. Depending on the person, it can either go extremely well or extremely poorly. Getting someone else to point a camera at the screen helps.


Yes, and it works if both parties are able to communicate using precise language. The onus is on the remote SME to exactly articulate steps, and on the local hands to exactly follow instructions and pause for clarifications when necessary.


Yeah. Do what you have to.

Sometimes the DR plan isn't so much I have to have a working key, I just have to know who gets their first with a working key, and break glass might be literal.


Not OP, but many times. Really makes you think hard about log messages after an upset customer has to read them line by line over the phone.

One was particularly painful, as it was a "funny" log message I had added the code when something went wrong. Lesson learned was to never add funny / stupid / goofy fail messages in the logs. You will regret it sooner or later.


folks with physical access are also denied. source - https://twitter.com/YourAnonOne/status/1445100431181598723


FWIW that's not the original source, just some twitter account reposting info shared by someone else. See this sub-thread: https://news.ycombinator.com/item?id=28750888


IT: "Please do this fix."

Person 1: "I can't, I don't have physical access."

IT: "Please do this fix."

Person 2: "I can't, I don't have digital access."

Why? It's [IT's?] policy.


Let me guess, it is tied to FB systems which are down. That would be hilarious.


this is not new, this is everyday life with helping hands, on duty engineers, l2-l3 levels telling people with physical access which commands to run etc. etc. etc.


Then you have security issues like this where someone impersonates a client with helping hands and drains your exchanges hot wallet:

https://www.huffpost.com/archive/ca/entry/canadian-bitcoins-...


The places I've seen this at had specific verification codes for this. One had a simple static code per person that the hands-on guys looked up in a physical binder on their desk. Very disaster proof.

The other ones had a system on the internal network in which they looked you up, called back on your company phone and asked for a passphrase the system showed them. Probably more secure but requires those systems to be working.


This is not a real datacenter case but normal social hacking. On the datacenter side you have many more security checks plus many of the times the helping hands and engineers are part of the same company, using internal communication tools etc. so they are on the same logical footprint anyhow


Telecommunication satellite communication issues might seriously shut down whole regions if they occur.


I don't think so. I bet nobody is ever going to make that mistake at FB again after today.


I think it's the same with supply chains.


It just bit FB.


like today! xD




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: