One of the most fascinating breach analyses I've ever read.
Reading between the lines, I sense the client didn't 100% trust Mr. Bogdanov in the beginning, and certainly knew there was exfiltration of some kind. Perhaps they had done a quick check of the same stats they guided the author toward. "Check for extra bits" seems like a great place to start if you don't know exactly what you're looking for.
Their front-end architecture seemed quite locked down and security-conscious: just a kernel + Go binary running as init, plain ol' NFS for config files, firewalls everywhere, bastion hosts for internal networks, etc. So already the client must have suspected the attack was of significant sophistication. Who was better equipped to do this than their brilliant annual security consultant?
Which is completely understandable to me, as this hack is already of such unbelievable sophistication that resembles a Neil Stephenson plot. Since the author did not actually commit the crime, and in fact is a brilliant security researcher, everything worked out.
> So already the client must have suspected the attack was of significant sophistication. Who was better equipped to do this than their brilliant annual security consultant?
If you suspected your security consultant, what would be the point of slipping them tiny hints about what you've found? If they're the source of the intrusion, they already know. If they're not the source of the intrusion, why fear them when you've already been compromised? Also, if you suspected the consultant, why hire them to do the security review?
I suspect the real reason is probably simpler: they have strong personal or financial incentives to "not have known" about the intrusion before the researcher discovered it.
I agree there's nothing to rule out your theory. Likely we will never know. But then why authorize sharing the story?
Specifically I don't think the owner thought it was likely, just a concern he couldn't shake. Probably he relaxed as soon as the consultant didn't make excuses, and tackled the job—extracting the binary from an unlinked inode is definitely not showing reluctance. Pure speculation, of course.
Hi, author of the blog post. This is correct - keeping PII protected has always been their concern, but recent breaches in thier's and other industries (including some they heard of and were not publicized) made them even more concerned.
I don't know if this is realistic in any way, but I've seen lots of Murder, She Wrote episodes where the criminal only gets caught because they become involved in the investigation some way and accidentally reveal knowledge that only the attacker could possibly know. This strategy necessitates hiding secret information so it can be revealed later by the attacker.
> This is hardly reducing the attack surface compared to a good distro with the usual userspace.
Run `tcpdump -n 'tcp and port 80'` on your frontend host and you'll still see PHP exploit attempts from 15 years ago. Not every ghost who knocks is an APT. A singleton Go binary running on a Linux kernel with no local storage is objectively a smaller attack surface than a service running in a container with /bin/sh, running on a vhost with a full OS, running on a physical host with thousands of sleeping VMs—the state of many, many websites and APIs today.
No, you have to understand what is really part of the attack surface and what the attacker wants.
For example, on a properly built system with a single application running with its own user the attacker might have no practical benefit at all in doing a privilege escalation to root.
> running on a physical host with thousands of sleeping VMs
This is a strawman. A shared hypervisor opens another attack surface and was not part of the discussion.
Look, my friend, we will have to disagree on this. What exploits will attack this setup from the front? A Linux networking or other syscall RCE, a Go compiler intrinsic RCE, a vulnerability in the app code, or a vulnerability in a third party library. All of which exist in the common OS-hosted scenario, in addition to everything else, plus you have both your container and your OS to worry about (e.g. openssl).
EDIT: Anyway, I'd like to thank Mr. Bogdanov and his client for sharing this story—it's just fascinating.
Sounds like a pretty nice way to get around having to constantly patch minor CVEs in base OS/distributions to maintain compliance - cut out the OS entirely.
No, it's not. You can deploy a very minimal Linux while also keeping the services that are actually good for security, like logging, IDS/IPS, certification compliance tooling, monitoring.
Unless you are running unnecessary daemons exposed on the Internet, 99% of the attack surface is from your application and the kernel itself.
Superb work. The "who" of attribution is more likely related to the actual PII they were after than any signature you'll get in the code. Seems like a lot of effort and risk of their malware being discovered for PII instead of being an injection point into those users machines. I rarely hear security people talk about why a system was targeted, and once you have that, you can know what to look for, inject canaries to test etc.
It should be pretty easy for someone to differentiate between the Chinese people and the Chinese government.
Meanwhile, can you prove that this "innate xenophobia" is present in every human to an extent that it's actually relevant, and that this particular instance of suggesting that the malware is Chinese in origin meaningfully exacerbates it?
Moreover, China is a geopolitical rival to the United States, India, and other countries that constitute a majority of HN readers. Information like this is interesting from that viewpoint.
Threat modelling to develop useful risk mitigation requires that system owners essentially do a means/motive/opportunity test on the valuable data they have. The motive piece includes nation states as actors, and that matters in terms of how much recourse you are going to have against an attacker.
However, I'd propose a new convention that any unattributed attacks and example threat scenarios of nation states should use Canada as the default threat actor, because nobody would believe it or be offended.
Lol no s/he likely wouldn’t but s/he’ll argue it’s different bec Trump didn’t make any negative statements about them so it’s impossible to be xenophobic against them.
To prove my point s/he had no problem with the top level comment 6 hrs ago “mossad gonna mossad”
Ah, the value is in saying "the thing you're saying reads like this to an uninformed person". If the interpretation is correct, it reinforces the communication style chosen. If the interpretation is incorrect and the writer is aiming at this audience it is evidence against.
For instance, sometimes I say something to a friend and they misunderstand what I intend. The feedback on the misunderstanding permits me to recalibrate my communication and it helps them receive the right information.
I am not claiming that it was "the Chinese". I'm claiming that saying "Chinese APT" reads to me like this.
I guess it depends on when we talk about it but it certainly matters if it is the janitor / secret hacker in the building or someone from somewhere that you have no legal recourse.
Conspiracy theory: the fact the POC insisted on the writer checking out the traffic suggests they knew about (or were suspicious of) the fact that PII was being leaked.
Probably, but is that a conspiracy theory so much as an insurance policy? Being able to competently complete that sort of nightmare investigation is probably why the investigator was re-hired annually.
A packet capture of the config files would show something was up to anyone suspicious, but knowing what to do about it is a completely different story.
The 'conspiracy' part of my conspiracy theory is not that they hired a security consultant, but that they explicitly guided him to the exact hardware[1] with the correct metric to detect it[2] asking him to test for a surprisingly accurate hypothetical[3], even going so far as to temporarily deny the suggestion of the person they're paying to do this work[4]. This is weirdly specific assuming they had no knowledge of the compromise.
Of course, I have no non-circumstantial evidence and this could all be a coincidence, which is why my comment is prefixed with "conspiracy theory".
1: "However, he asked me to first look at their cluster of reverse gateways / load balancers"
2: Would have likely been less likely to find the issue with active analysis given the self destruct feature
3: "Specifically he wanted to know if I could develop a methodology for testing if an attacker has gained access to the gateways and is trying to access PII"
4: "I couldn't SSH into the host (no SSH), so I figured we will have to add some kind of instrumentation to the GO app. Klaus still insisted I start by looking at the traffic before (red) and after the GW (green)"
Perhaps "the guy responsible for building the kernel" noticed his laptop was compromised. Then they'd know of a theoretical possibility of a compromise.
Not wanting to instrument the Go app could be an operational concern.
It sounded to me like they had a suspicion and specifically wanted the contractor to use his expertise in a limited way that would catch if the suspicion was right.
Perhaps they had noticed the programs restarting and when trying to debug triggered it.
#4 is a reasonable request. If the client wants to verify the lower level ops instead of higher level application and deployment, the instrumentation would be counterproductive. That could happen if he was thinking something on the lines of "there's a guy here that compiles his own kernel on a personal laptop, I wonder what impact this has".
The other ones could be explained by him being afraid of leaking PII, and most PII being on that system.
Yes. That would be the person with the org handling the organization's relationship with the contractor, setting up their access, answering questions, guiding, propagating results, etc.
I'm trying to find the lesson in here about how to prevent this kind of incident in the first place. The nearest I can find is: don't build any production binaries on your personal machine.
Reproducible builds can go a long way, along with a diverse set of build servers which are automatically compared. Whether you use your personal machine or a CI system there's still the risk of it being compromised (though your personal machine is probably at a little more risk of that since personal machines tend to have a lot more software running on them than CI systems or production machines).
I'm paranoid, and I'd have considered the efforts described here to be pretty secure. I'll say the only counter to this grade of threat is constant monitoring, by a varied crew of attentive, inventive, and interested people. Even then, there's probably going to be a lot of luck needed.
The kind of eyes that can spot the hinky pattern while watching that monitor are the vital ingredient, and thats not something i can quantify. Or even articulate well.
I think you may have misinterpreted that part of the post - my understanding is that the Linux laptop that was being used was compromised, and there was a 3 month gap when that developer switched to a Windows machine before that became compromised too. Specifically it would be fascinating to learn whether the Windows host was compromised or if it was only the Linux VM.
> The developer machine was compromised in a deeper level (rootkit?)
Unlikely that would not have taken 3 month.
> The developer installs a particular application in each Linux box
Possible, but also unlikely, as long as the vm wasn't used for other things this also wouldn't have taken 3 month.
> The developer installs a particular application in each Linux box
There probably is, but it probably has nothing to do with this exploit. For the same reasons as mentioned above.
My guess is that it was a targeted attack against that developer and there is a good chance the first attack and the second attack used different attack vectors hence the 3 month gap.
My guess would be persistence in other parts of their network used to get the credentials of that developer in some way. Perhaps some internal webapp; perhaps credential reuse with some other system; perhaps malware installed in some development tool or script that the developer would pull from some other company system and run on their machine. Perhaps even phishing, which is much more likely to succeed if you have compromised some actual coworkers' machine and can send the malware through whatever messaging system you use internally.
(Assuming that the system on itself is designed with security in mind.)
The reason is manifold but include:
- attacks against developer systems are often not or less considered in security planing
- many of the technique you can use to harden a server conflict with development workflows
- there are a lot of tools you likely run on dev systems which add a large (supply chain) attack surface (you can avoid this by allways running everything in a container, including you language server/core of your ides auto completion features).
Some examples:
- docker groub member having pseudo root access
- dev user has sudo rights so key logger can gain root access
- build scripts of more or less any build tool (e.g. npm, maven plugins, etc.)
- locking down code execution on writable hard drives not feasible (or bypassed by python,node,java,bash).
- various selinux options messing up dev or debug tools
- various kernel hardening flags preventing certain debugging tools/approaches
- preventing LD_PRELOAD braking applications and/or test suites
I think a big difference between build machines and dev machines, at least in principle, is that you can lock down the network access of the build machine, whereas developers are going to want to access arbitrary sites on the internet.
A build machine may need to download software dependencies, but ideally those would come from an internal mirror/cache of packages, which should be not just more secure but also quicker and more resilient to network failures.
Interestingly, this is water on mills we are currently thinking about. We're in the process of scaling up security and compliance procedures, so we have a lot of things on the table, like segregation of duties, privileged access workstations, build and approval processes.
Interestingly, the way with the least overall headaches is to fully de-privilege all systems humans have access to during regular, non-emergency situations. One of those principles would be that software compiled on a workstation automatically disqualifies from deployment, and no human should even be able to deploy something into a repository the infra can deploy from.
Maybe I should even push container-based builds further and put up a possible project to just destroy and rebuild CI workers every 24 hours. But that will make a lot of build engineers sad.
Do note that "least headaches" does not mean "easy".
This is why I always insist on branches being protected at the VCS server level so that no code can sneak in without other's approval - the idea is that even if your machine is compromised, the worst it can do is commit malicious code to a branch and open a PR where it'll get caught during code review, as opposed to sneakily (force?) pushing itself to master.
If you use cloud services that offer automated builds you can push the trust onto the provider by building things in a standard (docker/ami) image with scripts in the same repository as the code, cloned directly to the build environment.
If you roll your own build environment then automate the build process for it and recreate it from scratch fairly often. Reinstall the OS from a trusted image, only install the build tools, generate new ssh keys that only belong to the build environment each time, and if the build is automated enough just delete the ssh keys after it's running. Rebuild it again if you need access for some reason. Don't run anything but the builds on the build machines to reduce the attack surface, and make it as self contained as possible, e.g. pull from git, build, sign, upload to a repository. The repository should only have write access from the build server.
Verify signatures before installing/running binaries.
> If you use cloud services that offer automated builds you can push the trust onto the provider by building things in a standard (docker/ami) image with scripts in the same repository as the code, cloned directly to the build environment.
And I guess, for those super-critical builds, don't rely on anything but the distro repos or upstream downloads for tooling?
Because if you deploy your own build tools from your own infra, you are at risk to taint the chain of trust with binaries from your own tainted infra again. I'm aware of the trusting trust issue, but compromising the signed gcc copy in debians repositories would be much harder than some copy of a proprietary compiler in my own (possibly compromised) binary repository.
> And I guess, for those super-critical builds, don't rely on anything but the distro repos or upstream downloads for tooling?
You can build more tooling by building it in the trusted build environment using trusted tools. Not everything has to be a distro package, but the provenance of each binary needs to be verifiable. That can include building your own custom tools from a particular commit hash that you trust.
I believe the commenter meant that only the build server should be able to write to the build artifact repository, so ”write access from the build server” would be correct.
That can not be the right lesson, because there's no inherent reason "personal machine" is any less safe than "building cluster" or whatever you have around. Yes, on practice it often is less secure to a degree, so it's not a useless rule, but it's not a solution either.
If it's solved some way, it's by reproducible builds and automatic binary verification. People are doing a lot of work on the first, but I think we'll need both.
> there's no inherent reason "personal machine" is any less safe than "building cluster" or whatever you have around
Sure there is! I browse internet a lot on my dev machines, and this exposes me to bugs in browsers and document viewers. And if I do get compromised, my desktop is so complex and runs so many services the compromise is unlikely to be detected. So all attacker needs is one zero day, once.
Compare this to a CI with infra-as-a-code, like Github Actions. If the build process gets compromised, it only matters until the next re-build. Even if you get a supply chain attack once (for example), if this is discovered all your footholds disappear! And even if you got the developers' keys, it is not easy to persist -- you have to make commits and those can be noticed and undone.
(Of course if your "building cluster" is a bunch of traditional machines which are never reformatted and which many developers have root access to, then they are not that much more secure. But you don't have to do it that way.)
You rebuild your build cluster with what image? Where do the binaries there come from? And what machine rebuilds the machines.
Securing the machines themselves is a process of adding up always decreasing marginal gains until you say "enough", but the asymptote is never towards a fully secure cluster. That ceiling on how secure you can get is clearly suboptimal.
Besides, the ops people's personal machines have a bunch of high access permissions that can permanently destroy any security you can invent. That isn't any less true if your ops people work for Microsoft instead of you.
I mentioned "github actions" for a reason. You give up lots of control when you use them. In exchange, you get "crowd immunity" -- the hope that if there is a vulnerability, it will affect so many people that (1) you are not going to be the easiest target and (2) someone, somewhere will notice it.
Your build actions happen all in the docker images/ephemeral VMs. You use images directly distributed by the corresponding project, for example you may start directly from Canonical's Ubuntu image. The "runners" are provided by Github, and managed by Microsoft's security team as well. The only thing that you actually control is a 50-line YAML file in your git repo, and people will look at it any time they want to add a new feature.
Yes, the if someone hacks Microsoft's ops people, they can totally mess up my day. But would they? Every usage of zero-day carries some risk, so if attackers do get access to those systems, they'll much likely to go for some sort of high-value, easy-money target like cryptocurrency exchanges. Plus, I am pretty sure that Microsoft actually has solid security practices, like automatic deployments, 2FA everywhere, logging, auditing, etc... They are not going to have a file on CI/CD machine that is different from one in Git, like OP's system did!
The APTs do not have magical powers, they buy from the same exploit market everyone has.
Let’s say my organization (which is not very well known) has an exploitable bug. What are the chances that someone will discover it? Pretty close to none, the hole can be there for many years waiting for APT to come and exploit it.
Now imagine Github runner or default Ubuntu image has an exploitable bug. What are the chances it will last long? Not very high. In a few months, someone will discover and either report or exploit it. Then it will be fixed and no longer helpful for APT threat actors.
Remember, the situation described in the post only occurred because they used binary images that only a few people could look at. Generating binary kernel on someone’s laptop is easy to subvert in undetectable way, but how do you subvert a Dockerfile stored in Git repo without it being obvious?
Use a PaaS like Heroku or Google App Engine, with builds deployed from CI. All the infrastructure-level attack surface is defended by professionals who at least have a fighting chance.
I feel reasonably competent at defending my code from attackers. The stuff that runs underneath it, no way.
If you double your defences, you double the cost but advanced attackers still get what they want. "Ratchet up defences" does't mean simply doing things a bit more correctly, it requires you to hire many expensive people to do lots of things that you didn't do before. This article is a good example - the company as described seems to have a very good (and expensive) level of security already, the vast majority of companies are much less secure, and it still wasn't sufficient.
And if you increase your defences so much that you're actually somewhat protected from an advanced attacker, you're very, very far on the security vs usability tradeoff, to get there is an organization-wide effort that (unlike simple security basics/best practices) makes doing things more difficult and slows down your business. You do it only if you really have to, which is not the case for most organizations - as we can see from major breaches e.g. SolarWinds, the actual consequences of getting your systems owned are not that large, companies get some bad PR and some costs, but it seems that prevention would cost more and still would be likely to fail against a sufficiently determined attacker.
Build everything on a secured CI/CD system, keep things patched, monitor traffic egress especially with PII, manual review of code changes, especially for sensitive things
This is truly the stuff of nightmares, and I'm definitely going to review our CI/CD infrastructure with this in mind. I'm eagerly awaiting learning what the initial attack vector was.
If people didn't allow macros in Excel, stayed in read-only mode in Word and only opened sandboxed PDFs (convert to images in sandbox, OCR result, stitch back together), we would see a sharp decline in successful breaches. But that would be boring.
So the attacker has to have exploits in every pdf reader app on linux? Since it is not Adobe only and there are quite a few. Or maybe a common backend engine (mupdf and popler)...
Yeah, I suspect that a rather lot of the options use the same libraries; https://en.wikipedia.org/wiki/Poppler_(software) claims that poppler is used by Evince, LibreOffice 4.x, and Okular (among others).
This has advantages and disadvantages; yes, if there is a security hole in it, it likely affects everything that uses it. But it also means it gets use-case tested more thoroughly, at a minimum. Ideally, all "stakeholders" would have a vested interest in doing reviews of their own, or perhaps pooling money to have the code scrutinized.
An attacker doesn’t need every attack to work every time. One breach is usually enough to get into your system, so long as they can get access to the right machine.
I heard a story from years ago that security researchers tried leaving USB thumb drives in various bank branches to see what would happen. They put autorun scripts on the drives so they would phone home when plugged in. Some 60% of them were plugged in (mostly into bank computers).
The attacker obviously does not need to have exploits in every pdf reader app on linux, it needs to have an exploit in a single pdf reader app out of all those which someone in your organization is using. If 99% of your employees are secure but 1% are not, you're vulnerable. Perhaps there's a receptionist in your Elbonian[1] branch on an outdated unpatched computer, and that's as good entry point in your network as any other, with possibilities for lateral movement to their boss or IT support persons' account and onwards from there. In this particular case, a developer's Linux machine was the point of persistence where the malware got inserted into their server builds, however, most likely that machine wasn't the first point of entrance in their syetems.
Remember how Adobe removed Flash support from Acrobat a couple of years back? Attacks like this are why. Well, and Flash had other issues, too.
I'm not sure when you started using PDFs (I remember mid-90s when my Dad told me about this cool new document format that would standardize formats across platforms, screen and paper!), but hardly anything is static any more.
The nexus of unsafe programming languages and exploit markets, where for the right price you can purchase undisclosed bugs basically ready to use. Modern offensive security is essentially a bit like shopping in Ikea
This is the kind of content I come to HN for! I don't get to do a lot of low level stuff these days, and my forensics skills are almost non-existent, so it's really nice to see the process laid out. Heck, just learning of binwalk and scapy (which I'd heard of, but never looked into) was nice.
Consider the possibility that its fiction. Would you be upset? I wouldn't, perhaps a bit disappointed not to learn more. This certainly fits into "worthy of itself".
Please change the posting title to match the article title and disambiguate between APT (Advanced Persistent Threats, the article subject) and Apt (the package manager).
Thanks, I don’t work in security but I use APT a lot.
I thought it was a unfunny joke? Like ... APT provide some of those packages?
Ok. That make more sense.
The author did a good job at making that readable. Is it often like that?
You're right... what an annoying namespace collision. On the other hand, stylizing software as Initial Caps is much more acceptable than stylizing non-software acronyms that way, so it would still be less misleading to change the capitalization.
Poster here.
Do you think I need to edit the title?
This title was funny to me, but probably just because I am a security guy and I know what is an APT.
Who is such a hot target and can take such an independent attitude, even to allowing this to be published? If this had been a bank, they'd have had to report to regulators and likely we'd have heard none of these details for years if ever. Same for most anything else big enough to be a target i can think of offhand.
Idk. while banks have to report on this they are (as far as I know) still free to publicize details.
We normally don't hear about this things not because they can't speak about it but because they don't want to speak about it (bad press).
My guess is that it's a company which takes security relatively serious, but isn't necessary very big.
> hot target [..] else big enough to be a target
I don't thing you need to be that big to be a valid target for a attack of this kind, neither do I think this attack is on a level where "only the most experienced/best hackers" could have pulled it of.
I mean we don't know how the dev laptop was infected but given that it took them 3 month to reinfect it I would say it most likely wasn't a state actor or similar.
I think you're right that it's medical. The author calls out PII was the target. Sure, there's PII in Defense/Fintech/Government, but it's probably not the target in those sectors and PII doesn't have the same spotlight on it as in the Medical world (e.g. HIPPA & GDPR).
Are you saying that, for example, the addresses of military generals and spies are less of a target for hackers than the addresses of medical patients? While there are laws to protect medical information, I think all governments care more about protecting national security information.
Ah, good point! No, I was not saying that at all, and thank you for pointing that out.
When I was thinking of "defense", I was thinking of the defense contractors who are designing/building things like the next-gen weapons, radar, vehicles, and the like. In that context, when it comes to what they can exfiltrate, I think attackers probably prioritize the details & designs over PII. Just a guess though.
Not just vaccines, but basically all your data, including billing and disease history. Perfect for both scamming and extortion.
Keep in mind that you actually want your medical provider to have that data, so they can treat you with respect to your medical history, without killing you in the process.
True. However, reading between the lines, the exfiltration "project" was targeted (i.e. one-off), skilled and long. I would put the cost anywhere between 1 megabuck and 10 megabucks. Given risks and dubious monetization, I would assume the "sponsor" demands at least a 10x ROI.
How about psychiatric data from the area around Washington DC? Hospitals/practices that are frequented by New York CEO-types? I can picture that being quite valuable to the right parties.
One thing I didn't get is this magical PII thing. How does the author look at a random network packet -- nay, just packet headers -- and assign a PII:true/false label? I think many corporations would sacrifice the right hand of a sysadmin if that was the way to get this tech.
The article just says:
> I wrote a small python program to scan the port 80 traffic capture and create a mapping from each four-tuple TLS connection to a boolean - True for connection with PII and False for all others.
Is it just matching against a list of source IPs? And perhaps the source port, to determine whether it comes from e.g. a network drive (NFS in this case)? Not sure what he uses the full four-tuple for, if this is the answer in the first place. It's very hand-wavy for what is an integral part of finding the intrusion and kind of a holy grail in other situations as well.
Amazon and Microsoft also have their own offerings, but can be quite expensive for network packets (and pretty slow).
Most projects / teams will use some basic regular expressions to capture basics like SSN, credit card numbers or phone numbers. They’re typically just strings of a specific length. More difficult if you’re doing addresses, names, etc.
That's great for you but... how's that relevant to the article? The author never speaks of using this sort of thing.
I saw these regex matchers in school but don't understand them. They go off all day long because one in a dozen numbers match a valid credit card number, even in the lab environment the default setup was clearly unusable. But perhaps more my point: who'd ever upload the stolen data plaintext anyhow? Unencrypted connections have not been the default for stolen data since... the 80s? If your developers are allowed to do rsync/scp/ftps/sftp/https-post/https-git-smart-protocol then so can I, and if they can't do any of the above then they can't do their work. Adding a mitm proxy is, aside from a SPOF waiting to happen, also very easily circumvented. You'd have to reject anything that looks high in entropy (so much for git clone and sending PDFs) and adding a few null bytes to avoid that trigger is also peanuts.
These appliances are snakeoil as far as I've seen. But then I very rarely see our customers use this sort of stuff, and when I do it's usually trivial to circumvent (as I invariably have to to do my work).
Now the repository you linked doesn't use regexes, it uses "a cutting edge pre-trained deep learning model, used to efficiently identify sensitive data". Cool. But I don't see any stats from real world traffic, and I also don't see anyone adding custom python code onto their mitm box to match this against gigabits of traffic. Is this a product that is relevant here, or more of a tech demo that works on example files and could theoretically be adapted? Either way, since it's irrelevant to what the author did, I'm not even sure if this is just spam.
> One thing I didn't get is this magical PII thing. How does the author look at a random network packet -- nay, just packet headers -- and assign a PII:true/false label? I think many corporations would sacrifice the right hand of a sysadmin if that was the way to get this tech.
Checkout Amazon macie or Microsoft presidio or try actually using the library I linked?
It’s usually used in a constrained way, in no way perfect. But it helps investigators track suspected cases of data exfiltration. You can pull something that looks suspect (say a credit card) and compare against an internal dataset and see if it’s legit.
In the repo I linked you can see the citation for an earlier model on synthetic and real world datasets:
My guess was that traffic containing PII was flagged in some way such that it was visible in the pre-GW traffic the researcher had access to. That was the point of linking up the pre-gateway and post-gateway packets. I'm not sure how common such setups are.
What's even more incredible to me is that the researcher somehow recreated exactly the same / correct traffic pattern on their local testing setup, so that they were able to compare the traffic with the production environment to detect that there was a problem. How would you do this?
I'm not even sure what the "time" variable is on the graphs. Response time? (It also seems weird that there's any PII on port 80, but that's an unrelated issue.)
> What's even more incredible to me is that the researcher somehow recreated exactly the same / correct traffic pattern on their local testing setup, so that they were able to compare the traffic with the production environment to detect that there was a problem.
Yeah, that's another thing that has me confused, but I figured one thing at a time...
Thanks for the response, that pre-set PII flag does sound plausible, though it's odd that they'd never mention it and mention a 'four-tuple' instead (sounds like they're trying to use terms not everyone knows? Idk, maybe it's more well-known than it seems to me).
Yes, that was the part where I got lost. It seems he skipped some details about that so it's not clear from the article how that was done. I can't imagine capturing the encrypted data got him that.
This observation os way too casual imo:
"We noticed a 3 month gap about 5 month ago, and it corresponded with the guy moving the kernel build from a Linux laptop to a new Windows laptop with a VirtualBox VM in it for compiling the kernel. It looks as if it took the attackers three months to gain access back into the box and into the VM build."
If the attackers have access to brute force OS engineers / sysadmins work pc's then that should probably be the headline. The rest is just about not being found
Maybe if you are a business oriented person. But reading through the analysis, I felt like the researcher seriously enjoyed the hunt and the "not being found" part.
> On March 21, 2021, CNA determined that it sustained a
sophisticated cybersecurity attack. The attack caused a network
disruption and impacted certain CNA systems, including corporate
email. Upon learning of the incident, we immediately engaged a
team of third-party forensic experts to investigate and
determine the full scope of this incident, which is ongoing.
Reading between the lines, I sense the client didn't 100% trust Mr. Bogdanov in the beginning, and certainly knew there was exfiltration of some kind. Perhaps they had done a quick check of the same stats they guided the author toward. "Check for extra bits" seems like a great place to start if you don't know exactly what you're looking for.
Their front-end architecture seemed quite locked down and security-conscious: just a kernel + Go binary running as init, plain ol' NFS for config files, firewalls everywhere, bastion hosts for internal networks, etc. So already the client must have suspected the attack was of significant sophistication. Who was better equipped to do this than their brilliant annual security consultant?
Which is completely understandable to me, as this hack is already of such unbelievable sophistication that resembles a Neil Stephenson plot. Since the author did not actually commit the crime, and in fact is a brilliant security researcher, everything worked out.