Linux kernel bugs seem to have the greatest required level of expertise to debug. I've been programming for a few years, and kernel debugging is definitely not in my toolkit, and I know very few people who would know how. Currently, I'm facing a wifi kernel bug that keeps spitting out call traces to the system log, that I have absolutely no idea what to do with.
Wifi is even worse than i.e. the bug in the article because often it's some registers that are off or some firmware bug or even chip bug - so you are basically helpless without intimate knowledge about the chip in question (only with NDA if at all). Much respect to anyone hacking on wifi (or anything else) in the kernel!
And if you're lucky, then the wifi chip will helpfully corrupt your RAM with its terrible DMA mismanagement in a way that allows you to notice that the memory you were expecting to contain your own values instead contains the ESSID of the hospital across the road from you: https://mjg59.dreamwidth.org/11235.html
I remember back in 2005 or so Intel was dicking around with ipw3945 shipping a userspace regulatory blob. Linux kernel devs didn't like this, I guess wouldn't merge the driver. Intel rewrote the driver into iwlwifi (iwl3945) which just loaded a microcode out of the firmware directory.
However, the NDA compartmentalization issue would cause bugs. For instance, I guess the wifi microcode devs implemented hardware queuing but the intel kernel driver devs at the time didn't know this, so the driver devs just wrote their own queues in software and the users dealt with weird errors for some time because they literally couldn't ask the division that wrote the microcode how the hell it worked. I'm amazed anything with wifi works. For a time around then I just threw up my hands and used OpenBSD's reverse-engineered driver because it was more reliable and simple to understand.
I once tried to chase a bug in a wifi driver on NetBSD and soon gave up as I was just completely lost. I also remember buying a book on driver development, hoping there would be some decent info on the 802.11 stack and was mildy amused to find a breif mention at the end of a chapter describing ethernet internet saying "sorry wifi is too complicated to cover".
Since that and about half a dozen other wifi kernel issues? I've been kinda fascinated by wifi and wifi stacks in the kerbel, but it appears to be some black magic power not available to the masses
For what it's worth, a counterpoint from an ex-kernel maintainer who now does web stuff and distributed systems: kernel programming was usually easier for me than your standard full-stack web developer stuff. The API surface of a web stack is so large -- juggling things like a JS framework, CSS, nuances of HTTP sessions, web security, databases, load balancers, etc.
Most single-machine kernel bugs are going to be easier to figure out, once you learn some kernel APIs, of which there aren't so many.
> Linux kernel bugs seem to have the greatest required level of expertise to debug.
Some of it yeah. Honestly, I spent a lot of time looking at it and figured out how to fix/tweak minor crap.
Main thing I've been working on and off over the years on is CPU scheduler issues. Back when I was a high schooler my music kept skipping whenever my desktop box came under any substantial load. Turns out desktop workloads way different than what sorts of workloads most of the devs are paid to work on. The situation's improved but honestly still sucks with the stock scheduler.
>Currently, I'm facing a wifi kernel bug that keeps spitting out call traces to the system log, that I have absolutely no idea what to do with.
iwlwifi? if you're on 5.6.0 it was shipped broken. I think the fix landed in 5.6.1, so anything newer should work.
Traces are nice. At least you have some pointer where to start with.
1) You can google for the functions in the trace (to see if someone had a similar issue and it's solved already). You can add site:lkml.org to narrow it down to the linux kernel mailing list.
2) You can go to http://lxr.free-electrons.com/ and search for the functions in the trace and look around the code to see what the issue might be. The trace will have some more info about the kind of the issue (WARN, NULL pointer dereference, etc.)