• 0 Posts
  • 19 Comments
Joined 1 year ago
cake
Cake day: June 15th, 2023

help-circle




  • It’s a proprietary config file. I think it’s a list of rules to forbid certain behaviours on the system. Presumably it’s downloaded by some userland service, but it has to be parsed by the kernel driver. I think the files get loaded ok but the driver crashes when iterating over an array of pointers. Possibly these are the rules and some have uninitialised pointers but this is speculation based on some kernel dumps on twitter. So the bug probably existed in the kernel driver for quite a while, but they pushed a (somehow) malformed config file that triggered the crash.


  • For this Channel File, yes. I don’t know what the failure rate is - this article mentions 40-70%, but there could well be a lot of variance between different companies’ machines.

    The driver has presumably had this bug for some time, but they’ve never had a channel file trigger it before. I can’t find any good information on how they deploy these channel files other than that they push several changes per day. One would hope these are always run by a diverse set of test machines to validate there’s no impact to functionality but only they know the procedure there. It might vary based on how urgent a mitigation is or how invasive it’ll be - though they could just be winging it. It’d be interesting to find out exactly how this all went down.




  • It’s not that clear cut a problem. There seems to be two elements; the kernel driver had a memory safety bug; and a definitions file was deployed incorrectly, triggering the bug. The kernel driver definitely deserves a lot of scrutiny and static analysis should have told them this bug existed. The live updates are a bit different since this is a real-time response system. If malware starts actively exploiting a software vulnerability, they can’t wait for distribution maintainers to package their mitigation - they have to be deployed ASAP. They certainly should roll-out definitions progressively and monitor for anything anomalous but it has to be quick or the malware could beat them to it.

    This is more a code safety issue than CI/CD strategy. The bug was in the driver all along, but it had never been triggered before so it passed the tests and got rolled out to everyone. Critical code like this ought to be written in memory safe languages like Rust.



  • This doesn’t really answer my question but Crowdstrike do explain a bit here: https://www.crowdstrike.com/blog/technical-details-on-todays-outage/

    These channel files are configuration for the driver and are pushed several times a day. It seems the driver can take a page fault if certain conditions are met. A mistake in a config file triggered this condition and put a lot of machines into a BSOD bootloop.

    I think it makes sense that this was a preexisting bug in the driver which was triggered by an erroneous config. What I still don’t know is if these channel updates have a staged deployment (presumably driver updates do), and what fraction of machines that got the bad update actually had a BSOD.

    Anyway, they should rewrite it in Rust.



  • wouldn’t changing it just end up performative

    Exactly. Sidereal time does get rid of time zones and leap years, but it’s still referenced to a single physical object and relies on a arbitrary choice of start point. So it doesn’t create some perfect cosmic time standard.

    The international date line doesn’t help since that’s just 180° offset from Greenwich itself.

    The point of standards is that they can be followed by everyone. The AD/BC epoch is fine. The Greenwich meridian is fine. UTC is fine. Changing them would cause so much disruption that it cannot be worth it.

    Daylight savings can go die in a ditch though.


  • doesn’t change

    Citation needed.

    Do you use leap seconds to stay in sync with earth’s rotation? When would they be applied? How would spacefarers be notified of these updates?

    Also, what meridian do you choose for this ‘universal’ time? Is it still Greenwich? Because that’s peak colonial baggage.


  • Is there any reason to keep the existing set-up? If it’s just one drive, you could replace it with another and install Alma or something fresh. Then you could copy over whatever config the old system had to get up and running again. You could swap to the old drive if you needed to revert. If you have a spare machine, you could stand up the fresh setup side-by-side with the old one before swapping over.



  • Microsoft PowerToys has a pseudo-tiling wm for Windows. There are loads of new options on Linux so while few people from the total population are using them, I think they’re growing.

    I’m sure you could get by without a terminal on modern desktop oriented distros. Windows has it’s own weirdness, like having to manually edit the registry. Just because there’s a GUI for that doesn’t make it a better user experience. A ton of issues are basically unfixable by users on Windows and Mac. I’m not decompiling their kernel to figure out why sleep is so flakey. Linux is much more reliable.