I understand that part of the "magic" behind the M1 is how it has some cores tha...

heavyset_go · on Aug 28, 2021

ARM big.LITTLE[1] SoCs have been a thing for about a decade now, and most operating systems have schedulers that take advantage of each set of cores. macOS isn't doing anything special that Linux et al. aren't doing.

[1] https://en.wikipedia.org/wiki/ARM_big.LITTLE

kbenson · on Aug 28, 2021

> macOS isn't doing anything special that Linux et al. aren't doing.

MacOS isn't doing anything Linux and others aren't doing, or MacOS isn't doing anything those others can't do?

That is, do we actually know how well tuned MacOS is for these cores and their capabilities, or is that an assumption? I thought I had read there were some specific instructions in the chip that were either new to it or were more aggressively used by MacOS to get additional energy savings or performance gains.

hedgehog · on Aug 28, 2021

I don't know of anything really magical but for years Apple has been steadily pushing apps towards APIs that give the OS a lot of latitude to manage energy [1]. Grand Central Dispatch, AVFoundation, etc. Then on iOS BackgroundTasks etc (and iPhones have had little cores for quite a while now). I would imagine a lot of that experience transfers to macOS.

The centralized + draconian approach they take has a lot of problems but does help sweeping changes like this.

1. https://developer.apple.com/library/archive/documentation/Pe...

fomine3 · on Aug 29, 2021

macOS do some clever scheduling, but I don't know is it improves performance, Maybe mainly improves power efficiency at user idle.

https://news.ycombinator.com/item?id=27182244

imwillofficial · on Aug 28, 2021

> macOS isn't doing anything special that Linux et al. aren't doing.

And you base this comment on, what exactly?

heavyset_go · on Aug 28, 2021

Care to share what special things macOS is doing? Because according to Apple's documentation, it doesn't seem like they're doing anything special when it comes to heterogeneous multiprocessing and scheduling that Linux hasn't been doing for quite some time.

nicce · on Aug 29, 2021

At high level yes, but on much lower level that is another story. When you manufacture your own chips and code your own OS, there are no limits on microtuning. You can design them to work together, instead of when you usually need to make compromises.

These kind of tuning might never end up to Linux kernel being too chip specific. Apple has also moved a lot of driver code to another layer, and you don’t need that on kernel for example.

rowanG077 · on Aug 30, 2021

This keeps getting repeated with nothing to back it up. Can you give very concrete examples that apple can do that Linux isn't doing?

nicce · on Aug 30, 2021

The core reasoning is in the design, and in power to manufacture and update all parts (device, firmware, drivers, OS). You can design them to work flawlessly together in bigger picture. You can leave out properties from kernel to be done by OS apps. You can make hardware based submodules, such as DCP interface on M1 macs, main target of discussion on Asahi Linux (https://asahilinux.org/2021/08/progress-report-august-2021/). You can add own instruction sets for own purposes. Something which is hard to add for Linux kernel.

In theory, you might be able to the same with Linux kernel, but in practice driver development and other stuff is relying on reverse-engineering, black box testing or written specs without access to source code. How time consuming is that compared that to Apple? Is it more likely that main line code in Kernel is acceptable when it works, not when it is perfectly optimized and works? You can't rely that some OS app handles something, when Apple has full power for that.

Android and iOS are better example for this. I post few links which might give an idea.

https://www.quora.com/Why-are-iPhone-processors-more-efficie...

https://www.quora.com/Why-does-the-iPhone-need-so-much-less-...

rowanG077 · on Aug 31, 2021

> You can leave out properties from kernel to be done by OS apps.

Doable on Linux as well. If this is better for performance this most likely already would have been implemented. Besides this is not concrete. With concrete I mean things that are known to be implemented with M1 that for example Asahi won't be able to replicate.

> You can make hardware based submodules, such as DCP interface on M1 macs, main target of discussion on Asahi Linux (https://asahilinux.org/2021/08/progress-report-august-2021/).

This is just hardware. Even so this example is a non-starter since DCP will be supported by Linux.

> You can add own instruction sets for own purposes. Something which is hard to add for Linux kernel.

It's actually not hard. It's trivial if you add compiler support(which Apple most likely would for, you guessed it, LLVM). There are actually some custom instructions on M1 afaik, mostly used for being able to run X86 more efficiently.

> https://www.quora.com/Why-are-iPhone-processors-more-efficie...

Where the top 4 points being pure hardware. Point 5 being about specific design decisions done in Android which doesn't mean anything. Point 7 even says that the most likely performance increase would be custom co-processors which again is pure hardware. I'm not sure what this link is supposed to achieve but it arguments are opposed to Apple being better because of Software hardware magic.

> https://www.quora.com/Why-does-the-iPhone-need-so-much-less-...

This link again mentions the design decisions why Android is less responsive. The main culprit mentioned on why Android uses more RAM is that Android implemented by vendors have a load more bloat. This has nothing to do with magic hardware software combo. This is Android from vendors being trash.

ycomb2e · on Sept 4, 2021

Well, XMrig on Linux supports huge pages (~30% increase). XMRig on MacOS 11.2 without huge pages is 9% faster.

I think it's due to the scheduling boost of MacOS.

Macos: https://xmrig.com/benchmark/4ecQf Linux: https://xmrig.com/benchmark/47Dr5v

gjsman-1000 · on Aug 28, 2021

If that is the case though, I wouldn't be surprised if newer Linux and BSD releases gain additional support for per-core-type performance scheduling and optimizations therein.

It's not entirely new - Remember pretty much all ARM processors that aren't MCUs have big.LITTLE, but there is no doubt additional work to be done in the area.

mhh__ · on Aug 29, 2021

And Intel are going with it now also.

wmf · on Aug 28, 2021

how much of the sublime performance of M1 Macs comes from MacOS being fine-tuned to take advantage of these two different type of cores?

Basically none. The performance comes from the big cores. Linux/BSD can guarantee good performance in the short term by disabling the little cores.

ggm · on Aug 28, 2021

This answer seems optimistic. Unless you have a single execution state cpu bound, which has no parallelism, and no other tasks exist needing runtime, having more cores, even little ones, seems like a win.

Even just pedestrian clock processing for interrupts could exploit the other cores. Or keyboard and mouse processing, whatever. Playing an mp3 while you compile? That other core sure would stop context switching in the compiler...

FullyFunctional · on Aug 29, 2021

My Ubuntu VM on my Mac Mini gets outstanding performance which validates the point that macOS isn't essential for the performance. I'm sure however that macOS is very helpful in ensuring the power efficiency on laptops.

nextos · on Aug 29, 2021

Same here. For many years, a MacBook Air 11 was my daily driver. After some time, I wiped up Mac OS X and I ran a minimal Linux configuration: XMonad, Emacs, Firefox and XTerm.

With a few tweaks, mostly those suggested by powertop, my battery range was indistinguishable from Mac OS. Which is impressive, given that Safari is known to be very optimized towards low energy usage. I guess I compensated that with a simpler graphics stack that generated less CPU wakeups.

wwweston · on Aug 29, 2021

I’m with you up through XMonad, Emacs, and XTerm but... Firefox? Right now I’m currently struggling with an attempt to use a circa 2015 Dell XPS 13 as a Linux-based daily driver, and Firefox is nigh unusable with even only a few tabs, 4GB of RAM apparently doesn’t go far enough, swap degrades performance even with SSD but turning it off just means stuff dies. I’d love to find out I just set things up wrong but I’m shocked to discover I was getting better performance out of Windows.

generalizations · on Aug 31, 2021

I wonder what else you're running on that machine. I have an i5 X201 from 2010 with 2GB of ram (and an SSD), and I regularly push it with 50-ish Firefox tabs.

However, I'm using i3 instead of gnome, and void instead of debian et. al.

wwweston · on Aug 31, 2021

It's Lubuntu, so I think the desktop is LXQt; I'd assume it's not that.

About the only unusual thing I can think of is that I'm trying to use dropbox. It dies periodically, so maybe it's hungry, but even without it running, less than a dozen FF tabs can bog down the machine (and I gave up on Chromium entirely).

Would totally welcome any tips from people confident I can do better.

generalizations · on Aug 31, 2021

Honestly, I wouldn't discount the desktop or the OS. On a fresh reboot, htop shows my cpu usage across the four cores as 0, 0.7, 0, and 1.3%. That's not a lot of background activity.

I won't claim to have late-model-Ryzen performance; there is an SSD performance hit when the machine uses some of the 16gb of swap I gave it. The website data has to go somewhere. But I haven't found it to become unusable, except when I restore and load all my tabs simultaneously. After it's all downloaded though, pulling web pages out of swap is pretty fast.

Personally, I found the best things for performance were an SSD, i3+void, and a ton of swap space. Pretty much in that order.

Edit: I looked up the processors of the two machines. Ironically, all else being equal, that X201 is a full 20% faster than yours (2.2 vs 2.66).

setpatchaddress · on Aug 28, 2021

Try that and let us know how battery life works out.

vbezhenar · on Aug 28, 2021

macOS/iOS have API for marking jobs as background which will run on slow cores. And this API is used, AFAIK. I'm not sure if widely used Windows or Linux software routinely marks its threads for background jobs. I know that I never did that in my software.

ing3ng · on Aug 28, 2021

Well you could determine that from the priority of the process couldn't you?

heavyset_go · on Aug 28, 2021

On both macOS and Linux, process scheduling goes further than just niceness. On macOS in particular, it has a concept of process priorities[1] and I/O policies, and the OS itself defines special priorities and policies for background processes.

[1] https://www.manpagez.com/man/2/setpriority/

1vuio0pswjnm7 · on Aug 29, 2021

Wonder if the manual scheduler control utility will work on the M1.

https://man.netbsd.org/schedctl.8

webmobdev · on Aug 28, 2021

Apple system developers definitely deserve a lot of credit for optimising ios / macOS Big Sur for its ARM hardware platform. If we could run another OS on it, it would be evident that part of the performance boost of Apple's M1 ARM processor is definitely due to the optimised software it runs.

cpleppert · on Aug 29, 2021

I used an intel macbook pro with an older version of Mac Os that had lots of background processes and features disabled just to get the performance I wanted.

My M1 Air was noticeably faster even with spotlight indexing and a massive build inside a virtual machine out of the box.

webmobdev · on Aug 29, 2021

The system software (OS) is highly optimised for M1 and thus greatly adds to its performance. Note that Apple has been developing ios / iPad OS on ARM platform for many years now.