Jump to content

Apple M1 Ultra - 2nd highest multicore score, lost to 64-core AMD Threadripper.

TheReal1980
22 hours ago, FakeKGB said:

Fair, but I doubt they know what AMD CPUs are.

they do know, macOS even accedently at one point shipped with AMD APU drivers and AMD cpu ids, apple clearly internally had been testing out first and second gen Zen chips but proabolty noted it was not worth the effort given they were moving to their own arc soon after (properly supporting AMD chips to the extend macOS supported intel would be a lot of work, custom work in the secular and system libs like accurate would need to be hand tuned).  Worth remembering that the silicon industry is very fluid with staff, most expiranced cpu engines in the industry have worked at Intel, AMD, Apple during there life this also applies to the chips teams within apple that have all worked at other silicon vendors in the past. 

 

Link to comment
Share on other sites

Link to post
Share on other sites

On 3/19/2022 at 6:54 AM, LAwLz said:

Not sure what exactly you are referring to, but Arm has native instructions for easily covering between the more loose memory model of regular Arm, and the more strict memory model of x86. 

 

Arm, not Apple, have created several instructions specifically for this purpose. Anyone making Arm cores can implement it if they want. It's not Apple specific. 

ARMv8 does not indue TSO memory modes (these are not different instruction these are differs modes the cpu is set into that effects all memory operations while in that mode), sure if you make your own arm cores you can add any instruction/modes you want but this is not part of the required ARMv8 spec (and is not present on any other ARMv8 cpu).  Yes the ARM spec does not forbid you from adding this mode but the ARM spec does not forbid you from adding any mode you like.

Given that developer toggle this mode themselves (it is limited to the kernel) the runtime env that runs within the mode does not need to comply to any of the ARM spec since the only code that runs within this is the translated rosseta2 executables. 

Link to comment
Share on other sites

Link to post
Share on other sites

On 3/20/2022 at 12:19 AM, hishnash said:

ARMv8 does not indue TSO memory modes (these are not different instruction these are differs modes the cpu is set into that effects all memory operations while in that mode), sure if you make your own arm cores you can add any instruction/modes you want but this is not part of the required ARMv8 spec (and is not present on any other ARMv8 cpu).  Yes the ARM spec does not forbid you from adding this mode but the ARM spec does not forbid you from adding any mode you like.

Given that developer toggle this mode themselves (it is limited to the kernel) the runtime env that runs within the mode does not need to comply to any of the ARM spec since the only code that runs within this is the translated rosseta2 executables. 

It's been a few years since I looked into this, but my understanding of TSO is that it is just Apple's way of enabling developers to easily access the standard Arm features/instructions that were designed for x86 memory consistency. A lot of them were introduced in the ARM 8.3 extension. For example LDAPR which I believe has been supported in standard Arm cores since 2019, such as the A77.

 

 

Hell, even ARMv7 had somewhat of a strong memory order mode. It was called "strongly ordered memory" back then. In ARMv8 it is called "Device-nGnRnE most restrictive"

 

But even without this memory ordering mode, chances are Apple's M1 would still be really fast with translating x86 code. Having strong memory ordering is nice, but it's not really needed, and in the cases where it is needed you can just throw in some barriers. Most programs will likely work just fine regardless of the memory mode on the M1.

Link to comment
Share on other sites

Link to post
Share on other sites

On 3/9/2022 at 9:44 AM, Spindel said:

At a power draw of 65-70 W on the CPU

It definitely draws more than that at peak load. Closer to 250W but that’s also the whole computer, including the graphics. 

Laptop: 2019 16" MacBook Pro i7, 512GB, 5300M 4GB, 16GB DDR4 | Phone: iPhone 13 Pro Max 128GB | Wearables: Apple Watch SE | Car: 2007 Ford Taurus SE | CPU: R7 5700X | Mobo: ASRock B450M Pro4 | RAM: 32GB 3200 | GPU: ASRock RX 5700 8GB | Case: Apple PowerMac G5 | OS: Win 11 | Storage: 1TB Crucial P3 NVME SSD, 1TB PNY CS900, & 4TB WD Blue HDD | PSU: Be Quiet! Pure Power 11 600W | Display: LG 27GL83A-B 1440p @ 144Hz, Dell S2719DGF 1440p @144Hz | Cooling: Wraith Prism | Keyboard: G610 Orion Cherry MX Brown | Mouse: G305 | Audio: Audio Technica ATH-M50X & Blue Snowball | Server: 2018 Core i3 Mac mini, 128GB SSD, Intel UHD 630, 16GB DDR4 | Storage: OWC Mercury Elite Pro Quad (6TB WD Blue HDD, 12TB Seagate Barracuda, 1TB Crucial SSD, 2TB Seagate Barracuda HDD)
Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, DrMacintosh said:

It definitely draws more than that at peak load. Closer to 250W but that’s also the whole computer, including the graphics. 

As I said later in this thread I too would expect it to draw more (200-220 W) when you fully load CPU+GPU. But the GB test in OP only strains the CPU part of the SoC so that’s the number that is interesting when comparing to the artificial suns that are x86 CPUs.

Link to comment
Share on other sites

Link to post
Share on other sites

This thing can reach 3080/3090, and they haven't even unleashed the big guns yet.

 

Gotta wonder what the upcoming Mac Pro would look like...

Desktop

Y4M1-II: AMD Ryzen 9-5900X | Asrock RX 6900XT Phantom Gaming D | Gigabyte RTX 4060 low profile | 64GB G.Skill Ripjaws V | 2TB Samsung 980 Pro + 4TB 870 EVO + 4TB SanDisk Ultra 3D + 8TB WD Black + 4TB WD Black HDD | Lian Li O11 Dynamic XL-X | Antec ST1000 1000W 80+ Titanium | MSI Optix MAG342CQR | BenQ EW3270U | Kubuntu

-------------------------------

Mobile devices

Kuroneko: Lenovo ThinkPad X1 Yoga 4th (Intel i7-10510U | 16GB RAM | 1TB SSD)

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, YamiYukiSenpai said:

This thing can reach 3080/3090, and they haven't even unleashed the big guns yet.

 

Gotta wonder what the upcoming Mac Pro would look like...

While impressive, it can reach those cards when they’re constrained to half of their power envelope. 

MacBook Pro 16 i9-9980HK - Radeon Pro 5500m 8GB - 32GB DDR4 - 2TB NVME

iPhone 12 Mini / Sony WH-1000XM4 / Bose Companion 20

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, YamiYukiSenpai said:

This thing can reach 3080/3090, and they haven't even unleashed the big guns yet.

 

Gotta wonder what the upcoming Mac Pro would look like...

*In ideal conditions and slices of the power draw territory for certain applications.

 

It is a good system, but Apple's claims of "fastest pc chip" don't hold water. A bit disappointing, considering I was just coming around to trusting their graphs. 

 

Despite the slightly disappointing results of the GPU, I am still optimistic since the really osm efficiency might allow for really cool form factors,which the 3090 or a xeon won't allow.

Link to comment
Share on other sites

Link to post
Share on other sites

On 3/19/2022 at 1:02 AM, FakeKGB said:

That's likely because Intel Mac users have no hecking idea what AMD even is. They know Intel is older, so if Apple Silicon > Intel, they're happy.

Dude, there are computer illiterate Mac users, but there are plenty of people who use Mac's and still know about the world outside of their Macintosh. Don't make such generalizations.

Link to comment
Share on other sites

Link to post
Share on other sites

Interesting tidbit about the Ultra GPU.

 

Apparently it only draws around 30 W when for example running Blender. GB compute tops at around 50 W.

 

GFXBench apparently uses more power.

 

But it’s interesting that there seems to be some software issues in utilizing the full potential of the GPU.

 

I hope we well get some more indepth investigation of this by the likes of anandtech or some other outlet.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Spindel said:

Apparently it only draws around 30 W when for example running Blender.

It's used in Blender? I thought the version with M1 GPU support wasn't out yet?

 

Edit:

Oh it came out on the 9th, very interesting.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, leadeater said:

It's used in Blender? I thought the version with M1 GPU support wasn't out yet?

It is out since about a week

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Spindel said:

I hope we well get some more indepth investigation of this by the likes of anandtech or some other outlet.

You can attached Xcode debugging tools to blender to inspect the GPU activity and its quite clear its not going that great a job even on the M1 Max.

This is a snapshot of about 3 seconds of work during the BMW GPU render it should be solid orange. But you can see there are empty areas in the perf state indicating nothing was running at all and the little green segments showing the task that was disptached was lower enough priority that it was running in low power state.
image.thumb.png.5e69d46ea1ef7f3756b9676cf30031c2.png

Also if you look at the gpu counters that give an dictation of how much cache, ALU etc are limiting the operations you can see the shaders are doing a very poor job of saturating the gpu:
image.thumb.png.21fe171296c6fc57809a18d3be9e0cdc.png

In fact its even worse than that if you look at the shader breakdown you can see massive segments of unused time.

image.thumb.png.42e77b0e3e017c0d00e98ffb6396249c.png

From looking a these graphs I would say more than 70% of the time is un-used and the time that is sued as very poor fill rate.  The fact that there are any gaps between these scheduled calls is a big red flag, the system should be queueing up compute tasks well in advance so that there is a task ready to work on before the Current task finishes (that is what blender does for CUDA). 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, hishnash said:

You can attached Xcode debugging tools to blender to inspect the GPU activity and its quite clear its not going that great a job even on the M1 Max.

This is a snapshot of about 3 seconds of work during the BMW GPU render it should be solid orange. But you can see there are empty areas in the perf state indicating nothing was running at all and the little green segments showing the task that was disptached was lower enough priority that it was running in low power state.
image.thumb.png.5e69d46ea1ef7f3756b9676cf30031c2.png

Also if you look at the gpu counters that give an dictation of how much cache, ALU etc are limiting the operations you can see the shaders are doing a very poor job of saturating the gpu:
image.thumb.png.21fe171296c6fc57809a18d3be9e0cdc.png

In fact its even worse than that if you look at the shader breakdown you can see massive segments of unused time.

image.thumb.png.42e77b0e3e017c0d00e98ffb6396249c.png

From looking a these graphs I would say more than 70% of the time is un-used and the time that is sued as very poor fill rate.  The fact that there are any gaps between these scheduled calls is a big red flag, the system should be queueing up compute tasks well in advance so that there is a task ready to work on before the Current task finishes (that is what blender does for CUDA). 

The question then is where in lies the problem?

 

Is it the Metal API or the implementation of the API?

 

As I said this problem is not specific to Blender alone. In any case either Apple will need to issue an fix to their drivers and/or the API or the developers need to fix their implementation. 
 

Link to comment
Share on other sites

Link to post
Share on other sites

22 minutes ago, Spindel said:

Is it the Metal API or the implementation of the API?

Not an issue with the API it is possible to saturate these GPUs with compute workloads, maybe RT workloads can't fully fill the ALU as they are mostly IO bound unless you have costly shaders but they should at least be pre-scheduling shader calls in advance rather than waiting of the shader to complete call back tot he cpu then issue the next one. 

This is an issue with blender not having been optimised yet, they only just got full metal feature support into cycles in bender 3.1 I expect they were focusing on ensuring they could render all the object and shader types (cycles has quite a complex set of features it supports when you look at it, volumetric, multiple types of hair, particles, and then the entire node based shader graph). 

You should not start optimising until you have all these features supported I expect over the next few realises we will see some rather large performance improvements as they turn shaders to maximums usage and as they tune the dispatch so that the GPU is not sitting around doing nothing waiting of the next task.

Link to comment
Share on other sites

Link to post
Share on other sites

On 3/9/2022 at 8:35 PM, Alex Atkin UK said:

Pinch of salt indeed.

 

My Macbook Pro scores higher than my 9900K, almost as high as my 5950X, but in real-world desktop use both of those "feel" faster, they're just more responsive in general.

 

eg MacOS is terrible when dealing with network drives, feel like I'm back in the 90s with how long it can take.  Linux and Windows will open the drive in less than a second, MacOS can take minutes, and I've performed every single tweak I can find online to both MacOS and SAMBA.  Its not WiFi either, my wired Mac Mini M1 takes just as long and hard wiring the Macbook Pro makes zero difference either.

IDK, I have constant issues with windows file explorer crashing when not connected to network. Takes multiple minutes for that garbage thing to suggest troubleshooter and then it fixes itself. MacOS connects super fast, though bit tedious.

Link to comment
Share on other sites

Link to post
Share on other sites

On 3/26/2022 at 5:02 PM, Just that Mario said:

IDK, I have constant issues with windows file explorer crashing when not connected to network. Takes multiple minutes for that garbage thing to suggest troubleshooter and then it fixes itself. MacOS connects super fast, though bit tedious.

I wish I knew what was up with MacOS, same problem on a Mac Mini and Macbook Pro - same server on Linux or Windows is absolutely fine.

Another annoying thing is MacOS unmounts the drives whenever Samba is restarted on the server, so I regularly wake up the Mac to a "drives disconnected" message because my server updates at 3am automatically.  Windows on the other hand doesn't care, it probably DOES still technically disconnect but it sensibly just re-connects in the background so you'd never know.  Why bother the end-user about this unless the reconnection fails?  I don't need to know and I certainly shouldn't have to waste time going back into Finder to reconnect the drives.  Its more than once caused resuming work from a network drive to fail outright and is problematic with the software I use (Topaz Video Enhance AI) because once that happens it doesn't seem to recover even once reconnected.

Granted that's partly a flaw in that software, but its one that wouldn't happen if MacOS didn't behave so oddly.
 

Router:  Intel N100 (pfSense) + GL.iNet GL-X3000/ Spitz AX WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Alex Atkin UK said:


Granted that's partly a flaw in that software, but its one that wouldn't happen if MacOS didn't behave so oddly.
 

I can’t say I’ve had the issues you’ve described here myself, but that said I never use Windows for hosting smb shares (only ever use nas appliances or Linux samba), I do know others who’ve had similar though.

 

 It’s unlikely to help with the disconnect issue but does following this help at all?

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Paul Thexton said:

I can’t say I’ve had the issues you’ve described here myself, but that said I never use Windows for hosting smb shares (only ever use nas appliances or Linux samba), I do know others who’ve had similar though.

 

 It’s unlikely to help with the disconnect issue but does following this help at all?

Not using Windows, using Fedora and made all the recommended tweaks to smb.conf which made an improvement to initially displaying the directory contents but still painfully slow pulling in the rest of the listing when you scroll down.

Router:  Intel N100 (pfSense) + GL.iNet GL-X3000/ Spitz AX WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Alex Atkin UK said:

Not using Windows, using Fedora and made all the recommended tweaks to smb.conf which made an improvement to initially displaying the directory contents but still painfully slow pulling in the rest of the listing when you scroll down.

Just found this too, but I don’t know if samba supports multichannel, never needed to look in to it

 

https://support.apple.com/en-us/HT212277

Link to comment
Share on other sites

Link to post
Share on other sites

39 minutes ago, Paul Thexton said:

 It’s unlikely to help with the disconnect issue but does following this help at all?

Yes, that is what at least allowed the initial directly listing to appear almost immediately, before it would hang even doing that.

Router:  Intel N100 (pfSense) + GL.iNet GL-X3000/ Spitz AX WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Paul Thexton said:

Just found this too, but I don’t know if samba supports multichannel, never needed to look in to it

 

https://support.apple.com/en-us/HT212277

SAMBA support both SMB Multi-Channel and SMB Direct. Not sure if they have transitioned from "experimental" or not, I last looked at these for SAMBA a few years ago. Either way both work and I have gotten both to work personally.

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Alex Atkin UK said:

Yes, that is what at least allowed the initial directly listing to appear almost immediately, before it would hang even doing that.

I’m out of ideas then. May sound silly but have you raised a Feedback Assistant ticket with Apple? They don’t always reply to them, but the more people use that to tell them there’s a problem it increases the slim chance an engineer who cares will see it.

Link to comment
Share on other sites

Link to post
Share on other sites

23 hours ago, Paul Thexton said:

I’m out of ideas then. May sound silly but have you raised a Feedback Assistant ticket with Apple? They don’t always reply to them, but the more people use that to tell them there’s a problem it increases the slim chance an engineer who cares will see it.

It seems to have been going on for a long time.

Router:  Intel N100 (pfSense) + GL.iNet GL-X3000/ Spitz AX WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

@Alex Atkin UKit definitely has. I’ve just never run in to it myself and I’ve never quite understood why. I can only assume it’s down to how I tend to manage folder structures on my shared drives, because there’s not much else I tend to do differently.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×