Jump to content

Introducing the DPU, Processing In Ram

porina
Quote

The idea behind In-Memory Processing, or ‘Processing In-Memory’, is that a number of those simple integer or floating point operations should be done while the memory is still in DRAM – no need to cart it over to the CPU, do the operation, and then send it back. If the data can stay there and be updated, this saves time and power without affecting the result. Alternatively, perhaps compute on the CPU can be reduced if results are sent back out to main memory and a final XOR is applied to the data in memory. That frees up the main CPU core to do other compute related things, or reduces the effective memory bandwidth should it be a limiting factor.

https://www.anandtech.com/show/14750/hot-chips-31-analysis-inmemory-processing-by-upmem

 

Where can we add more processing potential? Let the ram handle some of the simpler calculations. In this implementation, a Data Processing Unit (DPU) is one "core" per 64MB of ram capable of some functions, but wont replace your CPU any time soon. Potential benefits include saving some ram bandwidth usage, and reducing the energy needed to do those processing tasks it can be used for. Designers claim it does not require much more manufacturing cost or power over standard ram. It does require a recode so it isn't a drop in and run enhancement. So initially it will likely be targeted by applications specifically (re)coded to take advantage of it. Maybe future generations will get more advanced and generic enough it could reach into consumer level gear. 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Interesting. Sounds like something that will be picked up in enterprise and super computer stuff for a while before we get our hands on it though.

Current LTT F@H Rank: 90    Score: 2,503,680,659    Stats

Yes, I have 9 monitors.

My main PC (Hybrid Windows 10/Arch Linux):

OS: Arch Linux w/ XFCE DE (VFIO-Patched Kernel) as host OS, windows 10 as guest

CPU: Ryzen 9 3900X w/PBO on (6c 12t for host, 6c 12t for guest)

Cooler: Noctua NH-D15

Mobo: Asus X470-F Gaming

RAM: 32GB G-Skill Ripjaws V @ 3200MHz (12GB for host, 20GB for guest)

GPU: Guest: EVGA RTX 3070 FTW3 ULTRA Host: 2x Radeon HD 8470

PSU: EVGA G2 650W

SSDs: Guest: Samsung 850 evo 120 GB, Samsung 860 evo 1TB Host: Samsung 970 evo 500GB NVME

HDD: Guest: WD Caviar Blue 1 TB

Case: Fractal Design Define R5 Black w/ Tempered Glass Side Panel Upgrade

Other: White LED strip to illuminate the interior. Extra fractal intake fan for positive pressure.

 

unRAID server (Plex, Windows 10 VM, NAS, Duplicati, game servers):

OS: unRAID 6.11.2

CPU: Ryzen R7 2700x @ Stock

Cooler: Noctua NH-U9S

Mobo: Asus Prime X470-Pro

RAM: 16GB G-Skill Ripjaws V + 16GB Hyperx Fury Black @ stock

GPU: EVGA GTX 1080 FTW2

PSU: EVGA G3 850W

SSD: Samsung 970 evo NVME 250GB, Samsung 860 evo SATA 1TB 

HDDs: 4x HGST Dekstar NAS 4TB @ 7200RPM (3 data, 1 parity)

Case: Sillverstone GD08B

Other: Added 3x Noctua NF-F12 intake, 2x Noctua NF-A8 exhaust, Inatek 5 port USB 3.0 expansion card with usb 3.0 front panel header

Details: 12GB ram, GTX 1080, USB card passed through to windows 10 VM. VM's OS drive is the SATA SSD. Rest of resources are for Plex, Duplicati, Spaghettidetective, Nextcloud, and game servers.

Link to comment
Share on other sites

Link to post
Share on other sites

So mining on a HDD isn't that much of a stretch after all...

"We also blind small animals with cosmetics.
We do not sell cosmetics. We just blind animals."

 

"Please don't mistake us for Equifax. Those fuckers are evil"

 

This PSA brought to you by Equifacks.
PMSL

Link to comment
Share on other sites

Link to post
Share on other sites

So basically... a math co-processor, separate from the CPU.

Where have I heard this before?

PLEASE QUOTE ME IF YOU ARE REPLYING TO ME

Desktop Build: Ryzen 7 2700X @ 4.0GHz, AsRock Fatal1ty X370 Professional Gaming, 48GB Corsair DDR4 @ 3000MHz, RX5700 XT 8GB Sapphire Nitro+, Benq XL2730 1440p 144Hz FS

Retro Build: Intel Pentium III @ 500 MHz, Dell Optiplex G1 Full AT Tower, 768MB SDRAM @ 133MHz, Integrated Graphics, Generic 1024x768 60Hz Monitor


 

Link to comment
Share on other sites

Link to post
Share on other sites

31 minutes ago, rcmaehl said:

So basically... a math co-processor, separate from the CPU.

Where have I heard this before?

Co-processors aren't new, even today you could see a GPU as a co-pro to the CPU. The "new" thing here is sticking it inside the ram. The operations themselves aren't remarkable, but by working in ram you cut out a lot of the transfers between ram and CPU. 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

What kind of useful calculations though? If the CPU has enough bandwidth, could you compress/encrypt on the fly? Though I'm not sure if it's worth lifting that off 1 CPU core...

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, TechyBen said:

What kind of useful calculations though? If the CPU has enough bandwidth, could you compress/encrypt on the fly? Though I'm not sure if it's worth lifting that off 1 CPU core...

You'll have to look up the source article for a more detailed list, but there is a variety of operations. A major point of this is there isn't enough bandwidth to go around. There hasn't been for many years and it is getting worse, not better, as cores increase far faster than ram bandwidth. Ram bottlenecks are a major limiting factor in many compute heavy scenarios. Anything you can do to offload some of that can help, with the promise of doing more work, less moving data around, at lower power.

 

For sure, it will be extremely niche in use, so don't expect to see it in consumer level devices any time soon.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Hm interesting, if it can offload cycles from CPU without any issues good. 

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Mousepad: Skypad 3.0 XL / Zowie GTF-X | Mouse: Zowie S1-C | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | Acer XV272U | OS: Windows 11 |

Link to comment
Share on other sites

Link to post
Share on other sites

15 hours ago, Mira Yurizaki said:

So who tells the DPU what to do?

Yes, i am very curious. I am guessing it does something like this :

 

- CPU send compute instruction to the RAM

- RAM answer the location of the result

- CPU send 2 INT in 1 cycle

- CPU read memory address.

 

I am unsure how are they syncing so the CPU knows when the result is computed. Also i assume not all computation is sent to the RAM either. Must be some sort of sharing. Very interesting subject. And the document talk of 500 mhz only and that's still pretty good for the performance they are getting. I guess the first iteration of this people will run like 2 stick of standard ram paired with 2 stick of this special RAM.

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Franck said:

Yes, i am very curious. I am guessing it does something like this :

 

- CPU send compute instruction to the RAM

- RAM answer the location of the result

- CPU send 2 INT in 1 cycle

- CPU read memory address.

Look at it the other way around. Data lives in ram. In conventional usage, it has to go to the CPU, get worked on, then returned to the ram. If you can do some of the work locally in ram, you cut the CPU out. The processing in ram still has to be controlled and synchronised by the CPU, but the processed data doesn't need to immediately go to CPU when done.

 

4 minutes ago, Franck said:

Also i assume not all computation is sent to the RAM either.

The computations available to the DPU are relatively limited compared to CPU, so will only be a subset.

 

4 minutes ago, Franck said:

Must be some sort of sharing.

It requires re-coding so I assume it is managed in software to ensure things happen in the right order.

 

4 minutes ago, Franck said:

I guess the first iteration of this people will run like 2 stick of standard ram paired with 2 stick of this special RAM.

These will probably go in dedicated servers/workstations running specific software, and ram will be the same type throughout. 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

I'm guessing UPMEM uses the side channels that DIMM slots have to issue commands to the DPU, since obviously there's not going to be an update to x86 to include instructions that could do something like "store, then modify"

 

Still, there's a question of security since memory the CPU is supposed to own can be modified outside of the CPU's control.

Link to comment
Share on other sites

Link to post
Share on other sites

16 hours ago, thorhammerz said:

?

 

DAEDALUS (2018 Refit) - Processor: AMD Ryzen 5 - 1600 @ 3.7Ghz // Cooler: Cooler Master Hyper 212 LED Turbo Black Edition // Motherboard: Asus RoG Strix B350-F Gaming // Graphics Card: Gigabyte GTX 1060 Windforce 6GB GDDR5 // Memory: 2 x 8GB DDR4 Corsair LPX Vengeance 3000Mhz // Storage: WD Green - 250GB M.2 SATA SSD (Boot Drive and Programs), SanDisk Ultra II 120GB (GTA V), WD Elements 1TB External Drive (Steam Library) // Power Supply: Cooler Master Silent Pro 700W // Case: BeQuiet Silentbase 600 with SilentWings Mk.2 Internal Fans // Peripherals: VicTop Mechanical Gaming Keyboard & VicTsing 7200 DPI Wired Gaming Mouse

 

PROMETHEUS (2018 Refit) - Processor: Intel Core i5-3470 @ 3.2Ghz // Cooler: Cooler Master 212 EVO // Motherboard: Foxconn 2ABF // Graphics Card: ATI Radeon HD 5450 (For Diagnostic Testing Only) // Memory: 2 x 4GB DDR3 Mushkin Memory // Storage: 10TB of Various Storage Drives // Power Supply: Corsair 600W // Case: Bitfenix Nova Midi Tower - Black

 

SpeedTest Results - Having Trouble Finding a Decent PSU? - Check the PSU Tier List!

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Mira Yurizaki said:

Still, there's a question of security since memory the CPU is supposed to own can be modified outside of the CPU's control.

Is it possible to write to ram without going through the CPU? If you have to go through CPU, then by definition it is under CPU control.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, porina said:

Is it possible to write to ram without going through the CPU? If you have to go through CPU, then by definition it is under CPU control.

It may still go through the CPU, but that doesn't mean the thing ultimately working on the data can't be a rogue agent that tells the CPU everything is fine when in fact, everything is on fire.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, porina said:

Is it possible to write to ram without going through the CPU? If you have to go through CPU, then by definition it is under CPU control.

By definition, something has to dictate what gets written where, and what data is operated on when (and where). If not the CPU itself, then a co or embedded processor/controller somewhere along the way.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Mira Yurizaki said:

It may still go through the CPU, but that doesn't mean the thing ultimately working on the data can't be a rogue agent that tells the CPU everything is fine when in fact, everything is on fire.

To my understanding, these DPUs don't have any persistent storage, or even an OS. If you maintain security on your CPU, things should remain in a consistent state. 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, porina said:

To my understanding, these DPUs don't have any persistent storage, or even an OS. If you maintain security on your CPU, things should remain in a consistent state. 

The DPUs still have to have a program to run. So unless it's baked in ROM and mathematically verified to not have issues, you can still have a rogue agent.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Mira Yurizaki said:

The DPUs still have to have a program to run. So unless it's baked in ROM and mathematically verified to not have issues, you can still have a rogue agent.

Maybe I'm missing something. The CPU has to give those instructions to the DPU. So if the CPU is in good state, the DPU should be in good state. If you get malware on CPU, all bets are off.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Mira Yurizaki said:

The DPUs still have to have a program to run. So unless it's baked in ROM and mathematically verified to not have issues, you can still have a rogue agent.

I am assuming that programming wise we will have specific instruction set for these. So i guess that if security is a risk we will be the one's managing what can compute there or not. We will make sure to not do critical stuff there IF there is any issue security wise.

Link to comment
Share on other sites

Link to post
Share on other sites

Wait. Just realised... AMD were offering complete bit level (IIRC) encryption in ram... so these chips would have to have the access codes and also use them, or be run on systems without it. :(

Also adds even more problems if running software/services in VMs, as I could see cross talk/access similar to Spectre or RowHammer going full on nuclear on some tech like this. XD

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, porina said:

Maybe I'm missing something. The CPU has to give those instructions to the DPU. So if the CPU is in good state, the DPU should be in good state. If you get malware on CPU, all bets are off.

The DPU has to have something to interpret those instructions. It's very unlikely that whatever it is that interprets those instructions is anything low level like how a CPU does it. Instead it would interpret commands via firmware or something. If there's any access to the portion in how the DPU responds to those instructions in a manner that it can be overwritten, then there's an attack vector in the DPU.

 

It's the same thing with Intel's IME. It lives in the chipset which you would think is commanded by the CPU. But because the chipset (and the IME by extension) is an independent processing unit that handles data outside of the CPU's bounds, it can be compromised to no longer be trustworthy with whatever the CPU sends it.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, TechyBen said:

Wait. Just realised... AMD were offering complete bit level (IIRC) encryption in ram... so these chips would have to have the access codes and also use them, or be run on systems without it. :(

Also adds even more problems if running software/services in VMs, as I could see cross talk/access similar to Spectre or RowHammer going full on nuclear on some tech like this. XD

In the first instance I think this may be applied to wholly owned servers since the initial use cases are more focused. If it gains sufficient popularity then maybe it gets expanded into cloud servers but they have more time to work that bit out.

 

2 minutes ago, Mira Yurizaki said:

It's the same thing with Intel's IME. It lives in the chipset which you would think is commanded by the CPU. But because the chipset (and the IME by extension) is an independent processing unit that handles data outside of the CPU's bounds, it can be compromised to no longer be trustworthy with whatever the CPU sends it.

I think we're in general agreement but the argument hinges on if there is a way to access the DPU without going through the CPU, which was an earlier question of mine. IME can be accessed externally to CPU (there were network based attacks previously I think) hence it provides another exposed point that can be attacked.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×