Jump to content

What is Multithreaded file copy?

In several videos about their 100 gig networking and liqid badgerden, they talked about Cho-EZ copy, a "multithreaded" file copier. How can file copy be multithreaded when its all handled in hardware with DMA controllers? I've heard very rarely of software-based file copying, and I've always heard it is many times slower than Direct Memory Access anyway.

Link to comment
Share on other sites

Link to post
Share on other sites

It basically copies multiple files at the same time, as there are limits on how fast a single stream of data can go, even with DMA.

 

Look up the /MT option in robocopy.

Link to comment
Share on other sites

Link to post
Share on other sites

Let's say you have a 10gb file you want to copy.

Your program will initialize a certain number of threads (lets say 10 in this case).

Essentially what we'll do is divide the file into fixed size chunks, and we'll assign a thread to copy each chunk.  To be clear, we won't actually move data around to divide it, just give it different start and end bounds in the file. So the thread that copies the first chunk will get the bounds 0-99mb, thread 2 will get 100-199mb, thread 3 will get 200-299mb, etc.

For Spinning Drives, a multithreaded copy is probably going to be slower due to it having to move the read head around, which is the slowest part of any read/write operation (could be wrong; if so please correct me).  For SSDs however multithreaded copy can be very helpful.

 

DMA does help to speed up this process but it isn't a substitute for multithreaded file copy. 

To see why let's take a high level look at copying data from one machine to another.

 

Source Drive -> Local Memory -> Send over Network -> Remote Memory -> Remote Drive.

 

Not at first glance you might think "okay, that's pretty straightforward, how are you actually improving on that?"

Well the issue here is the local memory.  When you do a read operation what you're basically doing is providing a memory buffer where you need the data, the file, and the number of bytes you want.  Prior to DMA, when the data was first read from the drive into memory, it didn't get copied into that buffer.  Instead it got copied into another buffer, then into another buffer, then another one, and possibly only then into the buffer you provided (but there could be more buffers).  This same process would then happen in reverse if you were writing data, and pretty much the same thing would happen when sending data over the network (only difference is the data ends up in the NIC rather than the drive).  All this copying of course makes the process much slower.

 

DMA simply cuts out as many of those unnecessary copies and buffers as possible, and instead lets memory devices access the direct memory the calling program/device needs the data to end up in.

 

 

 

 

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

-> Moved to Storage Devices

^^^^ That's my post ^^^^
<-- This is me --- That's your scrollbar -->
vvvv Who's there? vvvv

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×