Jump to content

Which language is faster for file manipulation?

Neftex

Hi, id like to make my own program for file searching and comparison. What im thinking of doing is: find all files on system -> check for files with same names -> hash them and compare

 

basically i want to deduplicate files based on name/hash and manually inspect those with different hash

 

Right now im deciding between C# or Java as those are the ones im most comfortable with - which one would be faster?

MSI GX660 + i7 920XM @ 2.8GHz + GTX 970M + Samsung SSD 830 256GB

Link to comment
Share on other sites

Link to post
Share on other sites

Assembly would be the fastest. Then c# which is translated almost straight into assembly (simplifying). Java runs on a virtual machine (sort of) so can be much slower, but recent additions let you code more specifically for a platform.

For something like sorting files a c variant would probably be best.

Link to comment
Share on other sites

Link to post
Share on other sites

Assembly is not faster than C regarding file manipulation, since you'll basically use the same OS methods in both languages. C# and Java are notably slower due to their levels of abstraction.

Write in C.

Link to comment
Share on other sites

Link to post
Share on other sites

44 minutes ago, r4tch3t said:

For something like sorting files a c variant would probably be best.

I would agree C or maybe C++. Java and C# have both too much overhead so both of them are a bad choice. 

Link to comment
Share on other sites

Link to post
Share on other sites

While I haven't looked into the Java side of things, I poked around on the interwebs regarding C#. And at the moment I feel it's disingenuous to claim C# is horribly slower than C/C++:

 

https://stackoverflow.com/questions/686483/c-sharp-vs-c-big-performance-difference/686617#686617

http://journal.stuffwithstuff.com/2009/01/03/debunking-c-vs-c-performance/

https://www.codeproject.com/Tips/860631/Native-Cplusplus-and-Csharp-Which-is-the-Fastest-P

https://www.linkedin.com/pulse/c-faster-than-depends-coding-idom-joe-ellsworth/

 

The short of it is, language doesn't have a speed, its implementation does.

 

My take for the OP: Write the application in a language you're comfortable with and get the application out first. If you've designed the application well and written the code in a manner you can pick up easily after three months of not looking at it, whenever you're not satisfied with the performance and believe there are greener pastures, it'll be quicker to move over.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Neftex said:

hash them and compare

Just to clarify: hash the filenames or the contents ? It makes a significant difference. In the latter case the hashing will be the bulk of the work, not the file manipulation.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Unimportant said:

Just to clarify: hash the filenames or the contents ? It makes a significant difference. In the latter case the hashing will be the bulk of the work, not the file manipulation.

hashing filenames wouldnt really make sense, im going to hash the contents

MSI GX660 + i7 920XM @ 2.8GHz + GTX 970M + Samsung SSD 830 256GB

Link to comment
Share on other sites

Link to post
Share on other sites

The differences in speed between languages will be less than the differences in speed between algorithms. Pick a good algorithm, and then go from there.

15" MBP TB

AMD 5800X | Gigabyte Aorus Master | EVGA 2060 KO Ultra | Define 7 || Blade Server: Intel 3570k | GD65 | Corsair C70 | 13TB

Link to comment
Share on other sites

Link to post
Share on other sites

12 hours ago, Blade of Grass said:

The differences in speed between languages will be less than the differences in speed between algorithms. Pick a good algorithm, and then go from there.

Not to mention that the primary bottleneck will be read times anyway. different languages may have more or less overhead, but reading all files on a system is going to take a ton of disk reading which is not fast regardless of what language is trying to read it. The only difference might be how fast a given language processes a file once its loaded in memory, and if you are doing some crazy processing on it there then maybe language will matter, but I'm willing to bet that the performance here is very much bottlenecked on disk reads.

 

(edit: this does depend on the size of the files though. If you are only searching for a specific file type, and there aren't that many and they aren't that big then loading into memory won't be as big of a deal).

Gaming build:

CPU: i7-7700k (5.0ghz, 1.312v)

GPU(s): Asus Strix 1080ti OC (~2063mhz)

Memory: 32GB (4x8) DDR4 G.Skill TridentZ RGB 3000mhz

Motherboard: Asus Prime z270-AR

PSU: Seasonic Prime Titanium 850W

Cooler: Custom water loop (420mm rad + 360mm rad)

Case: Be quiet! Dark base pro 900 (silver)
Primary storage: Samsung 960 evo m.2 SSD (500gb)

Secondary storage: Samsung 850 evo SSD (250gb)

 

Server build:

OS: Ubuntu server 16.04 LTS (though will probably upgrade to 17.04 for better ryzen support)

CPU: Ryzen R7 1700x

Memory: Ballistix Sport LT 16GB

Motherboard: Asrock B350 m4 pro

PSU: Corsair CX550M

Cooler: Cooler master hyper 212 evo

Storage: 2TB WD Red x1, 128gb OCZ SSD for OS

Case: HAF 932 adv

 

Link to comment
Share on other sites

Link to post
Share on other sites

@reniat afaik theres some kind of file table in the filesystem, so i thought of reading that to get all files (not sure if it contains filenames) so that shouldnt be too slow. slowest things will probably be searching for same filename and hashing

 

thinking about it, i guess i should be looking for language with fastest hashing and string comparison

MSI GX660 + i7 920XM @ 2.8GHz + GTX 970M + Samsung SSD 830 256GB

Link to comment
Share on other sites

Link to post
Share on other sites

30 minutes ago, Neftex said:

thinking about it, i guess i should be looking for language with fastest hashing and string comparison

I don't think that's necessary, considering hashing is a well-defined algorithm (also make sure you know which hashing algorithm is being used) and string compares tend to be as basic as possible. So they're likely well optimized enough that between them there's no practical difference.

 

But again, I would suggest just getting the application written first in a language you're most comfortable. If you have issues regarding performance, then we can start looking at other avenues.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×