dvds: md5sum/ sha256: represent all data?

Alir · November 25, 2016

I'm on Linux and one of its pros over Windows is that you can easily create hashes of the contents of files and dvds.

My question is, are these hashes 100% accurate and representative of all data on there? Is it possible for malware to be hiding somewhere behind the curtain? On other storage devices malware can still potentially reside inside the firmware if you perform a format. Do discs have something similar to firmware? Meaning that a dvd could still be malicious even if its hash is authentic?

alpenwasser · November 26, 2016

Just to be sure I understand your question right: Are you asking if an install medium image you can download from a linux distro site can be compromised? Or just any old sort of data?

In the case of install media: It depends a bit on the source of your image and reference checksum. If the install medium can be compromised, it's not entirely unlikely that the website has been compromised as well (after all, how else does a malicious party upload a compromised image onto the server in the first place?), in which case they can just supply you with the "correct" checksum for the compromised image. You grab the image, you checksum it, you compare with the reference, you think it's all good, and you're screwed.

In general, if you want two different files to have the same hash (called a hash collision), you'll need to:

create your malicious payload
embed it into the image you want to compromise
make sure that the image with the payload has the same checksum as the image by itself

Particularly the last step can get tricky, though it's not impossible. It depends a bit on the hash; MD5 is comparatively weak and hash collisions have been demonstrated if I remember right (haven't read up on it in a while, sorry). Other hash algorithms like SHA1 (used by git) or better are more difficult. However, getting that hash collision is still not impossible, just pretty difficult.

If the last step fails, and you have the checksum for the clean file, you will notice the corruption. Otherwise, your'e shit-out-of-luck.

The only way to really be sure that you're not installing anything compromised on your system is to download the source code and manually review it (a practical impossibility, of course) before compiling. Checksums are useful for making sure that your download went fine and your image hasn't been downloaded incompletely or corrupted through the download process, or, if you feel so inclined, to verify the integrity of the data on your HDD (ZFS uses checksums internally for that), but it's no means to ensure perfect security IMHO (which doesn't exist anyway).

I'm not a security expert though, could be that I've overlooked something. But this is what comes to mind off the top of my head.

Alir · November 26, 2016

15 hours ago, alpenwasser said:

Just to be sure I understand your question right: Are you asking if an install medium image you can download from a linux distro site can be compromised? Or just any old sort of data?

In the case of install media: It depends a bit on the source of your image and reference checksum. If the install medium can be compromised, it's not entirely unlikely that the website has been compromised as well (after all, how else does a malicious party upload a compromised image onto the server in the first place?), in which case they can just supply you with the "correct" checksum for the compromised image. You grab the image, you checksum it, you compare with the reference, you think it's all good, and you're screwed.

In general, if you want two different files to have the same hash (called a hash collision), you'll need to:

create your malicious payload

embed it into the image you want to compromise

make sure that the image with the payload has the same checksum as the image by itself

Particularly the last step can get tricky, though it's not impossible. It depends a bit on the hash; MD5 is comparatively weak and hash collisions have been demonstrated if I remember right (haven't read up on it in a while, sorry). Other hash algorithms like SHA1 (used by git) or better are more difficult. However, getting that hash collision is still not impossible, just pretty difficult.

If the last step fails, and you have the checksum for the clean file, you will notice the corruption. Otherwise, your'e shit-out-of-luck.

The only way to really be sure that you're not installing anything compromised on your system is to download the source code and manually review it (a practical impossibility, of course) before compiling. Checksums are useful for making sure that your download went fine and your image hasn't been downloaded incompletely or corrupted through the download process, or, if you feel so inclined, to verify the integrity of the data on your HDD (ZFS uses checksums internally for that), but it's no means to ensure perfect security IMHO (which doesn't exist anyway).

I'm not a security expert though, could be that I've overlooked something. But this is what comes to mind off the top of my head.

Ultimately my concern is whether after burning an iso to a disc, whether malware can be installed to it possibly by a rogue/ malicious software on the computer? So that when after burning the image to the disc and finalising it, the image file could be different to the image burned to the disc?

For example, on HDDs and USBs you have firmware that can be compromised and other hidden areas - SSDs. Is it the case on DVDs as well? Can data be written to it in a place that is not represented during a hash?

I want to burn an image to a disc and then create a hash of it and compare it to the hash of my iso file but I'm wondering whether hashes have any kind of loophole whereby not all data is accurately hashed?

Yeah, paranoia, but it's legitimate paranoia. I need discs I can trust the integrity of.

Alir · November 26, 2016

Any idea?

alpenwasser · November 26, 2016

Ah, so your basic concern is:

Download iso
Verify w/ checksum. iso is clean (as said, this could already be unreliable if the website has been compromised and a false checksum provided)
Malware which is already somewhere on your computer injects a payload into the iso after you have checksummed it
You then burn that compromised iso to a disk
You do a checksum of the entire disk, but it shows up with the same checksum as originally because the malware payload is somewhere which you can't access on the disk

This seems difficult, but not entirely impossible at first glance (again, disclaimer: I'm no security expert). The difference between an optical disk and a USB drive is that a USB drive actually has firmware (which, as you rightly point out, can be compromised). An optical disk doesn't really have that, it just carries data and metadata as far as I know. But: your optical drive of course has firmware. And theoretically, that firmware could be compromised, though I'm not sure how difficult that would be.

Absolute security doesn't really exist. There will come a point at which you will have to say "Alright, this is good enough.", unless you intend to re-design a CPU from scratch, along with everything else in your PC (it's just as possible that your hardware is compromised, after all, if we're being truly paranoid; what if there's a backdoor in our networking chips?). For this particular problem, if you wanted to be about as sure as you can reasonably get, I would recommend reading up on the various CD and DVD standards from reliable sources, see how the data is stored on those optical media, and whether or not this would provide an opportunity to inject a payload in an undetectable manner.

Alir · November 26, 2016

6 minutes ago, alpenwasser said:

Ah, so your basic concern is:

Download iso

Verify w/ checksum. iso is clean (as said, this could already be unreliable if the website has been compromised and a false checksum provided)

Malware which is already somewhere on your computer injects a payload into the iso after you have checksummed it

You then burn that compromised iso to a disk

You do a checksum of the entire disk, but it shows up with the same checksum as originally because the malware payload is somewhere which you can't access on the disk

This seems difficult, but not entirely impossible at first glance (again, disclaimer: I'm no security expert). The difference between an optical disk and a USB drive is that a USB drive actually has firmware (which, as you rightly point out, can be compromised). An optical disk doesn't really have that, it just carries data and metadata as far as I know. But: your optical drive of course has firmware. And theoretically, that firmware could be compromised, though I'm not sure how difficult that would be.

Absolute security doesn't really exist. There will come a point at which you will have to say "Alright, this is good enough.", unless you intend to re-design a CPU from scratch, along with everything else in your PC (it's just as possible that your hardware is compromised, after all, if we're being truly paranoid; what if there's a backdoor in our networking chips?). For this particular problem, if you wanted to be about as sure as you can reasonably get, I would recommend reading up on the various CD and DVD standards from reliable sources, see how the data is stored on those optical media, and whether or not this would provide an opportunity to inject a payload in an undetectable manner.

Didn't really answer my question but many thanks for trying.

I'm really that concerned if malware compromises the ISO after I have checked it. What matters is whether malware intends to infect my dvd before it is finalised. So I have my ISO and malware burned to my disc. If the ISO gets compromised after I initially checked it, it doesn't matter. I would verify the disc on a couple computers and verify the checksum with the ones on the official website and on third party sites. The chances of them all being compromised are nill. Unless the actual ISO itself has malicious code in it. Which, as you point out, becomes the limit where paranoia starts negatively affecting productivity.

I'm trying to burn common ISOs I keep having to burn to the same USB stick. Thing is on USBs, they can be compromised easily even if you've 'burned' an ISO to one. By hardware, discs can't be compromised after they've been finalised since there's nowhere else or malware to be saved to.

I'm just wondering whether malware can hide itself in such a way that it would not be detected by a checksum.

mariushm · November 26, 2016

MD5 is pretty much done. It's possible and fairly easy to create files with specific MD5 hashes. So someone with bad intentions could download a genuine ISO, add a file to it with some malware and then add some additional harmless file like a txt file in the iso and manipulate the bytes in that text file so that when a software calculates the md5 hash of the whole iso file, you get the old md5 hash.

I don't know a particular software that does something like this, but there is out there. And making this hack can take anything between a few hours to a few days depending on how much computing power you have (video cards help)

See https://en.wikipedia.org/wiki/MD5

SHA-1 is also vulnerable

sha-2 is more complex, requires more computations etc and i'm not aware of some flaw in the algorithm that would speed up generating files with same sha256 hash.

However,

alpenwasser · November 26, 2016

1 hour ago, Alir said:

I'm just wondering whether malware can hide itself in such a way that it would not be detected by a checksum.

Fundamentally, yes. See: http://www.mscs.dal.ca/~selinger/md5collision/

But: Assuming your initial iso file is clean, I'm not sure how practical it is for a malware to infect it on your machine. Attacks using hash collisions which I've read about so far were written with specific files in mind, files which were known to the attacker at the time when the malware was written (if I'm wrong, feel free to correct me). As @mariushm said, this step can take hours to days, depending on the files (for MD5, that is). This is no problem when you write a malware for a known iso, then send it onto its way via its attack vector.

But: If your initial iso is clean, but the malware is instead locally on your machine and targets the iso, this seems to become significantly less practical for an attacker. They would have to run all those calculations locally on your machine. If they want that done fast, they will use lots of resources, which would mean you would suddenly see your CPU usage spike, thus (hopefully) becoming suspicious. And even if you don't notice, if that takes several hours, you'll probably have burnt the image to disk before the malware has done its job. And if they try to be stealthy and run with low CPU usage, the chances of you burning the iso before they're done become even higher.

There's an example program for creating MD5 hash collisions on the website I linked above. I'm currently running it with a very small example, and that seems to be taking about half an hour (20% done after six minutes). I'm curious to see how it scales with bigger files; will report back once that's done (might take a few hours though). If it doesn't matter much how big the file is, the problem is much bigger than if it takes longer for big files (since iso files tend to be pretty sizeable and my first test program was only a few kilobytes).

alpenwasser · November 27, 2016

@Alir Alright, I let the md5 collision generator run over night, results are as follows:

For a 15 kilobyte input file, it took 36 minutes to generate the hash collision.
For 33 megabyte input file, it took 3 hours and 40 minutes.

Granted, this is just one way to generate MD5 collisions; maybe there are faster ones out there. But at least based on these results, if the iso you downloaded was clean, infecting it and generating the data needed to generate a hash collision locally on your machine does not really seem practical to me, because iso files tend to be significantly larger than the files I tested with, so the malware would likely need several days to do its work, at which point you'd have already burnt the disk I presume.

The software could probably be optimized some more to make it faster. It only runs on a single core at the moment. But if you allow it to utilize more cores, I would expect that you as a suspicious and paranoid user would notice that your machine is suddenly being heavily loaded with some software which you don't know. Heck, even if it's allowed to load a single core to the max, that would already make me very suspicious (I tend to keep a close eye on CPU usage in general). So if the malware wanted to do its work undetected, it would need to throttle CPU usage quite a lot, at which point it might take weeks to infect a CD-ROM-sized iso file and generate the appropriate hash collision data.

Alir · November 27, 2016

14 hours ago, mariushm said:

MD5 is pretty much done. It's possible and fairly easy to create files with specific MD5 hashes. So someone with bad intentions could download a genuine ISO, add a file to it with some malware and then add some additional harmless file like a txt file in the iso and manipulate the bytes in that text file so that when a software calculates the md5 hash of the whole iso file, you get the old md5 hash.

I don't know a particular software that does something like this, but there is out there. And making this hack can take anything between a few hours to a few days depending on how much computing power you have (video cards help)

See https://en.wikipedia.org/wiki/MD5

SHA-1 is also vulnerable

sha-2 is more complex, requires more computations etc and i'm not aware of some flaw in the algorithm that would speed up generating files with same sha256 hash.

However,

13 hours ago, alpenwasser said:

Fundamentally, yes. See: http://www.mscs.dal.ca/~selinger/md5collision/

But: Assuming your initial iso file is clean, I'm not sure how practical it is for a malware to infect it on your machine. Attacks using hash collisions which I've read about so far were written with specific files in mind, files which were known to the attacker at the time when the malware was written (if I'm wrong, feel free to correct me). As @mariushm said, this step can take hours to days, depending on the files (for MD5, that is). This is no problem when you write a malware for a known iso, then send it onto its way via its attack vector.

But: If your initial iso is clean, but the malware is instead locally on your machine and targets the iso, this seems to become significantly less practical for an attacker. They would have to run all those calculations locally on your machine. If they want that done fast, they will use lots of resources, which would mean you would suddenly see your CPU usage spike, thus (hopefully) becoming suspicious. And even if you don't notice, if that takes several hours, you'll probably have burnt the image to disk before the malware has done its job. And if they try to be stealthy and run with low CPU usage, the chances of you burning the iso before they're done become even higher.

There's an example program for creating MD5 hash collisions on the website I linked above. I'm currently running it with a very small example, and that seems to be taking about half an hour (20% done after six minutes). I'm curious to see how it scales with bigger files; will report back once that's done (might take a few hours though). If it doesn't matter much how big the file is, the problem is much bigger than if it takes longer for big files (since iso files tend to be pretty sizeable and my first test program was only a few kilobytes).

54 minutes ago, alpenwasser said:

@Alir Alright, I let the md5 collision generator run over night, results are as follows:

For a 15 kilobyte input file, it took 36 minutes to generate the hash collision.

For 33 megabyte input file, it took 3 hours and 40 minutes.

Granted, this is just one way to generate MD5 collisions; maybe there are faster ones out there. But at least based on these results, if the iso you downloaded was clean, infecting it and generating the data needed to generate a hash collision locally on your machine does not really seem practical to me, because iso files tend to be significantly larger than the files I tested with, so the malware would likely need several days to do its work, at which point you'd have already burnt the disk I presume.

The software could probably be optimized some more to make it faster. It only runs on a single core at the moment. But if you allow it to utilize more cores, I would expect that you as a suspicious and paranoid user would notice that your machine is suddenly being heavily loaded with some software which you don't know. Heck, even if it's allowed to load a single core to the max, that would already make me very suspicious (I tend to keep a close eye on CPU usage in general). So if the malware wanted to do its work undetected, it would need to throttle CPU usage quite a lot, at which point it might take weeks to infect a CD-ROM-sized iso file and generate the appropriate hash collision data.

I don't think you guys have quite understood my concern. I'm not concerned about hash collisions. I use sha256sum when I can, which includes checking DVDs and ISOs. It's cryptographically secure.

I'm concerned about whether malware can be saved onto a DVD in such a way or in such a place so that when you do create a hash, the malware does not get hashed and so you think the DVD is clean when it really isn't. It's mostly a question of hardware and how the DVD works. Which I'm assuming you guys don't have knowledge of?

alpenwasser · November 27, 2016

2 hours ago, Alir said:

I don't think you guys have quite understood my concern. I'm not concerned about hash collisions. I use sha256sum when I can, which includes checking DVDs and ISOs. It's cryptographically secure.

I'm concerned about whether malware can be saved onto a DVD in such a way or in such a place so that when you do create a hash, the malware does not get hashed and so you think the DVD is clean when it really isn't. It's mostly a question of hardware and how the DVD works. Which I'm assuming you guys don't have knowledge of?

Well, I already went into this in my above posts: I recommend reading up on the various CD and DVD standards. If you want to be sure about this, you will need to truly, properly understand those standards, or more precisely: How data is written to, stored on and retrieved from the disks. And no, I've not done this, because I no longer have an optical drive anywhere. But when in doubt, always go to the primary source.

Also, as said: If your burner/reader has a compromised firmware, then it can do pretty much whatever it wants without you noticing. When you burn the disk, it can embed a payload without telling you, and then when you read the disk (in the same drive, of course), it can ignore the payload and only give you the data you think should be there (thus giving you a correct checksum), while doing something else with the payload. This would probably no longer work on a drive with a clean firmware (again, depending on how the standard is implemented). How the firmware gets infected is another question though. So, if you cannot trust your burner, then you're screwed.

Do you generate these iso files locally, or do you download them?

As a starting point, I'd suggest reading up on some of this stuff, and then maybe go from there (copied from Wikipedia):

SFF ATAPI/MMC
- Mount Rainier (packet writing)
- Mount Fuji (layer jump recording)
Rainbow Books
File systems
- ISO 9660
  - Joliet
  - Romeo
  - Rock Ridge / SUSP
  - El Torito
  - Apple ISO 9660 Extensions
- Universal Disk Format (UDF)
- ISO 13490

Alir · November 27, 2016

24 minutes ago, alpenwasser said:

Well, I already went into this in my above posts: I recommend reading up on the various CD and DVD standards. If you want to be sure about this, you will need to truly, properly understand those standards, or more precisely: How data is written to, stored on and retrieved from the disks. And no, I've not done this, because I no longer have an optical drive anywhere. But when in doubt, always go to the primary source.

Also, as said: If your burner/reader has a compromised firmware, then it can do pretty much whatever it wants without you noticing. When you burn the disk, it can embed a payload without telling you, and then when you read the disk (in the same drive, of course), it can ignore the payload and only give you the data you think should be there (thus giving you a correct checksum), while doing something else with the payload. This would probably no longer work on a drive with a clean firmware (again, depending on how the standard is implemented). How the firmware gets infected is another question though. So, if you cannot trust your burner, then you're screwed.

Do you generate these iso files locally, or do you download them?

As a starting point, I'd suggest reading up on some of this stuff, and then maybe go from there (copied from Wikipedia):

SFF ATAPI/MMC

Mount Rainier (packet writing)

Mount Fuji (layer jump recording)

Rainbow Books

File systems

ISO 9660

Joliet

Romeo

Rock Ridge / SUSP

El Torito

Apple ISO 9660 Extensions

Universal Disk Format (UDF)

ISO 13490

Thanks. I download them. Mostly or only just Windows and Linux ISOs. But I verify them from more than one source. I think that's sufficient.

Thanks for the links.

Sign In

dvds: md5sum/ sha256: represent all data?

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Featured Topics

Topics

Latest From Linus Tech Tips:

I Was Never Meant to Have This Prototype CPU

Latest From Tech Quickie:

Why Do Speakers Hiss?

Latest From TechLinked:

Intel: “It Wasn’t Me”

Latest From GameLinked:

Bethesda Knows It’s Broken

Latest From ShortCircuit:

How is this even handheld?! - OneXPlayer X1

Latest From Mac Address:

Why did you buy an Apple Vision Pro?

Latest From Channel Super Fun:

I Swapped the CEO's Assistant For a Day!