Jump to content

Sorry I think this is gonna be long so a kinda TL;DR I'm looking for info on SSDs in that, is the deterioration I'm seeing on my SSD normal, also if formatting the drive is actually fixing the problems of bad sectors and read errors that I was getting. Along with that how much should I expect from a Samsung RMA, or are they gonna just send it back saying it's good.

So the full story...
I've been having random problems with my computer (built by myself 2 years ago) for the past year or so (including; initial windows loading problems with not getting to login screen; random crashes (no bsod but basically should have probably); file explorer randomly crashing; graphics all of a sudden going crazy; not detecting monitors; what seems like windows crashing without bsod'ing somehow; a couple of actual bsod's; etc.) and have been going through all the troubleshooting steps I could think of and find online for said problems as they came up, (monitored the temperatures of CPU/GPU/SSDs; Reseated CPU/GPU/Memory/SSDs/Power Cables; reinstalled graphics drivers; updated drivers for all components; checked for/repaired corrupted files and windows files; etc.) and those steps kinda seemed like they would work but then I'd just run into something else not working somewhere between a week to even sometimes a month later. Only recently it dawned on me to check up on my boot drive, which I hadn't done so in a long while. With the problems I have been having, I definitely should have done that much earlier and also should have much more heavily considered/done a reinstall of windows but I'll be honest it sounded like a hassle to go through all that at the time, but I definitely will be doing that very soon regardless.

Back to my boot drive, it's a Samsung 980 pro 2TB SSD which I bought in late 2021 (yes it was part of that batch with the bad firmware, which I did eventually update, but I wasn't made aware of that issue until it had like 1/2 to 1 year of use on it) Upon looking for problems with the drive, I tried running Samsung magician scans, the SMART info on the surface always said it was healthy and Samsung magician and Crystaldiskinfo never had anything that said there were problems, I now know to not really trust that completely...
Of the 4 scans they have on the Samsung magician software,
the "short scan" and "short SMART self-test" versions went off with nothing coming up as wrong,
the "full scan" would show like 10 bad sectors and upon trying to use their "recovery" function it would just error and tell me to check the drive was properly connected and to try again,
the "extended SMART self-test" would error out very early in the test and gave an error that said "defects detected" and pointed me to Samsung's own troubleshooting saying to replace the SSD.


When I went to clone the data to another SSD (Crucial P5 plus 2TB) both Samsung's data cloning tool and Crucial's cloning tool would refuse to clone the drive both stating that it ran into read errors, on what I assume were the bad sectors. Eventually I did find that DiskGenius would at least go through with the cloning however it said it couldn't read a few parts of what, I again believe are those bad sectors, and said that the clone would have errors because of that. (and having booted from that clone, yes, it's acting about the same as on the old drive with random problems I've run into a couple problems already)

So I was planning to RMA the Samsung SSD as it is still under warranty. In that process I formatted the SSD to make getting any info on the device at least a little harder to get, from googling most people seem to be of the mindset that you just trust them to not go through any of it but idk felt like it was better than sending it in as is (not super sensitive info but you know it's a boot drive so not exactly nothing either). After doing so I ran the same scans again just to have an update and to screenshot some of the drive data to make sure I don't get screwed by getting some fried drive that's not comparable to this one in health and such etc. and now no bad sectors are showing anymore and no read errors or anything are showing anymore. 

In comes my reason for posting. Does this mean it's "fixed"?
I'll be honest I don't really trust the SSD and don't know if the RMA will help my trust of the drive. From my quick googling some of the reviews of samsung's RMA service seem less than ideal and some make it sound like they'll take it in, see no errors, and send it back after running the same small tests and seeing the same "it's good" so not sure on that either.
So mostly I would like to know a few things

  1. If I should trust that this "format" has fixed things?
    1. I don't really know too much on how SSDs work in this regard, so to me right now I don't really trust that the SSD won't just start having errors again, but that could be down to me not being in the know about how it all works and how they deal with bad sectors. So some enlightenment on that end would be nice.
  2. Is there any one with experience with Samsung's SSD RMAs? If so should I be concerned? Cause in a few google searches it seemed like there were a few people that said they just formatted the drive and sent it back, which would indicate to me that they might do nothing for me and say it's fine.



If it helps this is the most recent SMART data from the drive
image_2024-07-14_031234232.png.343589e75c4ed823f648d8026eec0268.png
(just a note the unsafe shutdowns should mostly be from the windows loading errors I was mentioning earlier, I didn't realize it had happened quite that much, but basically the bios would load, windows would start loading (I assume), and then the screen would just go black sometime before when the login screen should come up, and would just sit there until I did a hard shutdown, which never felt right, and again probably that should've prompted me to reinstall windows, but yeah I didn't do that...) 

Also I don't know that it's relevant but specs for the computer
Processor: Intel i7-12700K
RAM: DDR4 2x16gb 3600 MHz
Motherboard: ASUS TUF Gaming z690-Plus wifi d4
GPU: NVIDIA GeForce RTX 3080 Ti
OS: Windows 11 Pro 23H2
Drives:
   - Boot Drive: Samsung SSD 980 Pro 2TB (drive in question giving issues)
   - Other Drive: Crucial P5 Plus 2TB (about 2 years old)

Link to post
Share on other sites

SSDs contain a bunch of spare blocks so they can map out failing ones. Often it can be caught before it gets too bad for ECC to take care of, and silently moved to another location on refresh. Sounds like in this case it got too bad for that to happen. Once the drive has identified bad areas, they get taken out of use, so in that sense it is "fixed".

 

I've seen this on a handful of SSDs. It seems more of a problem on lower end ones like the really cheap Kingston SATA or WD Green SATA. On those I've had data rot in around the order of a year. Full wipe and fine again, then I got rid of them as I didn't trust them any more.

 

Another time was when I accidentally cooked a SSD. I forget exactly what it was, either a Samsung or Crucial M.2. It was placed under a GPU and the SSD remained above its rated working temperature for a sustained period. I only found out when the install corrupted, and investigation showed the SSD was having a really bad time. Again, full wipe and it was fine again when kept out of the heat.

 

I can't speak on the warranty side. If you use it again, I'd suggest more frequent full surface scans to keep an eye on things. At the least it gives the drive a chance to identify potentially degrading areas sooner.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to post
Share on other sites

Yes I've recently run into multiple cases of SSDs having read errors or slowdowns, but after a secure erase and reformat they're all good to go again.

 

My theory is that the flash cells leak charge, so data that's stored on them and doesn't change (so these cells never get rewritten for months/years) like OS files become difficult to read and eventually become unreadable. 

Secure erase forces erasing everything, which means anything you subsequently put back on is freshly written and will be good again for whatever amount of time it takes for the charge to leak.

My main PC's corsair SSD started becoming very slow to read some areas after less than a year in service but since I caught it before it caused actual read errors I could image it, secure erase it and rewrite the image as is, been running fine for months since. 

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to post
Share on other sites

3 hours ago, Kilrah said:

My theory is that the flash cells leak charge, so data that's stored on them and doesn't change (so these cells never get rewritten for months/years) like OS files become difficult to read and eventually become unreadable. 

This generally matches my observations except I don't think a secure erase is needed. The key parts seem to be the controller needs to identify potentially weak cells, which it can do by how much ECC is needed. Maybe there are other things it can do too. Once it knows which parts are in the danger zone, it can either decide to reallocate them early, otherwise it will happen naturally when data is rewritten.

 

Occasionally doing a full disk read is a good check on the state. I like HDDscan which, although it is a HD era tool, reports access times for reads so problem areas that take large fractions or even multiple seconds will stick out.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to post
Share on other sites

6 hours ago, porina said:

The key parts seem to be the controller needs to identify potentially weak cells, which it can do by how much ECC is needed.

In my cases it never resulted in the amount of spare blocks going down, so there were no actually defective blocks identified.

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to post
Share on other sites

5 minutes ago, Kilrah said:

In my cases it never resulted in the amount of spare blocks going down, so there were no actually defective blocks identified.

Can you see the actual number of blocks kept as spare? The "raw" SMART value feels like a normalised value more than a count, although I can't be sure.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to post
Share on other sites

Depends on the drive, some will show it and some won't.

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to post
Share on other sites

The raw is usually the literal count ( some software will show it as % if the count is 100, and it usually just is literally 100 of them ).

 

As for the data rot on SSD, it's real, but the better SSD keep the time of the data write and will refresh it ( if they have power, and clock ) after some predetermined time, first to do it was Samsung in the 840 and 840 Evo SSD ( first commercial TLC SSD, had problems reading months old data )

   
 
 
 
Spoiler
CPU : Intel 14gen i7-14700K
COOLER :  Thermalright Peerless Assassin 120 White + thermaltake toughfan 12 white + Thermal Grizzly - CPU Contact Frame Intel 13./14. +  Coollaboratory Liquid Ultra
GPU : MSI RTX 2070 Armor @GPU 2050MHz Mem 8200MHz -> USB C 10Gb/s cable 2m -> Unitek 4x USB HUB 10 Gb/s (Y-HB08003)
MOBO : MSI MEG Z690 UNIFY
RAM :  Corsair VENGEANCE DDR5 RAM 64 GB (2 x 32 GB) 6400 MHz CL32 (CMK64GX5M2B6400C32)
SSD : Intel Optane 905P 960GB U.2 (OS) + 2 x WD SN850X 4TB + 2 x PNY CS3140 2TB + ASM2824 PCIe switch -> 4 x Plextor M8PeG 1TB + flexiDOCK MB014SP-B -> Crucial MX500 2TB + GoodRam Iridium PRO 960GB + Samsung 850 Pro 512GB
HDD : WD White 18TB WD180EDFZ + SATA port multiplier adp6st0-j05 (JMB575) ->  WD Gold 8TB WD8002FRYZ + WD Gold 4TB WD4002FYYZ + WD Red PRO 4TB WD4001FFSX + WD Green 2TB WD20EARS
EXTERNAL
HDD/SSD : 
XT-XINTE LM906 (JMS583) -> Plextor M8PeG 1TB + WD My Passport slim 1TB + LaCie Porsche Design Mobile Drive 1TB USB-C + Zalman ZM-VE350 -> Goodram IRDM PRO 240GB
PSU :  Super Flower leadex platinum 750 W biały -> Bitfenix alchemy extensions białe/białe + AsiaHorse 16AWG White 
UPS :  CyberPower CP1500EPFCLCD -> Brennenstuhl primera-line 8 -> Brennenstuhl primera-line 10
LCD :  LG 32UD59-B + LG flatron IPS236 -> Silverstone SST-ARM11BC
CASE :  Fractal R5 Biały + Lian Li BZ-H06A srebrny + 6 x Thermaltake toughfan 14 white + Thermalright TL-B8W
SPEAKERS :  Aune S6 Pro -> Topping PA3-B -> Polk S20e black -> Monoprice stand 16250
HEADPHONES :  TOSLINK 2m -> Aune S6 Pro -> 2 x Monoprice Premier 1.8m 16AWG 3-pin XLR -> Monoprice Monolith THX AAA 887 -> 4-pin XLR na 2 x 3.5mm 16 cores OCC 2m Cable -> HiFiMAN Edition XS -> sheepskin pads + 4-pin XLR na 2 x 2.5mm ABLET silver 2m  Cable -> Monoprice Monolith M1060 + Brainwavz HM100 -> Brainwavz sheepskin oval pads + Wooden double Ɪ Stand + Audio-Technica ATH-MSR7BK -> sheepskin pads + Multibrackets MB1893 + Sennheiser Momentum 3 +  Philips Fidelio X2HR/00 + JBL J88 White
MIC :  Tonor TC30 -> Mozos SB38
KEYBOARD : Corsair STRAFE RGB Cherry MX Silent (EU) + Glorious PC Gaming Race Stealth Slim - Full Size Black + PQI MyLockey
MOUSE :  Logitech MX ERGO + 2 x Logitech MX Performance + Logitech G Pro wireless + Logitech G Pro Gaming -> Hotline Games 2.0 Plus + Corsair MM500 3xl + Corsair MM300 Extended + Razer goliathus control
CONTROLLERS :  Microsoft xbox series x controller pc (1VA-00002) -> brainwavz audio Controller Holder UGC2 + Microsoft xbox 360 wireless black + Ravcore Javelin
NET :  Intel x520-DA2 -> 2 x FTLX8571D3BCV-IT + 2 x ASUS ZenWiFi Pro XT12
NAS :  Qnap TS-932X-2G -> Noctua NF-P14s redux 1200 PWM -> Kingston 16GB 2400Mhz CL14 (HX424S14IB/16) -> 9 x Crucial MX500 2TB ->  2 x FTLX8571D3BCV-IT -> 2 x Digitus (DK-HD2533-05/3)
Link to post
Share on other sites

If new firmware and format resolved the issue see how it behaves, then great. If you continue to experience odd behavior because it was running old firmware for long, maybe RMA then. I buy stuff from local retailer so anything can easily be RMA though.

I've build multiple PCs with 980 Pro and all are fine, they were not even running latest firmware back then and were fine. Don't think they were the very first batch.

My rig has few 990 Pro which run fine.

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Zowie GTF-X | Mouse: Vaxee XE wired | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | LG 32GS95UV-B OLED 4K 240Hz / 1080p 480Hz dual-mode | OS: Windows 11 |

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×