Jump to content

SSHD health check methods?

So I have a Seagate SSHD ST1000LM014, got it all the way back in 2014 or something.

Which it has an 8GB NAND flash on board, hence solid-state-hybird-drive. That would speed up programs/booting... which worked at the time.

 

But now it's just my data drive with 43k hours of power on hours, and 12k start/stop count.

 

I'm wondering if there is way to check the to see if the NAND is still good.

I assume if the NAND went bad, the HDD controller would just revert the drive back to a standard 1TB laptop drive...?

 

*I'm considering taking that drive out and replace it with a much newer 2018 2TB laptop hard drive I have laying around, maybe the 2TB drive would be faster too. (This drive will be storing unimportant data back into the laptop or used as an external drive)

 

**Crystal diskmark showing the SSHD status as good, 0 reallocated Sector (05),0 Reported Uncorrectable Errors (BB), 0 Current Pending Sector Count and 0 Uncorrectable Sector Count (C5 and C6).

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, da na said:

If the NAND went bad the whole drive is toast, unfortunately. 

Back in the day I read that if the HDD controller sees the NAND is dead, it will just continue operate the drive as a standard hard drive. The Drive still have a separate 64MB  DRAM (a bit small since the 2TB laptop hard drive have 128MB DRAM)

image.png.2520780364b2836f196dcf8169ff6652.png

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Supersonicwolfe said:

Back in the day I read that if the HDD controller sees the NAND is dead, it will just continue operate the drive as a standard hard drive. The Drive still have a separate 64MB  DRAM (a bit small since the 2TB laptop hard drive have 128MB DRAM)

image.png.2520780364b2836f196dcf8169ff6652.png

I'm not quite sure how accurate that is. The ROM of the hard drive is stored in NAND (best source) so if the NAND dies the drive will be unable to even start, and there's nothing you can do about it without getting into serious repair. I did some digging and several sources seem to corroborate. You might be best off replacing the drive.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, da na said:

I'm not quite sure how accurate that is. The ROM of the hard drive is stored in NAND (best source) so if the NAND dies the drive will be unable to even start, and there's nothing you can do about it without getting into serious repair. I did some digging and several sources seem to corroborate. You might be best off replacing the drive.

Thanks for that!

Yeah seems they store the ROM on NAND too, so NAND is dead then the drive won't even start up. I did some quick digging too since the one you liked is a newer model.

 

But I wonder what if the NAND is fine but say the MLC is worn so much it turned itself into Read-only mode... would that still allow the HDD to work??

34 minutes ago, da na said:

If the NAND went bad the whole drive is toast, unfortunately. 

HWiNFO should show NAND health of SSHDs.

Oh and I checked with HWiNFO, it doesn't seem to have it anywhere. Unlike on a SSD where it says the Drive remaining life

Link to comment
Share on other sites

Link to post
Share on other sites

If the drive ever enters read only mode then immediately pick up a new drive and start copying data to the new one to ensure minimal data loss (I think)

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, Babybee3329 said:

If the drive ever enters read only mode then immediately pick up a new drive and start copying data to the new one to ensure minimal data loss (I think)

Welp I guess you missed the whole thing about the drive is a SSHD. It's a 1TB hard drive with 8GB MLC NAND on it to act as fast cache, similar to how Optane worked on a smaller scale. User have no control on what's on the NAND, the HDD controller will copy most frequently used files onto the NAND for faster I/O.

 

Besides as far as I know there is no way to see the health status of the MLC NAND, hence I'm asking to see if there is away to check it.

Link to comment
Share on other sites

Link to post
Share on other sites

Post here whole Crystaldiskinfo SMART table.

The SSHD are known for reliability problems, as the NAND wear out fast (nonexistent wear leveling algorithm, and small capacity of NAND), usually when the NAND fail, the whole HDD is dead already...

   
 
 
 
Spoiler
CPU : Intel 14gen i7-14700K
COOLER :  Thermalright Peerless Assassin 120 White + thermaltake toughfan 12 white + Thermal Grizzly - CPU Contact Frame Intel 13./14. +  Coollaboratory Liquid Ultra
GPU : MSI RTX 2070 Armor @GPU 2050MHz Mem 8200MHz -> USB C 10Gb/s cable 2m -> Unitek 4x USB HUB 10 Gb/s (Y-HB08003)
MOBO : MSI MEG Z690 UNIFY
RAM :  Corsair VENGEANCE DDR5 RAM 64 GB (2 x 32 GB) 6400 MHz CL32 (CMK64GX5M2B6400C32)
SSD : Intel Optane 905P 960GB U.2 (OS) + 2 x WD SN850X 4TB + 2 x PNY CS3140 2TB + ASM2824 PCIe switch -> 4 x Plextor M8PeG 1TB + flexiDOCK MB014SP-B -> Crucial MX500 2TB + GoodRam Iridium PRO 960GB + Samsung 850 Pro 512GB
HDD : WD White 18TB WD180EDFZ + SATA port multiplier adp6st0-j05 (JMB575) ->  WD Gold 8TB WD8002FRYZ + WD Gold 4TB WD4002FYYZ + WD Red PRO 4TB WD4001FFSX + WD Green 2TB WD20EARS
EXTERNAL
HDD/SSD : 
XT-XINTE LM906 (JMS583) -> Plextor M8PeG 1TB + WD My Passport slim 1TB + LaCie Porsche Design Mobile Drive 1TB USB-C + Zalman ZM-VE350 -> Goodram IRDM PRO 240GB
PSU :  Super Flower leadex platinum 750 W biały -> Bitfenix alchemy extensions białe/białe + AsiaHorse 16AWG White 
UPS :  CyberPower CP1500EPFCLCD -> Brennenstuhl primera-line 8 -> Brennenstuhl primera-line 10
LCD :  LG 32UD59-B + LG flatron IPS236 -> Silverstone SST-ARM11BC
CASE :  Fractal R5 Biały + Lian Li BZ-H06A srebrny + 6 x Thermaltake toughfan 14 white + Thermalright TL-B8W
SPEAKERS :  Aune S6 Pro -> Topping PA3-B -> Polk S20e black -> Monoprice stand 16250
HEADPHONES :  TOSLINK 2m -> Aune S6 Pro -> 2 x Monoprice Premier 1.8m 16AWG 3-pin XLR -> Monoprice Monolith THX AAA 887 -> 4-pin XLR na 2 x 3.5mm 16 cores OCC 2m Cable -> HiFiMAN Edition XS -> sheepskin pads + 4-pin XLR na 2 x 2.5mm ABLET silver 2m  Cable -> Monoprice Monolith M1060 + Brainwavz HM100 -> Brainwavz sheepskin oval pads + Wooden double Ɪ Stand + Audio-Technica ATH-MSR7BK -> sheepskin pads + Multibrackets MB1893 + Sennheiser Momentum 3 +  Philips Fidelio X2HR/00 + JBL J88 White
MIC :  Tonor TC30 -> Mozos SB38
KEYBOARD : Corsair STRAFE RGB Cherry MX Silent (EU) + Glorious PC Gaming Race Stealth Slim - Full Size Black + PQI MyLockey
MOUSE :  Logitech MX ERGO + 2 x Logitech MX Performance + Logitech G Pro wireless + Logitech G Pro Gaming -> Hotline Games 2.0 Plus + Corsair MM500 3xl + Corsair MM300 Extended + Razer goliathus control
CONTROLLERS :  Microsoft xbox series x controller pc (1VA-00002) -> brainwavz audio Controller Holder UGC2 + Microsoft xbox 360 wireless black + Ravcore Javelin
NET :  Intel x520-DA2 -> 2 x FTLX8571D3BCV-IT + 2 x ASUS ZenWiFi Pro XT12
NAS :  Qnap TS-932X-2G -> Noctua NF-P14s redux 1200 PWM -> Kingston 16GB 2400Mhz CL14 (HX424S14IB/16) -> 9 x Crucial MX500 2TB ->  2 x FTLX8571D3BCV-IT -> 2 x Digitus (DK-HD2533-05/3)
Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, kokosnh said:

Post here whole Crystaldiskinfo SMART table.

The SSHD are known for reliability problems, as the NAND wear out fast (nonexistent wear leveling algorithm, and small capacity of NAND), usually when the NAND fail, the whole HDD is dead already...

Yeap given it been well used for almost 10 years (I checked, I bought it in 2013-9 and it was my OS drive until  2016, then it had always been used a secondary drive with some programs on it.)

Given I don't know the NAND status, that is one of the reasons I'm considering taking it out and swap in my 2TB secondary laptop drive that have been working fine. (I no longer use that laptop that much, so a 1TB drive would be more than enough)

 

Matter of fact, it just gave me a heart attack (I tried to do a Crystal Disk Benchmark 8GiB setting, it didn't finish the random writes and it became unresponsive, not even responding after restart. It did came back after turn off the PSU then turn on again, and nothing in the SMART from Crystal disk info indicated what happened)

 

image.png.9a71c2a9558978e8a18d64b5ce01f0fd.png

 

Link to comment
Share on other sites

Link to post
Share on other sites

There are some command timeout errors, but i wonder if they are increasing ( the raw value hold 3 different Numbers in hex, so it’s hard to read it ). 
do you have a screen from crystaldiskinfo, from before? Or is this one from before?


there’s nothing alarming in SMART beside that from what I can read, but there's no attributes to indicate NAND health... 

   
 
 
 
Spoiler
CPU : Intel 14gen i7-14700K
COOLER :  Thermalright Peerless Assassin 120 White + thermaltake toughfan 12 white + Thermal Grizzly - CPU Contact Frame Intel 13./14. +  Coollaboratory Liquid Ultra
GPU : MSI RTX 2070 Armor @GPU 2050MHz Mem 8200MHz -> USB C 10Gb/s cable 2m -> Unitek 4x USB HUB 10 Gb/s (Y-HB08003)
MOBO : MSI MEG Z690 UNIFY
RAM :  Corsair VENGEANCE DDR5 RAM 64 GB (2 x 32 GB) 6400 MHz CL32 (CMK64GX5M2B6400C32)
SSD : Intel Optane 905P 960GB U.2 (OS) + 2 x WD SN850X 4TB + 2 x PNY CS3140 2TB + ASM2824 PCIe switch -> 4 x Plextor M8PeG 1TB + flexiDOCK MB014SP-B -> Crucial MX500 2TB + GoodRam Iridium PRO 960GB + Samsung 850 Pro 512GB
HDD : WD White 18TB WD180EDFZ + SATA port multiplier adp6st0-j05 (JMB575) ->  WD Gold 8TB WD8002FRYZ + WD Gold 4TB WD4002FYYZ + WD Red PRO 4TB WD4001FFSX + WD Green 2TB WD20EARS
EXTERNAL
HDD/SSD : 
XT-XINTE LM906 (JMS583) -> Plextor M8PeG 1TB + WD My Passport slim 1TB + LaCie Porsche Design Mobile Drive 1TB USB-C + Zalman ZM-VE350 -> Goodram IRDM PRO 240GB
PSU :  Super Flower leadex platinum 750 W biały -> Bitfenix alchemy extensions białe/białe + AsiaHorse 16AWG White 
UPS :  CyberPower CP1500EPFCLCD -> Brennenstuhl primera-line 8 -> Brennenstuhl primera-line 10
LCD :  LG 32UD59-B + LG flatron IPS236 -> Silverstone SST-ARM11BC
CASE :  Fractal R5 Biały + Lian Li BZ-H06A srebrny + 6 x Thermaltake toughfan 14 white + Thermalright TL-B8W
SPEAKERS :  Aune S6 Pro -> Topping PA3-B -> Polk S20e black -> Monoprice stand 16250
HEADPHONES :  TOSLINK 2m -> Aune S6 Pro -> 2 x Monoprice Premier 1.8m 16AWG 3-pin XLR -> Monoprice Monolith THX AAA 887 -> 4-pin XLR na 2 x 3.5mm 16 cores OCC 2m Cable -> HiFiMAN Edition XS -> sheepskin pads + 4-pin XLR na 2 x 2.5mm ABLET silver 2m  Cable -> Monoprice Monolith M1060 + Brainwavz HM100 -> Brainwavz sheepskin oval pads + Wooden double Ɪ Stand + Audio-Technica ATH-MSR7BK -> sheepskin pads + Multibrackets MB1893 + Sennheiser Momentum 3 +  Philips Fidelio X2HR/00 + JBL J88 White
MIC :  Tonor TC30 -> Mozos SB38
KEYBOARD : Corsair STRAFE RGB Cherry MX Silent (EU) + Glorious PC Gaming Race Stealth Slim - Full Size Black + PQI MyLockey
MOUSE :  Logitech MX ERGO + 2 x Logitech MX Performance + Logitech G Pro wireless + Logitech G Pro Gaming -> Hotline Games 2.0 Plus + Corsair MM500 3xl + Corsair MM300 Extended + Razer goliathus control
CONTROLLERS :  Microsoft xbox series x controller pc (1VA-00002) -> brainwavz audio Controller Holder UGC2 + Microsoft xbox 360 wireless black + Ravcore Javelin
NET :  Intel x520-DA2 -> 2 x FTLX8571D3BCV-IT + 2 x ASUS ZenWiFi Pro XT12
NAS :  Qnap TS-932X-2G -> Noctua NF-P14s redux 1200 PWM -> Kingston 16GB 2400Mhz CL14 (HX424S14IB/16) -> 9 x Crucial MX500 2TB ->  2 x FTLX8571D3BCV-IT -> 2 x Digitus (DK-HD2533-05/3)
Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, kokosnh said:

There are some command timeout errors, but i wonder if they are increasing ( the raw value hold 3 different Numbers in hex, so it’s hard to read it ). 
do you have a screen from crystaldiskinfo, from before? Or is this one from before?


there’s nothing alarming in SMART beside that from what I can read, but there's no attributes to indicate NAND health... 

Oh it's coming out, once I cloned my 2TB Laptop drive, its going to replace the SSHD. The SSHD will live out it's remaining life as an external drive.

 

The command time out is certainly high, highest in my PC actually. There is a 4TB desktop drive also have have high command time out, but I know that drive have problem, and it's only storing redundant information, I won't even flinch if it just died. (My other drives a 1TB desktop shows 1, a 8TB  ironwolf shows 3, and the 2tb laptop drive shows 0)

 

On the other hand I don't think (BC) command time out is a reliable failure indicator anyway, some vender don't have it, while I also have relatively new 1TB 2.5inch HDD (serving as external drive in a crappy enclosure) reporting 200020003.

Link to comment
Share on other sites

Link to post
Share on other sites

There are 3 different counters in that one big value, that is why it looks so big ( and it’s just 659-686 errors )
 

and it’s hard to tell anything from just one value, you have to see how it’s changing over time to tell anything remotely useful.

 

   
 
 
 
Spoiler
CPU : Intel 14gen i7-14700K
COOLER :  Thermalright Peerless Assassin 120 White + thermaltake toughfan 12 white + Thermal Grizzly - CPU Contact Frame Intel 13./14. +  Coollaboratory Liquid Ultra
GPU : MSI RTX 2070 Armor @GPU 2050MHz Mem 8200MHz -> USB C 10Gb/s cable 2m -> Unitek 4x USB HUB 10 Gb/s (Y-HB08003)
MOBO : MSI MEG Z690 UNIFY
RAM :  Corsair VENGEANCE DDR5 RAM 64 GB (2 x 32 GB) 6400 MHz CL32 (CMK64GX5M2B6400C32)
SSD : Intel Optane 905P 960GB U.2 (OS) + 2 x WD SN850X 4TB + 2 x PNY CS3140 2TB + ASM2824 PCIe switch -> 4 x Plextor M8PeG 1TB + flexiDOCK MB014SP-B -> Crucial MX500 2TB + GoodRam Iridium PRO 960GB + Samsung 850 Pro 512GB
HDD : WD White 18TB WD180EDFZ + SATA port multiplier adp6st0-j05 (JMB575) ->  WD Gold 8TB WD8002FRYZ + WD Gold 4TB WD4002FYYZ + WD Red PRO 4TB WD4001FFSX + WD Green 2TB WD20EARS
EXTERNAL
HDD/SSD : 
XT-XINTE LM906 (JMS583) -> Plextor M8PeG 1TB + WD My Passport slim 1TB + LaCie Porsche Design Mobile Drive 1TB USB-C + Zalman ZM-VE350 -> Goodram IRDM PRO 240GB
PSU :  Super Flower leadex platinum 750 W biały -> Bitfenix alchemy extensions białe/białe + AsiaHorse 16AWG White 
UPS :  CyberPower CP1500EPFCLCD -> Brennenstuhl primera-line 8 -> Brennenstuhl primera-line 10
LCD :  LG 32UD59-B + LG flatron IPS236 -> Silverstone SST-ARM11BC
CASE :  Fractal R5 Biały + Lian Li BZ-H06A srebrny + 6 x Thermaltake toughfan 14 white + Thermalright TL-B8W
SPEAKERS :  Aune S6 Pro -> Topping PA3-B -> Polk S20e black -> Monoprice stand 16250
HEADPHONES :  TOSLINK 2m -> Aune S6 Pro -> 2 x Monoprice Premier 1.8m 16AWG 3-pin XLR -> Monoprice Monolith THX AAA 887 -> 4-pin XLR na 2 x 3.5mm 16 cores OCC 2m Cable -> HiFiMAN Edition XS -> sheepskin pads + 4-pin XLR na 2 x 2.5mm ABLET silver 2m  Cable -> Monoprice Monolith M1060 + Brainwavz HM100 -> Brainwavz sheepskin oval pads + Wooden double Ɪ Stand + Audio-Technica ATH-MSR7BK -> sheepskin pads + Multibrackets MB1893 + Sennheiser Momentum 3 +  Philips Fidelio X2HR/00 + JBL J88 White
MIC :  Tonor TC30 -> Mozos SB38
KEYBOARD : Corsair STRAFE RGB Cherry MX Silent (EU) + Glorious PC Gaming Race Stealth Slim - Full Size Black + PQI MyLockey
MOUSE :  Logitech MX ERGO + 2 x Logitech MX Performance + Logitech G Pro wireless + Logitech G Pro Gaming -> Hotline Games 2.0 Plus + Corsair MM500 3xl + Corsair MM300 Extended + Razer goliathus control
CONTROLLERS :  Microsoft xbox series x controller pc (1VA-00002) -> brainwavz audio Controller Holder UGC2 + Microsoft xbox 360 wireless black + Ravcore Javelin
NET :  Intel x520-DA2 -> 2 x FTLX8571D3BCV-IT + 2 x ASUS ZenWiFi Pro XT12
NAS :  Qnap TS-932X-2G -> Noctua NF-P14s redux 1200 PWM -> Kingston 16GB 2400Mhz CL14 (HX424S14IB/16) -> 9 x Crucial MX500 2TB ->  2 x FTLX8571D3BCV-IT -> 2 x Digitus (DK-HD2533-05/3)
Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, kokosnh said:

There are 3 different counters in that one big value, that is why it looks so big ( and it’s just 659-686 errors )
 

and it’s hard to tell anything from just one value, you have to see how it’s changing over time to tell anything remotely useful.

 

Nice, I learnt something.

Still, looks like this topic is a dead end.

 

As I already intended, I will probably act on caution and take it out. The power-on hours is approaching 5 years total and almost 10 years since perchance (and I almost never just put it aside). It's probably still on the bottom end of the bathtub curve, a pretty good run for a HDD already. Plus failing a Crystal Disk Mark test and have to hard power-cycle isn't really confidence assuring...

Link to comment
Share on other sites

Link to post
Share on other sites

you can change the values to decimal by going to function>advanced feature> raw values> 10[DEC].

Link to comment
Share on other sites

Link to post
Share on other sites

22 hours ago, Ryker Robb said:

you can change the values to decimal by going to function>advanced feature> raw values> 10[DEC].

That would make the data be less useful, because for Seagate, the [188] Command Timeout is set of 3 values

 

According to Seagate:

3.11 Attribute ID 188: Command Timeout Count

Normalized Command Timeout Count = 100 – Command Timeout Count .

This attribute tracks the number of command time outs as defined by an active command being interrupted by a HRESET and COMRESET or SRST or another command

 

The normalized value is only computed when the number of commands is in the range 10^3 to 10^4 . The CommandCount and ErroCount are cleared when Number Of Commands reaches 10^4. The error count used to compute normalized value is not reported in attribute Raw value. It is reported in vendor info area of Attribute sector, bytes 474:475. If Command Timeout Count is > 99, normalize value of 1 is reported. The initial Worst Value is set to 0xFD as a special case.

Raw Usage

Raw [1 – 0] = Total # of command timeouts, with Max hold of FFFFh

Raw [3 – 2] = Total # of commands with > 5 second completion, including those > 7.5 seconds

Raw [5 – 4] = Total # of commands with > 7.5 second completion

 

So the raw value (0293029502AE) would mean

0x0293 0x0295 0x02AE

Which is 659, 661, 686 ERRORS

The "relatively new 1TB 2.5inch HDD (serving as external drive in a crappy enclosure)" reporting 200020003.

would be 2, 2, 3,

3 error, 1 less than 5 seconds, 2 more than 7.5 seconds.

 

And this value itself isn't particularly useful given some vender like WD don't even report it in their external drives...

Link to comment
Share on other sites

Link to post
Share on other sites

38 minutes ago, Supersonicwolfe said:

That would make the data be less useful, because for Seagate, the [188] Command Timeout is set of 3 values

In that case, try switching to 10[DEC]-2byte.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×