Jump to content

Doing some reorganization of my computers recently. Moved some drives from my main rig to my fileserver etc. I wanted to quantify the health of the drives so I didn't store stuff on them if they were about to fail. So I installed the smartmontools package, and ran smartctl on each of my drives. Every single one of them has "pre-fail" conditions on them, but some of them have the exact same values for things like Spin_Retry_Count. Two of my drives have a value of 100 and a threshold of 97, but the rest of the pre-fail values aren't anywhere NEAR their threasholds. My question is this, how "scared" should I be with these values? How can something be "pre-fail" if the value has never changed from 100? (As in ID #10) Here's the worst one I have. Sure, my drives are old, but I've never actually had a drive failure. I'm generally pretty gentle on my hardware. 

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   116   099   006    Pre-fail  Always       -       106119037
  3 Spin_Up_Time            0x0003   098   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       709
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       557271
  9 Power_On_Hours          0x0032   087   087   000    Old_age   Always       -       11834
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   037   020    Old_age   Always       -       204
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   084   000    Old_age   Always       -       22
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   065   057   045    Old_age   Always       -       35 (Min/Max 20/35)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       1
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       49
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       399
194 Temperature_Celsius     0x0022   035   043   000    Old_age   Always       -       35 (0 10 0 0 0)
195 Hardware_ECC_Recovered  0x001a   054   053   000    Old_age   Always       -       106119037
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

Also, what's up with ID #7? I thought the "value" counted DOWN until it got to the threshold. How can the "worst" value be higher than the current value? Aren't higher numbers better? (Temperature is obviously the exception.) 

Link to comment
https://linustechtips.com/topic/670994-smart-status-scaring-me-linux-smartctl/
Share on other sites

Link to post
Share on other sites

The actual value on SMART data is the RAW_VALUE column.

 

I summon @Captain_WD, he may be able to help with parsing that SMART reading.

We have a NEW and GLORIOUSER-ER-ER PSU Tier List Now. (dammit @LukeSavenije stop coming up with new ones)

You can check out the old one that gave joy to so many across the land here

 

Computer having a hard time powering on? Troubleshoot it with this guide. (Currently looking for suggestions to update it into the context of <current year> and make it its own thread)

Computer Specs:

Spoiler

Mathresolvermajig: Intel Xeon E3 1240 (Sandy Bridge i7 equivalent)

Chillinmachine: Noctua NH-C14S
Framepainting-inator: EVGA GTX 1080 Ti SC2 Hybrid

Attachcorethingy: Gigabyte H61M-S2V-B3

Infoholdstick: Corsair 2x4GB DDR3 1333

Computerarmor: Silverstone RL06 "Lookalike"

Rememberdoogle: 1TB HDD + 120GB TR150 + 240 SSD Plus + 1TB MX500

AdditionalPylons: Phanteks AMP! 550W (based on Seasonic GX-550)

Letterpad: Rosewill Apollo 9100 (Cherry MX Red)

Buttonrodent: Razer Viper Mini + Huion H430P drawing Tablet

Auralnterface: Sennheiser HD 6xx

Liquidrectangles: LG 27UK850-W 4K HDR

 

Link to post
Share on other sites

12 minutes ago, Energycore said:

The actual value on SMART data is the RAW_VALUE column.

 

I summon @Captain_WD, he may be able to help with parsing that SMART reading.

Actually... I figured it out. I read the ****ing manual. ;) As Linus would say "That's why you call tech support, because you always figure it out when you're on hold."

 

The "Type" column indicates the "TYPE" of attribute (no crap...). So if that particular attribute fails (or is lower than its threshold) then the drive is failing in that manner. In other words, if a "Pre-Fail" attribute is lower than its threshold, then the disk is about to fail spectacularly. If an "Old_age" attribute has a value lower than threshold, then the disk is just REALLY old, and probably should be replaced. In the case of the drive above, it's actually ok. 

 

Phew...

 

However, this has prodded me to setup S.M.A.R.T. monitoring on my fileserver. I guess I'll go figure that out now...

Link to post
Share on other sites

Actually, apparently one of my drives has some unreadable sectors, which smart thinks is a bad thing, even though it's nowhere near its threshold value. Maybe I'll replace that drive...

 

I've gotten two "mails" about it. Linux mail... not sure what it's actually called.

Link to post
Share on other sites

On 4.10.2016 г. at 7:38 PM, corrado33 said:

~snip~

 

Hi there :)

 

Normalized values aren't really the most accurate thing to look at when checking a drive's health. What you want to check are the raw values as they show you the actual counts of the different attributes. 

Judging by what you posted the drive does appear to have some issues due to the values of IDs #7 and #188. It may not be critical but I'd keep an eye on that drive just to be on the safe side.

What is the drive's brand and model? I would also use a manufacturer's tool to run some diagnostics and verify those results by using different tools to get those S.M.A.R.T. readings. 

 

Post back if you have any questions! 

 

Thanks @Energycore for mentioning! 

 

Captain_WD. 

If this helped you, like and choose it as best answer - you might help someone else with the same issue. ^_^
WDC Representative, http://www.wdc.com/ 

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×