Jump to content

Help! Getting my 8TB drive working with H200

Go to solution Solved by Jelly-Monster,

I've potentially found the solution!  Whilst playing around with smartctl, I noticed it was formatted with type 2 protection.  Not really knowing what this is, I did a bit of Google-ing and found this:

http://talesinit.blogspot.com/2015/11/formatted-with-type-2-protection-huh.html

 

I'm currently in the process of formatting (it's at 1.26%), which feels like progress.

Hi everyone,

 

I recently upgraded the Perc6 Raid controller in my Dell R710 to a H200 which I've flashed to IT mode. My current drive setup is:

(2x) 1TB hard drives in RAID1

(1x) 2TB hard drive

(1x) 8TB hard drive which I purchased from eBay at the same time as the H200

 

I can boot into Proxmox fine, and see all 4 disks. However, when I clicked "Initialize Disk with GPT" on the 8TB, I get "command '/sbin/sgdisk /dev/sdb -U R' failed: exit code 2".

Also, when I try and use fdisk I get "fdisk: cannot open /dev/sdb: Input/output error".   I've also booted a Ubuntu live image and get the same error with fdisk.

 

The drive was advertised as "new", but I guess it could be faulty.

 

Does anyone have any ideas?

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Jelly-Monster said:

Does anyone have any ideas?

Have you tried the disk without the RAID-controller?

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, WereCatf said:

Have you tried the disk without the RAID-controller?

Unfortunately I haven't got an other way to test it.  I could order SATA to SAS adapter and try it in my gaming PC.

Link to comment
Share on other sites

Link to post
Share on other sites

24 minutes ago, Jelly-Monster said:

Unfortunately I haven't got an other way to test it.  I could order SATA to SAS adapter and try it in my gaming PC.

Try unplugging the 8TB entirely then swapping the 2TB over to the port the 8TB was in and see if the 2TB stops working.

Main Rig:-

Ryzen 7 3800X | Asus ROG Strix X570-F Gaming | 16GB Team Group Dark Pro 3600Mhz | Corsair MP600 1TB PCIe Gen 4 | Sapphire 5700 XT Pulse | Corsair H115i Platinum | WD Black 1TB | WD Green 4TB | EVGA SuperNOVA G3 650W | Asus TUF GT501 | Samsung C27HG70 1440p 144hz HDR FreeSync 2 | Ubuntu 20.04.2 LTS |

 

Server:-

Intel NUC running Server 2019 + Synology DSM218+ with 2 x 4TB Toshiba NAS Ready HDDs (RAID0)

Link to comment
Share on other sites

Link to post
Share on other sites

41 minutes ago, Master Disaster said:

Try unplugging the 8TB entirely then swapping the 2TB over to the port the 8TB was in and see if the 2TB stops working.

I've swapped both, the 8TB is now /dev/sdc and still the same input/output error.

Link to comment
Share on other sites

Link to post
Share on other sites

26 minutes ago, Jelly-Monster said:

I've swapped both, the 8TB is now /dev/sdc and still the same input/output error.

Then its more than likely a faulty drive however it would still be good to test it outside of the RAID Controller just to be 100% sure.

Main Rig:-

Ryzen 7 3800X | Asus ROG Strix X570-F Gaming | 16GB Team Group Dark Pro 3600Mhz | Corsair MP600 1TB PCIe Gen 4 | Sapphire 5700 XT Pulse | Corsair H115i Platinum | WD Black 1TB | WD Green 4TB | EVGA SuperNOVA G3 650W | Asus TUF GT501 | Samsung C27HG70 1440p 144hz HDR FreeSync 2 | Ubuntu 20.04.2 LTS |

 

Server:-

Intel NUC running Server 2019 + Synology DSM218+ with 2 x 4TB Toshiba NAS Ready HDDs (RAID0)

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Master Disaster said:

Then its more than likely a faulty drive however it would still be good to test it outside of the RAID Controller just to be 100% sure.

Yeah, I'm thinking the same.  I've got a SATA to SAS adapter coming next week, so I guess we'll see what happens.

Link to comment
Share on other sites

Link to post
Share on other sites

check your kernel logs (dmesg) and look at /proc/partitions to see if the machine is actually recognising the drive properly and to see if its producing any errors.

We have multiple H200/H310's at work (in actual Dell servers) and i have one at home reflahsed to stock LSI firmware in my home media server, all happily talk to 8TB drives. Its very unlikely its a controller issue.

 

Oddly enough, i purchased a 8TB SAS drive for my home setup last year from ebay, which was pretty much DOA. The drive spun up and was detected by Linux, but you couldnt do anything with it and it just spewed out errors. I was going to return it to the ebay seller, then realised it was still under warranty with WD, so i sent it back to them and they replaced it no probs.

Link to comment
Share on other sites

Link to post
Share on other sites

17 minutes ago, Aragorn- said:

check your kernel logs (dmesg) and look at /proc/partitions to see if the machine is actually recognising the drive properly and to see if its producing any errors.

We have multiple H200/H310's at work (in actual Dell servers) and i have one at home reflahsed to stock LSI firmware in my home media server, all happily talk to 8TB drives. Its very unlikely its a controller issue.

 

Oddly enough, i purchased a 8TB SAS drive for my home setup last year from ebay, which was pretty much DOA. The drive spun up and was detected by Linux, but you couldnt do anything with it and it just spewed out errors. I was going to return it to the ebay seller, then realised it was still under warranty with WD, so i sent it back to them and they replaced it no probs.

Good shout.  I'm no good at reading the logs though.... it does seem to be spewing up some errors:

 

[14624.392973] mpt2sas_cm0: log_info(0x3112043b): originator(PL), code(0x12), sub_code(0x043b)
[14624.393010] sd 0:0:2:0: [sdc] Unaligned partial completion (resid=113928, sector_sz=512)
[14624.393016] sd 0:0:2:0: [sdc] tag#3096 CDB: Read(32)
[14624.393021] sd 0:0:2:0: [sdc] tag#3096 CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[14624.393025] sd 0:0:2:0: [sdc] tag#3096 CDB[10]: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00
[14624.393033] sd 0:0:2:0: [sdc] tag#3096 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_SENSE
[14624.393038] sd 0:0:2:0: [sdc] tag#3096 Sense Key : Illegal Request [current]
[14624.393043] sd 0:0:2:0: [sdc] tag#3096 Add. Sense: Logical block guard check failed
[14624.393051] sd 0:0:2:0: [sdc] tag#3096 CDB: Read(32)
[14624.393058] sd 0:0:2:0: [sdc] tag#3096 CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[14624.393062] sd 0:0:2:0: [sdc] tag#3096 CDB[10]: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00
[14624.393066] blk_update_request: protection error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 17 prio class 0

 

No idea what this means though?

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Jelly-Monster said:

Good shout.  I'm no good at reading the logs though.... it does seem to be spewing up some errors:

 

[14624.392973] mpt2sas_cm0: log_info(0x3112043b): originator(PL), code(0x12), sub_code(0x043b)
[14624.393010] sd 0:0:2:0: [sdc] Unaligned partial completion (resid=113928, sector_sz=512)
[14624.393016] sd 0:0:2:0: [sdc] tag#3096 CDB: Read(32)
[14624.393021] sd 0:0:2:0: [sdc] tag#3096 CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[14624.393025] sd 0:0:2:0: [sdc] tag#3096 CDB[10]: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00
[14624.393033] sd 0:0:2:0: [sdc] tag#3096 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_SENSE
[14624.393038] sd 0:0:2:0: [sdc] tag#3096 Sense Key : Illegal Request [current]
[14624.393043] sd 0:0:2:0: [sdc] tag#3096 Add. Sense: Logical block guard check failed
[14624.393051] sd 0:0:2:0: [sdc] tag#3096 CDB: Read(32)
[14624.393058] sd 0:0:2:0: [sdc] tag#3096 CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00
[14624.393062] sd 0:0:2:0: [sdc] tag#3096 CDB[10]: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00
[14624.393066] blk_update_request: protection error, dev sdc, sector 0 op 0x0:(READ) flags 0x0 phys_seg 17 prio class 0

 

No idea what this means though?

Quote

The sector size of the block layer is 512 bytes, but integrity interval
size might be different (in case of 4K block size of the media). At the
initiator side the virtual start sector is the one that was originally
submitted by the block layer (512 bytes) for the Reftag usage. The
initiator converts the Reftag to integrity interval units and sends it to
the target. So the target virtual start sector should be calculated at
integrity interval units. prepare_fn() and complete_fn() don't remap
correctly the Reftag when using incorrect units of the virtual start
sector, which leads to the following protection error at the device:

"blk_update_request: protection error, dev sdb, sector 2048 op 0x0:(READ)
flags 0x10000 phys_seg 1 prio class 0"

To fix that, set the seed in integrity interval units.

Might help? I've got zero experience with SAS so apologies if its irrelevant.

Main Rig:-

Ryzen 7 3800X | Asus ROG Strix X570-F Gaming | 16GB Team Group Dark Pro 3600Mhz | Corsair MP600 1TB PCIe Gen 4 | Sapphire 5700 XT Pulse | Corsair H115i Platinum | WD Black 1TB | WD Green 4TB | EVGA SuperNOVA G3 650W | Asus TUF GT501 | Samsung C27HG70 1440p 144hz HDR FreeSync 2 | Ubuntu 20.04.2 LTS |

 

Server:-

Intel NUC running Server 2019 + Synology DSM218+ with 2 x 4TB Toshiba NAS Ready HDDs (RAID0)

Link to comment
Share on other sites

Link to post
Share on other sites

Google some of the errors and see what you find...

 

This post for instance:

https://serverfault.com/questions/971722/dmesg-full-of-i-o-errors-smart-ok-four-disks-affected

 

Suggests a similar error caused by a bad cable.

 

I would look at the logs from boot time and make sure the controller sees the drive with the correct capacity. I would also check /proc/partitions to confirm the capacity is correct. You say you've flashed it so there shouldnt be any issues, but i'm pretty sure older LSI 2008 firmwares had issues with drives over 2TB, and from memory the IT flashing process is a multi-stage affair involving flashing older versions then newer versions.

 

I believe the mpt2_sas driver prints the version number when it initialises the card:

root@anduin:~# dmesg | grep mpt2sas |grep FW
[    2.944049] mpt2sas_cm0: LSISAS2008: FWVersion(19.00.00.00), ChipRevision(0x03), BiosVersion(07.37.00.00)

From memory 20 is the newest, i'm running 19. But part of the reflashing process requires you to flash on version 7 or something.

 

Similarly, when it detects the drive you'll get output like this:

root@anduin:~# dmesg | grep "sd 0:0:0:0"
[    3.588604] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    3.596850] sd 0:0:0:0: [sda] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB)
[    3.624694] sd 0:0:0:0: [sda] Write Protect is off
[    3.659857] sd 0:0:0:0: [sda] Mode Sense: f7 00 10 08
[    3.678043] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    3.755552] sd 0:0:0:0: [sda] Attached SCSI disk

You can remove the 0:0:0:0 bit in the command to show all the drives.

 

If you like, do a fresh boot and then post up the whole boot log and we can look thru it.

Link to comment
Share on other sites

Link to post
Share on other sites

12 hours ago, Aragorn- said:

This post for instance:

https://serverfault.com/questions/971722/dmesg-full-of-i-o-errors-smart-ok-four-disks-affected

 

Suggests a similar error caused by a bad cable.

I did think about the cables, as the Dell branded cables are so expensive I got some from AliExpress.  But I've moved the drive into a known working slot, and still get the same issue.  Do you think it could still be a bad cable?

 

12 hours ago, Aragorn- said:

I would look at the logs from boot time and make sure the controller sees the drive with the correct capacity. I would also check /proc/partitions to confirm the capacity is correct. You say you've flashed it so there shouldnt be any issues, but i'm pretty sure older LSI 2008 firmwares had issues with drives over 2TB, and from memory the IT flashing process is a multi-stage affair involving flashing older versions then newer versions.

 

I believe the mpt2_sas driver prints the version number when it initialises the card:


root@anduin:~# dmesg | grep mpt2sas |grep FW
[    2.944049] mpt2sas_cm0: LSISAS2008: FWVersion(19.00.00.00), ChipRevision(0x03), BiosVersion(07.37.00.00)

From memory 20 is the newest, i'm running 19. But part of the reflashing process requires you to flash on version 7 or something.

I'm running firmware ver sion 20.00.07.00-IT, and in the SAS topology it is seeing the 8TB drive.  I've attached screen captures that might help.

 

12 hours ago, Aragorn- said:

 

Similarly, when it detects the drive you'll get output like this:


root@anduin:~# dmesg | grep "sd 0:0:0:0"
[    3.588604] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    3.596850] sd 0:0:0:0: [sda] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB)
[    3.624694] sd 0:0:0:0: [sda] Write Protect is off
[    3.659857] sd 0:0:0:0: [sda] Mode Sense: f7 00 10 08
[    3.678043] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    3.755552] sd 0:0:0:0: [sda] Attached SCSI disk

You can remove the 0:0:0:0 bit in the command to show all the drives.

 

If you like, do a fresh boot and then post up the whole boot log and we can look thru it.

I've attached the logs from fresh boot, and using the GREP "sd" command.  Hopefully it makes more sense to you.

 

I'll do a bit more Googleing of the errors in the meantime.

8TB.jpg

H200.jpg

DMESG DMESG GREP SD

Link to comment
Share on other sites

Link to post
Share on other sites

I've potentially found the solution!  Whilst playing around with smartctl, I noticed it was formatted with type 2 protection.  Not really knowing what this is, I did a bit of Google-ing and found this:

http://talesinit.blogspot.com/2015/11/formatted-with-type-2-protection-huh.html

 

I'm currently in the process of formatting (it's at 1.26%), which feels like progress.

Link to comment
Share on other sites

Link to post
Share on other sites

Obscure for sure, i guess these drives have come out of a enterprise SAN type system. I ran into a similar issue about 15 years ago with some recovered fibre channel disks, but those actually reported 520byte sectors to the OS, and the disk tools clearly reported the sector size error as the issue. Clearly things have evolved, but the errors have got more obscure.

Fingers crossed that sorts it for you :)

Link to comment
Share on other sites

Link to post
Share on other sites

21 hours ago, Aragorn- said:

Obscure for sure, i guess these drives have come out of a enterprise SAN type system. I ran into a similar issue about 15 years ago with some recovered fibre channel disks, but those actually reported 520byte sectors to the OS, and the disk tools clearly reported the sector size error as the issue. Clearly things have evolved, but the errors have got more obscure.

Fingers crossed that sorts it for you :)

Yeah, that's what I'm thinking.  I bought the drive from Ebay, and although it was advertised as new it didn't come in sealed packaging. 

 

The format took about 30 hours, and thankfully worked. 

 

Thanks everyone for the help :)

Link to comment
Share on other sites

Link to post
Share on other sites

On 6/26/2020 at 8:54 PM, Aragorn- said:

Obscure for sure, i guess these drives have come out of a enterprise SAN type system. I ran into a similar issue about 15 years ago with some recovered fibre channel disks, but those actually reported 520byte sectors to the OS, and the disk tools clearly reported the sector size error as the issue. Clearly things have evolved, but the errors have got more obscure.

Fingers crossed that sorts it for you :)

Beware of enterprise disks for storage systems.

Some time ago I had a chance to play a bit with several disk enclosures with disks which were used for HP EVA8400 storage system.

I had them connected on x86 server running Linux.

I got it working with ZFS, and all was good.

But, there was one interesting thing - when disks encountered errors on read, it would just send soft error warning and actually give WRONG data to the OS! I guess firmware of EVA8400 knows how to handle this properly, but having same disk connected to x86 server yielded such result.

Luckily, using RAID6 and ZFS - ZFS itself noticed checksum errors (yay for checksums!), and corrected them.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×