Jump to content

Need some Linux NIC help, please (connection dropping randomly)

Sarra

I have a Ryzen 9 3900XT based machine with a TP-Link TX-401 10GbE card in it. This machine is connected via Cat6A to a 10GbE network switch. Short 3' cable. OS is KUbuntu 22.04.

 

Randomly, while I'm using it, the network will slow down, way down, and then disconnect. After a few seconds, it will reconnect on it's own. It will do this truly randomly, I could be idling and it will disconnect, or when I'm trying to actually use the internet.

 

I have swapped the cable, I have a second machine with an identical card, running Windows, and it doesn't disconnect. Should I swap cards and see if the problem follows the card, or is there some way I can check diagnostics in KUbuntu? I'm pretty new at Linux networking, please keep that in mind.

 

Heh, the irony is that the old Windows machine's onboard NIC used to do this, with a different cable, different switch, long ago...

"Don't fall down the hole!" ~James, 2022

 

"If you have a monitor, look at that monitor with your eyeballs." ~ Jake, 2022

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Sarra said:

or is there some way I can check diagnostics in KUbuntu?

Can you keep a terminal open running "sudo dmesg -wH" and try to keep a while at it when the issue occurs? Logs would be great for that.

 

Seems like this NIC has an Aquantia chip on it, so you can filter out the logs by doing a "sudo dmesg | grep atlantic" to see driver-specific logs. You could also try to update the firmware on your NIC.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

I'd also advise installing lm_sensors as this can monitor the temperature of the card.  You then run "sensors" and it shows something like:

enp7s0-pci-0700
Adapter: PCI adapter
PHY Temperature:  +76.9°C  
MAC Temperature:  +73.8°C  

 

The Aquantia cards run toasty and some people have claimed bad heatsink installation on some causing overheating.

 

Ironically this particular ASUS XG-C100C v1 (Aquantia AQC107) card would crash Windows 11 networking entirely under load, but works fine in Linux.  I think Linux handles some odd quirks of this card better:

[15136.441556] pcieport 0000:00:1c.4: AER: Corrected error message received from 0000:07:00.0
[15136.441587] atlantic 0000:07:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[15136.441594] atlantic 0000:07:00.0:   device [1d6a:07b1] error status/mask=00000001/0000a000
[15136.441601] atlantic 0000:07:00.0:    [ 0] RxErr                  (First)

 

There are also instructions out there to force the stock AQC107 firmware onto the card, which is usually newer than what the AIBs provide and made mine more reliable, until a Win11 update broke everything.

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, igormp said:

Can you keep a terminal open running "sudo dmesg -wH" and try to keep a while at it when the issue occurs? Logs would be great for that.

 

Seems like this NIC has an Aquantia chip on it, so you can filter out the logs by doing a "sudo dmesg | grep atlantic" to see driver-specific logs. You could also try to update the firmware on your NIC.

All that it shows is 'atlantic link change from 10000 to 0 and back again.  Shows some audits failed involving a printer spooler, which might explain why it sometimes has problems printing (the Linux box is a print server).

 

6 hours ago, Alex Atkin UK said:

I'd also advise installing lm_sensors as this can monitor the temperature of the card.  You then run "sensors" and it shows something like:

enp7s0-pci-0700
Adapter: PCI adapter
PHY Temperature:  +76.9°C  
MAC Temperature:  +73.8°C  

 

The Aquantia cards run toasty and some people have claimed bad heatsink installation on some causing overheating.

 

Ironically this particular ASUS XG-C100C v1 (Aquantia AQC107) card would crash Windows 11 networking entirely under load, but works fine in Linux.  I think Linux handles some odd quirks of this card better:

[15136.441556] pcieport 0000:00:1c.4: AER: Corrected error message received from 0000:07:00.0
[15136.441587] atlantic 0000:07:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[15136.441594] atlantic 0000:07:00.0:   device [1d6a:07b1] error status/mask=00000001/0000a000
[15136.441601] atlantic 0000:07:00.0:    [ 0] RxErr                  (First)

 

There are also instructions out there to force the stock AQC107 firmware onto the card, which is usually newer than what the AIBs provide and made mine more reliable, until a Win11 update broke everything.

Temps are lower than what you posted, 71C and 68C respectively.

 

I'm somewhat confused, my Windows machine shows a Marvell AQtion driver installed for it... Guessing Windows is using whatever driver it thinks will work, because Windows.

 

As much as I want to swap cards, the Windows machine is so old and has a failing BIOS chip, it may not boot up again, it's a real struggle to get it running. D: I either need a new board for that machine, or a board and a CPU. The linux machine is the replacement for that, I just need lots of SSD drives first, and I have no job atm, so it's hard to justify spending money right now.

"Don't fall down the hole!" ~James, 2022

 

"If you have a monitor, look at that monitor with your eyeballs." ~ Jake, 2022

Link to comment
Share on other sites

Link to post
Share on other sites

20 minutes ago, Sarra said:

All that it shows is 'atlantic link change from 10000 to 0 and back again.

Maybe try to update the firmware.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, igormp said:

Maybe try to update the firmware.

I went to check the cable, and as soon as I touched the cable, where it comes out of the card, the connection dropped, lights went out on the card. I think it's a bad cable, or more likely, the RJ45 jack on the card is bad.

 

I'll keep doing testing, I can't find my short Cat6A cables right now, I have no idea where they are, I know I have several more of them. I'll try swapping the cable again, and I'll grab a magnifier and see if I can see a problem with the jack itself.

"Don't fall down the hole!" ~James, 2022

 

"If you have a monitor, look at that monitor with your eyeballs." ~ Jake, 2022

Link to comment
Share on other sites

Link to post
Share on other sites

Just to completely finish this thread: I pulled a hair out of the RJ45 jack on the NIC, I haven't had a single network drop since then.

 

Thanks for the help, I did learn some more about Linux from this, so thank you again. :3

"Don't fall down the hole!" ~James, 2022

 

"If you have a monitor, look at that monitor with your eyeballs." ~ Jake, 2022

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×