Jump to content

PFsense nightmare

I've recently put together a pfsense router, repurposing a Fujitsu Primergy something something server. It's a Bulldozer-era Opteron server with some DDR3 RAM, an SSD, and an add-on 4-port gigabit NIC. Before PFsense, this PC did nothing for months, and before that it ran Windows Server 2012 R2 I believe, non-stop for 6 months or whatever the trial period was (probably more, since it doesn't shut down automatically when the trial expires :P). No issues.

 

Some 10 days ago, I installed PFsense on it: other than next-next-next, set WAN port, set LAN port, set the remaining ports, I did nothing to it. Hooked WAN to the IPS router, everything else to ethernet jacks around my house, and boom, everything seemed to be working. I may stress at this point (after many Google dead ends :P) that I have not installed any add-on, plug-in, you name it. it's pure download, rufus to USB, install, plug cables.

 

The problems started a few days later: every now and then (and it can be zero times for hours, and then 6 times in 20 minutes) I would lose connectivity on my ethernet connections. Internet is working (wifi is still going through the ISP thing), but I don't have internet, nor access o the PFsense interface, on ethernet. If I just wait, connection comes back, both to the PFsense and to the internet (didn't time it, but it takes a few seconds - a minute). Once it back, it can work just fine or do the same thing 2 minutes later.

I'd love to start digging more into how to configure PFsense and test what I can do with it, but so far I can't get past the most basic: working stably. It's driving me crazy and I don't know how to diagnose it as I can reproduce it on purpose, and I don't do anything to fix it. it just comes and goes at will.

 

Did anyone encounter similar issues? What could possibly be the reason?

In case it matters, my current layout is:

 

ISP modem

>>single ethernet>>

PFsense

>>LAGG (2 ethernet in LACP)>>

Cisco Switch from hell

>> 2 ethernet in LACP to main computer

>> single ethernet to another computer

 

The PFsense is also connected directly to other 2 ethernet jacks (no LAGG) elsewhere in the house, currently unused.

 

Before installing PFsense, the layout was:

 

ISP modem

>>single ethernet>>

Cisco Switch from hell

>> 2 ethernet in LACP to main computer

>> single ethernet to another computer

 

and was working with no issues.

 

Any suggestions? o.O

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, SpaceGhostC2C said:

Cisco Switch from hell

What is this, exactly, and what does your LACP configuration look like?

PC : 3600 · Crosshair VI WiFi · 2x16GB RGB 3200 · 1080Ti SC2 · 1TB WD SN750 · EVGA 1600G2 · Define C 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, beersykins said:

What is this, exactly, and what does your LACP configuration look like?

Catalyst Express 500G, the 500G-12TC version I believe (8+4 ports configuration).

 

Switch side, the relevant "Etherchannel" configuration looks like this:

 

Spoiler

Catalyst500.thumb.png.95c3daa943f8a543ce7e354c97cabbab.png

(The "critical error" is the switch not always detecting the fan spinning, despite the fan always spinning).

 

On the router's side, it looks like:

Spoiler

PFSENSE.thumb.png.8d7795f37f8b38ac62c6d1b0f55600fb.png

 

All IPs are set statically. The LAGG is set as the "LAN" connection. Since my post, I have made some changes: I wanted to see if connections not going through the switch were also down, and I found out DHCP was not working at all for them (configured as "OPT1" and "OPT2"). I have since disabled DHCP in LAN (all IPs downstream will be set statically anyway) and enabled it in the OPT ones. They are currently getting valid IPs, but DNS is still broken, whether I try Resolver or Forwarder services (changing between them does have consequences on which static DNS I have to set on the computers connected to the switch, but those not going through the switch don't get working DNS addresses through DHCP regardless of what I set).

This is all besided the original problem, maybe? I didn't have a new episode since the last (DHCP-related) changes, but haven0t used the computers for long either.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, SpaceGhostC2C said:

The problems started a few days later: every now and then (and it can be zero times for hours, and then 6 times in 20 minutes) I would lose connectivity on my ethernet connections. Internet is working (wifi is still going through the ISP thing), but I don't have internet, nor access o the PFsense interface, on ethernet. If I just wait, connection comes back, both to the PFsense and to the internet (didn't time it, but it takes a few seconds - a minute). Once it back, it can work just fine or do the same thing 2 minutes later.

I'd love to start digging more into how to configure PFsense and test what I can do with it, but so far I can't get past the most basic: working stably. It's driving me crazy and I don't know how to diagnose it as I can reproduce it on purpose, and I don't do anything to fix it. it just comes and goes at will.

 

Did anyone encounter similar issues? What could possibly be the reason?

Do you happen to be using a Realtek NIC?

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, WereCatf said:

Do you happen to be using a Realtek NIC?

I'll try to find out. The added card is Fujitsu branded, but the controller itself was obviously made by Intel/Broadcom/Realtek. Also need to check on the onboard one (that goes to the ISP modem).

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, SpaceGhostC2C said:

I'll try to find out. The added card is Fujitsu branded, but the controller itself was obviously made by Intel/Broadcom/Realtek. Also need to check on the onboard one (that goes to the ISP modem).

I'm asking, because what you're describing sounds kinda similar to an issue I had with Pfsense and Realtek NIC.

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, WereCatf said:

I'm asking, because what you're describing sounds kinda similar to an issue I had with Pfsense and Realtek NIC.

OK, found the add-in card, it seems to be an Intel I350-AM4 Gigabit Ethernet Controller

https://sp.ts.fujitsu.com/dmsp/Publications/public/ds-eth-ctrl-4x1Gbit-PCIe-x4-D3045-Cu.pdf

 

although I got it under the D2745-a11 part number.

 

Regardless, I hope your story had a happy ending just in case :P 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, SpaceGhostC2C said:

Regardless, I hope your story had a happy ending just in case :P 

Not really. I found an alternative driver for the Realtek NICs that I am using, but it's just a stopgap-measure. Gonna have to replace my entire Pfsense-box with something else, because it's one of those tiny SBCs without any PCI-E-slot or anything that I could use to put in Intel NICs. *groan*

2 minutes ago, SpaceGhostC2C said:

OK, found the add-in card, it seems to be an Intel I350-AM4 Gigabit Ethernet Controller

https://sp.ts.fujitsu.com/dmsp/Publications/public/ds-eth-ctrl-4x1Gbit-PCIe-x4-D3045-Cu.pdf

Log into Pfsense's command-line and check what dmesg says, when you're having connection-issues. dmesg is a tool that outputs the kernel's logs, so the last few lines during one of these connection-breakages might give a hint.

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, WereCatf said:

Not really. I found an alternative driver for the Realtek NICs that I am using, but it's just a stopgap-measure. Gonna have to replace my entire Pfsense-box with something else, because it's one of those tiny SBCs without any PCI-E-slot or anything that I could use to put in Intel NICs. *groan*

Log into Pfsense's command-line and check what dmesg says, when you're having connection-issues. dmesg is a tool that outputs the kernel's logs, so the last few lines during one of these connection-breakages might give a hint.

 

Well, I just went to try the command and the last few lines of the output look like:

Spoiler

lagg0: IPv6 addresses on igb2 have been removed before adding it as a member to prevent IPv6 address scope violation.
lagg0: link state changed to DOWN
lagg0: IPv6 addresses on igb3 have been removed before adding it as a member to prevent IPv6 address scope violation.
bge0: link state changed to DOWN
pflog0: promiscuous mode enabled
igb2: link state changed to UP
lagg0: link state changed to UP
bge0: link state changed to UP
igb3: link state changed to UP
igb1: link state changed to UP
igb2: Interface stopped DISTRIBUTING, possible flapping
igb3: Interface stopped DISTRIBUTING, possible flapping
igb1: link state changed to DOWN
igb1: link state changed to UP
igb3: Interface stopped DISTRIBUTING, possible flapping
igb2: Interface stopped DISTRIBUTING, possible flapping
igb3: Interface stopped DISTRIBUTING, possible flapping
igb2: Interface stopped DISTRIBUTING, possible flapping
igb3: Interface stopped DISTRIBUTING, possible flapping
igb2: Interface stopped DISTRIBUTING, possible flapping
igb3: Interface stopped DISTRIBUTING, possible flapping
igb2: Interface stopped DISTRIBUTING, possible flapping
igb1: link state changed to DOWN
igb1: link state changed to UP

 

igb2 and igb3 are the LAGG members. the lagg0 itself doesn't go down, but... 🤔

No recent incidence, though.

 

 

Meanwhile, Broadcom + Intel confirmed through pciconf -lv

Spoiler

igb0@pci0:2:0:0:	class=0x020000 card=0x11a81734 chip=0x150e8086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82580 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
igb1@pci0:2:0:1:	class=0x020000 card=0x11a81734 chip=0x150e8086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82580 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
igb2@pci0:2:0:2:	class=0x020000 card=0x11a81734 chip=0x150e8086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82580 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
igb3@pci0:2:0:3:	class=0x020000 card=0x11a81734 chip=0x150e8086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82580 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
bge0@pci0:3:0:0:	class=0x020000 card=0x11aa1734 chip=0x169214e4 rev=0x01 hdr=0x00
    vendor     = 'Broadcom Limited'
    device     = 'NetLink BCM57780 Gigabit Ethernet PCIe'
    class      = network
    subclass   = ethernet

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, SpaceGhostC2C said:

Well, I just went to try the command and the last few lines of the output look like

Interestingly, that's the same thing I was seeing with my Realtek NIC; the driver wasn't handling things properly and the NIC kept crashing. I know Realtek NICs have issues with Pfsense, but I was under the impression that Intel NICs are fine.

 

I don't really have any good advice on what to do now, other than maybe getting some different Intel NICs on eBay or whatever and seeing if they work better?

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, WereCatf said:

I don't really have any good advice on what to do now, other than maybe getting some different Intel NICs on eBay or whatever and seeing if they work better?

Sadly, I bought these things in the happy days before moving from an ebay-intensive country to a non-ebay country... :( 

There may still be something in the local used market. In the meantime, though, at least I have a lead to follow.

Or I can work on DHCP handing out the correct DNS... 9_9

 

Thanks for the help!

Link to comment
Share on other sites

Link to post
Share on other sites

Can you try without the LAGG?  Do you even need it?

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Alex Atkin UK said:

Can you try without the LAGG?  Do you even need it?

To be honest, I wouldn't even have anything other than a cable from my PC to the ISP's modem if I stuck to what I need :P This whole project is about learning and doing as much as I can, though.

 

I can try without it as a way to diagnose the issue. I left a laptop plugged to one of the ports outside the LAGG>Switch path (so just PFsense -> Laptop), to check if that connection is up when tha LAGG+Switch one falls. It hasn't happened again since I deactivated DHCP on it, but I won't claim victory yet as I see no link between what I did and the issue (plus 24hs isn't that long to test).

 

If/when it happens again, I'll let you know how it goes!

Link to comment
Share on other sites

Link to post
Share on other sites

If you continue to have issues, it sounds odd but I'd recommend putting in a different SSD/boot media, reinstalling pfSense and restoring your config. I've seen weird things happen with pfSense if the boot disk isn't rock solid, especially relating to the Web UI and services shutting down and restarting some time later.

Link to comment
Share on other sites

Link to post
Share on other sites

  • 2 weeks later...

Small update: I haven't experienced (fingers crossed) random episodes of losing connection for a few days, but I have detected some consistent behavior related to the same problem:

  • When starting up the PC connected with LACP to the switch, I quickly get access to the switch, but access all the way to the PFSense box takes considerably longer (before using PFSense, it did take longer than normal to obtain full internet access, but considerably less than it takes now)
  • If the second PC connected (single cable) to the switch is then switched on, I lose connectivity to the PFSense router again: basically, the link between the switch and the router breaks down, so neither PC "this" side of the switch has access to the internet or the router itself, but they do have access to the switch web interface and the can see each other through it.

In all cases, just by waiting the link router-switch comes back up and we all have access everywhere, but it would seem that every time a PC connects to the switch and seeks a path to the outside through the router, the router drops the link to the switch for a couple minutes. I haven't tried yet if I can reproduce this also by starting up a computer connected directly to the router (as tests that ruin connections are better left for when I'm not working :P).

 

System log's only new entries after the episode are the infamous

 

igb3: Interface stopped DISTRIBUTING, possible flapping
igb2: Interface stopped DISTRIBUTING, possible flapping

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×