Jump to content

PFSense VLAN Bizarre Issue (ARP and DHCP but nothing else)

brwainer

This post is meant to share a bizarre issue that we haven’t encountered before, for the amusement of anyone who is like me and enjoys such things. If anyone has knowledgeable suggestions I’m all ears, but we’ve already written this issue off as not worth spending additional time on.

The firewall in question is an existing install with 579 days of uptime. At the beginning of this tale it was on 2.3.2 and during troubleshooting it was upgraded to 2.4.4_1. The hardware is a Supermicro C2758 with 4 NICs, I don’t know more than that right now. PFSense is installed on the bare metal.

On to the issue: We had reason to add new VLANs to the firewall’s LAN port. Created VLANs, assigned interfaces and static IPs, created basic “allow all from XXX net” firewalls rules on each one. And then we tried to reach things in the subnets... fail. VLANs tagged properly on switches? Check. Devices actually on the IPs we think they’re on? Check, things within these subnets/VLANs can ping each other, but not the firewall, and the firewall can’t ping them. What about ARP? That is working. DHCP? Clients in the VLANs can get IPs from the DHCP server on PFSense. ARP and DHCP but nothing else usually means firewall rule issue? Remade firewall rules (and made sure everything was applied), no dice. Changed firewall rules to “Allow all from any”, no dice. Disabled firewall completely with “pfctl -d”, still can’t get anything through.

The existing VLANs did and do work fine, but all of the new ones we create have the issue above. Our only next step is to backup, wipe, and reinstall, but sadly there isn’t the budget nor time allotted for doing that, as this isn’t a local system for us. Our decision is that this one is just stuck with the VLANs it has until the customer wants to upgrade their whole network which would include a new firewall. They’ll have to eventually, because this is a Starwood hotel and PFSense (even NetGate appliances) isn’t Marriott approved.

Hopefully this issue gave some amusement and thought puzzling. 

Looking to buy GTX690, other multi-GPU cards, or single-slot graphics cards: 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Try turning it off and on again ?

 

Sort of sounds firewally related, odd that DHCP would be working. Do the VLAN interface IPs ping locally on the firewall? Can other devices in different, existing VLANs, ping the newly created VLAN interface IPs?

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, leadeater said:

Try turning it off and on again ?

 

Sort of sounds firewally related, odd that DHCP would be working. Do the VLAN interface IPs ping locally on the firewall? Can other devices in different, existing VLANs, ping the newly created VLAN interface IPs?

Under normal circumstances, if you create a new interface (VLAN or regular) without defining any firewall rules, then DHCP will work - just one of the quirks of PFSense. Your other questions are good ones and I’ll test today.

Looking to buy GTX690, other multi-GPU cards, or single-slot graphics cards: 

 

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, leadeater said:

Do the VLAN interface IPs ping locally on the firewall? Can other devices in different, existing VLANs, ping the newly created VLAN interface IPs?

Yes, and Yes.

Looking to buy GTX690, other multi-GPU cards, or single-slot graphics cards: 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Assuming you have created multiple interface VLAN's on the same network adapter on the pfSense side, does your DHCP addressing work on each of the respective interfaces or are devices getting DHCP from the original default interface prior to the changes?

 

Can you get a screenshot of the interfaces you have assigned and their respective firewall rules per interface (remove any info you dont want made public)

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Falconevo said:

Assuming you have created multiple interface VLAN's on the same network adapter on the pfSense side, does your DHCP addressing work on each of the respective interfaces or are devices getting DHCP from the original default interface prior to the changes?

 

Can you get a screenshot of the interfaces you have assigned and their respective firewall rules per interface (remove any info you dont want made public)

Devices in the new VLANs get IPs from the proper DHCP server for that VLAN/subnet, not from the original default one. Their MACs show up in the ARP table on the firewall with the assigned IPs, along with an ARP entry for the firewall's own local IP for each VLAN. The firewall also responds to ARP requests from the clients in the VLAN.

 

I'm not going to collect screenshots, but I ask that you trust me based on my prior postings here that I know a thing or two about setting up a new VLAN in PFSense. We've already removed the new VLANs from the system and there isn't any justification for more company time to be spent on this, as it has been looked at by many engineers including one from a partner company that exclusively uses PFSense.

Looking to buy GTX690, other multi-GPU cards, or single-slot graphics cards: 

 

Link to comment
Share on other sites

Link to post
Share on other sites

So I see a lot of talk on creating VLANs on the firewall but what about on the switches coming off the firewall? Are they access or trunks?

Link to comment
Share on other sites

Link to post
Share on other sites

45 minutes ago, mynameisjuan said:

So I see a lot of talk on creating VLANs on the firewall but what about on the switches coming off the firewall? Are they access or trunks?

They are all HP switches so there isn’t such a thing as access or trunk ports. Every new VLAN was created on the switches before doing so on the firewall. We copied the config of a similar VLAN to what the new one should be. On the ports between switches and to the PFSense’s LAN, every VLAN is tagged. We know that the VLAN config on the switches must be OK because we are getting ARP through.

Looking to buy GTX690, other multi-GPU cards, or single-slot graphics cards: 

 

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, brwainer said:

We know that the VLAN config on the switches must be OK because we are getting ARP through.

Not necessarily. 

 

If you cannot ping the VLAN interface on the switch from the VLAN on the firewall then there is a possible VLAN issue somewhere. I have little experience with HP switches but I do know its tagging is setup weird like explicitly tagging on a vlan per port basis. 

 

-Can the two VLAN interfaces ping each other?

-Can you try pinging the VLAN interface on the firewall from a PC in the same VLAN and with wireshark see if you are getting an ARP response?

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, mynameisjuan said:

Not necessarily. 

 

If you cannot ping the VLAN interface on the switch from the VLAN on the firewall then there is a possible VLAN issue somewhere. I have little experience with HP switches but I do know its tagging is setup weird like explicitly tagging on a vlan per port basis. 

 

-Can the two VLAN interfaces ping each other?

-Can you try pinging the VLAN interface on the firewall from a PC in the same VLAN and with wireshark see if you are getting an ARP response?

Myself and others who troubleshooted this issue have extensive HP experience and we know the VLANs were correct on the switches. And how would you explain the firewall knowing (having an entry in its ARP table) a MAC for the IP we placed on the switch by static assignment (not DHCP), and vice versa with the switch learning the router’s MAC, other than that ARP packets are getting through? This is not an IPv6 network so there isn’t another protocol at work. We also saw this behavior with devices connected downstream of the switches, like a wireless controller.

 

Two devices other than the firewall in a new VLAN can ping each other. Pings to or from the firewall on a new VLAN fail. We don’t have the facility here to do packet capture from the non-firewall endpoint (no remotely manageable client computer) but the logic above kndicates that ARP responses are fine.

Looking to buy GTX690, other multi-GPU cards, or single-slot graphics cards: 

 

Link to comment
Share on other sites

Link to post
Share on other sites

34 minutes ago, brwainer said:

And how would you explain the firewall knowing (having an entry in its ARP table) a MAC for the IP we placed on the switch by static assignment (not DHCP), and vice versa with the switch learning the router’s MAC, other than that ARP packets are getting through?

With improper tagging or lack of tagging you will still receive packets on the firewall if its allow to accept it which are processed at layer 2 before rules then added to ARP at layer 3 THEN routing then rules. Again usually with the lack of tagging is what you are seeing. 

 

34 minutes ago, brwainer said:

Two devices other than the firewall in a new VLAN can ping each other.

Well yeah, firewall is not involved. Same subnet.

 

34 minutes ago, brwainer said:

Pings to or from the firewall on a new VLAN fail.

This is where I am saying the VLAN issue lies

 

34 minutes ago, brwainer said:

but the logic above kndicates that ARP responses are fine

But it doesnt. If you sent an ARP request out and it broadcast and makes it to the firewall on the wrong VLAN, sure you are going to ARP on the firewall but it doesnt mean the devices is getting an ARP response. All you need is ANY pc what so ever on that NEW vlan to run wireshark and see what traffic you are receiving. It would seriously tell you exactly whats going on. 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, mynameisjuan said:

With improper tagging or lack of tagging you will still receive packets on the firewall if its allow to accept it which are processed at layer 2 before rules then added to ARP at layer 3 THEN routing then rules. Again usually with the lack of tagging is what you are seeing. 

 

Well yeah, firewall is not involved. Same subnet.

 

This is where I am saying the VLAN issue lies

 

But it doesnt. If you sent an ARP request out and it broadcast and makes it to the firewall on the wrong VLAN, sure you are going to ARP on the firewall but it doesnt mean the devices is getting an ARP response. All you need is ANY pc what so ever on that NEW vlan to run wireshark and see what traffic you are receiving. It would seriously tell you exactly whats going on. 

If ARP packets are getting from one VLAN to another we have a serious issue on our hands.... that’s exactly what a VLAN is supposed to prevent from happening. Since you don’t seem to believe me, here’s the relevant switch config (vlan 936 works, vlan 100 doesn’t):

int 25 name Switch02

int 26 name PFSense_LAN

vlan 100

tag 25-26

vlan 936

tag 25-26

 

This is about as basic of a situation as we can have - we manage some hotels with hundreds of different VLANs and subnets in place, and while human mistakes can happen, there isn’t anything wrong with the VLAN config on the switch.

Looking to buy GTX690, other multi-GPU cards, or single-slot graphics cards: 

 

Link to comment
Share on other sites

Link to post
Share on other sites

19 hours ago, brwainer said:

Devices in the new VLANs get IPs from the proper DHCP server for that VLAN/subnet, not from the original default one. Their MACs show up in the ARP table on the firewall with the assigned IPs, along with an ARP entry for the firewall's own local IP for each VLAN. The firewall also responds to ARP requests from the clients in the VLAN.

 

I'm not going to collect screenshots, but I ask that you trust me based on my prior postings here that I know a thing or two about setting up a new VLAN in PFSense. We've already removed the new VLANs from the system and there isn't any justification for more company time to be spent on this, as it has been looked at by many engineers including one from a partner company that exclusively uses PFSense.

I don't doubt you are familiar with VLAN's etc, I had an issue some years ago where the VLANs were not being passed through correctly with a realtek interface and hp procurve 2920 (i think).  I ended up replacing the interface with an intel 1G card which was using the em* driver and the issue cleared immediately, I didn't even recreate the vlan interfaces in pf, I simply reassigned them to the new controller and everything came up.

 

I was having similar issues although back then on pf 2.2, unable to access the default gateway regardless of NAT or firewall config used.  Soon as the interface was replaced, the problem went away.

 

Also I am assuming the config is like this;

 

WAN > PF > (trunk port with configured vlans allowed) > HP switch > (switch port * vlan 1,2 or 3 etc as an access port) > client

 

The port that pf is plugged in to needs to be a trunk port with all the vlan's that you are attempting to pass to be in the trunk vlan list.

 

Also check the HP doesn't have the arp-protect feature enabled, that has caused me trouble in the past also.  I avoid HP switches these days so rarely come across these issues :D

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Falconevo said:

I don't doubt you are familiar with VLAN's etc, I had an issue some years ago where the VLANs were not being passed through correctly with a realtek interface and hp procurve 2920 (i think).  I ended up replacing the interface with an intel 1G card which was using the em* driver and the issue cleared immediately, I didn't even recreate the vlan interfaces in pf, I simply reassigned them to the new controller and everything came up.

 

I was having similar issues although back then on pf 2.2, unable to access the default gateway regardless of NAT or firewall config used.  Soon as the interface was replaced, the problem went away.

 

Also I am assuming the config is like this;

 

WAN > PF > (trunk port with configured vlans allowed) > HP switch > (switch port * vlan 1,2 or 3 etc as an access port) > client

 

The port that pf is plugged in to needs to be a trunk port with all the vlan's that you are attempting to pass to be in the trunk vlan list.

 

Also check the HP doesn't have the arp-protect feature enabled, that has caused me trouble in the past also.  I avoid HP switches these days so rarely come across these issues :D

I would hope that a Supermicro C2758 platform’s onboard NICs are all Intel, but I haven’t looked it up. But if it is a NIC issue, we are replacing it with a Cisco ASA or a Watchguard - which is happening regardless next time they have any budget for network work.

 

yes the config is basically what you have assumed. Yes, on the port going to the PFSense lan all VLANs are tagged - the equivalent of a trunk port. The HP doesn’t have arp-protect and during troubleshooting we also disabled some of other security settings on it. We ended up with the barest config that still handled the VLANs that worked plus the ones we were testing.

Looking to buy GTX690, other multi-GPU cards, or single-slot graphics cards: 

 

Link to comment
Share on other sites

Link to post
Share on other sites

36 minutes ago, brwainer said:

I would hope that a Supermicro C2758 platform’s onboard NICs are all Intel, but I haven’t looked it up. But if it is a NIC issue, we are replacing it with a Cisco ASA or a Watchguard - which is happening regardless next time they have any budget for network work.

  

yes the config is basically what you have assumed. Yes, on the port going to the PFSense lan all VLANs are tagged - the equivalent of a trunk port. The HP doesn’t have arp-protect and during troubleshooting we also disabled some of other security settings on it. We ended up with the barest config that still handled the VLANs that worked plus the ones we were testing.

That SM board has the i354 network controller on it which had a load of problems in earlier versions of pfSense from the 2.1 era.  Does the interface use the ix driver or the igb driver on your installation?

 

I went through my emails and kb I have updated sporadically over the years and I have a command in there from an issue with the ix driver but it's so long ago I cannot recall why it's in the debugging pf but its relating to vlan hw filtering.  It was just sat on its own in the bottom of a kb I wrote but I don't recall why it was there.

 

ifconfig ix0 -vlanhwfilter

 

Give the option a try and see if the problem goes away, change your ix* for the correct interface running the VLANs ofc as ix0 may be your wan interface etc.  So make sure you select the correct interface.

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, brwainer said:

If ARP packets are getting from one VLAN to another we have a serious issue on our hands.... that’s exactly what a VLAN is supposed to prevent from happening.

I know that, thats why I am saying its a possible VLAN misconfiguration

 

7 hours ago, brwainer said:

int 25 name Switch02

int 26 name PFSense_LAN

vlan 100

tag 25-26

vlan 936

tag 25-26

And what ports are the untagged ports? Which of those are you trying to ping from?

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, mynameisjuan said:

I know that, thats why I am saying its a possible VLAN misconfiguration

 

And what ports are the untagged ports? Which of those are you trying to ping from?

We are testing from the switches themselves as well as a wireless controller.

 

Switch01:

int 25 name Switch02

int 26 name PFSense_LAN

vlan 100

tag 25-26

ip address 10.0.3.130 255.255.255.128

vlan 101

tag 25-26

ip address dhcp

vlan 936

tag 25-26

ip address 10.0.1.50 255.255.255.0

 

Switch02:

int 1 name WLC

int 50 name Switch01

vlan 100

tag 50

ip address 10.0.3.131 255.255.255.128

vlan 101

tag 50

untag 1

no ip address

vlan 936

tag 50

ip address 10.0.1.51 255.255.255.0

Looking to buy GTX690, other multi-GPU cards, or single-slot graphics cards: 

 

Link to comment
Share on other sites

Link to post
Share on other sites

20 minutes ago, brwainer said:

We are testing from the switches themselves as well as a wireless controller.

 

Switch01:

int 25 name Switch02

int 26 name PFSense_LAN

vlan 100

tag 25-26

ip address 10.0.3.130 255.255.255.128

 

So off switch 1 if you ping 10.0.3.129 source 10.0.3.130 do you get any response at all? If not how is 100 setup on PFsense. 

Link to comment
Share on other sites

Link to post
Share on other sites

32 minutes ago, mynameisjuan said:

So off switch 1 if you ping 10.0.3.129 source 10.0.3.130 do you get any response at all? If not how is 100 setup on PFsense. 

This is correct. PFSense is configured with VLAN 100 as 10.0.3.129/25, and has a proper firewall rule on the VLAN (accept all from XXX net, also trued accept all from any). Pings from PFSense to either switch have no reslonse, pings from the switches to pfsense have no response. And yet PFSense has each switch’s MAC in its ARP table for 10.0.3.130 and 10.0.3.131, and they were dynamically learned. The switches likewise have learned the firewall’s MAC for 10.0.3.129.

Looking to buy GTX690, other multi-GPU cards, or single-slot graphics cards: 

 

Link to comment
Share on other sites

Link to post
Share on other sites

15 hours ago, Falconevo said:

That SM board has the i354 network controller on it which had a load of problems in earlier versions of pfSense from the 2.1 era.  Does the interface use the ix driver or the igb driver on your installation?

 

I went through my emails and kb I have updated sporadically over the years and I have a command in there from an issue with the ix driver but it's so long ago I cannot recall why it's in the debugging pf but its relating to vlan hw filtering.  It was just sat on its own in the bottom of a kb I wrote but I don't recall why it was there.

 

ifconfig ix0 -vlanhwfilter

 

Give the option a try and see if the problem goes away, change your ix* for the correct interface running the VLANs ofc as ix0 may be your wan interface etc.  So make sure you select the correct interface.

 

15 hours ago, Falconevo said:

Did a quick google for the command I had because i clearly didnt think it up myself in the KB and came across the netgate debugging page for the intel interfaces;

 

https://www.netgate.com/docs/pfsense/hardware/tuning-and-troubleshooting-network-cards.html#Intel_ix.284.29_Cards

ix vlan.png

The system uses the igb drivers, not the ix drivers. TSO and LRO are disabled already. Thanks for mentioning that though, we didn’t know about the possible ix driver vlan issue.

Looking to buy GTX690, other multi-GPU cards, or single-slot graphics cards: 

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, brwainer said:

This is correct. PFSense is configured with VLAN 100 as 10.0.3.129/25, and has a proper firewall rule on the VLAN (accept all from XXX net, also trued accept all from any). Pings from PFSense to either switch have no reslonse, pings from the switches to pfsense have no response. And yet PFSense has each switch’s MAC in its ARP table for 10.0.3.130 and 10.0.3.131, and they were dynamically learned. The switches likewise have learned the firewall’s MAC for 10.0.3.129.

hmmm.... with show mac-address vlan 100 on switch 1 does it show PFsense's MAC? 

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, brwainer said:

 

The system uses the igb drivers, not the ix drivers. TSO and LRO are disabled already. Thanks for mentioning that though, we didn’t know about the possible ix driver vlan issue.

The IGB driver uses it also but I certainly haven't see the issue on that driver.  You can check the supported interface options by running 'ifconfig' in SSH on the pfSense box.


This is an output from one of mine using igb driver which shows the hw vlan tagging is active on that driver also.

 

igb3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=6503bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCS

 

Maybe test with the following;

 

ifconfig igb0 -vlanhwfilter -vlanhwfilter -tso

 

Worth a shot :)

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

Guest
This topic is now closed to further replies.

×