First of all, the problem is taken care of, so that part isn’t important. The thing is, I’m pretty good on the whole R&S thing, but I can’t come up with a good explanation as to why this happened the way it did. So, looking for suggestions.
The setup:
A number of devices on a basic class c subnet. Devices include .1, .6, .7, .8, .9, and .18. The gateway is .254. All devices are on the same switch.
Now, I ping .18 from outside the subnet and my ping times are <1ms. I ping .254 and my ping times are <1ms. I ping anything else, and my ping times vary from 40ms to 500ms to timeouts.
From inside the subnet, all the ping times are <1ms.
As part of “I don’t know what the hell is going on so let’s see if we can figure out all the weirdness” I disabled the port going to the device at .1. And here’s where it got interesting - from outside the subnet I was no longer able to ping .1 (obvously), nor could I ping .6, .7, .8, and .9. I could still ping the .18 and the .254.
Now, the actual problem was that some fool (not me! yay!) configured the .6, .7, .8, and .9 devices with .1 as their gateway. Once those were corrected to .254 as the gateway everything was happy and good all around. So, problem solved.
The layer two stuff, sure, that all makes sense. Nobody’s pings on the switch should have left it anyway, so all of that acted as expected. The gateway is irrelevant.
But, here’s the thing that I can’t figure out. How the hell were devices responding to pings from off the subnet when they had the incorrect default gateway? They should have received the ping, seen that the source was from off their subnet, ARP’d the (wrong) gateway, sent the ping to .1, and then…
As far as I know the .1 device should have just dropped the ICMP packet. Which obviously didn’t happen.
Ideas?
-Joe