Fool me once, shame on you. Fool me twice, and I will write a blogpost to remember.
For a few days I've been experiencing intermittent networking problems. Every couple of minutes iSCSI and FCoE connection seemed to break for a brief moment. This made filesystem quite unhappy. I've added another path to iSCSI target, over legacy IPv4, but it didn't help. And because all paths were disappearing at the same moment, multipath device was failing, too.
So, iSCSI-over-IPv6, iSCSI-over-IPv4 and FCoE were crumbling. Clearly, the network was at fault (as it always is!). Then it hit me. I have seen a bug like this before. It manifested a bit different because of a different driver (r8169 vs tg3), but even without hints in dmesg I've recognized the problem.
You see, ethernet port at gigabit speed is quite powerhungry. Limiting speed to 100MBps can reduce power draw, especially with multi-port and multi-gigabits NICs. This fact is utilized by tuned utility. If you put tuned into powersave profile – like my misbehaving station was – and you have dynamic_tuning = 1 in configuration file…
During idle periods, tuned dropped my NIC into one hundred megabits speed. When bandwidth usage rose, tuned flipped the network card to gigabit speed. Brief moment of layer-1 renegotation was enough to disturb the connection to the storage target.
Disabling dynamic tuning restored the network to rock-solid state instantly.