LVS Latency Issues
Background and Symptoms
- Two setups: both RHEL 6.2; same kernel; same version of LVS stack; one, L3rq, comprising three Sun boxen, one as director and two real servers; t'other, Incline, comprising three Dell boxes, one as director and two real servers.
- Everything works fine on L3rq.
- On Incline SSH X11 tunnelling is awful: xterm starts slowly; emacs takes as long to start as MS Windows does to patch (ok, not that long. . . ).
- Not a bandwidth issue: scp of a large file back from the real nodes to my desktop is fine.
- Assume it is latency.
- Only difference is the hardware — and therefore the ethernet drivers.
Fix
- Sun: Nvidia NICs, forcedeth driver,
- Dell: Broadcom cards (groan), bnx2 driver.
- What can ethtool tell us? On Sun h/w:
prompt> ethtool -k em1 Offload parameters for eth3: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: off large-receive-offload: off
On Dell h/w:prompt> ethtool -k em1 Offload parameters for em2: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off
- So changed the generic-receive-offload on the Dell hardware:
ethtool -K em1 gro off ethtool -K em2 gro off
and instant cure!
Kudos
To:
Un-Kudos
To RedHat/Broadcom for getting the card/driver wrong!