I *think* I read somewhere that some stacks will set the PUSH bit on every outgoing datagram if TCP_NODELAY is set.

Not the Linux one. And besides, PUSH is to do with whether to return from a blocking read after the first packet, or to wait for further packets; no client stack in the world will return *part* of the data from its input buffer just because one packet had the PUSH flag set. PUSH is not a TCP record boundary; there is no such thing.

If you are only receiving one packet per read call on Windows, it's because you're calling read more often than the wire rate of the packets.

Peter