[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [microblaze-uclinux] TCP transmit performance
Hi,
hmm...I also optimised my user application and put some more kernel stuff into BRAM. So I cannot separate well which part brings how much gain.
I just can directly compare the spent time in xenet_FifoSend() which reduced from 600us to about 250us because of no skb_alloc, no memcpy, new assembler checksum calculation and just one call of skb_free() instead of two.
Memcpy (done by the latest assembler version) of one packet (with 1460 bytes, src=32bit-aligned, dst=16bit-aligned) takes 150us here now. Memmove() should approximately that long also, and it's cut down on FifoRecvHandler().
My user app uses the Nagle algorithm to reduce the ACK response count and calls socket function send 3 times, with 220KB, 80KB and 10KB one after another. That takes 320ms to send here. Before I had 670ms with the same KB size, but also because of much more calls of send() with much smaller portions; so I had additional thread-switching time.
Each call of send() is internally processed in an on-the-fly mechanism in tcp_sendmsg() which works in a loop. Each loop cycle allocs a packet management struct (skb) and a 1460bytes-userdata-plus-header portion, copies the data to it (150us), queues such packet and tries to send it. About every second trial leads to real sending. The average time of such loop cycle is 1100us where xenet_FifoSend() is 250us of it.
I've seen my PC answers every second packet with an ACK packet which is additionally processed in the RX soft-IRQ handler. This handler also cleans up sent packets and intermittently skb_free's about 30 of them, which then always interrupts sending for about 5ms because freeing takes about 150us for each packet. Such freeing seems to clean from the packets which were cloned because of possible retransmissions.
You can see most time the EMAC is idle and Linux is doing packet management and dynamic memory alloc/free. I know my EMAC FPGA module has a fifo of 8KB and I wish I know how I could set a larger packet size (likely MTU size). This way the number of packets would be smaller and the management overhead per user data portion would be percentally smaller. So looking at the EMAC workload I could still imagine doubled TCP/IP send throughput.
CU, F@lk
--
GMX startet ShortView.de. Hier findest Du Leute mit Deinen Interessen!
Jetzt dabei sein: http://www.shortview.de/?mc=sv_ext_mf@gmx
___________________________
microblaze-uclinux mailing list
microblaze-uclinux@xxxxxxxxxxxxxx
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/