[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [microblaze-uclinux] xenet_FifoSend(struct sk_buff *orig_skb,struct net_device *dev)



Hi Falk,

Brettschneider Falk wrote:

John Williams wrote:
If you can measure the time taken in other parts of FifoSend, my guess is a large part is the unavoidable memcpy / skb_copy_and_csum_dev() calls.
unfortunately, yes. A short hack for avoiding the alloc showed the time hasn't changed significantly. :-( Though memory fragmentation may be reduced in hard-threaded systems.

If you alloc the skb in the driver init function, and free it in driver shutdown, then it will not live forever (if e.g. loaded as a module).
good idea.

I saw the function pointer to xenet_FifoSend() is stored as dev->hard_start_xmit. Which higher layer uses it? Do you think it's worth to have a look if the address offset 2 can be fixed?

That's the primary entry point for the upper network layer to send a packet via this device. The alignment requirement is deeply ingrained into the kernel. The problem comes about because ethernet frame headers are not a multiple of 4 bytes long, but kernel expects (or at least assumes) that IP header fields are word aligned.

You can skip the memcpy, and enable unaligned exception handling in the kernel instead. Then every time the kernel looks into a packet header, you take an unaligned exception, and it ends up being no faster than just doing the memcpy. If you had lots of really long packets maybe unaliged exceptions would be faster, but in general I saw no real value.

This is why Xilinx implemented the DRE (data realignment engine) in the plb and opb ethernet controllers (in SGDMA mode only), and why the SDMA engines in MPMC can do non-word aligned DMA transfers. Much easier and faster to do in hardware than take the CPU exceptions all the time.

There is one possibility you might try - the FifoSend (and receive) calls use the lower level 0 Xilinx drivers to actually write the data to the hardware FIFOs on the device. These don't handle unaligned pointers.

You can tweak these routines based on the pointer alignment, and handle unaligned ptrs without the exception hit. Basically you do the realignment yourself in software, but ensure you only do full word memory accesses.

in dodgy 7am psuedo code:

// unaligned src ptr
u32 *src=0x2;

// aligned dst ptr
u32 *dst=0x100;

u32 *s1=(src&~0x3);

u32 tmp1,tmp2;

tmp1=s1[0];
while(1)
{
	tmp2=s1[1];
	*dst++=(tmp1<<16) | (tmp2>>16);
	s1++;
	tmp1=tmp2;
}


Maybe worth a try.

Cheers,

John
___________________________
microblaze-uclinux mailing list
microblaze-uclinux@xxxxxxxxxxxxxx
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/