[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [microblaze-uclinux] xenet_FifoSend(struct sk_buff *orig_skb,structnet_device *dev)
Hi again,
I guess it depends on the size of the buffers being checksummed: i.e.
overhead vs. time in the inner loop. Any idea of the average size of the
buffers being worked on?
For the last version of the code i sent, my inner loop is 8 cycles long and
the gcc-produced version is 12 cycles. On big buffers, the time should
approach about 2/3 of the gcc version. Inline here is a version that does
it in 7 cycles - yeah, I'm playing around <G>.
I 'm not sure how such things are viewed around the linux world: whether or
not the maintainability aspects of C trumps putting such (un-necessary)
assembler stuff in the code. I'm playing with an assembler memcpy at the
moment, on the theory that it may be asked to move larger blocks more often
than do_csum is asked to work on large blocks - let me know if there's any
interest.
Regards,
Jim Law
########## new optimized inner csum loop ##########
2: # when get here, then more than one word to be summed
addi r11,r11,-1 # word count = word count - 1
addi r0,r0,0 # clear carry for add with carry in loop
beqi r11,4f # if no more words to do don't enter loop
5: lwi r4,r7,0 # temp = *word address
addc r3,r3,r4 # csum = csum + temp + carry
addik r11,r11,-1 # word count = word count - 1, don't disturb carry
bneid r11,5b # loop back if still some words to do
addik r7,r7,4 # word address++, don't disturb carry - IN DELAY SLOT
4: # deal with last (possibly partial) word
lwi r4,r7,0 # temp = *word address
and r4,r4,r10 # temp = temp & endmask
addc r3,r3,r4 # csum = csum + temp + carry
#############################################
----- Original Message -----
From: "Brettschneider Falk" <fbrettschneider@xxxxxxxxxxxxxxx>
To: <microblaze-uclinux@xxxxxxxxxxxxxx>
Sent: Monday, April 28, 2008 12:01 PM
Subject: RE: [microblaze-uclinux] xenet_FifoSend(struct sk_buff
*orig_skb,structnet_device *dev)
Hi Jim,
Jim Law wrote:
I had a look at the do_csum() routine in the
arch/microblaze/lib/checksum.c file and produced an assembler
version in the
hopes of providing some speedup.
Thanks very much. Though it seems to me mb-gcc produces almost perfect
assembler instructions since the time is still by about 100us for
computing the checksum. Anyway at first sight... I'm going to compare more
in detail more later...
Cheers, Falk
___________________________
microblaze-uclinux mailing list
microblaze-uclinux@xxxxxxxxxxxxxx
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive :
http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
___________________________
microblaze-uclinux mailing list
microblaze-uclinux@xxxxxxxxxxxxxx
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/