From: Siva Velusamy <siva.velusamy@xxxxxxxxx>
Reply-To: microblaze-uclinux@xxxxxxxxxxxxxx
To: microblaze-uclinux@xxxxxxxxxxxxxx
Subject: Re: [microblaze-uclinux] Jtag Debug
Date: Wed, 26 Oct 2005 12:17:55 -0700
Andy -
You are missing the barrel shifter in your MHS. This should improve the
performance quite a bit. The FPU is enabled, but you have to make sure
your
code uses single precision arithmetic (define all constants as 2.0F and
not
2.0 as it will default to double, use float types, etc). You should be
able
to confirm that you are actually using the fpu by grepping for fp
instructions in your ELF file.
mb-objdump -d fft.elf and search for fadd, frsub, fmul
If you don't see those instructions, then you are not using the FPU.
-Siva
Xilinx, Inc.
On 10/26/05, Abot Botbot <dejanigma@xxxxxxxxxxx> wrote:
>
> Paul,
>
> Yea the design was compiled in EDK using BSB. I didn't want to get into
> specifics right away, but I can give you a little perspective. The
> algorithm
> I'm using computes a floating point FFT on a 2x64-point array (real,
> imag).
> This was originally running on a TI DSP chip, but we want to reduce the
> form
> factor and eliminate the a lot of the other hardware with one FPGA.
This
> operation is run as emulated floating point math on the 150 MHz DSP, it
> does
> several thousand FFTs per second. On the SP3 and the ML401 designs
running
> the same code (at 50 Mhz and 100 MHz respectively), the same algorithm
> results in about one FFT per second, on both platforms. Granted we're
> comparing apples and oranges, but the difference in speed is
phenominal,
> several orders of magnitude. Now that you have a little perspective,
maybe
> you have a better idea where the problem is. The past month has been
spent
> just trying to get around these massive performance hinderances. Adding
> the
> FPU didn't give any noticable increase in speed. The designs are
running
> out
> of external memory (SDRAM) with cachelink over FSL to the microblaze.
> Granted it takes dozens of clock cycles for every read and write, or at
> least in the worst cases, but that still doesn't add up to the
performance
> I
> see. Some people have asked why we are using the mircoblaze for DSP at
> all,
> but I don't see any architecture restrictions to doing emulated
floating
> point math at a decent speed. Since the clock speeds are different and
the
> memory access times are longer, I would immedaitely expect to have
maybe
> 10%
> of the speed of the 150 MHz DSP, but i get .001% or worse... If you
need
> any
> further info I can go on all day :D.. Any insight you can give would be
> invaluable. In the final implementation I expect to have dedicated
> hardware
> for doing the FFTs, but the microblaze should still be able to do it
> decently, right?
>
> Thanks,
> Andy
>
>
>