[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [microblaze-uclinux] Jtag Debug



Andy -

You are missing the barrel shifter in your MHS. This should improve the performance quite a bit. The FPU is enabled, but you have to make sure your code uses single precision arithmetic (define all constants as 2.0F and not 2.0 as it will default to double, use float types, etc). You should be able to confirm that you are actually using the fpu by grepping for fp instructions in your ELF file.

mb-objdump -d fft.elf  and search for fadd, frsub, fmul

If you don't see those instructions, then you are not using the FPU.

-Siva

Xilinx, Inc.

On 10/26/05, Abot Botbot <dejanigma@xxxxxxxxxxx> wrote:
Paul,

  Yea the design was compiled in EDK using BSB. I didn't want to get into
specifics right away, but I can give you a little perspective. The algorithm
I'm using computes a floating point FFT on a 2x64-point array (real, imag).
This was originally running on a TI DSP chip, but we want to reduce the form
factor and eliminate the a lot of the other hardware with one FPGA. This
operation is run as emulated floating point math on the 150 MHz DSP, it does
several thousand FFTs per second. On the SP3 and the ML401 designs running
the same code (at 50 Mhz and 100 MHz respectively), the same algorithm
results in about one FFT per second, on both platforms. Granted we're
comparing apples and oranges, but the difference in speed is phenominal,
several orders of magnitude. Now that you have a little perspective, maybe
you have a better idea where the problem is. The past month has been spent
just trying to get around these massive performance hinderances. Adding the
FPU didn't give any noticable increase in speed. The designs are running out
of external memory (SDRAM) with cachelink over FSL to the microblaze.
Granted it takes dozens of clock cycles for every read and write, or at
least in the worst cases, but that still doesn't add up to the performance I
see. Some people have asked why we are using the mircoblaze for DSP at all,
but I don't see any architecture restrictions to doing emulated floating
point math at a decent speed. Since the clock speeds are different and the
memory access times are longer, I would immedaitely expect to have maybe 10%
of the speed of the 150 MHz DSP, but i get .001% or worse... If you need any
further info I can go on all day :D.. Any insight you can give would be
invaluable. In the final implementation I expect to have dedicated hardware
for doing the FFTs, but the microblaze should still be able to do it
decently, right?

Thanks,
Andy