[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [microblaze-uclinux] Re: multiprocessing



On Tue, 2004-04-06 at 19:10, John Williams wrote:
> Hi Matthew, Greg
> 
> I'm side-stepping the fork/vfork stuff... :-)

	I'm done with it, too :).


> Greg Ungerer wrote:
> 
> > Matthew Rubenstein wrote:
> >>
> 
> >> I'm more excited
> >> about a MicroBlaze program that can retrieve FPGA config files from disk
> >> or memory, reconfig the remaining unallocated gates, and call those
> >> functions during execution. 
> 
> I'm basically doing this right now.  The mechanics are easy, the hard 
> part is actually generating and manipulating the partial bitstreams.  I 
> can pull bitstreams from memory file systems, network file systems, 
> webservers, whatever.  Linux doesn't care, that's why it's so powerful.
> 
> My microblaze board currently has a device node /dev/self ... I cat 
> bitstreams into it, and microblaze reconfigures itself...
> 
> But... even Xilinx, who know more about their devices, tools and data 
> structures than anyone else, are still working on things that are almost 
> trivial in the software analogue, like runtime relocation and so on.

	Runtime relocation is for iterative processing, which has a linear
address space. Why not just allow the gate network to be routed relative
to itself? As long as there's room on the chips, within the right
perimeter shape, for more gates to be configured, does the geometry need
to be changed, retaining the topology? If the "virgin" gates have varied
embedded features in their geometry, then that's a perfect algorithm to
run on the MicroBlaze: revise the geometry while retaining the topology.
Such an enabling app would change the FPGA world.


> >> Even if the program just starts up, gets the
> >> latest version of the glue logic for its neighboring support chips, and
> >> continues its execution only in its uniprocessing MicroBlaze CPU thread,
> >> the savings and efficiency of downloadable, onchip glue logic will be
> >> superior to hardcoded CPUs in many cutting edge applications.     
> 
> I'm very interested in ideas like spawning new processors for new 
> threads/tasks, but sometimes doubt if the overhead will be worth the 
> gain.  In ultra low power applications it might be, but then if you are 
> talking ultra-low power, you probably aren't using FPGAs (yet).  It's no 
> secret that current-gen FPGAs have a power consumption problem.

	I'm talking about a one-shot download/install of the glue FPGAs, on
booting uCLinux, surviving until the next upgrade. Not every time any
app runs - why bother, when the neighboring support chips don't change,
don't require new glue? Something like this could make automotive
intelligence upgrades much more affordable, from eliminating time spent
pulling boards, to reducing the skillset needed to upgrade (jack in
client bench computer, upload new glue & apps, reboot, rather than
mechanical upgrading), to geographically centralizing the techs and
actually fixing over the Internet with downloads. Upgrade your "crashed"
car over your phone by the roadside! And of course other industries
would follow suit.


> It's got to be something a bit more left-field than that...

	I've got it, coach, I've got it!


> >> Even 
> >> more exciting is loading multiprocessing logic that can be called
> >> from a single function call in the uniprocessing program offers to
> >> balance the best of both worlds of uni/multiprocessing, in the same
> >> application. Then the architectural opportunities for runtime feedback
> >> to the gates configuration offers a new level of intelligence to these
> >> programs, while leveraging all the applicable techniques and software
> >> from the last generations of hardcoded hardware. 
> 
> I agree whole-heartedly, but until we find ways of managing the 
> complexity, ideas like this will remain stuck on the workbenches of 
> laboratories around the world..

	Calling multiproc gate networks from iterated core apps *is* the way to
manage the complexity: Port a uCLinux/ARM app to uCLinux/MicroBlaze.
Migrate some parallelizable functions from the iterated instructions to
parallel gates. Test. Migrate some more. Test. Keep it up until there's
just an iterated loop, like main(), and maybe some other linear
functions, like device access control logic.

	And an executable flowchart is a way to "program" new apps which are
obviously decomposable into non/parallel schematic blocks, for direction
to the appropriate compiler/synthesizer. That manages the complexity,
like an IDL compiler makes even CORBA development manageable.


> Some people think the answer is a new architecture, like PACT/XPP and 
> the one from Quicksilver, coarse-grained devices designed from the 
> outset to support dynamic reconfig.  Indeed. designed to *depend* on 
> dynamic reconfig.  Perhaps some blend of the coarse- and fine-grained 
> logic will be the answer, I'm not sure.

	uCL/MB is that new architecture. It's probably even overkill compared
with a Virtex+PPC, but it's past the boundary of the old threshold,
rather than straddling it.


> >> For example, a program
> >> running in MicroBlaze has to decode an audio stream. It reads the
> >> header, detects the format, then loads the decoder for that format from
> >> disk to the gates. 
> 
> Sure, this is possible with current tools, processes and devices.  Not 
> easy, but possible.

	Well, it's not possible to get the performace:$ for multiple streams
(7.1 audio, +video, +subtitles, etc) from an iterated processor as it is
from an FPGA. But cost in $ and time to market for porting the whole app
to FPGA is prohibitive. And keeping all the codecs in FPGA while only
one is used, perhaps for the lifetime of the installed unit, without
knowing which codec is the one for that unit in advance, is cost
prohibitive. ASICs can't do it, either. And with Internet downloadable
codecs, or media-objects encapsulating bundled codecs, something like
MB/uCL is the only way to go. The demand is there.


> >> Realtime DSP becomes possible, and the benefits that
> >> Intel crowed about for "NSP" (Native Signal Processing) actually arrive
> >> intact. Next track is in a different format? Just overwrite the decoder
> >> gates with that format's decoder from a disk file. And the rest of the
> >> program, which doesn't vary with IO conditions, is a direct port of a
> >> free/open PowerPC app. (I shiver with excitement :).
> 
> Absolutely.  We are on the same page :)
> 
> > For me this seems a little too easy. I figure we can almost do this now.
> > Loading things on demand that we already know about (we know in advance
> > what codecs will be used in the above example). It would be much more
> > interresting if we could be clever about this on the fly.

	See above. If it's too easy, let's just do it to get it out of the way;
no one else thinks it's easy, and we'd blow their minds.


> I gave a seminar recently on some of this stuff I'm doing, and made 
> reference to possible analogies between software and hardware.  40/50 
> years ago when people programmed bootloaders with dip switches and push 
> buttons, they might have considered something like loadable kernel 
> modules and run time code relocation to be impossible.  Or if not 
> impossible, so overwhelmingly complex to never be useful.  What about 
> complex software systems that run on interpreted languages?
> 
> Are we at the same point in hardware now?  We are starting to see what 
> could be done, and the tools and devices are beginning to let us 
> experiment with it.
> 
> What is the hardware analog of an interpreted language?  What is the 
> analog of a byte-code language like Java?  Object orientation?  What 
> about portability?  Projects like MPGA (meta PGA) start to look more and 
> more interesting when you think like this...

	We are at the point of Intel's 4004 in 1970. We're looking at the apps
from the point of view of the limitations of the last generation (like
nifty ICs), and the excitement about leaving them behind with a new
generation of manufacturing integration and programming opportunities.
But this is a qualitative leap, not merely a scaled up quantative step.
We can think in terms of what people want to do that these devices can
be made to do, rather than just what we can do, and how do we find
someone who wants them?


> Back to Greg's point, I think we need to constantly ask ourselves, 
> "Beyond being cool, is it useful?"  I truly hope and usually believe the 
> answer to be "yes", but it's our job to prove it.
> 
> It can be helpful to play "let's pretend".  Let's pretend that dynamic 
> hardware module placement is trivial.  Let's pretend that I can turn a 
> VHDL description of a circuit into real logic, on the fly, self-hosted 
> on a microblaze system.
> 
> Now what?
> 
> What does it buy me?  What can I do now, that I couldn't do before? 
> What problems can I now solve, that were intractable previously?

	How about the above codec on-the-fly app, but for *recording*? Analyze
the incoming data, build a compression algorithm customized for that
signal dataset in gates, compress the data, bundle the compression gates
with the compressed dataset. Infeasible with really expensive, fast
iterative processors. Intractible for cheap, slow iterative processors.
Doable with MB/uCL.


> Is it enough to rely on the evolutionary progress in silicon technology? 
>   Will the world's ASIC vendors give up and let Xilinx and Altera spend 
> the dollars chasing Moore's Law, and we'll just use the abundance of 
> gates in a wasteful manner, the same way that we do with memory and 
> cycles in software?

	Yes, it is. Yes, we will. Waste is a sin only in an economy of
scarcity. Why kill only what you can eat, if noone misses the prey? I'd
die every day, if I could ;).


On Tue, 2004-04-06 at 23:07, Greg Ungerer wrote: 

> Hi John,
> > John Williams wrote:
> > >> Well for one reason because no I can put on 1 chip all the periperhals
> > >> that I need to use X chips for now. Insetad of laying down a CPU, a
> > >> network engine, etc, I can lay down a single FPGA that is everything
> > >> on one. One day it may even be cheaper :-)
> > > 
> > > 
> > > I think Greg has captured the reality of FPGA-based systems at the 
> > > moment.  Xilinx' marketing line for Spartan-3 is "Make it your ASIC". 
> > > For me that says it all, about where they want this device to live.
> > 
> > Makes sense really. This is the market "niche" that they have
> > grown from.

	And the market where all the money still is. That money is feeding
development of the brave new world we've been promising for 15 years,
but which now seems to be dawning on a landscape missing only tools.


> > Ofcourse we know it can do so much more :-)

	That's why Xilinx doesn't compete with the developers like you, who
create the demand for their chips by feeding demand for apps.


> > >> Or maybe I can lay down 4 or 8 or more Microblaze cores on a single
> > >> FPGA and do something clever with traditional SMP.
> > > 
> > > 
> > > Hey Greg, I delivered your Microblaze SMP hardware already remember?! :)
> > > 
> > > This would be the first SMP uClinux system right?  Let's do it!
> > 
> > It would. We had better get it finished before someone else
> > comes along with another SMP setup :-)
> > 
> > Greg

	Imagine a Beowolf cluster of these... And why not? That's what Linux
open source apps are for.


> John
> ___________________________
> microblaze-uclinux mailing list
> microblaze-uclinux@itee.uq.edu.au
> Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
-- 


(C) Matthew Rubenstein

___________________________
microblaze-uclinux mailing list
microblaze-uclinux@itee.uq.edu.au
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/