[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [microblaze-uclinux] kernel BUG at sched.c:687!



Hi,

> -----Original Message-----
> From: owner-microblaze-uclinux@xxxxxxxxxxxxxx
> [mailto:owner-microblaze-uclinux@xxxxxxxxxxxxxx]On Behalf Of George
> Smith
> Sent: Friday, February 17, 2006 4:47 PM
> To: MicroBlazeUcLinux
> Subject: RE: [microblaze-uclinux] kernel BUG at sched.c:687!
> 
> 
> hi
>  I had the same crash when I run "heavy" ethernet traffic 
> with the smsc
> driver. I was assuming this was a driver problem, but had got a chance
> to get back to it. Does your crash happen under a heavy load?
I think the CPU load isn't the problem because the program has permanent
load in the calculating thread here. I experienced a very high number of
thread switches and ISR calls triggers the bad time frame much earlier.
Here, mainly when I additionally use the interrupt-driven USB 2.0 driver to
transfer data. This means much more thread switches and IRQs. Several thread
switches and switches to the driver are made during one 1 millisecond.
Here's one example of logging (without USB transfer but also with the
crash). You can see the microsecond counter and the appropriate thread name
(IO = main thread polling the drivers; IP = img proc thread, IPS = img proc
starter). The expected behaviour is that I should see the pthread exit
handler (see pthread_cleanup_push()) of the imag proc thread coming up next
and the following call to pthread_join() but it doesn't happen; instead
Linux is crashing in the scheduler again. :-( 
I could imagine that an ISR function in the driver interrupts Linux during a
thread switch and confuses the scheduler which actually wants to operate the
pthread killing procedure.

---snip---
681824  : IO : next aquBuf (07010707)
681931  : IO : TRG enabled
682064  : IO : trigger
682198  : IO : onTrigger(1)
682326  : IO : TRG disabled
682833  : IO : waiting on poll()...
683422  : IP : calling img_proc_run()
691583  : IO : returned from poll() 0x00000002
691783  : IO : IRQ: aquFinished
691966  : IO : waiting on poll()...
694744  : IO : returned from poll() 0x00000004
694934  : IO : onFexResult(1) (07010707)
695126  : IO : startImgProc(1)
695459  : IO : waiting on poll()...
696299  : IP : awaking
699041  : IO : returned from poll() 0x00000200
699232  : IO : timer max. output (1)
699330  : IO : WARNING: timeout for img proc thread, buf=1
699588  : IO : killing img proc thread.
699771  : IO : onResultHandling(1)
700007  : IO : save(1)
700603  : IP : exit handler: priv_img_proc_onKill
701165  : IO : waiting on poll()...
703811  : IPS: returned from pthread_join
707384  : IPS: calling pthread_join
709727  : IO : returned from poll() 0x00000010
709938  : IO : output-pins off
710101  : IO : next aquBuf (07070107)
710212  : IO : TRG enabled
710348  : IO : trigger
710483  : IO : onTrigger(2)
710612  : IO : TRG disabled
710817  : IO : waiting on poll()...
711383  : IP : calling img_proc_run()
719863  : IO : returned from poll() 0x00000002
720069  : IO : IRQ: aquFinished
720244  : IO : waiting on poll()...
723193  : IO : returned from poll() 0x00000004
723384  : IO : onFexResult(2) (07070107)
723577  : IO : startImgProc(2)
723907  : IO : waiting on poll()...
724743  : IP : awaking
726074  : IO : returned from poll() 0x00000020
726308  : IO : waiting on poll()...
727333  : IO : returned from poll() 0x00000400
727526  : IO : timer max. output (2)
727625  : IO : WARNING: timeout for img proc thread, buf=2
727882  : IO : killing img proc thread.
728064  : IO : onResultHandling(2)
728302  : IO : save(2)
374491  : IO : waiting on poll()...
375092  : IO : returned from poll() 0x00000030
375299  : IO : output-pins off
375467  : IO : next aquBuf (07070701)
375578  : IO : TRG enabled
375715  : IO : trigger
375850  : IO : onTrigger(3)
375979  : IO : TRG disabled
376195  : IO : waiting on poll()...
kernel BUG at sched.c:687!

-----------------

Cheers,
F@lk


> 
> gesmith
> 
> 
> On Fri, 2006-02-17 at 09:08, Brettschneider Falk wrote:
> > Hi,
> > 
> > as soon as I strip it down to a test case I don't hit the 
> bad time frame
> > anymore.
> > The kernel is from middle of November 05, after you changed 
> the entry.S
> > stuff. I haven't seen changes in arch/microblaze/kernel 
> after that. The file
> > sched.c wasn't changed since ages.
> > 
> > Now I sometimes also see the output:
> > kernel BUG at sched.c:562!
> > 
> > Furthermore I've seen crashes of the program and the last 
> output was logging
> > from a logically completely senseless function of my 
> program, instead of
> > switching the thread.
> > 
> > Cheers,
> > F@lk
> > 
> > 
> > > -----Original Message-----
> > > From: owner-microblaze-uclinux@xxxxxxxxxxxxxx
> > > [mailto:owner-microblaze-uclinux@xxxxxxxxxxxxxx]On Behalf Of John
> > > Williams
> > > Sent: Friday, February 17, 2006 9:02 AM
> > > To: microblaze-uclinux@xxxxxxxxxxxxxx
> > > Subject: Re: [microblaze-uclinux] kernel BUG at sched.c:687!
> > > 
> > > 
> > > Hi Falk,
> > > 
> > > If you can produce a minimal test case that demonstrates the 
> > > behaviour,
> > > I'll take a look at it.  I realise this may be tricky, but 
> > > isolating it
> > > is half the battle, before fixing it.
> > > 
> > > I believe the platform context switch code to be correct. 
>  There was a
> > > bug fix that I checked in in Nov/Dec last year - I assume you are
> > > running on latest kernel sources?
> > > 
> > > Regards,
> > > 
> > > John
> > > 
> > > Brettschneider Falk wrote:
> > > > Hi,
> > > > I have 4 pthreads with SCHED_RR and different priorities 
> > > and when a certain
> > > > thread is killed due a timeout, I sometimes get 
> > > > 	kernel BUG at sched.c:687!
> > > > and usually just a Linux crash. This all only happens if 
> > > many context
> > > > switches happen during the killing of that pthread. All 
> > > threads heavily use
> > > > mutexes, semaphores and 1 of them uses poll() to wait for 
> > > events of a kernel
> > > > driver.
> > > > 
> > > > Are you sure the microblaze's platform code for context 
> > > switches is really
> > > > OK?
> > > > Otherwise I wish we used kernel 2.6.
> > > > 
> > > > Once I thought I worked around such scheduler problems by 
> > > using pthread_join
> > > > to catch a dying thread but now I often hit a time frame 
> > > where that doesn't
> > > > help either.
> > > > 
> > > > I've played around with some ideas for workaround for the 
> > > last 5 days but
> > > > now I'm stranded again. *sigh*
> > > > 
> > > > Cheers
> > > > F@lk
> > > > ___________________________
> > > > microblaze-uclinux mailing list
> > > > microblaze-uclinux@xxxxxxxxxxxxxx
> > > > Project Home Page : 
> > http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> > > Mailing List Archive :
> > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> > 
> > ___________________________
> > microblaze-uclinux mailing list
> > microblaze-uclinux@xxxxxxxxxxxxxx
> > Project Home Page : 
> http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> > Mailing List Archive :
> > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> > ___________________________
> > microblaze-uclinux mailing list
> > microblaze-uclinux@xxxxxxxxxxxxxx
> > Project Home Page : 
> http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> > Mailing List Archive : 
> http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> -- 
> 
> George Smith
> VP Engineering
> Linear Acoustic, Inc
> 
> ___________________________
> microblaze-uclinux mailing list
> microblaze-uclinux@xxxxxxxxxxxxxx
> Project Home Page : 
> http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> Mailing List Archive : 
> http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> 
___________________________
microblaze-uclinux mailing list
microblaze-uclinux@xxxxxxxxxxxxxx
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/