[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [microblaze-uclinux] kernel BUG at sched.c:687!
Hi,
I do get "kernel BUG at sched.c:562" after the system runs for few
hours. Unfortunately i couldnt characterize it as it appers randomly
or i couln't repeat the problem through a test case.
How can i debug when the system hangs at this points. Does the magic
SysRq key works for uclinux?.
What typically causes this "kernel BUG at sched.c:562".
Thanx
- Prasad
On 2/20/06, Brettschneider Falk <fbrettschneider@xxxxxxxxxxxxxxxxx> wrote:
> Hi,
>
> > -----Original Message-----
> > From: owner-microblaze-uclinux@xxxxxxxxxxxxxx
> > [mailto:owner-microblaze-uclinux@xxxxxxxxxxxxxx]On Behalf Of George
> > Smith
> > Sent: Friday, February 17, 2006 4:47 PM
> > To: MicroBlazeUcLinux
> > Subject: RE: [microblaze-uclinux] kernel BUG at sched.c:687!
> >
> >
> > hi
> > I had the same crash when I run "heavy" ethernet traffic
> > with the smsc
> > driver. I was assuming this was a driver problem, but had got a chance
> > to get back to it. Does your crash happen under a heavy load?
> I think the CPU load isn't the problem because the program has permanent
> load in the calculating thread here. I experienced a very high number of
> thread switches and ISR calls triggers the bad time frame much earlier.
> Here, mainly when I additionally use the interrupt-driven USB 2.0 driver to
> transfer data. This means much more thread switches and IRQs. Several thread
> switches and switches to the driver are made during one 1 millisecond.
> Here's one example of logging (without USB transfer but also with the
> crash). You can see the microsecond counter and the appropriate thread name
> (IO = main thread polling the drivers; IP = img proc thread, IPS = img proc
> starter). The expected behaviour is that I should see the pthread exit
> handler (see pthread_cleanup_push()) of the imag proc thread coming up next
> and the following call to pthread_join() but it doesn't happen; instead
> Linux is crashing in the scheduler again. :-(
> I could imagine that an ISR function in the driver interrupts Linux during a
> thread switch and confuses the scheduler which actually wants to operate the
> pthread killing procedure.
>
> ---snip---
> 681824 : IO : next aquBuf (07010707)
> 681931 : IO : TRG enabled
> 682064 : IO : trigger
> 682198 : IO : onTrigger(1)
> 682326 : IO : TRG disabled
> 682833 : IO : waiting on poll()...
> 683422 : IP : calling img_proc_run()
> 691583 : IO : returned from poll() 0x00000002
> 691783 : IO : IRQ: aquFinished
> 691966 : IO : waiting on poll()...
> 694744 : IO : returned from poll() 0x00000004
> 694934 : IO : onFexResult(1) (07010707)
> 695126 : IO : startImgProc(1)
> 695459 : IO : waiting on poll()...
> 696299 : IP : awaking
> 699041 : IO : returned from poll() 0x00000200
> 699232 : IO : timer max. output (1)
> 699330 : IO : WARNING: timeout for img proc thread, buf=1
> 699588 : IO : killing img proc thread.
> 699771 : IO : onResultHandling(1)
> 700007 : IO : save(1)
> 700603 : IP : exit handler: priv_img_proc_onKill
> 701165 : IO : waiting on poll()...
> 703811 : IPS: returned from pthread_join
> 707384 : IPS: calling pthread_join
> 709727 : IO : returned from poll() 0x00000010
> 709938 : IO : output-pins off
> 710101 : IO : next aquBuf (07070107)
> 710212 : IO : TRG enabled
> 710348 : IO : trigger
> 710483 : IO : onTrigger(2)
> 710612 : IO : TRG disabled
> 710817 : IO : waiting on poll()...
> 711383 : IP : calling img_proc_run()
> 719863 : IO : returned from poll() 0x00000002
> 720069 : IO : IRQ: aquFinished
> 720244 : IO : waiting on poll()...
> 723193 : IO : returned from poll() 0x00000004
> 723384 : IO : onFexResult(2) (07070107)
> 723577 : IO : startImgProc(2)
> 723907 : IO : waiting on poll()...
> 724743 : IP : awaking
> 726074 : IO : returned from poll() 0x00000020
> 726308 : IO : waiting on poll()...
> 727333 : IO : returned from poll() 0x00000400
> 727526 : IO : timer max. output (2)
> 727625 : IO : WARNING: timeout for img proc thread, buf=2
> 727882 : IO : killing img proc thread.
> 728064 : IO : onResultHandling(2)
> 728302 : IO : save(2)
> 374491 : IO : waiting on poll()...
> 375092 : IO : returned from poll() 0x00000030
> 375299 : IO : output-pins off
> 375467 : IO : next aquBuf (07070701)
> 375578 : IO : TRG enabled
> 375715 : IO : trigger
> 375850 : IO : onTrigger(3)
> 375979 : IO : TRG disabled
> 376195 : IO : waiting on poll()...
> kernel BUG at sched.c:687!
>
> -----------------
>
> Cheers,
> F@lk
>
>
> >
> > gesmith
> >
> >
> > On Fri, 2006-02-17 at 09:08, Brettschneider Falk wrote:
> > > Hi,
> > >
> > > as soon as I strip it down to a test case I don't hit the
> > bad time frame
> > > anymore.
> > > The kernel is from middle of November 05, after you changed
> > the entry.S
> > > stuff. I haven't seen changes in arch/microblaze/kernel
> > after that. The file
> > > sched.c wasn't changed since ages.
> > >
> > > Now I sometimes also see the output:
> > > kernel BUG at sched.c:562!
> > >
> > > Furthermore I've seen crashes of the program and the last
> > output was logging
> > > from a logically completely senseless function of my
> > program, instead of
> > > switching the thread.
> > >
> > > Cheers,
> > > F@lk
> > >
> > >
> > > > -----Original Message-----
> > > > From: owner-microblaze-uclinux@xxxxxxxxxxxxxx
> > > > [mailto:owner-microblaze-uclinux@xxxxxxxxxxxxxx]On Behalf Of John
> > > > Williams
> > > > Sent: Friday, February 17, 2006 9:02 AM
> > > > To: microblaze-uclinux@xxxxxxxxxxxxxx
> > > > Subject: Re: [microblaze-uclinux] kernel BUG at sched.c:687!
> > > >
> > > >
> > > > Hi Falk,
> > > >
> > > > If you can produce a minimal test case that demonstrates the
> > > > behaviour,
> > > > I'll take a look at it. I realise this may be tricky, but
> > > > isolating it
> > > > is half the battle, before fixing it.
> > > >
> > > > I believe the platform context switch code to be correct.
> > There was a
> > > > bug fix that I checked in in Nov/Dec last year - I assume you are
> > > > running on latest kernel sources?
> > > >
> > > > Regards,
> > > >
> > > > John
> > > >
> > > > Brettschneider Falk wrote:
> > > > > Hi,
> > > > > I have 4 pthreads with SCHED_RR and different priorities
> > > > and when a certain
> > > > > thread is killed due a timeout, I sometimes get
> > > > > kernel BUG at sched.c:687!
> > > > > and usually just a Linux crash. This all only happens if
> > > > many context
> > > > > switches happen during the killing of that pthread. All
> > > > threads heavily use
> > > > > mutexes, semaphores and 1 of them uses poll() to wait for
> > > > events of a kernel
> > > > > driver.
> > > > >
> > > > > Are you sure the microblaze's platform code for context
> > > > switches is really
> > > > > OK?
> > > > > Otherwise I wish we used kernel 2.6.
> > > > >
> > > > > Once I thought I worked around such scheduler problems by
> > > > using pthread_join
> > > > > to catch a dying thread but now I often hit a time frame
> > > > where that doesn't
> > > > > help either.
> > > > >
> > > > > I've played around with some ideas for workaround for the
> > > > last 5 days but
> > > > > now I'm stranded again. *sigh*
> > > > >
> > > > > Cheers
> > > > > F@lk
> > > > > ___________________________
> > > > > microblaze-uclinux mailing list
> > > > > microblaze-uclinux@xxxxxxxxxxxxxx
> > > > > Project Home Page :
> > > http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> > > > Mailing List Archive :
> > > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> > >
> > > ___________________________
> > > microblaze-uclinux mailing list
> > > microblaze-uclinux@xxxxxxxxxxxxxx
> > > Project Home Page :
> > http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> > > Mailing List Archive :
> > > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> > > ___________________________
> > > microblaze-uclinux mailing list
> > > microblaze-uclinux@xxxxxxxxxxxxxx
> > > Project Home Page :
> > http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> > > Mailing List Archive :
> > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> > --
> >
> > George Smith
> > VP Engineering
> > Linear Acoustic, Inc
> >
> > ___________________________
> > microblaze-uclinux mailing list
> > microblaze-uclinux@xxxxxxxxxxxxxx
> > Project Home Page :
> > http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> > Mailing List Archive :
> > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> >
> ___________________________
> microblaze-uclinux mailing list
> microblaze-uclinux@xxxxxxxxxxxxxx
> Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
>
>
___________________________
microblaze-uclinux mailing list
microblaze-uclinux@xxxxxxxxxxxxxx
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/