[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [microblaze-uclinux] kernel BUG at sched.c:687!
Hi,
> -----Original Message-----
> From: owner-microblaze-uclinux@xxxxxxxxxxxxxx
> [mailto:owner-microblaze-uclinux@xxxxxxxxxxxxxx]On Behalf Of George
> Smith
> Sent: Friday, February 17, 2006 4:47 PM
> To: MicroBlazeUcLinux
> Subject: RE: [microblaze-uclinux] kernel BUG at sched.c:687!
>
>
> hi
> I had the same crash when I run "heavy" ethernet traffic
> with the smsc
> driver. I was assuming this was a driver problem, but had got a chance
> to get back to it. Does your crash happen under a heavy load?
I think the CPU load isn't the problem because the program has permanent
load in the calculating thread here. I experienced a very high number of
thread switches and ISR calls triggers the bad time frame much earlier.
Here, mainly when I additionally use the interrupt-driven USB 2.0 driver to
transfer data. This means much more thread switches and IRQs. Several thread
switches and switches to the driver are made during one 1 millisecond.
Here's one example of logging (without USB transfer but also with the
crash). You can see the microsecond counter and the appropriate thread name
(IO = main thread polling the drivers; IP = img proc thread, IPS = img proc
starter). The expected behaviour is that I should see the pthread exit
handler (see pthread_cleanup_push()) of the imag proc thread coming up next
and the following call to pthread_join() but it doesn't happen; instead
Linux is crashing in the scheduler again. :-(
I could imagine that an ISR function in the driver interrupts Linux during a
thread switch and confuses the scheduler which actually wants to operate the
pthread killing procedure.
---snip---
681824 : IO : next aquBuf (07010707)
681931 : IO : TRG enabled
682064 : IO : trigger
682198 : IO : onTrigger(1)
682326 : IO : TRG disabled
682833 : IO : waiting on poll()...
683422 : IP : calling img_proc_run()
691583 : IO : returned from poll() 0x00000002
691783 : IO : IRQ: aquFinished
691966 : IO : waiting on poll()...
694744 : IO : returned from poll() 0x00000004
694934 : IO : onFexResult(1) (07010707)
695126 : IO : startImgProc(1)
695459 : IO : waiting on poll()...
696299 : IP : awaking
699041 : IO : returned from poll() 0x00000200
699232 : IO : timer max. output (1)
699330 : IO : WARNING: timeout for img proc thread, buf=1
699588 : IO : killing img proc thread.
699771 : IO : onResultHandling(1)
700007 : IO : save(1)
700603 : IP : exit handler: priv_img_proc_onKill
701165 : IO : waiting on poll()...
703811 : IPS: returned from pthread_join
707384 : IPS: calling pthread_join
709727 : IO : returned from poll() 0x00000010
709938 : IO : output-pins off
710101 : IO : next aquBuf (07070107)
710212 : IO : TRG enabled
710348 : IO : trigger
710483 : IO : onTrigger(2)
710612 : IO : TRG disabled
710817 : IO : waiting on poll()...
711383 : IP : calling img_proc_run()
719863 : IO : returned from poll() 0x00000002
720069 : IO : IRQ: aquFinished
720244 : IO : waiting on poll()...
723193 : IO : returned from poll() 0x00000004
723384 : IO : onFexResult(2) (07070107)
723577 : IO : startImgProc(2)
723907 : IO : waiting on poll()...
724743 : IP : awaking
726074 : IO : returned from poll() 0x00000020
726308 : IO : waiting on poll()...
727333 : IO : returned from poll() 0x00000400
727526 : IO : timer max. output (2)
727625 : IO : WARNING: timeout for img proc thread, buf=2
727882 : IO : killing img proc thread.
728064 : IO : onResultHandling(2)
728302 : IO : save(2)
374491 : IO : waiting on poll()...
375092 : IO : returned from poll() 0x00000030
375299 : IO : output-pins off
375467 : IO : next aquBuf (07070701)
375578 : IO : TRG enabled
375715 : IO : trigger
375850 : IO : onTrigger(3)
375979 : IO : TRG disabled
376195 : IO : waiting on poll()...
kernel BUG at sched.c:687!
-----------------
Cheers,
F@lk
>
> gesmith
>
>
> On Fri, 2006-02-17 at 09:08, Brettschneider Falk wrote:
> > Hi,
> >
> > as soon as I strip it down to a test case I don't hit the
> bad time frame
> > anymore.
> > The kernel is from middle of November 05, after you changed
> the entry.S
> > stuff. I haven't seen changes in arch/microblaze/kernel
> after that. The file
> > sched.c wasn't changed since ages.
> >
> > Now I sometimes also see the output:
> > kernel BUG at sched.c:562!
> >
> > Furthermore I've seen crashes of the program and the last
> output was logging
> > from a logically completely senseless function of my
> program, instead of
> > switching the thread.
> >
> > Cheers,
> > F@lk
> >
> >
> > > -----Original Message-----
> > > From: owner-microblaze-uclinux@xxxxxxxxxxxxxx
> > > [mailto:owner-microblaze-uclinux@xxxxxxxxxxxxxx]On Behalf Of John
> > > Williams
> > > Sent: Friday, February 17, 2006 9:02 AM
> > > To: microblaze-uclinux@xxxxxxxxxxxxxx
> > > Subject: Re: [microblaze-uclinux] kernel BUG at sched.c:687!
> > >
> > >
> > > Hi Falk,
> > >
> > > If you can produce a minimal test case that demonstrates the
> > > behaviour,
> > > I'll take a look at it. I realise this may be tricky, but
> > > isolating it
> > > is half the battle, before fixing it.
> > >
> > > I believe the platform context switch code to be correct.
> There was a
> > > bug fix that I checked in in Nov/Dec last year - I assume you are
> > > running on latest kernel sources?
> > >
> > > Regards,
> > >
> > > John
> > >
> > > Brettschneider Falk wrote:
> > > > Hi,
> > > > I have 4 pthreads with SCHED_RR and different priorities
> > > and when a certain
> > > > thread is killed due a timeout, I sometimes get
> > > > kernel BUG at sched.c:687!
> > > > and usually just a Linux crash. This all only happens if
> > > many context
> > > > switches happen during the killing of that pthread. All
> > > threads heavily use
> > > > mutexes, semaphores and 1 of them uses poll() to wait for
> > > events of a kernel
> > > > driver.
> > > >
> > > > Are you sure the microblaze's platform code for context
> > > switches is really
> > > > OK?
> > > > Otherwise I wish we used kernel 2.6.
> > > >
> > > > Once I thought I worked around such scheduler problems by
> > > using pthread_join
> > > > to catch a dying thread but now I often hit a time frame
> > > where that doesn't
> > > > help either.
> > > >
> > > > I've played around with some ideas for workaround for the
> > > last 5 days but
> > > > now I'm stranded again. *sigh*
> > > >
> > > > Cheers
> > > > F@lk
> > > > ___________________________
> > > > microblaze-uclinux mailing list
> > > > microblaze-uclinux@xxxxxxxxxxxxxx
> > > > Project Home Page :
> > http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> > > Mailing List Archive :
> > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> >
> > ___________________________
> > microblaze-uclinux mailing list
> > microblaze-uclinux@xxxxxxxxxxxxxx
> > Project Home Page :
> http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> > Mailing List Archive :
> > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> > ___________________________
> > microblaze-uclinux mailing list
> > microblaze-uclinux@xxxxxxxxxxxxxx
> > Project Home Page :
> http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> > Mailing List Archive :
> http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> --
>
> George Smith
> VP Engineering
> Linear Acoustic, Inc
>
> ___________________________
> microblaze-uclinux mailing list
> microblaze-uclinux@xxxxxxxxxxxxxx
> Project Home Page :
> http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> Mailing List Archive :
> http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
>
___________________________
microblaze-uclinux mailing list
microblaze-uclinux@xxxxxxxxxxxxxx
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/