[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [microblaze-uclinux] Patch for stability under heavy IRQ load



Hi John,

I'm going to apply the patch since I have this problem, but I don't understand 
why you say r12 was not restored. I'm working with an older version of 
entry.S but I was sure this was done, so I have download the last entry.S and 
you restore r12 as I though,
 
TRAP_STATE_RESTORER->RESTORE_CALL_CLOBBERED_REGS_NO_RVAL->RESTORE_CALL_CLOBBERED_REGS_AFTER_RVAL

I did not have time lastly, but I had detected the problem was always 
returning from sys_execve but, at first sight, without signals interfering.

I'll return my results.

Thanks and regards. 

On Monday 28 August 2006 00:37, John Williams wrote:
> Hi everyone,
>
> In the course of debugging some issues for a PetaLogix customer, I've
> identifed a few subtle kernel bugs in the signal handling code that were
> causing hangs/bad behaviour under very high interrupt loads (1KHz and
> greater).
>
> Several people have reported such issues in the past, but it took a week of
> focussed debugging to track down exactly what was wrong.
>
> To that end, I'd appreciated it if those who have experienced issues could
> please try the attached patch, which will go cleanly over any recent CVS
> kernel. It would also be helpful for any others to try it, and make sure it
> doesn't break anything for you (shouldn't do, but the more testing the
> better!).
>
> For those interested, the bug manifested itself in two ways:
>
> 1. If an interrupt occurred between the delivery of a signal to a process,
> and the actual handling of that signal (two seperate things in the kernel),
> then the signal handler would be called at an address that was 4 bytes
> "short" of the proper address.  This would typically cause a stack
> corruption, which sometimes was fatal, and sometimes not.
>
> The fix was to unify handling of return address offsets between interrupts
> and system calls - by adding 4 to the "old PC" address on entry to a
> syscall, the return from interrupts and system calls could be unified to
> have the same offset - "rtbd r14, 0" and "rtid r14, 0".
>
> 2. In the syscall restart code, the syscall number was being saved back
> onto the stack frame in the saved "r12" slot, however r12 was not actually
> being restored in the syscall return path.  It still worked due to a subtle
> interaction with bug #1 above, again except under high IRQ loads when there
> was a narrow interrupt window that could cause problems.
>
> Cheers,
>
> John

-- 
Alejandro Lucero
Director Técnico
+34 665 68 71 68
Valencia (SPAIN)
www.os3sl.com

___________________________
microblaze-uclinux mailing list
microblaze-uclinux@xxxxxxxxxxxxxx
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/