[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[microblaze-uclinux] bad page fault kernel panics
Hi,
I'm using the PetaLogix subversion snapshot with MMU support enabled on
the Spartan-3A DSP 1800 Board with MicroBlaze 7.10.d and I was
accidentally running into bad page fault panics over and over again -
sometimes during boot, sometimes later. In the meantime I found a way
to force this panic by running
# while [ -d / ] ; do ls ; done
which will end in said bad page fault panic sooner or later (but usually
within the first minute being executed)
Registers, Stack and Call Trace look like this:
Stack:
c04cce68 c04d43a0 c04cced0 c7df52b4 00000000 00000000 00000000 c0227a7c
00000000 c0227b4c 00000001 00000007 00010000 00000001 c01ee55c 00000001
c0226000 00000000 00000008 00000000 00000000 c0019ba4 00000000 00000000
Call Trace:
[<c0019ba4>] update_process_times+0x94/0xa0
[<c00098c0>] account_system_time+0x10/0x16c
[<c0001c20>] timer_interrupt+0x60/0x98
[<c002cf64>] handle_IRQ_event+0x54/0xb4
[<c0001c2c>] timer_interrupt+0x6c/0x98
[<c002d070>] __do_IRQ+0xac/0x140
[<c0068334>] new_inode+0xc/0x9c
[<c002cf64>] handle_IRQ_event+0x54/0xb4
[<c00019fc>] do_IRQ+0x3c/0x94
[<c00880c0>] pid_revalidate+0x24/0x10c
[<c0001bfc>] timer_interrupt+0x3c/0x98
[<c0059428>] do_lookup+0x84/0x1e4
[<c0005ec4>] irq_call+0x0/0x8
[<c005b2f0>] __link_path_walk+0x844/0xee8
[<c005b394>] __link_path_walk+0x8e8/0xee8
[<c0089d08>] proc_pid_lookup+0x30/0x1d4
[<c00f6e4c>] number+0x2c0/0x3f4
[<c00f7a98>] __umodsi3+0x8c/0xbc
[<c005b2f0>] __link_path_walk+0x844/0xee8
[<c00f730c>] vsnprintf+0x38c/0x6dc
[<c0001bfc>] timer_interrupt+0x3c/0x98
[<c005ba94>] link_path_walk+0x100/0x28c
[<c0019b84>] update_process_times+0x74/0xa0
[<c00f7730>] sprintf+0x30/0x44
[<c0001c2c>] timer_interrupt+0x6c/0x98
[<c0088a1c>] proc_self_follow_link+0x28/0x54
[<c006855c>] touch_atime+0xc0/0x150
[<c005ad64>] __link_path_walk+0x2b8/0xee8
[<c005ba94>] link_path_walk+0x100/0x28c
[<c00074d4>] do_page_fault+0x2e0/0x444
[<c005bea0>] do_path_lookup+0xac/0x2a0
[<c0005ae8>] page_fault_instr_trap+0x1e8/0x1f0
[<c0056cc4>] do_execve+0x3c/0x238
[<c005cdf0>] __path_lookup_intent_open+0x64/0xd8
[<c005cdc4>] __path_lookup_intent_open+0x38/0xd8
[<c005cf00>] path_lookup_open+0x8/0x1c
[<c0056694>] open_exec+0x28/0x108
[<c0056cd8>] do_execve+0x50/0x238
[<c0005240>] sys_execve_wrapper+0x0/0x10
[<c0056cc4>] do_execve+0x3c/0x238
[<c0001f9c>] sys_execve+0x54/0x94
[<c0001f6c>] sys_execve+0x24/0x94
[<c0004f78>] sc+0x10/0x18
[<c0004f78>] sc+0x10/0x18
Oops: kernel access of bad area, sig: 11
Registers dump: mode=1
r1=C02279CC, r2=00000000, r3=00000861, r4=0000010F
r5=00000007, r6=00000800, r7=C02279E4, r8=00000018
r9=00000001, r10=C0226000, r11=000041AA, r12=C0003AF0
r13=00000000, r14=C022796C, r15=C0005AE8, r16=49CAF500
r17=00000001, r18=00000000, r19=00000007, r20=4817FFF4
r21=00000000, r22=00000000, r23=00000000, r24=00000001
r25=00000000, r26=FFFFFFFF, r27=00000002, r28=0000000A
r29=00000003, r30=0000000E, r31=C03287B0, rPC=C0003B08
msr=000041AA, ear=0000010F, esr=000000B2, fsr=C6AA9E80
Kernel panic - not syncing: Aiee, killing interrupt handler!
<0>Rebooting in 120 seconds..
The Program Counter is always pointing to the same address, which is
inside the _unaligned_data_exception code as objdump shows:
c0003af0 <_unaligned_data_exception>:
c0003af0: a50303e0 andi r8, r3, 992
c0003af4: 65080002 bsrli r8, r8, 2
c0003af8: a4c30400 andi r6, r3, 1024
c0003afc: be260068 bneid r6, 104 // c0003b64
c0003b00: a4c30800 andi r6, r3, 2048
c0003b04 <ex_lw_vm>:
c0003b04: be060034 beqid r6, 52 // c0003b38
c0003b08: e0a40000 lbui r5, r4, 0 <--- here it is
c0003b0c: b000c01e imm -16354
c0003b10: 30c0ce40 addik r6, r0, -12736
c0003b14: f0a60000 sbi r5, r6, 0
c0003b18: e0a40001 lbui r5, r4, 1
c0003b1c: f0a60001 sbi r5, r6, 1
c0003b20: e0a40002 lbui r5, r4, 2
c0003b24: f0a60002 sbi r5, r6, 2
c0003b28: e0a40003 lbui r5, r4, 3
c0003b2c: f0a60003 sbi r5, r6, 3
c0003b30: b8100020 brid 32 // c0003b50
c0003b34: e8660000 lwi r3, r6, 0
I'm a bit confused as the ESR value is 0xb2 which indicates a Data TLB
Miss Exception, but it gets stuck executing the handler function for
Unaligned Data Exception - which again might be caused by the 0x10f in
r4 which shouldn't be valid address anyway. (other times I got 0xff,
0x10b, 0x102, 0xffffff79 etc. in r4)
After the 120 seconds, when the kernel calls its reboot function, the
Stack and Call Trace looks like the following - where you can see the
exception handler functions:
Stack:
c001ed7c c01ad2c4 00003c96 00001998 00003330 00008000 00000000 c000cc18
c000cbec 4817fff4 00000000 c03287b0 c0227934 0000000b c0010d6c c01ae8d8
00000078 c0375ac8 00000000 00005000 00000000 c02277c4 c0226000 0000000b
Call Trace:
[<c001ed7c>] emergency_restart+0xc/0x20
[<c000cc18>] panic+0x154/0x1dc
[<c000cbec>] panic+0x128/0x1dc
[<c0010d6c>] do_exit+0x624/0x93c
[<c0004060>] die+0x90/0x98
[<c000404c>] die+0x7c/0x98
[<c00071e8>] bad_page_fault+0xcc/0xd8
[<c0003b08>] ex_lw_vm+0x4/0x34
[<c0007328>] do_page_fault+0x134/0x444
[<c0003b08>] ex_lw_vm+0x4/0x34
[<c0005ae8>] page_fault_instr_trap+0x1e8/0x1f0
[<c00081dc>] task_running_tick+0x17c/0x2a8
[<c0003af0>] _unaligned_data_exception+0x0/0x14
[<c0005ae8>] page_fault_instr_trap+0x1e8/0x1f0
[<c0003b08>] ex_lw_vm+0x4/0x34
[<c0019ba4>] update_process_times+0x94/0xa0
[<c00098c0>] account_system_time+0x10/0x16c
[<c0001c20>] timer_interrupt+0x60/0x98
...
So my guess is, the actual panic is caused by a missing entry in the
__ex_table section for that address (0xc0003b08), but I doubt that this
is the real reason - as mentioned above, the data in r4 doesn't look
like valid addresses. However, the Call Trace always starts with an
interrupt handling function - currently always some timer interrupt
handler, but I also had a PS/2 keyboard driver and pressing some keys
forced this panic as well, showing the keyboard ISR functions. For me
it looks like some bad interaction between page fault handling and
interrupts, but that's just another guess..
Now basically my question is, if that's a known issue or at least if
the situation can be reproduced by someone else - or if the problem
might be within my hardware configuration or ..well, dunno.. any hint
is welcome. If more information is needed, let me know.
Thanks in advance,
Sven
___________________________
microblaze-uclinux mailing list
microblaze-uclinux@xxxxxxxxxxxxxx
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/