[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [microblaze-uclinux] Kernel module driver crash



If this is not too much off topic I would post here some simple steps to trigger the errors I was talking about.

1) transfer some data from pc to board
2) transfer some data from board to pc

for 1) I use a "while true; do wget http://<server_addr>/image.ub; done"
for 2) I use a "while true; do rm <somefile>*; wget http://<board_addr>/somefile; done"

I use boa as web server and put the <somefile> I want to transfer in /home/httpd (a symlink should suffice).
The two tasks are run simultaneously.

After some time of activity I get, for example:

Data bus error exception in kernel mode.                                        
Oops: bus exception, sig: 7                                                     
 Registers dump: mode=47AA0938                                                  
 r1=0000007C, r2=00000005, r3=00000076, r4=00000000                             
 r5=FFFFFFFC, r6=47AA0938, r7=00000005, r8=4F444946                             
 r9=434F4E4E, r10=47AA092C, r11=45435449, r12=00000004                          
 r13=000005A0, r14=46EAB024, r15=46EA1470, r16=00000000                         
 r17=46EAB038, r18=00000016, r19=46EA1778, r20=46EA15B4                         
 r21=47AA08DF, r22=47AA0000, r23=00000088, r24=00000000                         
 r25=47AA08CA, r26=47AA0000, r27=47AA0938, r28=47AA0000                         
 r29=0000007C, r30=00000000, r31=47AA08CA, rPC=46EA5028                         
 msr=46EA4BF8, ear=46EA4B44, esr=46EA52FC, fsr=47AA092C                         
Oops: Exception in kernel mode, sig: 4                                          
 Registers dump: mode=47EFFFC0                                                  
 r1=00000000, r2=4420DE68, r3=4420DE68, r4=4420DE68                             
 r5=00000000, r6=47FE2000, r7=55CD3FFF, r8=00000000                             
 r9=00000000, r10=000000F8, r11=00000000, r12=000005A0                          
 r13=000000F8, r14=47EE5608, r15=47FEBC14, r16=00000082                         
 r17=47FEBC28, r18=3BDF211A, r19=47D95820, r20=00000001                         
 r21=4400FC9C, r22=7E3FCD80, r23=44208BF4, r24=47FEBCC8                         
 r25=00000000, r26=440018BC, r27=FFFFFFFF, r28=47FE3EA0                         
 r29=440018C8, r30=00000001, r31=00000000, rPC=00000000                         
 msr=47FE3EA0, ear=47FEBC14, esr=47FEBC14, fsr=44004250                         
Kernel panic - not syncing: Attempted to kill init!                             
 <0>Rebooting in 120 seconds..Machine restart...                                
                                                                                
Stack:                                                                          
  47fe3d20 00000004 44018818 00000000 440ca64c 440190c8 44193244 0000402f       
  009f7ef6 00008000 00003f90 0000009f 44009904 0000119e ffffffff 00000001       
  00000000 00000010 0001d4bf 4400c9b0 441945a8 00000078 44209520 00000000       
Call Trace:                                                                     
[<44018818>] atomic_notifier_call_chain+0x8/0x1c                                
[<440ca64c>] bust_spinlocks+0x58/0x78                                           
[<440190c8>] emergency_restart+0xc/0x20                                         
[<44009904>] panic+0x180/0x218                                                  
[<4400c9b0>] do_exit+0x360/0x8cc                                                
[<440032f8>] die+0x68/0x70                                                      
[<440032e4>] die+0x54/0x70                                                      
[<44003370>] _exception+0x70/0x80                                               
[<44003424>] full_exception+0xa4/0x1a8                                          
[<44116db8>] netif_receive_skb+0x278/0x284                                      
[<44004250>] _interrupt+0x110/0x118                                             
[<44024004>] hrtimer_get_remaining+0x34/0x8c                                    
[<44116e84>] process_backlog+0xc0/0x1c4                                         
[<4410e9dc>] __alloc_skb+0x60/0x14c                                             
[<4411701c>] net_rx_action+0x94/0x164                                           
[<440effb8>] RecvHandler+0xec/0x1b8                                             
[<440eff80>] RecvHandler+0xb4/0x1b8                                             
[<4400fab4>] __do_softirq+0x28/0x3c                                             
[<44002ecc>] handle_other_ex+0x38/0x88                                          
[<4400fa58>] __do_softirq2+0xb8/0xec                                            
[<4400fc9c>] irq_exit+0x40/0x54                                                 
[<440018bc>] do_IRQ+0x40/0x98                                                   
[<440018c8>] do_IRQ+0x4c/0x98                                                   
[<44004250>] _interrupt+0x110/0x118                                             
[<4400fab4>] __do_softirq+0x28/0x3c                                             
[<4402b158>] handle_IRQ_event+0x54/0xb8                                         

Or, on another run:

BUG: soft lockup detected on CPU#0!                                             
                                                                                
Stack:                                                                          
  469f3b38 00000000 00000000 00000000 0006f38d 44014030 44196730 00000000       
  00000000 00000000 00000001 469f2000 00000000 47bc02e4 4401407c 00000000       
  00000001 00000000 00000000 469f3b80 47bc02e4 44001c78 00000000 00000000       
Call Trace:                                                                     
[<44014030>] run_local_timers+0x18/0x2c                                         
[<4401407c>] update_process_times+0x38/0xa4                                     
[<44001c78>] timer_interrupt+0x54/0x8c                                          
[<4402b158>] handle_IRQ_event+0x54/0xb8                                         
[<4400fb14>] do_softirq+0x4c/0x60                                               
[<4402b250>] __do_IRQ+0x94/0x12c                                                
[<4400fc9c>] irq_exit+0x40/0x54                                                 
[<440018bc>] do_IRQ+0x40/0x98                                                   
[<440018c8>] do_IRQ+0x4c/0x98                                                   
[<4400fc9c>] irq_exit+0x40/0x54                                                 
[<44004250>] _interrupt+0x110/0x118                                             
[<440018c8>] do_IRQ+0x4c/0x98                                                   
[<4402d5bc>] add_to_page_cache+0x134/0x144                                      
[<4402d4ec>] add_to_page_cache+0x64/0x144                                       
[<4402d5bc>] add_to_page_cache+0x134/0x144                                      
[<4402f768>] generic_file_buffered_write+0x168/0x6d0                            
[<4402f774>] generic_file_buffered_write+0x174/0x6d0                            
[<4402ffe4>] __generic_file_aio_write_nolock+0x314/0x600                        
[<440df4d4>] n_tty_receive_buf+0x528/0xff4                                      
[<44006184>] __wake_up+0x20/0x48                                                
[<440df4ac>] n_tty_receive_buf+0x500/0xff4                                      
[<4410f2e4>] __kfree_skb+0x8c/0x124                                             
[<4413da64>] tcp_recvmsg+0x560/0x840                                            
[<4413d554>] tcp_recvmsg+0x50/0x840                                             
[<440560f0>] file_update_time+0xb8/0xf4                                         
[<4402ffac>] __generic_file_aio_write_nolock+0x2dc/0x600                        
[<44030488>] generic_file_aio_write+0x84/0x17c                                  
[<440407b0>] do_sync_write+0xb8/0x108                                           
[<440ded78>] opost+0xcc/0x208                                                   
[<44040934>] vfs_write+0x134/0x140                                              
[<440e0cd4>] write_chan+0x0/0x390                                               
[<440db35c>] tty_write+0x214/0x25c                                              
[<440207b4>] autoremove_wake_function+0x0/0x48                                  
[<44040934>] vfs_write+0x134/0x140                                              
[<440408a4>] vfs_write+0xa4/0x140                                               
[<44040a40>] sys_write+0x54/0xac                                                
[<44004730>] work_pending+0xc/0x3c                                              
[<44004758>] work_pending+0x34/0x3c                                             
[<44004758>] work_pending+0x34/0x3c                                             

Some people in the past confirmed this same behaviour, but the question remained pending.
I could not find any easy way to trigger the errors on demand. It takes quite a bit to observe the crash with the method I explained (a few hours).
If anybody could confirm again this behaviour maybe we could narrow down the problem to a particular configuration/setting.

Anyway, thanks again for your time.
Giulio Mazzoleni

Il giorno lun, 27/04/2009 alle 15.24 +0200, Giulio Mazzoleni ha scritto:
> Hi Wendy,
> you are right.
> 
> Furthemore if the init funciton is defined as "int
> init_module(void)" (and the call to module_init is removed) it is put in
> the .text section by the compiler instead of the .init section and the
> errors disappear.
> 
> I still wonder if there could be any relation between this kind of
> errors and the ones I get during normal operation (they seem to get
> triggered more frequently under heavy network activity).
> The error messages printed by the kernel are the same, so I was hoping..
> 
> Giulio
> 
> Il giorno ven, 24/04/2009 alle 18.20 +1000, Wendy Liang ha scritto:
> > Hi Giulio and Ian,
> > 
> > By dumping the .ko, I found that the the init_module can only jump to
> > the top of .text section of the .ko.
> > 
> > if testd/c/b() is called by more than once, the compiler will generate
> > executable code for testd/c/b/a() in the same sequence as how they
> > defined in the .text section of .ko.
> > 
> > Otherwise, because they are all static function, by default, the
> > compiler will only generate executable code for testa() in the .text
> > section of .ko.
> > 
> > We are still investigating why it cannot  jump to section testa() when
> > it is not at the top of .text section.
> > 
> > Regards,
> > Wendy
> 
> 
> 
> 
> ___________________________
> microblaze-uclinux mailing list
> microblaze-uclinux@xxxxxxxxxxxxxx
> Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> 

___________________________
microblaze-uclinux mailing list
microblaze-uclinux@xxxxxxxxxxxxxx
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/