Monthly Archives: August 2013

Tracking down a Microblaze problem.

Part of the installation instructions for the ELLCC cross development suite is compiling and running bzip2 for each target processor, as described here. This is working for all the current targets except Microblaze.

Part of the test is to run bzip2 against known files and compare the results to pre-compressed result files. The first test run for Microblaze generates an error:

Doing 6 tests (3 compress, 3 uncompress) ...
If there's a problem, things might stop at this point.
../../../bin/qemu-microblaze ./bzip2.microblaze -1  < sample1.ref > sample1.rb2

bzip2.microblaze: couldn't allocate enough memory
        Input file = (stdin), output file = (stdout)
make[1]: *** [test] Error 1

To try to debug the problem, I fired up QEMU in debug mode on the Microblaze binary:

dev% cd ~/ellcc/test/src/bzip2-1.0.6/
dev% ~/ellcc/bin/qemu-microblaze -g 1234 ./bzip2.microblaze -1 < sample1.ref > sample1.rb2

This causes QEMU to wait for debug commands by listening on port 1234. The bzip2 executable is held up from running. Then I start up GDB to debug the program:

dev% ~/ellcc/bin/ecc-gdb bzip2.microblaze

Now I can connect to the program:

(gdb) target remote :1234
Remote debugging using :1234
0x100000f0 in _start ()

The program has been paused at the symbol _start, which is the low level entry point of all programs. For the Microblaze, it is in the file crt1.s. This entry point sets up the basic environment for the program and enters the start up code, ultimately entering the main() function.

Looking at the bzip2 sources, I find the message “couldn’t allocate enough memory” In the outOfMemory() function in bzip2.c. To track down the problem, I need to find out the origin of that call. I’ll set a breakpoint there, continue the and see:

(gdb) break outOfMemory
Breakpoint 1 at 0x10004db4: file bzip2.c, line 877.

From here I can use the GDB “continue” command (shortened to “c”) to get to the outOfMemory() breakpoint:

(gdb) c

Breakpoint 1, outOfMemory () at bzip2.c:877
877        showFileNames();

Great! I’ll use the GDB “where” command to see the stack trace, and find out how the program got here:

(gdb) where
#0  outOfMemory () at bzip2.c:877

Not so great. For some reason there is no stack trace. This will make debugging this problem quite a bit harder. All I can find by looking at the source is that there is a call to malloc() that is failing.

I guess there are two options here. First I could try to debug why there isn’t a stack trace and be able to use GDB to get more information, or second I could try to debug the malloc() problem. I’ll file a bug report on the GDB problem and try to track the malloc() bug down with the debug capabilities that I have.

I’m not sure if malloc() is failing in general, or if it is being given an invalid input argument. Let’s try restarting the program and setting a breakpoint on the call to malloc().

The call to malloc() is in a function called myMalloc():

(gdb) list myMalloc
1702    /*---------------------------------------------*/
1703    static 
1704    void *myMalloc ( Int32 n )
1705    {
1706       void* p;
1708       p = malloc ( (size_t)n );
1709       if (p == NULL) outOfMemory ();

Let’s set a breakpoint on line 1708:

(gdb) break bzip2.c:1708
Breakpoint 1 at 0x10006e5c: file bzip2.c, line 1708.
(gdb) c

Breakpoint 1, myMalloc (n=268724702) at bzip2.c:1708
1708       p = malloc ( (size_t)n );
(gdb) print n
$1 = 268724702

That doesn’t look good. We’re trying to allocate way too much memory. Maybe a disassembly and register dump will help:

(gdb) disas
Dump of assembler code for function myMalloc:
   0x10006e48 <+0>:     addik   r1, r1, -20
   0x10006e4c <+4>:     swi     r15, r1, 0
   0x10006e50 <+8>:     swi     r19, r1, 4
   0x10006e54 <+12>:    add     r19, r1, r0
   0x10006e58 <+16>:    swi     r5, r19, 24
=> 0x10006e5c <+20>:    addik   r1, r1, -8
   0x10006e60 <+24>:    imm     2
   0x10006e64 <+28>:    brlid   r15, -25884     // 0x10030948 
   0x10006e68 <+32>:    swi     r5, r19, 12
   0x10006e6c <+36>:    addik   r1, r1, 8
   0x10006e70 <+40>:    lwi     r5, r19, 12
   0x10006e74 <+44>:    swi     r3, r19, 8
   0x10006e78 <+48>:    bneid   r3, 28  // 0x10006e94 <$tmp596>
   0x10006e7c <+52>:    swi     r5, r19, 16
   0x10006e80 <+56>:    brid    8       // 0x10006e88 <$BB43_1>
   0x10006e84 <+60>:    or      r0, r0, r0
   0x10006e88 <+0>:     brlid   r15, -8472      // 0x10004d70 
   0x10006e8c <+4>:     addik   r1, r1, -4
   0x10006e90 <+8>:     addik   r1, r1, 4
   0x10006e94 <+0>:     lwi     r3, r19, 8
   0x10006e98 <+4>:     add     r1, r19, r0
   0x10006e9c <+8>:     lwi     r19, r1, 4
---Type  to continue, or q  to quit---q
(gdb) info regi
r0             0x0      0
r1             0xf6ffeb38       0xf6ffeb38
r2             0x0      0
r3             0x0      0
r4             0xf6ffecac       -150999892
r5             0x8      8
r6             0xf6ffeeb8       -150999368
r7             0x4      4
r8             0x0      0
r9             0x0      0
r10            0x1003e278       268690040
r11            0xae     174
r12            0xae     174
r13            0x0      0
r14            0x100305a4       268633508
r15            0x10006e14       268463636
r16            0x0      0
r17            0x0      0
r18            0x0      0
r19            0xf6ffeb38       -151000264
r20            0x0      0
r21            0x0      0
r22            0x0      0
---Type  to continue, or q  to quit---

Interesting. Dusting off the Microblaze application binary interface doc, I see that parameters are passed to functions in registers r5-r10, r3 and r4 are used to return values from functions, r15 is the return address register, r1 is the stack pointer, and r19 is the frame pointer. It looks as if eight bytes are wanted, because r5 contains 8. I suspect that the value printed for n is wrong because at this point n has not been saved to memory.

As an aside, the Microblaze has something called branch delay slots. The brlid instruction used to call malloc() is “branch and link immediate with delay”. The swi instruction following the brlid instruction is in the delay slot. It will be executed when the brlid instruction is executed and will be done by the time malloc() is entered. This instruction is saving n to memory, which is why the value of n appears wrong.

Let’s take a look at where myMalloc() is being called by disassembling at the return address contained in r15:

(gdb) disas 0x10006e14
Dump of assembler code for function mkCell:
   0x10006e00 <+0>:     addik   r1, r1, -12
   0x10006e04 <+4>:     swi     r15, r1, 0
   0x10006e08 <+8>:     swi     r19, r1, 4
   0x10006e0c <+12>:    add     r19, r1, r0
   0x10006e10 <+16>:    addik   r1, r1, -8
   0x10006e14 <+20>:    brlid   r15, 52 // 0x10006e48 
   0x10006e18 <+24>:    addik   r5, r0, 8       // 0x8
   0x10006e1c <+28>:    addik   r1, r1, 8
   0x10006e20 <+32>:    swi     r3, r19, 8
   0x10006e24 <+36>:    swi     r0, r3, 0
   0x10006e28 <+40>:    lwi     r3, r19, 8
   0x10006e2c <+44>:    swi     r0, r3, 4
   0x10006e30 <+48>:    lwi     r3, r19, 8
   0x10006e34 <+52>:    add     r1, r19, r0
   0x10006e38 <+56>:    lwi     r19, r1, 4
   0x10006e3c <+60>:    lwi     r15, r1, 0
   0x10006e40 <+64>:    rtsd    r15, 8
   0x10006e44 <+68>:    addik   r1, r1, 12
End of assembler dump.

Sure enough, the delay slot following the call to myMalloc() loads the first parameter register, r5, with 8.

Time to start stepping into malloc() going on. Three stepi commands get us into malloc():

(gdb) stepi
0x10006e60      1708       p = malloc ( (size_t)n );
(gdb) stepi
0x10006e64      1708       p = malloc ( (size_t)n );
(gdb) stepi
0x10006e68      1708       p = malloc ( (size_t)n );
(gdb) stepi
malloc (n=0) at src/malloc/malloc.c:335
335     {

A little stepping around in malloc() gives something suspicious:

(gdb) step
bin_index_up (x=268442384) at src/malloc/malloc.c:128
128     {
(gdb) step
129             x = x / SIZE_ALIGN - 1;
(gdb) step
130             if (x <= 32) return x;
(gdb) print x
$4 = 4143967032
(gdb) step
132     }
(gdb) print x
$5 = 4143967032

Maybe the divide is being done incorrectly? Let's disassemble bin_index_up():

(gdb) disas bin_index_up
Dump of assembler code for function bin_index_up:
   0x10030e10 <+0>:     addik   r1, r1, -40
   0x10030e14 <+4>:     swi     r15, r1, 0
   0x10030e18 <+8>:     swi     r19, r1, 4
   0x10030e1c <+12>:    add     r19, r1, r0
   0x10030e20 <+16>:    addik   r3, r0, 4       // 0x4
   0x10030e24 <+20>:    andi    r3, r3, 31
   0x10030e28 <+24>:    addik   r4, r5, 0
   0x10030e2c <+28>:    addk    r6, r3, r0
   0x10030e30 <+32>:    swi     r5, r19, 44
   0x10030e34 <+36>:    swi     r6, r19, 20
   0x10030e38 <+40>:    beqid   r3, 40  // 0x10030e60 <$BB2_5>
   0x10030e3c <+44>:    swi     r4, r19, 24
   0x10030e40 <+48>:    lwi     r3, r19, 20
   0x10030e44 <+52>:    lwi     r4, r19, 24
   0x10030e48 <+56>:    srl     r4, r4
   0x10030e4c <+60>:    addik   r3, r3, -1
   0x10030e50 <+64>:    addk    r5, r3, r0
   0x10030e54 <+68>:    swi     r5, r19, 20
   0x10030e58 <+72>:    bneid   r3, -24 // 0x10030e40 
   0x10030e5c <+76>:    swi     r4, r19, 24
   0x10030e60 <+0>:     lwi     r3, r19, 24
   0x10030e64 <+4>:     addik   r4, r0, -1
=> 0x10030e68 <+8>:     addk    r3, r3, r4
   0x10030e6c <+12>:    lwi     r4, r19, 44
   0x10030e70 <+16>:    swi     r4, r19, 12
   0x10030e74 <+20>:    swi     r3, r19, 12
   0x10030e78 <+24>:    addik   r5, r0, 32      // 0x20
   0x10030e7c <+0>:     cmpu    r3, r5, r3
   0x10030e80 <+4>:     bgtid   r3, 28  // 0x10030e9c <$tmp38>


It turns out the SIZE_ALIGN is 16. So the "divide" is done by using shifts. Stepping through bin_index_up() shows the actual value is 16 / 16 - 1. The result is 0 as expected. A red herring. Time to dig further.

After a little investigation it turns out that malloc() is failing only for larger allocations. The musl library malloc tries to use mmap() to allocate larger blocks of memory. The interesting thing about this is that malloc calls a function called __mmap() that takes six parameters, the last of which is of type off_t, a 64 bit value. Six parameters are interesting. The Microblaze ABI reserves 6 registers to pass parameters with any addition parameter data saved in the stack by the caller. Here is the disassembly of where __mmap is called:

(gdb) disas 0x100309f8
Dump of assembler code for function $tmp5:
   0x100309cc <+0>:     addik   r1, r1, -32
   0x100309d0 <+4>:     swi     r0, r19, 28
   0x100309d4 <+8>:     addik   r7, r0, 3       // 0x3
   0x100309d8 <+12>:    addik   r8, r0, 34      // 0x22
   0x100309dc <+16>:    addik   r4, r0, -1
   0x100309e0 <+20>:    addk    r5, r0, r0
   0x100309e4 <+24>:    swi     r5, r19, 76
   0x100309e8 <+28>:    addk    r6, r3, r0
   0x100309ec <+32>:    addk    r9, r4, r0
   0x100309f0 <+36>:    lwi     r10, r19, 76
   0x100309f4 <+40>:    imm     0
   0x100309f8 <+44>:    brlid   r15, 8784       // 0x10032c48 <__mmap>
   0x100309fc <+48>:    swi     r4, r19, 80
   0x10030a00 <+52>:    addik   r1, r1, 32
   0x10030a04 <+56>:    swi     r3, r19, 52
   0x10030a08 <+60>:    lwi     r4, r19, 80
End of assembler dump.

Here is the beginning of the __mmap() code:

(gdb) disas
Dump of assembler code for function __mmap:
   0x10032c48 <+0>:     addik   r1, r1, -92
   0x10032c4c <+4>:     swi     r15, r1, 0
   0x10032c50 <+8>:     swi     r19, r1, 4
   0x10032c54 <+12>:    add     r19, r1, r0
   0x10032c58 <+16>:    addik   r3, r0, -4096
   0x10032c5c <+20>:    and     r3, r10, r3
   0x10032c60 <+24>:    lwi     r4, r19, 120
=> 0x10032c64 <+28>:    andi    r11, r4, 4095
   0x10032c68 <+32>:    or      r3, r11, r3

Here sre some of the register values at the point of failure:

(gdb) info regi
r0             0x0      0
r1             0xf6ffd3e4       0xf6ffd3e4
r2             0x0      0
r3             0x0      0
r4             0xf6ffd4d8       -151005992
r5             0x0      0
r6             0x62000  401408

The bold lines in the two assembly snippets above are where the last parameter is passed and accessed by the caller and callee. Notice that the caller allocates space on the stack (32 bytes) as a save area for the parameters to be passed. This is room for the six registers used to pass parameters and an additional four bytes for the extra space needed by the 64 bit parameter six.

The problem is that the r0 register (which always contains zero) is stored at an offset of 28 from the frame pointer, not at offset 28 from the stack pointer. When it is accessed at offset 120 in _mmap(), an uninitialized value is used. Ouch.

Time to go digging around in the Microblaze code generator. I'll follow up with another post when I've fixed it.