Part of the installation instructions for the ELLCC cross development suite is compiling and running bzip2 for each target processor, as described here. This is working for all the current targets except Microblaze.
Part of the test is to run bzip2 against known files and compare the results to pre-compressed result files. The first test run for Microblaze generates an error:
Doing 6 tests (3 compress, 3 uncompress) ... If there's a problem, things might stop at this point. ../../../bin/qemu-microblaze ./bzip2.microblaze -1 < sample1.ref > sample1.rb2 bzip2.microblaze: couldn't allocate enough memory Input file = (stdin), output file = (stdout) make[1]: *** [test] Error 1
To try to debug the problem, I fired up QEMU in debug mode on the Microblaze binary:
dev% cd ~/ellcc/test/src/bzip2-1.0.6/ dev% ~/ellcc/bin/qemu-microblaze -g 1234 ./bzip2.microblaze -1 < sample1.ref > sample1.rb2
This causes QEMU to wait for debug commands by listening on port 1234. The bzip2 executable is held up from running. Then I start up GDB to debug the program:
dev% ~/ellcc/bin/ecc-gdb bzip2.microblaze
Now I can connect to the program:
(gdb) target remote :1234 Remote debugging using :1234 0x100000f0 in _start () (gdb)
The program has been paused at the symbol _start, which is the low level entry point of all programs. For the Microblaze, it is in the file crt1.s. This entry point sets up the basic environment for the program and enters the start up code, ultimately entering the main() function.
Looking at the bzip2 sources, I find the message “couldn’t allocate enough memory” In the outOfMemory() function in bzip2.c. To track down the problem, I need to find out the origin of that call. I’ll set a breakpoint there, continue the and see:
(gdb) break outOfMemory Breakpoint 1 at 0x10004db4: file bzip2.c, line 877. (gdb)
From here I can use the GDB “continue” command (shortened to “c”) to get to the outOfMemory() breakpoint:
(gdb) c Continuing. Breakpoint 1, outOfMemory () at bzip2.c:877 877 showFileNames(); (gdb)
Great! I’ll use the GDB “where” command to see the stack trace, and find out how the program got here:
(gdb) where #0 outOfMemory () at bzip2.c:877 (gdb)
Not so great. For some reason there is no stack trace. This will make debugging this problem quite a bit harder. All I can find by looking at the source is that there is a call to malloc() that is failing.
I guess there are two options here. First I could try to debug why there isn’t a stack trace and be able to use GDB to get more information, or second I could try to debug the malloc() problem. I’ll file a bug report on the GDB problem and try to track the malloc() bug down with the debug capabilities that I have.
I’m not sure if malloc() is failing in general, or if it is being given an invalid input argument. Let’s try restarting the program and setting a breakpoint on the call to malloc().
The call to malloc() is in a function called myMalloc():
(gdb) list myMalloc 1700 1701 1702 /*---------------------------------------------*/ 1703 static 1704 void *myMalloc ( Int32 n ) 1705 { 1706 void* p; 1707 1708 p = malloc ( (size_t)n ); 1709 if (p == NULL) outOfMemory (); (gdb)
Let’s set a breakpoint on line 1708:
(gdb) break bzip2.c:1708 Breakpoint 1 at 0x10006e5c: file bzip2.c, line 1708. (gdb) c Continuing. Breakpoint 1, myMalloc (n=268724702) at bzip2.c:1708 1708 p = malloc ( (size_t)n ); (gdb) print n $1 = 268724702 (gdb)
That doesn’t look good. We’re trying to allocate way too much memory. Maybe a disassembly and register dump will help:
(gdb) disas Dump of assembler code for function myMalloc: 0x10006e48 <+0>: addik r1, r1, -20 0x10006e4c <+4>: swi r15, r1, 0 0x10006e50 <+8>: swi r19, r1, 4 0x10006e54 <+12>: add r19, r1, r0 0x10006e58 <+16>: swi r5, r19, 24 => 0x10006e5c <+20>: addik r1, r1, -8 0x10006e60 <+24>: imm 2 0x10006e64 <+28>: brlid r15, -25884 // 0x100309480x10006e68 <+32>: swi r5, r19, 12 0x10006e6c <+36>: addik r1, r1, 8 0x10006e70 <+40>: lwi r5, r19, 12 0x10006e74 <+44>: swi r3, r19, 8 0x10006e78 <+48>: bneid r3, 28 // 0x10006e94 <$tmp596> 0x10006e7c <+52>: swi r5, r19, 16 0x10006e80 <+56>: brid 8 // 0x10006e88 <$BB43_1> 0x10006e84 <+60>: or r0, r0, r0 0x10006e88 <+0>: brlid r15, -8472 // 0x10004d70 0x10006e8c <+4>: addik r1, r1, -4 0x10006e90 <+8>: addik r1, r1, 4 0x10006e94 <+0>: lwi r3, r19, 8 0x10006e98 <+4>: add r1, r19, r0 0x10006e9c <+8>: lwi r19, r1, 4 ---Type to continue, or q to quit---q Quit (gdb) info regi r0 0x0 0 r1 0xf6ffeb38 0xf6ffeb38 r2 0x0 0 r3 0x0 0 r4 0xf6ffecac -150999892 r5 0x8 8 r6 0xf6ffeeb8 -150999368 r7 0x4 4 r8 0x0 0 r9 0x0 0 r10 0x1003e278 268690040 r11 0xae 174 r12 0xae 174 r13 0x0 0 r14 0x100305a4 268633508 r15 0x10006e14 268463636 r16 0x0 0 r17 0x0 0 r18 0x0 0 r19 0xf6ffeb38 -151000264 r20 0x0 0 r21 0x0 0 r22 0x0 0 ---Type to continue, or q to quit--- Quit (gdb)
Interesting. Dusting off the Microblaze application binary interface doc, I see that parameters are passed to functions in registers r5-r10, r3 and r4 are used to return values from functions, r15 is the return address register, r1 is the stack pointer, and r19 is the frame pointer. It looks as if eight bytes are wanted, because r5 contains 8. I suspect that the value printed for n is wrong because at this point n has not been saved to memory.
As an aside, the Microblaze has something called branch delay slots. The brlid instruction used to call malloc() is “branch and link immediate with delay”. The swi instruction following the brlid instruction is in the delay slot. It will be executed when the brlid instruction is executed and will be done by the time malloc() is entered. This instruction is saving n to memory, which is why the value of n appears wrong.
Let’s take a look at where myMalloc() is being called by disassembling at the return address contained in r15:
(gdb) disas 0x10006e14 Dump of assembler code for function mkCell: 0x10006e00 <+0>: addik r1, r1, -12 0x10006e04 <+4>: swi r15, r1, 0 0x10006e08 <+8>: swi r19, r1, 4 0x10006e0c <+12>: add r19, r1, r0 0x10006e10 <+16>: addik r1, r1, -8 0x10006e14 <+20>: brlid r15, 52 // 0x10006e480x10006e18 <+24>: addik r5, r0, 8 // 0x8 0x10006e1c <+28>: addik r1, r1, 8 0x10006e20 <+32>: swi r3, r19, 8 0x10006e24 <+36>: swi r0, r3, 0 0x10006e28 <+40>: lwi r3, r19, 8 0x10006e2c <+44>: swi r0, r3, 4 0x10006e30 <+48>: lwi r3, r19, 8 0x10006e34 <+52>: add r1, r19, r0 0x10006e38 <+56>: lwi r19, r1, 4 0x10006e3c <+60>: lwi r15, r1, 0 0x10006e40 <+64>: rtsd r15, 8 0x10006e44 <+68>: addik r1, r1, 12 End of assembler dump. (gdb)
Sure enough, the delay slot following the call to myMalloc() loads the first parameter register, r5, with 8.
Time to start stepping into malloc() going on. Three stepi commands get us into malloc():
(gdb) stepi 0x10006e60 1708 p = malloc ( (size_t)n ); (gdb) stepi 0x10006e64 1708 p = malloc ( (size_t)n ); (gdb) stepi 0x10006e68 1708 p = malloc ( (size_t)n ); (gdb) stepi malloc (n=0) at src/malloc/malloc.c:335 335 { (gdb)
A little stepping around in malloc() gives something suspicious:
(gdb) step bin_index_up (x=268442384) at src/malloc/malloc.c:128 128 { (gdb) step 129 x = x / SIZE_ALIGN - 1; (gdb) step 130 if (x <= 32) return x; (gdb) print x $4 = 4143967032 (gdb) step 132 } (gdb) print x $5 = 4143967032
Maybe the divide is being done incorrectly? Let's disassemble bin_index_up():
(gdb) disas bin_index_up Dump of assembler code for function bin_index_up: 0x10030e10 <+0>: addik r1, r1, -40 0x10030e14 <+4>: swi r15, r1, 0 0x10030e18 <+8>: swi r19, r1, 4 0x10030e1c <+12>: add r19, r1, r0 0x10030e20 <+16>: addik r3, r0, 4 // 0x4 0x10030e24 <+20>: andi r3, r3, 31 0x10030e28 <+24>: addik r4, r5, 0 0x10030e2c <+28>: addk r6, r3, r0 0x10030e30 <+32>: swi r5, r19, 44 0x10030e34 <+36>: swi r6, r19, 20 0x10030e38 <+40>: beqid r3, 40 // 0x10030e60 <$BB2_5> 0x10030e3c <+44>: swi r4, r19, 24 0x10030e40 <+48>: lwi r3, r19, 20 0x10030e44 <+52>: lwi r4, r19, 24 0x10030e48 <+56>: srl r4, r4 0x10030e4c <+60>: addik r3, r3, -1 0x10030e50 <+64>: addk r5, r3, r0 0x10030e54 <+68>: swi r5, r19, 20 0x10030e58 <+72>: bneid r3, -24 // 0x10030e400x10030e5c <+76>: swi r4, r19, 24 0x10030e60 <+0>: lwi r3, r19, 24 0x10030e64 <+4>: addik r4, r0, -1 => 0x10030e68 <+8>: addk r3, r3, r4 0x10030e6c <+12>: lwi r4, r19, 44 0x10030e70 <+16>: swi r4, r19, 12 0x10030e74 <+20>: swi r3, r19, 12 0x10030e78 <+24>: addik r5, r0, 32 // 0x20 0x10030e7c <+0>: cmpu r3, r5, r3 0x10030e80 <+4>: bgtid r3, 28 // 0x10030e9c <$tmp38>
It turns out the SIZE_ALIGN is 16. So the "divide" is done by using shifts. Stepping through bin_index_up() shows the actual value is 16 / 16 - 1. The result is 0 as expected. A red herring. Time to dig further.
After a little investigation it turns out that malloc() is failing only for larger allocations. The musl library malloc tries to use mmap() to allocate larger blocks of memory. The interesting thing about this is that malloc calls a function called __mmap() that takes six parameters, the last of which is of type off_t, a 64 bit value. Six parameters are interesting. The Microblaze ABI reserves 6 registers to pass parameters with any addition parameter data saved in the stack by the caller. Here is the disassembly of where __mmap is called:
(gdb) disas 0x100309f8 Dump of assembler code for function $tmp5: 0x100309cc <+0>: addik r1, r1, -32 0x100309d0 <+4>: swi r0, r19, 28 0x100309d4 <+8>: addik r7, r0, 3 // 0x3 0x100309d8 <+12>: addik r8, r0, 34 // 0x22 0x100309dc <+16>: addik r4, r0, -1 0x100309e0 <+20>: addk r5, r0, r0 0x100309e4 <+24>: swi r5, r19, 76 0x100309e8 <+28>: addk r6, r3, r0 0x100309ec <+32>: addk r9, r4, r0 0x100309f0 <+36>: lwi r10, r19, 76 0x100309f4 <+40>: imm 0 0x100309f8 <+44>: brlid r15, 8784 // 0x10032c48 <__mmap> 0x100309fc <+48>: swi r4, r19, 80 0x10030a00 <+52>: addik r1, r1, 32 0x10030a04 <+56>: swi r3, r19, 52 0x10030a08 <+60>: lwi r4, r19, 80 End of assembler dump. (gdb)
Here is the beginning of the __mmap() code:
(gdb) disas Dump of assembler code for function __mmap: 0x10032c48 <+0>: addik r1, r1, -92 0x10032c4c <+4>: swi r15, r1, 0 0x10032c50 <+8>: swi r19, r1, 4 0x10032c54 <+12>: add r19, r1, r0 0x10032c58 <+16>: addik r3, r0, -4096 0x10032c5c <+20>: and r3, r10, r3 0x10032c60 <+24>: lwi r4, r19, 120 => 0x10032c64 <+28>: andi r11, r4, 4095 0x10032c68 <+32>: or r3, r11, r3
Here sre some of the register values at the point of failure:
(gdb) info regi r0 0x0 0 r1 0xf6ffd3e4 0xf6ffd3e4 r2 0x0 0 r3 0x0 0 r4 0xf6ffd4d8 -151005992 r5 0x0 0 r6 0x62000 401408
The bold lines in the two assembly snippets above are where the last parameter is passed and accessed by the caller and callee. Notice that the caller allocates space on the stack (32 bytes) as a save area for the parameters to be passed. This is room for the six registers used to pass parameters and an additional four bytes for the extra space needed by the 64 bit parameter six.
The problem is that the r0 register (which always contains zero) is stored at an offset of 28 from the frame pointer, not at offset 28 from the stack pointer. When it is accessed at offset 120 in _mmap(), an uninitialized value is used. Ouch.
Time to go digging around in the Microblaze code generator. I'll follow up with another post when I've fixed it.