Part of the installation instructions for the ELLCC cross development suite is compiling and running bzip2 for each target processor, as described here. This is working for all the current targets except Microblaze.
Part of the test is to run bzip2 against known files and compare the results to pre-compressed result files. The first test run for Microblaze generates an error:
Doing 6 tests (3 compress, 3 uncompress) ...
If there's a problem, things might stop at this point.
../../../bin/qemu-microblaze ./bzip2.microblaze -1 < sample1.ref > sample1.rb2
bzip2.microblaze: couldn't allocate enough memory
Input file = (stdin), output file = (stdout)
make[1]: *** [test] Error 1
To try to debug the problem, I fired up QEMU in debug mode on the Microblaze binary:
dev% cd ~/ellcc/test/src/bzip2-1.0.6/
dev% ~/ellcc/bin/qemu-microblaze -g 1234 ./bzip2.microblaze -1 < sample1.ref > sample1.rb2
This causes QEMU to wait for debug commands by listening on port 1234. The bzip2 executable is held up from running. Then I start up GDB to debug the program:
dev% ~/ellcc/bin/ecc-gdb bzip2.microblaze
Now I can connect to the program:
(gdb) target remote :1234
Remote debugging using :1234
0x100000f0 in _start ()
(gdb)
The program has been paused at the symbol _start, which is the low level entry point of all programs. For the Microblaze, it is in the file crt1.s. This entry point sets up the basic environment for the program and enters the start up code, ultimately entering the main() function.
Looking at the bzip2 sources, I find the message “couldn’t allocate enough memory” In the outOfMemory() function in bzip2.c. To track down the problem, I need to find out the origin of that call. I’ll set a breakpoint there, continue the and see:
(gdb) break outOfMemory
Breakpoint 1 at 0x10004db4: file bzip2.c, line 877.
(gdb)
From here I can use the GDB “continue” command (shortened to “c”) to get to the outOfMemory() breakpoint:
(gdb) c
Continuing.
Breakpoint 1, outOfMemory () at bzip2.c:877
877 showFileNames();
(gdb)
Great! I’ll use the GDB “where” command to see the stack trace, and find out how the program got here:
(gdb) where
#0 outOfMemory () at bzip2.c:877
(gdb)
Not so great. For some reason there is no stack trace. This will make debugging this problem quite a bit harder. All I can find by looking at the source is that there is a call to malloc() that is failing.
I guess there are two options here. First I could try to debug why there isn’t a stack trace and be able to use GDB to get more information, or second I could try to debug the malloc() problem. I’ll file a bug report on the GDB problem and try to track the malloc() bug down with the debug capabilities that I have.
I’m not sure if malloc() is failing in general, or if it is being given an invalid input argument. Let’s try restarting the program and setting a breakpoint on the call to malloc().
The call to malloc() is in a function called myMalloc():
(gdb) list myMalloc
1700
1701
1702 /*---------------------------------------------*/
1703 static
1704 void *myMalloc ( Int32 n )
1705 {
1706 void* p;
1707
1708 p = malloc ( (size_t)n );
1709 if (p == NULL) outOfMemory ();
(gdb)
Let’s set a breakpoint on line 1708:
(gdb) break bzip2.c:1708
Breakpoint 1 at 0x10006e5c: file bzip2.c, line 1708.
(gdb) c
Continuing.
Breakpoint 1, myMalloc (n=268724702) at bzip2.c:1708
1708 p = malloc ( (size_t)n );
(gdb) print n
$1 = 268724702
(gdb)
That doesn’t look good. We’re trying to allocate way too much memory. Maybe a disassembly and register dump will help:
(gdb) disas
Dump of assembler code for function myMalloc:
0x10006e48 <+0>: addik r1, r1, -20
0x10006e4c <+4>: swi r15, r1, 0
0x10006e50 <+8>: swi r19, r1, 4
0x10006e54 <+12>: add r19, r1, r0
0x10006e58 <+16>: swi r5, r19, 24
=> 0x10006e5c <+20>: addik r1, r1, -8
0x10006e60 <+24>: imm 2
0x10006e64 <+28>: brlid r15, -25884 // 0x10030948
0x10006e68 <+32>: swi r5, r19, 12
0x10006e6c <+36>: addik r1, r1, 8
0x10006e70 <+40>: lwi r5, r19, 12
0x10006e74 <+44>: swi r3, r19, 8
0x10006e78 <+48>: bneid r3, 28 // 0x10006e94 <$tmp596>
0x10006e7c <+52>: swi r5, r19, 16
0x10006e80 <+56>: brid 8 // 0x10006e88 <$BB43_1>
0x10006e84 <+60>: or r0, r0, r0
0x10006e88 <+0>: brlid r15, -8472 // 0x10004d70
0x10006e8c <+4>: addik r1, r1, -4
0x10006e90 <+8>: addik r1, r1, 4
0x10006e94 <+0>: lwi r3, r19, 8
0x10006e98 <+4>: add r1, r19, r0
0x10006e9c <+8>: lwi r19, r1, 4
---Type to continue, or q to quit---q
Quit
(gdb) info regi
r0 0x0 0
r1 0xf6ffeb38 0xf6ffeb38
r2 0x0 0
r3 0x0 0
r4 0xf6ffecac -150999892
r5 0x8 8
r6 0xf6ffeeb8 -150999368
r7 0x4 4
r8 0x0 0
r9 0x0 0
r10 0x1003e278 268690040
r11 0xae 174
r12 0xae 174
r13 0x0 0
r14 0x100305a4 268633508
r15 0x10006e14 268463636
r16 0x0 0
r17 0x0 0
r18 0x0 0
r19 0xf6ffeb38 -151000264
r20 0x0 0
r21 0x0 0
r22 0x0 0
---Type to continue, or q to quit---
Quit
(gdb)
Interesting. Dusting off the Microblaze application binary interface doc, I see that parameters are passed to functions in registers r5-r10, r3 and r4 are used to return values from functions, r15 is the return address register, r1 is the stack pointer, and r19 is the frame pointer. It looks as if eight bytes are wanted, because r5 contains 8. I suspect that the value printed for n is wrong because at this point n has not been saved to memory.
As an aside, the Microblaze has something called branch delay slots. The brlid instruction used to call malloc() is “branch and link immediate with delay”. The swi instruction following the brlid instruction is in the delay slot. It will be executed when the brlid instruction is executed and will be done by the time malloc() is entered. This instruction is saving n to memory, which is why the value of n appears wrong.
Let’s take a look at where myMalloc() is being called by disassembling at the return address contained in r15:
(gdb) disas 0x10006e14
Dump of assembler code for function mkCell:
0x10006e00 <+0>: addik r1, r1, -12
0x10006e04 <+4>: swi r15, r1, 0
0x10006e08 <+8>: swi r19, r1, 4
0x10006e0c <+12>: add r19, r1, r0
0x10006e10 <+16>: addik r1, r1, -8
0x10006e14 <+20>: brlid r15, 52 // 0x10006e48
0x10006e18 <+24>: addik r5, r0, 8 // 0x8
0x10006e1c <+28>: addik r1, r1, 8
0x10006e20 <+32>: swi r3, r19, 8
0x10006e24 <+36>: swi r0, r3, 0
0x10006e28 <+40>: lwi r3, r19, 8
0x10006e2c <+44>: swi r0, r3, 4
0x10006e30 <+48>: lwi r3, r19, 8
0x10006e34 <+52>: add r1, r19, r0
0x10006e38 <+56>: lwi r19, r1, 4
0x10006e3c <+60>: lwi r15, r1, 0
0x10006e40 <+64>: rtsd r15, 8
0x10006e44 <+68>: addik r1, r1, 12
End of assembler dump.
(gdb)
Sure enough, the delay slot following the call to myMalloc() loads the first parameter register, r5, with 8.
Time to start stepping into malloc() going on. Three stepi commands get us into malloc():
(gdb) stepi
0x10006e60 1708 p = malloc ( (size_t)n );
(gdb) stepi
0x10006e64 1708 p = malloc ( (size_t)n );
(gdb) stepi
0x10006e68 1708 p = malloc ( (size_t)n );
(gdb) stepi
malloc (n=0) at src/malloc/malloc.c:335
335 {
(gdb)
A little stepping around in malloc() gives something suspicious:
(gdb) step
bin_index_up (x=268442384) at src/malloc/malloc.c:128
128 {
(gdb) step
129 x = x / SIZE_ALIGN - 1;
(gdb) step
130 if (x <= 32) return x;
(gdb) print x
$4 = 4143967032
(gdb) step
132 }
(gdb) print x
$5 = 4143967032
Maybe the divide is being done incorrectly? Let's disassemble bin_index_up():
(gdb) disas bin_index_up
Dump of assembler code for function bin_index_up:
0x10030e10 <+0>: addik r1, r1, -40
0x10030e14 <+4>: swi r15, r1, 0
0x10030e18 <+8>: swi r19, r1, 4
0x10030e1c <+12>: add r19, r1, r0
0x10030e20 <+16>: addik r3, r0, 4 // 0x4
0x10030e24 <+20>: andi r3, r3, 31
0x10030e28 <+24>: addik r4, r5, 0
0x10030e2c <+28>: addk r6, r3, r0
0x10030e30 <+32>: swi r5, r19, 44
0x10030e34 <+36>: swi r6, r19, 20
0x10030e38 <+40>: beqid r3, 40 // 0x10030e60 <$BB2_5>
0x10030e3c <+44>: swi r4, r19, 24
0x10030e40 <+48>: lwi r3, r19, 20
0x10030e44 <+52>: lwi r4, r19, 24
0x10030e48 <+56>: srl r4, r4
0x10030e4c <+60>: addik r3, r3, -1
0x10030e50 <+64>: addk r5, r3, r0
0x10030e54 <+68>: swi r5, r19, 20
0x10030e58 <+72>: bneid r3, -24 // 0x10030e40
0x10030e5c <+76>: swi r4, r19, 24
0x10030e60 <+0>: lwi r3, r19, 24
0x10030e64 <+4>: addik r4, r0, -1
=> 0x10030e68 <+8>: addk r3, r3, r4
0x10030e6c <+12>: lwi r4, r19, 44
0x10030e70 <+16>: swi r4, r19, 12
0x10030e74 <+20>: swi r3, r19, 12
0x10030e78 <+24>: addik r5, r0, 32 // 0x20
0x10030e7c <+0>: cmpu r3, r5, r3
0x10030e80 <+4>: bgtid r3, 28 // 0x10030e9c <$tmp38>
It turns out the SIZE_ALIGN is 16. So the "divide" is done by using shifts. Stepping through bin_index_up() shows the actual value is 16 / 16 - 1. The result is 0 as expected. A red herring. Time to dig further.
After a little investigation it turns out that malloc() is failing only for larger allocations. The musl library malloc tries to use mmap() to allocate larger blocks of memory. The interesting thing about this is that malloc calls a function called __mmap() that takes six parameters, the last of which is of type off_t, a 64 bit value. Six parameters are interesting. The Microblaze ABI reserves 6 registers to pass parameters with any addition parameter data saved in the stack by the caller. Here is the disassembly of where __mmap is called:
(gdb) disas 0x100309f8
Dump of assembler code for function $tmp5:
0x100309cc <+0>: addik r1, r1, -32
0x100309d0 <+4>: swi r0, r19, 28
0x100309d4 <+8>: addik r7, r0, 3 // 0x3
0x100309d8 <+12>: addik r8, r0, 34 // 0x22
0x100309dc <+16>: addik r4, r0, -1
0x100309e0 <+20>: addk r5, r0, r0
0x100309e4 <+24>: swi r5, r19, 76
0x100309e8 <+28>: addk r6, r3, r0
0x100309ec <+32>: addk r9, r4, r0
0x100309f0 <+36>: lwi r10, r19, 76
0x100309f4 <+40>: imm 0
0x100309f8 <+44>: brlid r15, 8784 // 0x10032c48 <__mmap>
0x100309fc <+48>: swi r4, r19, 80
0x10030a00 <+52>: addik r1, r1, 32
0x10030a04 <+56>: swi r3, r19, 52
0x10030a08 <+60>: lwi r4, r19, 80
End of assembler dump.
(gdb)
Here is the beginning of the __mmap() code:
(gdb) disas
Dump of assembler code for function __mmap:
0x10032c48 <+0>: addik r1, r1, -92
0x10032c4c <+4>: swi r15, r1, 0
0x10032c50 <+8>: swi r19, r1, 4
0x10032c54 <+12>: add r19, r1, r0
0x10032c58 <+16>: addik r3, r0, -4096
0x10032c5c <+20>: and r3, r10, r3
0x10032c60 <+24>: lwi r4, r19, 120
=> 0x10032c64 <+28>: andi r11, r4, 4095
0x10032c68 <+32>: or r3, r11, r3
Here sre some of the register values at the point of failure:
(gdb) info regi
r0 0x0 0
r1 0xf6ffd3e4 0xf6ffd3e4
r2 0x0 0
r3 0x0 0
r4 0xf6ffd4d8 -151005992
r5 0x0 0
r6 0x62000 401408
The bold lines in the two assembly snippets above are where the last parameter is passed and accessed by the caller and callee. Notice that the caller allocates space on the stack (32 bytes) as a save area for the parameters to be passed. This is room for the six registers used to pass parameters and an additional four bytes for the extra space needed by the 64 bit parameter six.
The problem is that the r0 register (which always contains zero) is stored at an offset of 28 from the frame pointer, not at offset 28 from the stack pointer. When it is accessed at offset 120 in _mmap(), an uninitialized value is used. Ouch.
Time to go digging around in the Microblaze code generator. I'll follow up with another post when I've fixed it.