Tag Archives: Debugging

Using ELLCC to Cross Debug an ARM Application

I’m looking at replacing my Linux port of the NetBSD standard library with musl, another library with a BSD-like license. For the past couple of days I’ve been doing a feasibility study on musl, running it through my regression tests for x86_64 and things look very good.

Today, I decided to test the ARM and the first regression test failed. If you’re not familiar with ELLCC, it is a cross development tool chain that uses ecc (based on clang/LLVM) as the compiler. As part of my regression testing, I compile some of the NetBSD user-land utilities and run them using QEMU. When I ran the test of the program cat it failed:

cat ../../../../../src/bin/cat/testinput | ./cat | cmp ../../../../../src/bin/cat/testinput || exit 1
stdout: Bad file number
cmp: EOF on -

Strange. How could stdout have a bad file number? I simplified the test case and found that

~/ellcc/bin/qemu-arm cat < ../../../../../src/bin/cat/testinput

also failed.

I decided to fire up gdb on the simplified test. To do this, I started QEMU with the option to listen for the debugger on port 1234.

~/ellcc/bin/qemu-arm -g 1234 cat < ../../../../../src/bin/cat/testinput

In another window, I started the debugger:

[~/ellcc/test/obj/musl/linux/bin/cat] main% ~/ellcc/bin/ecc-gdb cat
GNU gdb (GDB) 7.4
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /home/rich/ellcc/test/obj/musl/linux/bin/cat/cat...done.
(gdb) set arch arm
The target architecture is assumed to be arm
(gdb) target remote :1234
Remote debugging using :1234
[New Remote target]
[Switching to Remote target]
0x0000807c in _start ()
(gdb) break cat.c:294
(gdb) break cat.c:310
Breakpoint 2 at 0x8cec: file ../../../../../src/bin/cat/cat.c, line 310.
(gdb) c
Continuing.

Breakpoint 1, raw_cat (rfd=0) at ../../../../../src/bin/cat/cat.c:294
294 wfd = fileno(stdout);
(gdb) next
295 if (buf == NULL) {
(gdb) print wfd
$1 = 1
(gdb) c
Continuing.

Breakpoint 2, raw_cat (rfd=0) at ../../../../../src/bin/cat/cat.c:310
310 if ((nw = write(wfd, buf + off, (size_t)nr)) < 0) (gdb) print wfd $2 = 0 (gdb)

Well, that is certainly a puzzling result! What could have changed wfd? Looking at the source of cat, it looks like the only thing that could have is the call to fstat(). What if the struct stat definition doesn't match what QEMU (or even ARM Linux) thinks it should be? It turns out that it is very possible that the struct stat used is right beneath the wfd variable on the stack.

Lets check that hypothesis. I'll set a breakpoint right at the fstat() call:

(gdb) set arch arm
The target architecture is assumed to be arm
(gdb) target remote :1234
Remote debugging using :1234
[New Remote target]
[Switching to Remote target]
0x0000807c in _start ()
(gdb) break cat.c:298
Breakpoint 1 at 0x8c40: file ../../../../../src/bin/cat/cat.c, line 298.
(gdb) c
Continuing.

Breakpoint 1, raw_cat (rfd=0) at ../../../../../src/bin/cat/cat.c:298
298 if (fstat(wfd, &sbuf) == 0 &&
(gdb) print wfd
$1 = 1
(gdb) next
303 if (buf == NULL) {
(gdb) print wfd
$2 = 0
(gdb) print sbuf
$3 = {st_dev = 10, __st_dev_padding = 0, __st_ino_truncated = 8, st_mode = 8576, st_nlink = 1, st_uid = 500,
st_gid = 5, st_rdev = 34821, __st_rdev_padding = 0, st_size = 0, st_blksize = 0, st_blocks = 1024,
st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 1338126579, tv_nsec = 0}, st_ctim = {
tv_sec = 1338126579, tv_nsec = 0}, st_ino = 1336216504}
(gdb)

This is interesting. The sbuf structure looks like it incorrectly set. st_nlink is 1, which is good for stdout. st_uid is 500, which is my user id. st_blksize should be 1024, but that value got moved to st_blocks. st_atime (the file access time) is empty and st_ino should be 8 like __st_ino_truncated. It looks like the struct stat definition used by musl for the ARM is incorrect.

I snooped around a little bit and found the problem. The stat struct was defined as:

struct stat
{
dev_t st_dev;
int __st_dev_padding;
long __st_ino_truncated;
mode_t st_mode;
nlink_t st_nlink;
uid_t st_uid;
gid_t st_gid;
dev_t st_rdev;
int __st_rdev_padding;
off_t st_size;
blksize_t st_blksize;

blkcnt_t st_blocks;
struct timespec st_atim;
struct timespec st_mtim;
struct timespec st_ctim;
ino_t st_ino;
};

It turned out that some padding was missing. I modified it to be

struct stat
{
dev_t st_dev;
int __st_dev_padding;
long __st_ino_truncated;
mode_t st_mode;
nlink_t st_nlink;
uid_t st_uid;
gid_t st_gid;
dev_t st_rdev;
int __st_rdev_padding[2];
off_t st_size;
blksize_t st_blksize;
int __st_rdev_padding2[1];
blkcnt_t st_blocks;
struct timespec st_atim;
struct timespec st_mtim;
struct timespec st_ctim;
ino_t st_ino;
};

and voila! The cat was happy again.

Breakpoint 1, raw_cat (rfd=0) at ../../../../../src/bin/cat/cat.c:298
298 if (fstat(wfd, &sbuf) == 0 &&
(gdb) next
303 if (buf == NULL) {
(gdb) print sbuf
$1 = {st_dev = 10, __st_dev_padding = 0, __st_ino_truncated = 8, st_mode = 8576, st_nlink = 1, st_uid = 500,
st_gid = 5, st_rdev = 34821, __st_rdev_padding = {0, 0}, st_size = 0, st_blksize = 1024,
__st_rdev_padding2 = {0}, st_blocks = 0, st_atim = {tv_sec = 1338132494, tv_nsec = 0}, st_mtim = {
tv_sec = 1338132494, tv_nsec = 0}, st_ctim = {tv_sec = 1336216504, tv_nsec = 0}, st_ino = 8}