Category Archives: ELLCC

Building and using the ELLCC cross development tool chain.

Cross Building tcsh with ELLCC

Now that ELLCC can build itself, it’s time to do some testing of other packages. I thought tcsh (a widely used command shell for *nix) would be a good choice, because it would exercise a bunch of system calls and standard library functions on all the targets.

I created a tcsh directory under the ELLCC test/src directory and got the latest version of tcsh sources.

I made a modified build script and Makefile to do the overall configure and build because I wanted an easy way to create executables for all the targets. The build script creates a directory for each target and runs the tcsh configure program in the directory:

#!/bin/sh
# ELLCC build script.

# Get the staging directory.
prefix=`cd ../../..; pwd`

# Figure out the compilers to use.
. $prefix/build-setup $*

echo Configured to $WHY.
echo C compiler: $cc $CFLAGS
echo C++ compiler: $cxx $CXXFLAGS
echo In: build$builddir

if [ "x$arg1" != "x" ] ; then
    # Build for a single target.
    targets=$arg1
fi

# Configure for all active targets in the target list.
for t in $targets; do
  t=`basename $t -elf`
  if [ -e $prefix/libecc/mkscripts/targets/$t/setup.mk ] ; then
    echo Configuring for $t-$os
    mkdir -p build-$t-$os
    make DIR=build-$t-$os CC=$cc CXX=$cxx AR=$ar TARGET=$t OS=$os \
        target=$t haslibs=$haslibs \
        bindir=$bindir prefix=$prefix build=$build \
        configure || exit 1

    make -C build-$t-$os || exit 1
  fi
done

The build script uses a simple make file to get the proper build parameters for each target:

-include $(prefix)/libecc/mkscripts/targets/$(TARGET)/setup.mk

ifneq ($(TARGET),$(build))
  HOST=--host=$(TARGET)-$(OS)
  BUILD=--build=$(build)-$(OS)
else
  HOST=
  BUILD=
endif

ifneq ($(CC),gcc)
  ifeq ($(haslibs),yes)
    CFLAGS=$(CFLAGS.$(TARGET))
    CXXFLAGS=$(CXXFLAGS.$(TARGET))
  endif
endif

configure:
        cd $(DIR) ; \
        ../src/configure \
        CC=$(CC) CFLAGS="$(CFLAGS)" \
        CXX=$(CXX) CXXFLAGS="$(CXXFLAGS)" \
        --bindir=$(bindir) --prefix=$(prefix) \
        $(HOST) $(BUILD) $(TARGETS)

clean:
        rm -fr build-*

Now building is a simple as:

[~/ellcc/test/src/tcsh] dev% ./build

This will do all the configures and makes. The file command can tell what each binary is:

[~/ellcc/test/src/tcsh] dev% file */tcsh
build-armeb-linux/tcsh:      ELF 32-bit MSB executable, ARM, version 1, statically linked, BuildID[sha1]=0x078219cbae606a2a0f587d2ef1ec08cb2f74507d, not stripped
build-arm-linux/tcsh:        ELF 32-bit LSB executable, ARM, version 1, statically linked, BuildID[sha1]=0x1e9b7097aeabe9d87bab6a6c58299d5e9ad8420b, not stripped
build-i386-linux/tcsh:       ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, BuildID[sha1]=0xe22f2010af0036439a4291a813fe52281fc0a854, not stripped
build-microblaze-linux/tcsh: ELF 32-bit MSB executable, version 1 (SYSV), statically linked, not stripped
build-mipsel-linux/tcsh:     ELF 32-bit LSB executable, MIPS, MIPS-I version 1, statically linked, BuildID[sha1]=0x38cea1a3ab4fbc8499d49c9db3d81fb61856694d, not stripped
build-mips-linux/tcsh:       ELF 32-bit MSB executable, MIPS, MIPS-I version 1, statically linked, BuildID[sha1]=0xfb15ef6c9556283827968236f14b182f23ecc1e0, not stripped
build-ppc-linux/tcsh:        ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), statically linked, BuildID[sha1]=0x728ba20e60011cb9e4daff2961c285fa689e23aa, not stripped
build-x86_64-linux/tcsh:     ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=0xbd05c401b4275cb101304547a070d55dd296227c, not stripped
[~/ellcc/test/src/tcsh] dev%

The size command will give a comparison of the executable size of each target:

[~/ellcc/test/src/tcsh] dev% size */tcsh
   text    data     bss     dec     hex filename
 852268   11940   84232  948440   e78d8 build-armeb-linux/tcsh
 850044   11940   84232  946216   e7028 build-arm-linux/tcsh
 836222   11932   83960  932114   e3912 build-i386-linux/tcsh
1211936   11944   84104 1307984  13f550 build-microblaze-linux/tcsh
 963356   11984   84088 1059428  102a64 build-mipsel-linux/tcsh
 963536   11984   84088 1059608  102b18 build-mips-linux/tcsh
 892092   13364   84352  989808   f1a70 build-ppc-linux/tcsh
 878438   15800   89840  984078   f040e build-x86_64-linux/tcsh
[~/ellcc/test/src/tcsh] dev%

It is interesting that the microblaze text section is to much larger than the rest.

I’ll try to run one of the executables.

[~/ellcc/test/src/tcsh] dev% ~/ellcc/bin/qemu-ppc build-ppc-linux/tcsh
[~/ellcc/test/src/tcsh] dev% echo $version
tcsh 6.18.01 (Astron) 2012-02-14 (ppc-apple-linux) options wide,nls,dl,al,kan,rh,color,filec
[~/ellcc/test/src/tcsh] dev%

Sure enough, it is an ppc-apple-linux build (!?!). Fun.

Building Lua for ARM with ELLCC

Today someone mentioned Lua the scripting language and how it was a nice embed-able language well suited to small embedded systems. I decided to see how well ELLCC could do building Lua for an ARM Linux target. You can build the ELLCC compiler package by following the instructions here.

I downloaded the latest Lua tarball from their download page and was off to the races. I extracted Lua into my ~/ellcc directory:

[~] dev% cd ~/ellcc
[~/ellcc] dev% tar xvfpz Downloads/lua-5.2.2.tar.gz

I made a few small changes to their configuration to use ecc for the ARM:

[~] dev% diff -r -c lua-5.2.2 ellcc/lua-5.2.2/
diff -r -c lua-5.2.2/src/luaconf.h ellcc/lua-5.2.2/src/luaconf.h
*** lua-5.2.2/src/luaconf.h     2013-03-16 16:10:18.000000000 -0500
--- ellcc/lua-5.2.2/src/luaconf.h       2013-11-08 09:59:56.000000000 -0600
***************
*** 43,49 ****
  #if defined(LUA_USE_LINUX)
  #define LUA_USE_POSIX
  #define LUA_USE_DLOPEN                /* needs an extra library: -ldl */
! #define LUA_USE_READLINE      /* needs some extra libraries */
  #define LUA_USE_STRTODHEX     /* assume 'strtod' handles hex formats */
  #define LUA_USE_AFORMAT               /* assume 'printf' handles 'aA' specifiers */
  #define LUA_USE_LONGLONG      /* assume support for long long */
--- 43,49 ----
  #if defined(LUA_USE_LINUX)
  #define LUA_USE_POSIX
  #define LUA_USE_DLOPEN                /* needs an extra library: -ldl */
! // #define LUA_USE_READLINE   /* needs some extra libraries */
  #define LUA_USE_STRTODHEX     /* assume 'strtod' handles hex formats */
  #define LUA_USE_AFORMAT               /* assume 'printf' handles 'aA' specifiers */
  #define LUA_USE_LONGLONG      /* assume support for long long */
diff -r -c lua-5.2.2/src/Makefile ellcc/lua-5.2.2/src/Makefile
*** lua-5.2.2/src/Makefile      2012-12-27 04:51:43.000000000 -0600
--- ellcc/lua-5.2.2/src/Makefile        2013-11-08 21:34:36.494043682 -0600
***************
*** 6,14 ****
  # Your platform. See PLATS for possible values.
  PLAT= none
  
! CC= gcc
  CFLAGS= -O2 -Wall -DLUA_COMPAT_ALL $(SYSCFLAGS) $(MYCFLAGS)
  LDFLAGS= $(SYSLDFLAGS) $(MYLDFLAGS)
  LIBS= -lm $(SYSLIBS) $(MYLIBS)
  
  AR= ar rcu
--- 6,16 ----
  # Your platform. See PLATS for possible values.
  PLAT= none
  
! CC= /home/rich/ellcc/bin/ecc
  CFLAGS= -O2 -Wall -DLUA_COMPAT_ALL $(SYSCFLAGS) $(MYCFLAGS)
+ CFLAGS+= -target arm-ellcc-linux-eabi -mcpu=armv6z -mfpu=vfp -mfloat-abi=softfp
  LDFLAGS= $(SYSLDFLAGS) $(MYLDFLAGS)
+ LDFLAGS+= -target arm-ellcc-linux-eabi
  LIBS= -lm $(SYSLIBS) $(MYLIBS)
  
  AR= ar rcu
***************
*** 103,109 ****
  generic: $(ALL)
  
  linux:
!       $(MAKE) $(ALL) SYSCFLAGS="-DLUA_USE_LINUX" SYSLIBS="-Wl,-E -ldl -lreadline"
  
  macosx:
        $(MAKE) $(ALL) SYSCFLAGS="-DLUA_USE_MACOSX" SYSLIBS="-lreadline"
--- 105,111 ----
  generic: $(ALL)
  
  linux:
!       $(MAKE) $(ALL) SYSCFLAGS="-DLUA_USE_LINUX" SYSLIBS="-Wl,-E -ldl"
  
  macosx:
        $(MAKE) $(ALL) SYSCFLAGS="-DLUA_USE_MACOSX" SYSLIBS="-lreadline"
[~] dev% 

Then I just typed:

[~/ellcc] dev% cd lua-5.2.2/
[~/ellcc/lua-5.2.2] dev% make linux

I now have my ARM lua exectiable:

[~/ellcc/lua-5.2.2] dev% file src/lua
src/lua: ELF 32-bit LSB executable, ARM, version 1, statically linked, BuildID[sha1]=0x635737dc31e9d493f06b8f9fa5d2e7e3c1fe93ee, not stripped
[~/ellcc/lua-5.2.2] dev% 

Which I can run with QEMU (I use an x86_64 Linux box and don’t have ARM hardware handy):

[~/ellcc/lua-5.2.2] dev% ~/ellcc/bin/qemu-arm src/lua
Lua 5.2.2  Copyright (C) 1994-2013 Lua.org, PUC-Rio
> 

I then went to Lua’s live demo page and cut and pasted an example program:

> -- hello.lua
-- the first program in every language

io.write("Hello world, from ",_VERSION,"!\n")> > > 
Hello world, from Lua 5.2!
> 

Nice!

Using ELLCC to Build Itself, Including Canadian Cross Builds

The ELLCC tool chain can now completely build itself. Currently the build has been tested only on Linux systems. The steps I’ll use here are:

  1. Build ELLCC using the system compiler (bootstrap build)
  2. Build it again with itself (sanity build)
  3. Cross build it for another target machine (show off build)

ELLCC supports several target processors:

  • arm – little endian ARM
  • armeb – big endian ARM
  • i366 – x86 in 32 bit mode
  • microblaze – the Xilinx soft-core processor for FPGAs
  • mips – big endian Mips
  • mipsel – little endian Mips
  • ppc – the Power PC in 32 bit mode
  • x86_64 – x86 in 64 bit mode

The compiler we’ll be building can target all of them.

Go into a directory and get the latest ELLCC top of tree (TOT):

[~] dev% svn checkout http://ellcc.org/svn/ellcc/trunk ellcc

I checkout into a directory called ellcc, but the name isn’t important.
In this example. I checked out the files in my home directory. Now we can do the initial bootstrap build:

[~] dev% cd ellcc
[~/ellcc] dev% ./build

The build takes quite a while because it is building ecc, the clang/LLVM based C/C++ compiler, GNU binutils for the assemblers, linker, and utilities, GDB for debugging, and QEMU for all the target processors for testing purposes. It also builds a compete set of run-time libraries:

  • libc++ for the C++ standard library
  • libc++ABI for C++ run-time suport
  • libunwind for C++ exception handling
  • musl for the C standard library
  • compiler-rt for low-level support routines
  • ncurses for terminal support
  • zlib for compression/decompression

All of these libraries are built for all the supported target processors.

When the bootstrap build is finished, you’ll get a message like this:

Please run the build script again to bootstrap ecc.
This may be done a few times:
1. ecc is built with itself (compiled with gcc) and libecc.
2. ecc is built with itself (compiled with itself) and libecc.

Run build again. This will build the ELLCC tools with themselves to complete the bootstrap.

Just run the same build command again:

[~/ellcc] dev% ./build

This time you won’t get the message above, since ELLCC has been completely bootstrapped.

Now for the fun part, a Canadian cross build.
In this step, we’ll use our newly built compiler to build a compiler that will run on a different target system. In this case, the ELLCC build rules don’t bother with compiling QEMU, since it is only used for ELLCC development. In addition, we can skip building the libraries, since they have already been built. I’ll do a build for an ARM target:

[~/ellcc] dev%de>[~/ellcc] dev% ./build arm

When this build completes, we’ll have a new directory populated with the ARM executables, bin-arm-linux:

[~/ellcc] dev% ls bin-arm-linux/
arm-elf-as*     ecc-gdb*      i386-elf-as*       llvm-extract*     macho-dump*
bugpoint*       ecc-gprof*    llc*               llvm-link*        microblaze-elf-as*
clang-check*    ecc-ld*       lli*               llvm-mc*          mips-elf-as*
clang-format*   ecc-ld.bfd*   lli-child-target*  llvm-mcmarkup*    not*
clang-tblgen*   ecc-nm*       llvm-ar*           llvm-nm*          opt*
ecc*            ecc-objcopy*  llvm-as*           llvm-objdump*     ppc64-elf-as*
ecc++@          ecc-objdump*  llvm-bcanalyzer*   llvm-ranlib@      ppc-elf-as*
ecc-addr2line*  ecc-ranlib*   llvm-config*       llvm-readobj*     sparc-elf-as*
ecc-ar*         ecc-readelf*  llvm-config-host*  llvm-rtdyld*      x86_64-elf-as*
ecc-as*         ecc-size*     llvm-cov*          llvm-size*
ecc-c++filt*    ecc-strings*  llvm-diff*         llvm-stress*
ecc-elfedit*    ecc-strip*    llvm-dis*          llvm-symbolizer*
ecc-embedspu*   FileCheck*    llvm-dwarfdump*    llvm-tblgen*

[~/ellcc] dev% ~/ellcc[~/ellcc] dev% file bin-arm-linux/ecc
bin-arm-linux/ecc: ELF 32-bit LSB executable, ARM, version 1, statically linked, BuildID[sha1]=0x9ff616a316ab010b46062f7fc1dff554ee7a6db8, not stripped
/bin/qemu-arm bin-arm-linux/ecc -v
clang version 3.4 (trunk)
Target: arm-unknown-linux-gnu
Thread model: posix
Selected GCC installation: 
[~/ellcc] dev

Very cool!

A Major ELLCC Milestone: Building a Completely non-gnu C++ Program for Linux

ELLCC (pronounced “elk”), the embedded compiler collection based on clang/LLVM, has reached a major milestone: The ability to create C++ programs for several target processors using a set of libraries based on libraries with non-gnu licences.

ELLCC incorporates a C/C++ compiler based on clang/LLVM (ecc). The current supported target processors are ARM (both little and big endian), i386, Microblaze (the Xilinx softcore processor), Mips (both big and little endian), PowerPC (32 bit only for now), and x86_64.

The first test case, using the LLVM lit test framework, looks pretty simple:

// Compile and run for every target.
// RUN: %armexx -o %t %s && %armrun %t  | FileCheck -check-prefix=CHECK %s
// RUN: %armebexx -o %t %s && %armebrun %t | FileCheck -check-prefix=CHECK %s
// RUN: %i386exx -o %t %s && %i386run %t | FileCheck -check-prefix=CHECK %s
// RUN: %microblazeexx -o %t %s && %microblazerun %t | FileCheck -check-prefix=CHECK %s
// RUN: %mipsexx -o %t %s && %mipsrun %t | FileCheck -check-prefix=CHECK %s
// RUN: %mipselexx -o %t %s && %mipselrun %t | FileCheck -check-prefix=CHECK %s
// RUN: %ppcexx -o %t %s && %ppcrun %t | FileCheck -check-prefix=CHECK %s
// FAIL: %ppc64exx -o %t %s && %ppc64run %t | FileCheck -check-prefix=CHECK %s
// RUN: %x86_64exx -o %t %s && %x86_64run %t | FileCheck -check-prefix=CHECK %s
// CHECK: foo.i = 10
// CHECK: bye
#include <cstdio>

class Foo {
    int i;
public:
    Foo(int i) : i(i) { }
    int get() { return i; }
    ~Foo() { printf("bye\n"); }
};

int main(int argc, char** argv)
{
    Foo foo(10);
    printf("foo.i = %d\n", foo.get());
}

It does look pretty simple, but it represents:

  • ecc compiled from clang/LLVM (near) current
  • libc++ from the LLVM project built using ecc
  • libc++ABI from the LLVM project built using ecc
  • libunwind for handling non-static destructors and exceptions built using ecc
  • musl for the standard C library built using ecc
  • compiler-rt for low level processor support built using ecc

I have some more cleanup to do, but after that the next step will be a self hosted ELLCC.

Tracking down a Microblaze problem.

Part of the installation instructions for the ELLCC cross development suite is compiling and running bzip2 for each target processor, as described here. This is working for all the current targets except Microblaze.

Part of the test is to run bzip2 against known files and compare the results to pre-compressed result files. The first test run for Microblaze generates an error:

Doing 6 tests (3 compress, 3 uncompress) ...
If there's a problem, things might stop at this point.
 
../../../bin/qemu-microblaze ./bzip2.microblaze -1  < sample1.ref > sample1.rb2

bzip2.microblaze: couldn't allocate enough memory
        Input file = (stdin), output file = (stdout)
make[1]: *** [test] Error 1

To try to debug the problem, I fired up QEMU in debug mode on the Microblaze binary:

dev% cd ~/ellcc/test/src/bzip2-1.0.6/
dev% ~/ellcc/bin/qemu-microblaze -g 1234 ./bzip2.microblaze -1 < sample1.ref > sample1.rb2

This causes QEMU to wait for debug commands by listening on port 1234. The bzip2 executable is held up from running. Then I start up GDB to debug the program:

dev% ~/ellcc/bin/ecc-gdb bzip2.microblaze

Now I can connect to the program:

(gdb) target remote :1234
Remote debugging using :1234
0x100000f0 in _start ()
(gdb) 

The program has been paused at the symbol _start, which is the low level entry point of all programs. For the Microblaze, it is in the file crt1.s. This entry point sets up the basic environment for the program and enters the start up code, ultimately entering the main() function.

Looking at the bzip2 sources, I find the message “couldn’t allocate enough memory” In the outOfMemory() function in bzip2.c. To track down the problem, I need to find out the origin of that call. I’ll set a breakpoint there, continue the and see:

(gdb) break outOfMemory
Breakpoint 1 at 0x10004db4: file bzip2.c, line 877.
(gdb)

From here I can use the GDB “continue” command (shortened to “c”) to get to the outOfMemory() breakpoint:

(gdb) c
Continuing.

Breakpoint 1, outOfMemory () at bzip2.c:877
877        showFileNames();
(gdb) 

Great! I’ll use the GDB “where” command to see the stack trace, and find out how the program got here:

(gdb) where
#0  outOfMemory () at bzip2.c:877
(gdb)

Not so great. For some reason there is no stack trace. This will make debugging this problem quite a bit harder. All I can find by looking at the source is that there is a call to malloc() that is failing.

I guess there are two options here. First I could try to debug why there isn’t a stack trace and be able to use GDB to get more information, or second I could try to debug the malloc() problem. I’ll file a bug report on the GDB problem and try to track the malloc() bug down with the debug capabilities that I have.

I’m not sure if malloc() is failing in general, or if it is being given an invalid input argument. Let’s try restarting the program and setting a breakpoint on the call to malloc().

The call to malloc() is in a function called myMalloc():

(gdb) list myMalloc
1700
1701
1702    /*---------------------------------------------*/
1703    static 
1704    void *myMalloc ( Int32 n )
1705    {
1706       void* p;
1707
1708       p = malloc ( (size_t)n );
1709       if (p == NULL) outOfMemory ();
(gdb)

Let’s set a breakpoint on line 1708:

(gdb) break bzip2.c:1708
Breakpoint 1 at 0x10006e5c: file bzip2.c, line 1708.
(gdb) c
Continuing.

Breakpoint 1, myMalloc (n=268724702) at bzip2.c:1708
1708       p = malloc ( (size_t)n );
(gdb) print n
$1 = 268724702
(gdb) 

That doesn’t look good. We’re trying to allocate way too much memory. Maybe a disassembly and register dump will help:

(gdb) disas
Dump of assembler code for function myMalloc:
   0x10006e48 <+0>:     addik   r1, r1, -20
   0x10006e4c <+4>:     swi     r15, r1, 0
   0x10006e50 <+8>:     swi     r19, r1, 4
   0x10006e54 <+12>:    add     r19, r1, r0
   0x10006e58 <+16>:    swi     r5, r19, 24
=> 0x10006e5c <+20>:    addik   r1, r1, -8
   0x10006e60 <+24>:    imm     2
   0x10006e64 <+28>:    brlid   r15, -25884     // 0x10030948 
   0x10006e68 <+32>:    swi     r5, r19, 12
   0x10006e6c <+36>:    addik   r1, r1, 8
   0x10006e70 <+40>:    lwi     r5, r19, 12
   0x10006e74 <+44>:    swi     r3, r19, 8
   0x10006e78 <+48>:    bneid   r3, 28  // 0x10006e94 <$tmp596>
   0x10006e7c <+52>:    swi     r5, r19, 16
   0x10006e80 <+56>:    brid    8       // 0x10006e88 <$BB43_1>
   0x10006e84 <+60>:    or      r0, r0, r0
   0x10006e88 <+0>:     brlid   r15, -8472      // 0x10004d70 
   0x10006e8c <+4>:     addik   r1, r1, -4
   0x10006e90 <+8>:     addik   r1, r1, 4
   0x10006e94 <+0>:     lwi     r3, r19, 8
   0x10006e98 <+4>:     add     r1, r19, r0
   0x10006e9c <+8>:     lwi     r19, r1, 4
---Type  to continue, or q  to quit---q
Quit
(gdb) info regi
r0             0x0      0
r1             0xf6ffeb38       0xf6ffeb38
r2             0x0      0
r3             0x0      0
r4             0xf6ffecac       -150999892
r5             0x8      8
r6             0xf6ffeeb8       -150999368
r7             0x4      4
r8             0x0      0
r9             0x0      0
r10            0x1003e278       268690040
r11            0xae     174
r12            0xae     174
r13            0x0      0
r14            0x100305a4       268633508
r15            0x10006e14       268463636
r16            0x0      0
r17            0x0      0
r18            0x0      0
r19            0xf6ffeb38       -151000264
r20            0x0      0
r21            0x0      0
r22            0x0      0
---Type  to continue, or q  to quit---
Quit
(gdb) 

Interesting. Dusting off the Microblaze application binary interface doc, I see that parameters are passed to functions in registers r5-r10, r3 and r4 are used to return values from functions, r15 is the return address register, r1 is the stack pointer, and r19 is the frame pointer. It looks as if eight bytes are wanted, because r5 contains 8. I suspect that the value printed for n is wrong because at this point n has not been saved to memory.

As an aside, the Microblaze has something called branch delay slots. The brlid instruction used to call malloc() is “branch and link immediate with delay”. The swi instruction following the brlid instruction is in the delay slot. It will be executed when the brlid instruction is executed and will be done by the time malloc() is entered. This instruction is saving n to memory, which is why the value of n appears wrong.

Let’s take a look at where myMalloc() is being called by disassembling at the return address contained in r15:

(gdb) disas 0x10006e14
Dump of assembler code for function mkCell:
   0x10006e00 <+0>:     addik   r1, r1, -12
   0x10006e04 <+4>:     swi     r15, r1, 0
   0x10006e08 <+8>:     swi     r19, r1, 4
   0x10006e0c <+12>:    add     r19, r1, r0
   0x10006e10 <+16>:    addik   r1, r1, -8
   0x10006e14 <+20>:    brlid   r15, 52 // 0x10006e48 
   0x10006e18 <+24>:    addik   r5, r0, 8       // 0x8
   0x10006e1c <+28>:    addik   r1, r1, 8
   0x10006e20 <+32>:    swi     r3, r19, 8
   0x10006e24 <+36>:    swi     r0, r3, 0
   0x10006e28 <+40>:    lwi     r3, r19, 8
   0x10006e2c <+44>:    swi     r0, r3, 4
   0x10006e30 <+48>:    lwi     r3, r19, 8
   0x10006e34 <+52>:    add     r1, r19, r0
   0x10006e38 <+56>:    lwi     r19, r1, 4
   0x10006e3c <+60>:    lwi     r15, r1, 0
   0x10006e40 <+64>:    rtsd    r15, 8
   0x10006e44 <+68>:    addik   r1, r1, 12
End of assembler dump.
(gdb)

Sure enough, the delay slot following the call to myMalloc() loads the first parameter register, r5, with 8.

Time to start stepping into malloc() going on. Three stepi commands get us into malloc():

(gdb) stepi
0x10006e60      1708       p = malloc ( (size_t)n );
(gdb) stepi
0x10006e64      1708       p = malloc ( (size_t)n );
(gdb) stepi
0x10006e68      1708       p = malloc ( (size_t)n );
(gdb) stepi
malloc (n=0) at src/malloc/malloc.c:335
335     {
(gdb) 

A little stepping around in malloc() gives something suspicious:

(gdb) step
bin_index_up (x=268442384) at src/malloc/malloc.c:128
128     {
(gdb) step
129             x = x / SIZE_ALIGN - 1;
(gdb) step
130             if (x <= 32) return x;
(gdb) print x
$4 = 4143967032
(gdb) step
132     }
(gdb) print x
$5 = 4143967032

Maybe the divide is being done incorrectly? Let's disassemble bin_index_up():

(gdb) disas bin_index_up
Dump of assembler code for function bin_index_up:
   0x10030e10 <+0>:     addik   r1, r1, -40
   0x10030e14 <+4>:     swi     r15, r1, 0
   0x10030e18 <+8>:     swi     r19, r1, 4
   0x10030e1c <+12>:    add     r19, r1, r0
   0x10030e20 <+16>:    addik   r3, r0, 4       // 0x4
   0x10030e24 <+20>:    andi    r3, r3, 31
   0x10030e28 <+24>:    addik   r4, r5, 0
   0x10030e2c <+28>:    addk    r6, r3, r0
   0x10030e30 <+32>:    swi     r5, r19, 44
   0x10030e34 <+36>:    swi     r6, r19, 20
   0x10030e38 <+40>:    beqid   r3, 40  // 0x10030e60 <$BB2_5>
   0x10030e3c <+44>:    swi     r4, r19, 24
   0x10030e40 <+48>:    lwi     r3, r19, 20
   0x10030e44 <+52>:    lwi     r4, r19, 24
   0x10030e48 <+56>:    srl     r4, r4
   0x10030e4c <+60>:    addik   r3, r3, -1
   0x10030e50 <+64>:    addk    r5, r3, r0
   0x10030e54 <+68>:    swi     r5, r19, 20
   0x10030e58 <+72>:    bneid   r3, -24 // 0x10030e40 
   0x10030e5c <+76>:    swi     r4, r19, 24
   0x10030e60 <+0>:     lwi     r3, r19, 24
   0x10030e64 <+4>:     addik   r4, r0, -1
=> 0x10030e68 <+8>:     addk    r3, r3, r4
   0x10030e6c <+12>:    lwi     r4, r19, 44
   0x10030e70 <+16>:    swi     r4, r19, 12
   0x10030e74 <+20>:    swi     r3, r19, 12
   0x10030e78 <+24>:    addik   r5, r0, 32      // 0x20
   0x10030e7c <+0>:     cmpu    r3, r5, r3
   0x10030e80 <+4>:     bgtid   r3, 28  // 0x10030e9c <$tmp38>

 

It turns out the SIZE_ALIGN is 16. So the "divide" is done by using shifts. Stepping through bin_index_up() shows the actual value is 16 / 16 - 1. The result is 0 as expected. A red herring. Time to dig further.

After a little investigation it turns out that malloc() is failing only for larger allocations. The musl library malloc tries to use mmap() to allocate larger blocks of memory. The interesting thing about this is that malloc calls a function called __mmap() that takes six parameters, the last of which is of type off_t, a 64 bit value. Six parameters are interesting. The Microblaze ABI reserves 6 registers to pass parameters with any addition parameter data saved in the stack by the caller. Here is the disassembly of where __mmap is called:

(gdb) disas 0x100309f8
Dump of assembler code for function $tmp5:
   0x100309cc <+0>:     addik   r1, r1, -32
   0x100309d0 <+4>:     swi     r0, r19, 28
   0x100309d4 <+8>:     addik   r7, r0, 3       // 0x3
   0x100309d8 <+12>:    addik   r8, r0, 34      // 0x22
   0x100309dc <+16>:    addik   r4, r0, -1
   0x100309e0 <+20>:    addk    r5, r0, r0
   0x100309e4 <+24>:    swi     r5, r19, 76
   0x100309e8 <+28>:    addk    r6, r3, r0
   0x100309ec <+32>:    addk    r9, r4, r0
   0x100309f0 <+36>:    lwi     r10, r19, 76
   0x100309f4 <+40>:    imm     0
   0x100309f8 <+44>:    brlid   r15, 8784       // 0x10032c48 <__mmap>
   0x100309fc <+48>:    swi     r4, r19, 80
   0x10030a00 <+52>:    addik   r1, r1, 32
   0x10030a04 <+56>:    swi     r3, r19, 52
   0x10030a08 <+60>:    lwi     r4, r19, 80
End of assembler dump.
(gdb) 

Here is the beginning of the __mmap() code:

(gdb) disas
Dump of assembler code for function __mmap:
   0x10032c48 <+0>:     addik   r1, r1, -92
   0x10032c4c <+4>:     swi     r15, r1, 0
   0x10032c50 <+8>:     swi     r19, r1, 4
   0x10032c54 <+12>:    add     r19, r1, r0
   0x10032c58 <+16>:    addik   r3, r0, -4096
   0x10032c5c <+20>:    and     r3, r10, r3
   0x10032c60 <+24>:    lwi     r4, r19, 120
=> 0x10032c64 <+28>:    andi    r11, r4, 4095
   0x10032c68 <+32>:    or      r3, r11, r3

Here sre some of the register values at the point of failure:

(gdb) info regi
r0             0x0      0
r1             0xf6ffd3e4       0xf6ffd3e4
r2             0x0      0
r3             0x0      0
r4             0xf6ffd4d8       -151005992
r5             0x0      0
r6             0x62000  401408

The bold lines in the two assembly snippets above are where the last parameter is passed and accessed by the caller and callee. Notice that the caller allocates space on the stack (32 bytes) as a save area for the parameters to be passed. This is room for the six registers used to pass parameters and an additional four bytes for the extra space needed by the 64 bit parameter six.

The problem is that the r0 register (which always contains zero) is stored at an offset of 28 from the frame pointer, not at offset 28 from the stack pointer. When it is accessed at offset 120 in _mmap(), an uninitialized value is used. Ouch.

Time to go digging around in the Microblaze code generator. I'll follow up with another post when I've fixed it.

ELLCC is an easy to use Cross Compiler

One thing that I probably haven’t made clear enough is that ELLCC is designed to be an easy to use cross development environment. Let’s say you need to generate code that runs on various Linux targets. In the gcc world, you’d have to either build multiple copies of the gcc tool chain or find pre-built binary packages that support each of the targets individually. Life is much simpler in the ELLCC world.

Let’s say you have a program you want to build:

#include 

int main()
{
    printf("hello world\n");
}

On a Linux host you can use ecc to build in the normal way:

~] main% ecc hello.c
[~] main% ./a.out 
hello world
[~] main% file a.out
a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not stripped
[~] main%

Now, let’s say you want to target an ARM processor. The only difference is that you can specify the target on the command line:

[~] main% ecc -target arm-ellcc-linux hello.c
[~] main% qemu-arm a.out
hello world
[~] main% file a.out
a.out: ELF 32-bit LSB executable, ARM, version 1, statically linked, not stripped
[~] main%

Same for the Power PC:

[~] main% ecc -target ppc-ellcc-linux hello.c
[~] main% qemu-ppc a.out
hello world
[~] main% file a.out
a.out: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), statically linked, not stripped
[~] main%

Note that I’m using QEMU’s Linux user mode emulation to run the programs for non-native targets. The results would be the same if you were to download the executables to your favorite smart phone.

ELLCC and the musl Standard C Library

ELLCC, my clang/LLVM based cross development tool chain for ARM, Microblaze, Mips, Power PC, and X86, now incorporates musl as its standard C library for Linux. musl is a MIT licensed, highly POSIX compliant library offering high performance and a small foot print. I spent several weeks evaluating musl before deciding to use it in ELLCC. The clarity and consistency of its code base and the quality of its design convinced me that musl would be an ideal addition to ELLCC. If you’re looking for a non-GPL library solution, I highly recommend musl.

Using ELLCC to Cross Debug an ARM Application

I’m looking at replacing my Linux port of the NetBSD standard library with musl, another library with a BSD-like license. For the past couple of days I’ve been doing a feasibility study on musl, running it through my regression tests for x86_64 and things look very good.

Today, I decided to test the ARM and the first regression test failed. If you’re not familiar with ELLCC, it is a cross development tool chain that uses ecc (based on clang/LLVM) as the compiler. As part of my regression testing, I compile some of the NetBSD user-land utilities and run them using QEMU. When I ran the test of the program cat it failed:

cat ../../../../../src/bin/cat/testinput | ./cat | cmp ../../../../../src/bin/cat/testinput || exit 1
stdout: Bad file number
cmp: EOF on -

Strange. How could stdout have a bad file number? I simplified the test case and found that

~/ellcc/bin/qemu-arm cat < ../../../../../src/bin/cat/testinput

also failed.

I decided to fire up gdb on the simplified test. To do this, I started QEMU with the option to listen for the debugger on port 1234.

~/ellcc/bin/qemu-arm -g 1234 cat < ../../../../../src/bin/cat/testinput

In another window, I started the debugger:

[~/ellcc/test/obj/musl/linux/bin/cat] main% ~/ellcc/bin/ecc-gdb cat
GNU gdb (GDB) 7.4
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /home/rich/ellcc/test/obj/musl/linux/bin/cat/cat...done.
(gdb) set arch arm
The target architecture is assumed to be arm
(gdb) target remote :1234
Remote debugging using :1234
[New Remote target]
[Switching to Remote target]
0x0000807c in _start ()
(gdb) break cat.c:294
(gdb) break cat.c:310
Breakpoint 2 at 0x8cec: file ../../../../../src/bin/cat/cat.c, line 310.
(gdb) c
Continuing.

Breakpoint 1, raw_cat (rfd=0) at ../../../../../src/bin/cat/cat.c:294
294 wfd = fileno(stdout);
(gdb) next
295 if (buf == NULL) {
(gdb) print wfd
$1 = 1
(gdb) c
Continuing.

Breakpoint 2, raw_cat (rfd=0) at ../../../../../src/bin/cat/cat.c:310
310 if ((nw = write(wfd, buf + off, (size_t)nr)) < 0) (gdb) print wfd $2 = 0 (gdb)

Well, that is certainly a puzzling result! What could have changed wfd? Looking at the source of cat, it looks like the only thing that could have is the call to fstat(). What if the struct stat definition doesn't match what QEMU (or even ARM Linux) thinks it should be? It turns out that it is very possible that the struct stat used is right beneath the wfd variable on the stack.

Lets check that hypothesis. I'll set a breakpoint right at the fstat() call:

(gdb) set arch arm
The target architecture is assumed to be arm
(gdb) target remote :1234
Remote debugging using :1234
[New Remote target]
[Switching to Remote target]
0x0000807c in _start ()
(gdb) break cat.c:298
Breakpoint 1 at 0x8c40: file ../../../../../src/bin/cat/cat.c, line 298.
(gdb) c
Continuing.

Breakpoint 1, raw_cat (rfd=0) at ../../../../../src/bin/cat/cat.c:298
298 if (fstat(wfd, &sbuf) == 0 &&
(gdb) print wfd
$1 = 1
(gdb) next
303 if (buf == NULL) {
(gdb) print wfd
$2 = 0
(gdb) print sbuf
$3 = {st_dev = 10, __st_dev_padding = 0, __st_ino_truncated = 8, st_mode = 8576, st_nlink = 1, st_uid = 500,
st_gid = 5, st_rdev = 34821, __st_rdev_padding = 0, st_size = 0, st_blksize = 0, st_blocks = 1024,
st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 1338126579, tv_nsec = 0}, st_ctim = {
tv_sec = 1338126579, tv_nsec = 0}, st_ino = 1336216504}
(gdb)

This is interesting. The sbuf structure looks like it incorrectly set. st_nlink is 1, which is good for stdout. st_uid is 500, which is my user id. st_blksize should be 1024, but that value got moved to st_blocks. st_atime (the file access time) is empty and st_ino should be 8 like __st_ino_truncated. It looks like the struct stat definition used by musl for the ARM is incorrect.

I snooped around a little bit and found the problem. The stat struct was defined as:

struct stat
{
dev_t st_dev;
int __st_dev_padding;
long __st_ino_truncated;
mode_t st_mode;
nlink_t st_nlink;
uid_t st_uid;
gid_t st_gid;
dev_t st_rdev;
int __st_rdev_padding;
off_t st_size;
blksize_t st_blksize;

blkcnt_t st_blocks;
struct timespec st_atim;
struct timespec st_mtim;
struct timespec st_ctim;
ino_t st_ino;
};

It turned out that some padding was missing. I modified it to be

struct stat
{
dev_t st_dev;
int __st_dev_padding;
long __st_ino_truncated;
mode_t st_mode;
nlink_t st_nlink;
uid_t st_uid;
gid_t st_gid;
dev_t st_rdev;
int __st_rdev_padding[2];
off_t st_size;
blksize_t st_blksize;
int __st_rdev_padding2[1];
blkcnt_t st_blocks;
struct timespec st_atim;
struct timespec st_mtim;
struct timespec st_ctim;
ino_t st_ino;
};

and voila! The cat was happy again.

Breakpoint 1, raw_cat (rfd=0) at ../../../../../src/bin/cat/cat.c:298
298 if (fstat(wfd, &sbuf) == 0 &&
(gdb) next
303 if (buf == NULL) {
(gdb) print sbuf
$1 = {st_dev = 10, __st_dev_padding = 0, __st_ino_truncated = 8, st_mode = 8576, st_nlink = 1, st_uid = 500,
st_gid = 5, st_rdev = 34821, __st_rdev_padding = {0, 0}, st_size = 0, st_blksize = 1024,
__st_rdev_padding2 = {0}, st_blocks = 0, st_atim = {tv_sec = 1338132494, tv_nsec = 0}, st_mtim = {
tv_sec = 1338132494, tv_nsec = 0}, st_ctim = {tv_sec = 1336216504, tv_nsec = 0}, st_ino = 8}

Augmenting the Test Suite

ELLCC has a few simple (for now) regression tests that are run after every new build. See this page for more information. A primary goal of ELLCC is to support a POSIX environment for both Linux and standalone (bare metal) embedded systems. To that end, I’ve recently started looking at open source test suites that check for POSIX compliance. One that I’ve found is the Open POSIX Test Suite.

I’ve looked at it a bit and it seems to be a good starting point for testing the ELLCC libraries. Unfortunately, it hasn’t been updated since 2005 so it is a little out of date.

I’ve worked on it a little bit to play better with a more modern Linux release (Fedora 16). I’m currently building the test suite against GCC and plan to build against the ecc compiler as a next step. (ecc uses the host header files and libraries to build programs. In other words, it is pretty similar to using clang/LLVM.) After that, I plan to build and test using libecc, the ELLCC standard libraries based on the NetBSD sources. If you’d like to follow the progress, I’m keeping the updated source in the ELLCC source repository.