Contents
What is ELLCC Exactly?
ELLCC is a set of tools and files that let you create programs. A set of tools and files like this is often called a toolchain. The programs you create can run on a PC or on a development board like the Raspberry Pi. You run ELLCC on a host. A host is a computer that typically has a keyboard and display that you can use to write and build your programs. When you build your program, you can tell ELLCC what your target is. A target is a computer on which you’ll run your program. Often, the host and target is the same and that’s the default way ELLCC works: You write, build, and run your program on one computer.
ELLCC can also work as a cross compiler. A cross compiler, or cross toolchain, can be used to write programs on one computer that will be deployed on another, typically a small system with limited resources. A great example is the Raspberry Pi mentioned above. While you can run ELLCC on the Pi, building large programs will take much longer than if you were to build them on a larger PC. If the host and target are both running Linux, you can even build and run the program on the host for testing and send it to the target when you’re reasonably confident that it is going to work.
Toolchain Basics
When you write a program you’re typically writing it in what’s called a high level language. A high level language is a human readable language like C or C++.1 Before you can run your program on the target your program has to be translated into something that the target can understand, which is often called machine code. Machine code is a strings of zeros and ones that are placed in the target’s memory and then executed. Machine code is made up of instructions which are very low level bit patterns that the target understands. The machine code of different targets are usually very different. This means that the toolchain has to know the details of each target’s instruction set to translate the high level language to the target’s instructions.
There are two main steps that a typical toolchain uses to make an executable program out of high level program source files: compiling and linking. These steps are described in the following sections.
Compiling a Source File
The first step is to translate each source file into a relocatable object file, usually with the name of the source file with a .o extension, so that main.c is translated into main.o. A relocatable object file is a file containing the machine code that implements the functions that the source file contains. In addition the relocatable object file contains information about what functions and variables the source file defines and uses, by name. This information is kept in a part of the .o file called the symbol table. The relocatable part of object file means that there is also information in the file that allows the machine code to be placed anywhere in the target’s memory.
Here’s an example with hello.c:
[test@main ~]$ cat hello.c #include <stdio.h> int main() { printf("hello world\n"); } [test@main ~]$ ecc -c hello.c [test@main ~]$
The -c option tells ecc to compile hello.c to an object file which by default will be named hello.o. You can examine the contents of hello.o with tools that come with ELLCC. The first one we’ll try is ecc-nm, which prints out symbol table information:
[test@main ~]$ ecc-nm hello.o 0000000000000000 T main U printf [test@main ~]$
This shows that hello.o has two symbols in its table: main
and printf
. The 0000000000000000
is the offset of the symbol main
in the object file and the T
says main is defined and is in the text section. There are three main sections in an object file. The text section contains executable code and read only data, the data section contains write-able initialized data and the bss section contains all the variables declared in a program that are not initialized. The symbol main is in the text section since it names a function that will be executed. Notice that the printf
symbol does not have an offset and is marked with a U
. This means that printf
not defined by the hello.o object file but is needed by it.
You can see the machine code in the object file by disassembling it with ecc-objdump:
test@main ~]$ ecc-objdump -d hello.o hello.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <main>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 83 ec 10 sub $0x10,%rsp 8: 48 bf 00 00 00 00 00 movabs $0x0,%rdi f: 00 00 00 12: b0 00 mov $0x0,%al 14: e8 00 00 00 00 callq 19 <main+0x19> 19: 31 c9 xor %ecx,%ecx 1b: 89 45 fc mov %eax,-0x4(%rbp) 1e: 89 c8 mov %ecx,%eax 20: 48 83 c4 10 add $0x10,%rsp 24: 5d pop %rbp 25: c3 retq [test@main ~]$
Notice that the symbol main
is at the beginning of the object file and that the instruction at offset 14 is the call to printf
. The e8
is the callq
opcode and the four zero bytes after it are where the relative offset to printf
will be placed when it is finally known.
As mentioned earlier, the machine code for different targets is usually very different which is why we need a cross compiler to build a program that will run on a target that is different than our host. Here’s an example cross building hello.c for a 64 bit ARM and running ecc-objdump on the object file:
[test@main ~]$ ecc -c hello.c -target arm64v8-linux [test@main ~]$ ecc-objdump -d hello.o hello.o: file format elf64-littleaarch64 Disassembly of section .text: 0000000000000000 <main>: 0: d10083ff sub sp, sp, #0x20 4: a9017bfd stp x29, x30, [sp, #16] 8: 910043fd add x29, sp, #0x10 c: 90000000 adrp x0, 0 <main> 10: 91000000 add x0, x0, #0x0 14: 94000000 bl 0 <printf> 18: 2a1f03e8 mov w8, wzr 1c: b81fc3a0 stur w0, [x29, #-4] 20: 2a0803e0 mov w0, w8 24: a9417bfd ldp x29, x30, [sp, #16] 28: 910083ff add sp, sp, #0x20 2c: d65f03c0 ret [test@main ~]$
Linking Object Files into a Program
The second step in building a program is to take all the object files and put them together to make a program. The linker does this job. The linker takes all the object files in your program, looks for any unresolved symbols, like printf
in the example above, and tries to resolve them. The linker resolves symbols by first looking at all the object files in your program to see what symbols are defined and using those definitions. What happens if the linker can’t resolve all the symbols in the object files you’ve created? That’s where libraries come in.
An include file, like stdio.h
in the example above, is just a file that defines things that you use in your programs so the compiler can see if you’re using them correctly. Some include files, like stdio.h
, are provided by a toolchain to define things that are available in libraries that come with the toolchain. A library is just a collection of object files that provide the definition of functions like printf
so your program can use them. When your program’s object files have unresolved references the linker will look in one or more libraries to see if the needed symbols can be resolved. If so, the definitions in the libraries will be used to resolve the symbols.
Most of the time you don’t run the linker directly. A common way to build a program is to use the compiler to run the linker, like this:
[test@main ~]$ ecc -c hello.c [test@main ~]$ ecc -o hello hello.o [test@main ~]$ ./hello hello world [test@main ~]$
When you provide .o files to the compiler it sends them to the linker with instructions on which libraries to use. You can see this by invoking ecc with the -v option:
[test@main ~]$ ecc -o hello hello.o -v ecc version 2017-07-29 (http://ellcc.org) based on clang version 6.0.0 (trunk 309487) Target: x86_64-ellcc-linux Thread model: posix InstalledDir: /home/test/ellcc/bin "/home/test/ellcc/bin/ecc-ld" -nostdlib -L/home/test/ellcc/libecc/lib/x86_64-linux -m elf_x86_64 --build-id --hash-style=gnu --eh-frame-hdr --gc-sections --defsym __dso_handle=42 -o hello -e _start -Bdynamic -dynamic-linker /home/test/ellcc/libecc/lib/x86_64-linux/libc.so /home/test/ellcc/libecc/lib/x86_64-linux/Scrt1.o hello.o -( -lc -lcompiler-rt -) [test@main ~]$
Notice that the linker, ecc-ld, is invoked with a bunch of options. The -L option tells the linker where to look for the libraries for the target and the -l options tell the linker which libraries to search. -lc says to search the standard C library where printf
and all the other standard C functions reside. If you use #include
files The define functions not in the standard C library, you may have to add -l options to the ecc command line, e.g. #include <curses.h>
and using functions in it would require a -lcurses at link time:
[test@main ~]$ ecc -o hello hello.o -lcurses -v ecc version 2017-07-29 (http://ellcc.org) based on clang version 6.0.0 (trunk 309487) Target: x86_64-ellcc-linux Thread model: posix InstalledDir: /home/test/ellcc/bin "/home/test/ellcc/bin/ecc-ld" -nostdlib -L/home/test/ellcc/libecc/lib/x86_64-linux -m elf_x86_64 --build-id --hash-style=gnu --eh-frame-hdr --gc-sections --defsym __dso_handle=42 -o hello -e _start -Bdynamic -dynamic-linker /home/test/ellcc/libecc/lib/x86_64-linux/libc.so /home/test/ellcc/libecc/lib/x86_64-linux/Scrt1.o hello.o -lcurses -( -lc -lcompiler-rt -) [test@main ~]$
ecc places any -l options you give it after any object files on the command line but before the standard C library’s -lc. The placement is important because the linker processes object files and libraries in command line order in a single pass. If the curses library were placed after the standard library and curses used functions from the standard library that were not already resolved, the linker doesn’t go back and try to resolve them. It just reports them as unresolved.2
ecc-obdump on the executable program gives:
[test@main ~]$ ecc-objdump -d hello hello: file format elf64-x86-64 Disassembly of section .plt: 0000000000400430 <.plt>: 400430: ff 35 d2 0b 20 00 pushq 0x200bd2(%rip) # 601008 <_GLOBAL_OFFSET_TABLE_+0x8> 400436: ff 25 d4 0b 20 00 jmpq *0x200bd4(%rip) # 601010 <_GLOBAL_OFFSET_TABLE_+0x10> 40043c: 0f 1f 40 00 nopl 0x0(%rax) 0000000000400440 <printf@plt>: 400440: ff 25 d2 0b 20 00 jmpq *0x200bd2(%rip) # 601018 <printf> 400446: 68 00 00 00 00 pushq $0x0 40044b: e9 e0 ff ff ff jmpq 400430 <.plt> 0000000000400450 <__libc_start_main@plt>: 400450: ff 25 ca 0b 20 00 jmpq *0x200bca(%rip) # 601020 <__libc_start_main> 400456: 68 01 00 00 00 pushq $0x1 40045b: e9 d0 ff ff ff jmpq 400430 <.plt> Disassembly of section .text: 0000000000400460 <_start>: 400460: 48 31 ed xor %rbp,%rbp 400463: 48 89 e7 mov %rsp,%rdi 400466: 48 8d 35 13 0a 20 00 lea 0x200a13(%rip),%rsi # 600e80 <_DYNAMIC> 40046d: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp 400471: e8 00 00 00 00 callq 400476 <_start_c> 0000000000400476 <_start_c>: 400476: 50 push %rax 400477: 8b 37 mov (%rdi),%esi 400479: 48 8d 57 08 lea 0x8(%rdi),%rdx 40047d: 48 8d 3d 1c 00 00 00 lea 0x1c(%rip),%rdi # 4004a0 <main> 400484: 48 8b 0d 65 0b 20 00 mov 0x200b65(%rip),%rcx # 600ff0 <_init> 40048b: 4c 8b 05 66 0b 20 00 mov 0x200b66(%rip),%r8 # 600ff8 <_fini> 400492: 45 31 c9 xor %r9d,%r9d 400495: e8 b6 ff ff ff callq 400450 <__libc_start_main@plt> 40049a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 00000000004004a0 <main>: 4004a0: 55 push %rbp 4004a1: 48 89 e5 mov %rsp,%rbp 4004a4: 48 83 ec 10 sub $0x10,%rsp 4004a8: 48 bf c6 04 40 00 00 movabs $0x4004c6,%rdi 4004af: 00 00 00 4004b2: b0 00 mov $0x0,%al 4004b4: e8 87 ff ff ff callq 400440 <printf@plt> 4004b9: 31 c9 xor %ecx,%ecx 4004bb: 89 45 fc mov %eax,-0x4(%rbp) 4004be: 89 c8 mov %ecx,%eax 4004c0: 48 83 c4 10 add $0x10,%rsp 4004c4: 5d pop %rbp 4004c5: c3 retq [test@main ~]$
Notice that the code at main
has been relocated to address 4004a0
and edited by the linker at offset 4004a8
and 4004b4
. In this example, hello has been linked to use the standard C library as a shared library. A shared library is a library that is loaded along with the program at run time by the dynamic linker. That’s why the code in the .plt
section exists. If the program had been linked with the -static option, hello would have been created as a statically linked file, In that case the ecc-objdump output would have been much larger as it would contain the code for the printf
function and any functions that printf
called.
Other interesting code in the output is at the beginning of the .text
section with the symbol _start
. This is called the startup code. The startup code changes depending on the environment in which the program will be run. In this case, the startup code is pulled in from the file Scrt1.o, which is the startup code for a shared library program on Linux. It was specified on the linker command line above. In addition the -e _start option given on the link line tells the linker to mark the _start
symbol as the program entry point.
A simplified description of what the startup code is doing here is that it makes sure the stack pointer is aligned and calls __libc_start_main
to initialize the C library, execute any constructors, and finally call main
.