Tag Archives: Networking

Adding Networking Support to ELK: Part 2

In a previous post I described how I started to add LwIP networking support to ELK, along with some of the design decisions that guide ELK development. In this post, I’ll describe how socket related system calls are added to ELK and how they interface with the LwIP network stack and the rest of the ELK modules.

A minimal source file that allows LwIP to be brought in to ELK at link time looks like this:

#include "config.h"
#include "kernel.h"
#include "lwip/tcpip.h"

// Make LwIP networking a select-able feature.
FEATURE_CLASS(lwip_network, network)

// Define socket related system calls.
ELK_CONSTRUCTOR()
{
}

// Start up LwIP.
C_CONSTRUCTOR()
{
  tcpip_init(0, 0);
}

Almost all symbols in an ELK select-able module are static. The only external linkage symbols in this module are defined by the FEATURE_CLASS() macro. In this case, the symbols __elk_lwip_network and __elk_feature_network are defined. The first symbol is used to pull this module in at link time. The second symbol is used to cause an error if another (currently non-existent) network module is linked in at the same time.

ELK used two phases of constructor functions to preform system initialization. ELK_CONSTRUCTOR() functions are called at system start up before the C library is initialized. These functions are typically used to initialize system call definitions. C_CONSTRUCTOR() functions are normal constructors called after the C library has been initialized but before main() is called. No system calls have been defined in this example, but the LwIP initialization function is called in the C_CONSTRUCTOR() phase.

There are several system calls that are specific to sockets and networking. I’ll create stub functions for all of them now. Even though I’m concentrating on getting ELK running on an ARM target right now, I always build ELK for all targets. My first attempt to create a stub handler for accept4() failed to compile for the i386:

#include <sys/socket.h>

#include "config.h"
#include "kernel.h"
#include "syscalls.h"
#include "lwip/tcpip.h"
#include "crt1.h"

// Make LwIP networking a select-able feature.
FEATURE_CLASS(lwip_network, network)

static int sys_accept4(int sockfd, struct sockaddr *addr, socklen_t *addrlen)
{
  return -ENOSYS;
}

// Define socket related system calls.
ELK_CONSTRUCTOR()
{
  SYSCALL(accept4);
}

// Start up LwIP.
C_CONSTRUCTOR()
{
  tcpip_init(0, 0);
}

It turns out that the i386 socket calls (and perhaps other targets) all go through one system call called SYS_socketcall. When I modified my source file like this:

#include <sys/socket.h>

#include "config.h"
#include "kernel.h"
#include "syscalls.h"
#include "lwip/tcpip.h"
#include "crt1.h"

// Make LwIP networking a select-able feature.
FEATURE_CLASS(lwip_network, network)

static int sys_accept4(int sockfd, struct sockaddr *addr, socklen_t *addrlen)
{
  return -ENOSYS;
}

#ifdef SYS_socketcall
static int sys_socketcall(int call, unsigned long *args)
{
  long arg[6];

  // Get the call arguments.
  copyin(arg, args, sizeof(arg));

  switch (call) {
  case __SC_accept4:
    return sys_accept4(args[0], (struct sockaddr *)arg[1], (socklen_t *)arg[2]);

  default:
    return -ENOSYS;
  }
}
#endif

// Define socket related system calls.
ELK_CONSTRUCTOR()
{
#ifndef SYS_socketcall
  SYSCALL(accept4);
#else
  SYSCALL(socketcall);
#endif
}

// Start up LwIP.
C_CONSTRUCTOR()
{
  tcpip_init(0, 0);
}

I’ll follow a similar pattern for the rest of the system calls. The result, with all the socket functions stubbed in, is here. Note that another little wrinkle is that at least some of the Linux ports, in this case the x86_64 port, don’t have the recv() and send() system calls. They use recvfrom() and sendto() instead.

Now I’ll start adding meat to the empty system call framework. I’ll start with the socket() system call, since it is the only way to get a socket and several design decisions will need to be made about how to interface with the rest of ELK. The normal LwIP socket descriptors index into a static array of structures containing the state of any open sockets. This means that all threads that use sockets share a socket namespace and that that socket namespace is different from the normal ELK file descriptor namespace. That is the first thing that has to change. To do that, I have to integrate LwIP sockets into the ELK VFS (Virtual File System) module. That is, a LwIP socket, when created, should result in the creation of a socket vnode. Subsequently, all operations on the socket should go through the vnode. The beauty of this approach is that socket descriptors and file descriptors will share the same namespace and that other operations that should be legal on sockets, like read() and write() (and select() when it gets implemented), will just work on all types of file descriptors as long as the low level support exists in the vnode implementation.

As I was delving into the implementation of the socket system call handling code, I realized that LwIP only implements the AF_INET and AF_INET6 domains. Since I’d like to support other domains, especially AF_UNIX, I decided to split the LwIP interface code out of the socket system call handling code. If I can set up the interfaces correctly, this should also allow other networking stacks to be dropped in in place of LwIP. So now I have network.c with the generic system call handling code and lwip_network.c with the LwIP interface glue.

Now’s the time to add error handling code to the system call handlers. This is really a stalling tactic while I’m thinking about how to integrate sockets into the existing virtual file system framework, but I suspect I’ll get some insights as I flesh out the generic code. After some thought and feverish coding, I came up with something that feels reasonable. I implemented the getsockopt() and setsockopt() system calls, which started giving me a better idea of what I need for socket integration. I’ve started to implement unix_network.c, which will implement the AF_UNIX (AF_LOCAL) domain. It looks like most of the code for the socket interface will be in network.c, with callbacks to the individual domain handlers where the semantics differ between domains. One of the interesting parts of the evolving design is that support for the various domain handlers can be specified at link time and eventually will be available as loadable modules.

Now I have much of the socket infrastructure done. I can create a socket file and open it, and use the socket() and socketpair() system calls. setsocketopt() and getsocketopt() are implemented and can be used to manipulate information in a generic socket structure. Although socket files exist in the normal virtual file system namespace, I had to add support for vnodes that don’t live in the normal space for non-file sockets. All socket operations go through a vnode however and socket file descriptors and regular file descriptors are indistinguishable. My first little test program looks like this:

[~/ellcc/examples/socket] dev% cat main.c
/* Simple socket tests.                                                                             
 */                                                                                                 
#include <sys/socket.h>                                                                             
#include <sys/stat.h>                                                                               
#include <fcntl.h>                                                                                  
#include <unistd.h>                                                                                 
#include <stdio.h>                                                                                  
#include <stdlib.h>                                                                                 
#include <errno.h>                                                                                  
#include <string.h>                                                                                 
                                                                                                    
int main(int argc, char **argv)                                                                     
{                                                                                                   
  int sv[2];                                                                                        
  int s = socketpair(AF_UNIX, SOCK_STREAM, 0, sv);                                                  
  if (s < 0) {                                                                                      
    printf("socketpair() failed: %s\n", strerror(errno));                                           
    exit(1);                                                                                        
  }

  s = write(sv[0], "hello world\n", sizeof( "hello world\n"));
  if (s < 0) {
    printf("write() failed: %s\n", strerror(errno));
  }
  char buffer[100];
  s = read(sv[1], buffer, 1);
  if (s < 0) {
    printf("read() failed: %s\n", strerror(errno));
  }

  s = mknod("/socket", S_IFSOCK|S_IRWXU, 0);
  if (s < 0) {
    printf("mknod() failed: %s\n", strerror(errno));
  }

  int fd = open("/socket", O_RDWR);
  if (fd < 0) {
    printf("open() failed: %s\n", strerror(errno));
  }
  s = read(fd, buffer, 1);
  if (s < 0) {
    printf("read() failed: %s\n", strerror(errno));
  }
}

Here is the result of running the program:

[~/ellcc/examples/socket] dev% make run
Preprocessing elkconfig.cfg
Compiling main.c
Linking socket
Running socket
enter 'control-A x' to exit QEMU
audio: Could not init `oss' audio driver
write() failed: Protocol not supported
read() failed: Protocol not supported
read() failed: Protocol not supported

Not bad for a day's work. The errors on the read() and write() calls are expected because I haven't yet implement the read and write buffers for AF_UNIX sockets, but the fact that the error returned is EPROTONOSUPPORT shows that the socket infrastructure is working as it's supposed to.

I've taken a little break to think about how I'd like to implement socket buffers. I think I'd like a design that

  • Allocates buffers a page (usually 4K) at a time.
  • Is coded to be shared between the different socket domains.
  • Expand and contract as needed.
  • Is as simple as possible.

I'm thinking that one approach would be to use two arrays of page pointers, each empty initially. The empty arrays would reside in the socket structure and be limited in size by a kernel compile time constant, maybe 64 entries each. This would allow a maximum 262,144 bytes for each of the buffers given 4K pages. The size would be controlled by the send and receive buffer size socket options, so sockets could be configured to use less memory in a memory constrained system.

It's been a busy day. I've implemented the socketpair() and bind() for AF_UNIX sockets. The buffering scheme seems to be working well, but I have to thing a bit more about how and when the buffer size can be reduced. I'm currently implementing listen() so I can move on to connect() and accept(). The nice thing about concentrating on AF_UNIX sockets first is that it is giving me a good feel for what I'll need to implement AF_INET using the LwIP stack.

It turns out that listen() in interesting. It is supposed to set up a backlog queue of pending connections, but what should that look like? I can see what it might look like for AF_UNIX sockets, but it is unclear what it should look like for remote connections. I guess my plan of implementing AF_UNIX first needs a slight diversion: I'm going to switch back to LwIP and see what a connection queue looks like there to try to come up with a common solution.

It turns out that interfacing LwIP with the current interface is pretty easy. I've had to make one change to the LwIP sources so far. I had to make the type of the socket member of the netconn structure definable at compile time. I needed this because sockets need to be represented by their socket structure pointer, not by a simple integer socket descriptor since socket file descriptors share the same descriptor namespace in different processes. I changed the declaration of the socket member in lwip/api.h to

#if LWIP_SOCKET
  LWIP_SOCKET_TYPE(socket);
#endif /* LWIP_SOCKET */

and added this definition in lwip/opt.h:

#ifndef LWIP_SOCKET_TYPE
#define LWIP_SOCKET_TYPE(name)          int name
#endif

In my lwipopts.h file I overrode the definition with

#define LWIP_SOCKET_TYPE(name) struct { int name; void *priv; }

This anonymous structure replaces the previous definition of "int socket;" with both an int and a pointer. I use the pointer to keep the higher level socket pointer for lwip_interface.c.

I've finished the first phase of LwIP integration. A few simple tests work, like in examples/socket/main.c. Next comes more extensive testing. I'm pretty happy with the way both AF_UNIX and AF_INET handling is implemented. It looks like it will be easy to drop in different stacks for these protocols or for other protocols as necessary.

Part 3 of this little saga will be about adding an Ethernet driver, so I can talk to something beyond 127.0.0.1.