Monthly Archives: February 2014

Interesting Linux Kernel Bug Uncovered by musl

ELLCC uses the very cool musl standard C library as a replacement for the normal Linux standard library. In the latest version of ELLCC, ELLCC was not able to compile itself on an x86_64 Fedora 20 Linux system. I was stumped for a while trying to track down the problem. It was weird: self hosting worked on a 32 bit Linux system (Fedora 19), but failed on a 64 bit system? Furthermore, self hosting only failed with ELLCC compiled with itself and linked with musl, but not with ELLCC compiled with itself and linked with glibc.

Fortunately, Rich Felker of musl fame shed some light on the issue. Here’s an edited IRC log:

04:34:00 PM - rdp: OK. malloc fails on my x86_64 linux after about 65527 4K allocations with musl malloc(). glibc malloc() doesn't, probably because it reverts to mmap() if brk fails. Yet I don't see any resource limits set. The gloibc brk() also failes after about 64K allocations.
04:37:52 PM - dalias: rdp, oh, we've seen this before
04:37:57 PM - dalias: it's a kernel bug with some optional kernel feature
04:38:21 PM - dalias: it keeps the kernel from merging adjacent vma's, so you end up with 64k pages each as their own tiny vma
04:38:36 PM - rdp: Excellent.
04:38:38 PM - dalias: it would happen if we used mmap too
04:38:50 PM - dalias: the reason it doesn't affect glibc is that they allocate huge amounts at a time
04:38:58 PM - rdp: Ah.
04:38:59 PM - dalias: and thereby waste memory if the program doesn't actually need much
04:39:17 PM - dalias: i'll try to find the option
04:39:21 PM - rdp: Any work around?
04:39:25 PM - rdp: OK. Thanks.
04:40:20 PM - dalias: CONFIG_MEM_SOFT_DIRTY
04:40:23 PM - dalias: turn it off
04:40:28 PM - dalias: there might be a way to do it at runtime
04:40:42 PM - dalias: or you could increase the limit on # of vma's
04:40:50 PM - dalias: but basically this option wastes MASSIVE amounts of ram
04:40:57 PM - dalias: by refusing to merge vma's
04:41:47 PM - dalias: it's a hack to make process checkpointing (save and restore running processes) more efficient
04:42:01 PM - dalias: by better tracking what has changed
04:42:50 PM - dalias: i don't see a way to turn it off
04:42:55 PM - dalias: check /proc/$pid/maps tho
04:43:08 PM - dalias: you should see a separate line for each page (i.e. 64k lines)
04:43:16 PM - dalias: if this is the issue that's affecting you
04:43:39 PM - rdp: I do.
04:43:51 PM - dalias: ok then this is the issue
04:43:57 PM - dalias: you can just up the limit if you want
04:44:04 PM - dalias: /proc/sys/vm/max_map_count
04:44:09 PM - dalias: but again this is expensive
04:44:16 PM - dalias: you want to disable CONFIG_MEM_SOFT_DIRTY
04:44:21 PM - dalias: and we really need to report this bug to the kernel folks                                                                             
04:44:25 PM - dalias: i don't think they're aware of it                       
04:45:11 PM - rdp: dalias: Thanks.                                            
04:46:47 PM - rdp: dalias: is it x86_64 specific? Not on i386?                
04:49:13 PM - dalias: rdp, i think it may be                                  
04:50:26 PM - dalias:
04:50:28 PM - feepbot: Analyzing cause of performance regression with different kernel version - Stack Overflow
04:51:38 PM - dalias: the accepted answer tracked down the cause of the soft_dirty bug and seems to cover how to fix it
04:53:02 PM - rdp: gotta love stackoverflow
05:47:19 PM - dalias: rdp, haha with regard to that SO answer:
05:47:26 PM - dalias: Finally fixed in Linux 3.13.3 and Linux 3.12.11, released 2014-02-13. – osgx 21 hours ago
05:57:32 PM - rdp: dalias: :-)
07:00:00 PM - dalias: rdp, i think it would be worth adding the issue you had to the faq on the wiki
07:00:51 PM - dalias: with a link to the stack overflow question/answer and information that it's fixed in 3.13.3, and that you can work around it by turning off CONFIG_MEM_SOFT_DIRTY (good fix) or increasing max_map_count (expensive fix)

For now, I got around the problem by using Rich’s expensive fix option (as superuser):

echo 128000 > /proc/sys/vm/max_map_count

Why didn’t ELLCC linked with glibc fail? Somebody considered it a bug at one point, but the glibc maintainers disagreed, I guess.
Rich pointed out that my guess about why the glibc malloc() doesn’t fail is probably wrong. But it is still a kernel bug nevertheless.