| JEMALLOC(3) | Library Functions Manual | JEMALLOC(3) |
jemalloc,
malloc.conf — the default
system allocator
Standard C Library (libc, -lc)
const char * _malloc_options;
The jemalloc is a general-purpose
concurrent malloc(3)
implementation specifically designed to be scalable on modern
multi-processor systems. It is the default user space system allocator in
NetBSD.
When the first call is made to one of the memory
allocation routines such as
malloc() or
realloc(),
various flags that affect the workings of the allocator are set or reset.
These are described below.
The “name” of the file referenced by the symbolic
link named /etc/malloc.conf, the value of the
environment variable MALLOC_OPTIONS, and the string
pointed to by the global variable _malloc_options will
be interpreted, in that order, character by character as flags.
Most flags are single letters. Uppercase letters indicate that the behavior is set, or on, and lowercase letters mean that the behavior is not set, or off. The following options are available.
madvise()
system call.malloc(),
realloc() will be initialized to 0xa5. All memory
returned by
free(),
realloc() will be initialized to 0x5a. This is
intended for debugging and will impact performance negatively.NULL pointer instead of a valid pointer. (The
default behavior is to make a minimal allocation and return a pointer to
it.) This option is provided for System V compatibility. This option is
incompatible with the X option.stderr and cause the program
to drop core (using
abort(3)). This option should
be set at compile time by including the following in the source code:
_malloc_options = "X";
malloc(),
realloc() will be initialized to 0. Note that this
initialization only happens once for each byte, so
realloc() does not zero memory that was previously
allocated. This is intended for debugging and will impact performance
negatively.Extra care should be taken when enabling any of the options in production environments. The A, J, and Z options are intended for testing and debugging. An application which changes its behavior when these options are used is flawed.
The jemalloc allocator uses multiple
arenas in order to reduce lock contention for threaded programs on
multi-processor systems. This works well with regard to threading
scalability, but incurs some costs. There is a small fixed per-arena
overhead, and additionally, arenas manage memory completely independently of
each other, which means a small fixed increase in overall memory
fragmentation. These overheads are not generally an issue, given the number
of arenas normally used. Note that using substantially more arenas than the
default is not likely to improve performance, mainly due to reduced cache
performance. However, it may make sense to reduce the number of arenas if an
application does not make much use of the allocation functions.
Memory is conceptually broken into equal-sized chunks, where the chunk size is a power of two that is greater than the page size. Chunks are always aligned to multiples of the chunk size. This alignment makes it possible to find metadata for user objects very quickly.
User objects are broken into three categories according to size:
Small and large objects are managed by arenas; huge objects are managed separately in a single data structure that is shared by all threads. Huge objects are used by applications infrequently enough that this single data structure is not a scalability issue.
Each chunk that is managed by an arena tracks its contents in a page map as runs of contiguous pages (unused, backing a set of small objects, or backing one large object). The combination of chunk alignment and chunk page maps makes it possible to determine all metadata regarding small and large allocations in constant time.
Small objects are managed in groups by page runs. Each run maintains a bitmap that tracks which regions are in use. Allocation requests can be grouped as follows.
Allocations are packed tightly together, which can be an issue for multi-threaded applications. If you need to assure that allocations do not suffer from cache line sharing, round your allocation requests up to the nearest multiple of the cache line size.
The first thing to do is to set the A option. This option forces a coredump (if possible) at the first sign of trouble, rather than the normal policy of trying to continue if at all possible.
It is probably also a good idea to recompile the program with suitable options and symbols for debugger support.
If the program starts to give unusual results, coredump or generally behave differently without emitting any of the messages mentioned in the next section, it is likely because it depends on the storage being filled with zero bytes. Try running it with the Z option set; if that improves the situation, this diagnosis has been confirmed. If the program still misbehaves, the likely problem is accessing memory outside the allocated area.
Alternatively, if the symptoms are not easy to reproduce, setting the J option may help provoke the problem. In truly difficult cases, the U option, if supported by the kernel, can provide a detailed trace of all calls made to these functions.
Unfortunately, jemalloc does not provide
much detail about the problems it detects; the performance impact for
storing such information would be prohibitive. There are a number of
allocator implementations available on the Internet which focus on detecting
and pinpointing problems by trading performance for extra sanity checks and
detailed diagnostics.
The following environment variables affect the execution of the allocation functions:
MALLOC_OPTIONSMALLOC_OPTIONS is set,
the characters it contains will be interpreted as flags to the allocation
functions.To dump core whenever a problem occurs:
ln -s 'A' /etc/malloc.conf
To specify in the source that a program does no return value checking on calls to these functions:
_malloc_options = "X";
If any of the memory allocation/deallocation functions detect an
error or warning condition, a message will be printed to file descriptor
STDERR_FILENO. Errors will result in the process
dumping core. If the A option is set, all warnings are
treated as errors.
The _malloc_message variable allows the
programmer to override the function which emits the text strings forming the
errors and warnings if for some reason the stderr
file descriptor is not suitable for this. Please note that doing anything
which tries to allocate memory in this function is likely to result in a
crash or deadlock.
All messages are prefixed by
“⟨progname⟩:
(malloc)”.
emalloc(3), malloc(3), memory(3), memoryallocators(9)
Jason Evans, A Scalable Concurrent malloc(3) Implementation for FreeBSD, http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf, April 16, 2006, BSDCan 2006.
Poul-Henning Kamp, Malloc(3) revisited, Proceedings of the FREENIX Track: 1998 USENIX Annual Technical Conference, USENIX Association, http://www.usenix.org/publications/library/proceedings/usenix98/freenix/kamp.pdf, June 15-19, 1998.
Paul R. Wilson, Mark S. Johnstone, Michael Neely, and David Boles, Dynamic Storage Allocation: A Survey and Critical Review, University of Texas at Austin, ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps, 1995.
The jemalloc allocator became the default
system allocator first in FreeBSD 7.0 and then in
NetBSD 5.0. In both systems it replaced the older
so-called “phkmalloc” implementation.
Jason Evans <jasone@canonware.com>
| June 21, 2011 | NetBSD 11.0 |