On a pure allocation pattern, the old one started to take a bad
performance hit on the search that precedes an sbrk, because of the
wilderness preservation problem -- we search through every block in
the free list before the sbrk, even though none of them will satisfy
the request, since they're all probably little chunks left over from
the last sbrk chunk. So our performance reduces to the same as that of
the 4.1 malloc.  Yech. If we were to preserve the wilderness, we
wouldn't have these little chunks. Good enough reason to reverse the
philosophy and cut blocks from the start, not the end. To minimize
pointer munging, means we have to make the block pointer p point to
the end of the block (i.e the end tag) NEXT becomes (p-1)->next, PREV
becomes (p-2)->prev, and the start tag becomes (p + 1 -
p->size)->size. The free logic is reversed -- preceding block merge
needs pointer shuffle, following block merge doesn't require pointer
shuffle. (has the additional advantage that realloc is more likely to
grow into a free area) Did this -- malloc, free stayed as complex/big,
but realloc and memalign simplified a fair bit. Now much faster on
pure allocation pattern, as well as any pattern where allocation
dominates, since fragmentation is less. Also much less wastage.

Also trimmed down the search loop in malloc.c to make it much smaller
and simpler, also a mite faster.
