This avoids unnecessarily copying the canary when doing a realloc from a
small size to a large size. It also avoids trying to copy a non-existent
canary out of a zero-size allocation, which are memory protected.
This avoids making a huge number of getrandom system calls during
initialization. The init CSPRNG is unmapped before initialization
finishes and these are still reseeded from the OS. The purpose of the
independent CSPRNGs is simply to avoid the massive performance hit of
synchronization and there's no harm in doing it this way.
Keeping around the init CSPRNG and reseeding from it would defeat the
purpose of reseeding, and it isn't a measurable performance issue since
it can just be tuned to reseed less often.
The tiny performance cost might as well be accepted now because this
will be needed for Android Q. It's also quite possible that some apps
make use of the features based on this including malloc_info.
The initial implementation was a temporary hack rather than a serious
implementation of random arena selection. It may still make sense to
offer it but it should be implemented via the CSPRNG instead of this
silly hack. It would also make sense to offer dynamic load balancing,
particularly with sched_getcpu().
This results in a much more predictable spread across arenas. This is
one place where randomization probably isn't a great idea because it
makes the benefits of arenas unpredictable in programs not creating a
massive number of threads. The security benefits of randomization for
this are also quite small. It's not certain that randomization is even a
net win for security since it's not random enough and can result in a
more interesting mix of threads in the same arena for an attacker if
they're able to attempt multiple attacks.
This wouldn't be worth using even if it had a whole bunch of heuristics
like ignoring expressions in static_assert, ignoring repeated patterns
like assigning different things to sequential array indexes, etc.
This further aligns the code style with the rest of the project and
fixes the clang-tidy readability-isolate-declaration lint triggered by
declaring all of these variables together.
This extends the size class scheme used for slab allocations to large
allocations. This drastically improves performance for many real world
programs using incremental realloc growth instead of using proper growth
factors. There are 4 size classes for every doubling in size, resulting
in a worst case of ~20% extra virtual memory being reserved and a huge
increase in performance for pathological cases. For example, growing
from 4MiB to 8MiB by calling realloc in increments of 32 bytes will only
need to do work beyond looking up the size 4 times instead of 1024 times
with 4096 byte granularity.