Primary memory or Linux is not considered as a single block by Linux Kernel. Instead it is fragmented or may be a better word here would be divided or differentiated into different zones inside nodes. Zones can vary depending on the architecture. We will be considering only 64 bit for learning purposes. Nodes are how the NUMA (Non-Unified Memory Access) is implemented in Linux. For quick understanding, there will be one node per CPU. And in turn, each node is further divided into zones.
Some of the zones are:
DMA – Exists for historical reasons. This zone cannot address anything beyond the first 16 MiB of RAM.
DMA32 – Exists only in 64 bit Linux and is there because of the way large memory 64 bit machine architecture and its hardware does the memory zones. Remember the 4 GB limitations for a non-PAE kernel? This is the reason. The maximum addressable space for any 32-bit OS is 4 GB.
Normal – Whatever RAM which exceeds the initial 4 GB RAM goes to this and almost all of kernel operations uses memory from this zone.
Zones are allotted in the preferential order of Normal > DMA32 > DMA. If Normal has free memory, kernel takes it from Normal zone, and if there isnʼt any free ones, it will take from DMA32.
Quite a lot of the details regarding system’s nodes, zones, and the state of their memory can be found in proc files /proc/pagetypeinfo, /proc/zoneinfo, /proc/<pid>/numa_maps, and /proc/buddyinfo. Each requires an explanation on its own which can be found at https://www.kernel.org/doc/Documentation/filesystems/proc.txt /proc/buddyinfo says it has “Kernel memory allocator information”. But what is a kernel memory allocator or in other words who is a buddy and what is a buddy allocator? This file is used primarily for diagnosing memory fragmentation issues. That leaves yet another question.
What is memory fragmentation and why is it an issue ?
When a Linux system has been running for a while without reboot, and more it keeps allocating and de-allocating pages, the quicker the memory becomes fragmented. And the kernel may not always be able to defragment enough memory for a requested size on time. If that happens, applications may not be able to allocate larger contiguous chunks of memory even though there is enough free memory available. This is what is called external memory fragmentation and /proc/buddyinfo file will allow you to view the current fragmentation state of your memory, as below. Without proper memory allocation, system wonʼt be able to launch new processes, especially those which require high memory allocation.
Interpreting output of a /proc/buddyinfo
Letʼs make it simple. For detailed. explanation, read the next paragraph, which is taken from a referenced page. If you see a lot of numbers in right side of buddyinfo – then no problem with page cache. If there are just just zero values on most of right commands – then compact, so that higher memory sized are moved to those at a single merged order.
Using the buddy algorithm, each column of numbers represents the number of pages of that order which are available.
The kernel’s basic unit of allocatable memory is the 4 KByte page (many stats are reported by page count, instead of memory size in Kbytes). The kernel also keeps track of larger contiguous blocks of pages because sometimes kernel code wants, say, a contiguous 64 kbyte block of memory. /proc/buddyinfo shows you how many such free chunks there are for each allocation ‘order’. The ‘order’ is 2^order pages, ie order 0 is a single page, order 1 is 2 pages (8 KB), order 2 is 4 pages (16 Kb), and so on. So when /proc/buddyinfo reports, for example:
Node 0, zone DMA32 7 20 2 4 6 4 3 4 6 5 369 # cat /proc/buddyinfo Node 0, zone DMA a 4 3 4 3 2 1 0 1 1 2 Node 0, zone Normal 1046 527 128 36 17 5 26 40 13 16 94
This means that in the DMA32 zone on this machine there are currently a free solo 4kb pages, 4 8kb two-page chunks, 3 16kb chunks, and so on, all the way up to 2 1024-page (4 Mbyte) chunks. The DMA32 and Normal zone on this machine is in pretty good shape.
In fact, having a disproportionate number of order 0 pages free is generally a danger sign since order 0 pages exist only when the kernel can’t merge them together to form higher-order free chunks. Lots of order 0 pages thus mean lots of fragmentation, where the kernel can’t even find two adjacent aligned pages to merge into an 8 kb order 1 chunk.
How to defragment the memory ?
1. Depending on your luck (if it didn’t crash the kernel), when you echo 1 to the file, all zones are compacted such that free memory is available in contiguous blocks as much as possible.
echo 1 > /proc/sys/vm/compact_memory
OR
sysctl vm.compact_memory=1
2. It can also be triggered on a per-node basis by writing any value to /sys/devices/system/node/nodeN/compact where N is the node ID to be compacted.
3. You can also have lower compaction value by setting the variable at /sys/kernel/debug/extfrag/extfrag_threshold and extfrag_index would tell you the fragmentation index. Values tending towards 0 imply allocations would fail due to lack of memory and values towards 1000 imply failures are due to fragmentation.
Being said all these, it is important to note that applications have access to virtual memory only and the above article is meant for kernel memory allocation.