forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
percpu: update the header comment and pcpu_build_alloc_info comments
The header comment for percpu memory is a little hard to parse and is not super clear about how the first chunk is managed. This adds a little more clarity to the situation. There is also quite a bit of tricky logic in the pcpu_build_alloc_info. This adds a restructure of a comment to add a little more information. Unfortunately, you will still have to piece together a handful of other comments too, but should help direct you to the meaningful comments. Signed-off-by: Dennis Zhou <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
- Loading branch information
1 parent
6b9b6f3
commit 9c01516
Showing
1 changed file
with
32 additions
and
26 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,36 +4,35 @@ | |
* Copyright (C) 2009 SUSE Linux Products GmbH | ||
* Copyright (C) 2009 Tejun Heo <[email protected]> | ||
* | ||
* This file is released under the GPLv2. | ||
* This file is released under the GPLv2 license. | ||
* | ||
* This is percpu allocator which can handle both static and dynamic | ||
* areas. Percpu areas are allocated in chunks. Each chunk is | ||
* consisted of boot-time determined number of units and the first | ||
* chunk is used for static percpu variables in the kernel image | ||
* (special boot time alloc/init handling necessary as these areas | ||
* need to be brought up before allocation services are running). | ||
* Unit grows as necessary and all units grow or shrink in unison. | ||
* When a chunk is filled up, another chunk is allocated. | ||
* The percpu allocator handles both static and dynamic areas. Percpu | ||
* areas are allocated in chunks which are divided into units. There is | ||
* a 1-to-1 mapping for units to possible cpus. These units are grouped | ||
* based on NUMA properties of the machine. | ||
* | ||
* c0 c1 c2 | ||
* ------------------- ------------------- ------------ | ||
* | u0 | u1 | u2 | u3 | | u0 | u1 | u2 | u3 | | u0 | u1 | u | ||
* ------------------- ...... ------------------- .... ------------ | ||
* | ||
* Allocation is done in offset-size areas of single unit space. Ie, | ||
* an area of 512 bytes at 6k in c1 occupies 512 bytes at 6k of c1:u0, | ||
* c1:u1, c1:u2 and c1:u3. On UMA, units corresponds directly to | ||
* cpus. On NUMA, the mapping can be non-linear and even sparse. | ||
* Percpu access can be done by configuring percpu base registers | ||
* according to cpu to unit mapping and pcpu_unit_size. | ||
* | ||
* There are usually many small percpu allocations many of them being | ||
* as small as 4 bytes. The allocator organizes chunks into lists | ||
* according to free size and tries to allocate from the fullest one. | ||
* Each chunk keeps the maximum contiguous area size hint which is | ||
* guaranteed to be equal to or larger than the maximum contiguous | ||
* area in the chunk. This helps the allocator not to iterate the | ||
* chunk maps unnecessarily. | ||
* Allocation is done by offsets into a unit's address space. Ie., an | ||
* area of 512 bytes at 6k in c1 occupies 512 bytes at 6k in c1:u0, | ||
* c1:u1, c1:u2, etc. On NUMA machines, the mapping may be non-linear | ||
* and even sparse. Access is handled by configuring percpu base | ||
* registers according to the cpu to unit mappings and offsetting the | ||
* base address using pcpu_unit_size. | ||
* | ||
* There is special consideration for the first chunk which must handle | ||
* the static percpu variables in the kernel image as allocation services | ||
* are not online yet. In short, the first chunk is structure like so: | ||
* | ||
* <Static | [Reserved] | Dynamic> | ||
* | ||
* The static data is copied from the original section managed by the | ||
* linker. The reserved section, if non-zero, primarily manages static | ||
* percpu variables from kernel modules. Finally, the dynamic section | ||
* takes care of normal allocations. | ||
* | ||
* Allocation state in each chunk is kept using an array of integers | ||
* on chunk->map. A positive value in the map represents a free | ||
|
@@ -43,6 +42,12 @@ | |
* Chunks can be determined from the address using the index field | ||
* in the page struct. The index field contains a pointer to the chunk. | ||
* | ||
* These chunks are organized into lists according to free_size and | ||
* tries to allocate from the fullest chunk first. Each chunk maintains | ||
* a maximum contiguous area size hint which is guaranteed to be equal | ||
* to or larger than the maximum contiguous area in the chunk. This | ||
* helps prevent the allocator from iterating over chunks unnecessarily. | ||
* | ||
* To use this allocator, arch code should do the following: | ||
* | ||
* - define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr() to translate | ||
|
@@ -1842,6 +1847,7 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info( | |
*/ | ||
min_unit_size = max_t(size_t, size_sum, PCPU_MIN_UNIT_SIZE); | ||
|
||
/* determine the maximum # of units that can fit in an allocation */ | ||
alloc_size = roundup(min_unit_size, atom_size); | ||
upa = alloc_size / min_unit_size; | ||
while (alloc_size % upa || (offset_in_page(alloc_size / upa))) | ||
|
@@ -1868,9 +1874,9 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info( | |
} | ||
|
||
/* | ||
* Expand unit size until address space usage goes over 75% | ||
* and then as much as possible without using more address | ||
* space. | ||
* Wasted space is caused by a ratio imbalance of upa to group_cnt. | ||
* Expand the unit_size until we use >= 75% of the units allocated. | ||
* Related to atom_size, which could be much larger than the unit_size. | ||
*/ | ||
last_allocs = INT_MAX; | ||
for (upa = max_upa; upa; upa--) { | ||
|