Skip to content

Commit

Permalink
doc: Add clarifications about allocations, and when chap is applicable
Browse files Browse the repository at this point in the history
  • Loading branch information
timboddy committed Jun 4, 2018
1 parent f8fb2d4 commit 1682649
Showing 1 changed file with 19 additions and 12 deletions.
31 changes: 19 additions & 12 deletions USERGUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,16 @@ At present this has only been tested on Linux, with the `chap` binary built for
At the time of this writing, the only process image file formats supported by `chap` are little-endian 32 bit ELF cores and little-endian 64 bit ELF cores, both of which are expected to be complete. Run `chap` without any arguments to get a current list of supported process image file formats.

### Supported Memory Allocators
At present the only memory allocator for which `chap` will be able to find allocations in the process image is the version of malloc used by glibc on Linux.
At present the only memory allocator for which `chap` will be able to find allocations in the process image is the version of malloc used by glibc on Linux. Even lacking support for jemalloc or tcmalloc there are many processes for which `chap` is useful, because many processes use glibc malloc, including many processes that are mostly using C++, C, Python, or Java code. The relevance in the case of Python or Java is mostly related to the use of native libraries.

A quick way to determine whether `chap` is likely to be useful for your process is to gather a core (for example, using gcore) then open chap and use **count allocations**. If the count is non-zero, `chap` is applicable.

```
-bash-4.1$ chap core.33190
chap> count allocations
734 allocations use 0x108900 (1,083,648) bytes.
chap>
```

### How to Start and Stop `chap`
Start `chap` from the command line, with the core file path as the only argument. Commands will be read by `chap` from standard input, typically one command per line. Interactive use is terminated by typing ctrl-d to terminate standard input.
Expand All @@ -55,27 +64,24 @@ To get a list of the commands, type "help<enter>" from the `chap` prompt. Doing
### General Concepts
Most of the commands in `chap` operate on sets. Normally, for any set that is understood by chap one can count it (find out how many there are and possibly get some aggregate value such as the total size), summarize it (provide summary information about the set), list it (give simple information about each member such as address and size), enumerate it (give an identifier, such as an address, for each member) and show it (list each member and dump the contents). See the help command in chap for a current list of verbs (count, list, show ...) and for which sets are supported for a given verb.

Most of the sets that one can identify with `chap` are related to **allocations**, which roughly correspond to memory ranges made available by memory allocation functions, such as malloc, in response to requests. Allocations are considered **used** or **free**, where **used** allocations are ones that have not been freed since they last were made available by the allocator. One can run any of the commands listed above (count, list ...) on **used**, the set of used allocations, **free**, the set of free allocations, or **allocations**, which includes all of them. If a given type is recognizable by a **signature**, as described in [Allocation Signatures] or by a **pattern**, as described in [Allocation Patterns](#allocation-patterns) Types], one can further restrict any given set to contain only instances of that type. A very small set that is sometimes of interest is "allocation *address*" which is non-empty only if there is an allocation that contains the given address. Any specified allocation set can also be restricted in various other ways, such as constraining the size. Use the help command, for example, "help count used", for details.
Most of the sets that one can identify with `chap` are related to **allocations**, which roughly correspond to memory ranges made available by memory allocation functions, such as malloc, in response to requests. Allocations are considered **used** or **free**, where **used** allocations are ones that have not been freed since they last were made available by the allocator. One can run any of the commands listed above (count, list ...) on **used**, the set of used allocations, **free**, the set of free allocations, or **allocations**, which includes all of them. If a given type is recognizable by a [signature](#allocation-signatures) or by a [pattern](#allocation-patterns), one can further restrict any given set to contain only instances of that type. A very small set that is sometimes of interest is "allocation *address*" which is non-empty only if there is an allocation that contains the given address. Any specified allocation set can also be restricted in various other ways, such as constraining the size. Use the help command, for example, **help count used**, for details.

Other interesting sets available in `chap` are related to how various allocations are referenced. For now this document will not provide a through discussion of references to allocations but will briefly address how `chap` understands such references to allocations. From the perspective of `chap` a reference to an allocation is a pointer-sized value, either in a register or at a pointer-aligned location in memory, that points somewhere within the allocation. Note that under these rules, `chap` currently often identifies things as references that really aren't, for example, because the given register or memory region is not really currently live. It is also possible for certain programs, for example ones that put pointers in misaligned places such as in fields of packed structures, but this in general is easy to fix by constraining programs not to do that. Given an address within an allocation one can look at the **outgoing** allocations (meaning the used allocations referenced by the specified allocation) or the **incoming** allocations (meaning the allocations that reference the specified allocation). Use the help command, for example, "help list incoming" or "help show exactincoming", or "help summarize outgoing" for details of some of the information one can gather about references to allocations.
Other interesting sets available in `chap` are related to how various allocations are referenced. For now this document will not provide a thorough discussion of references to allocations but will briefly address how `chap` understands such references to allocations. From the perspective of `chap` a reference to an allocation is a pointer-sized value, either in a register or at a pointer-aligned location in memory, that points somewhere within the allocation. Note that under these rules, `chap` currently often identifies things as references that really aren't, for example, because the given register or memory region is not really currently live. It is also possible for certain programs, for example ones that put pointers in misaligned places such as in fields of packed structures, but this in general is easy to fix by constraining programs not to do that. Given an address within an allocation one can look at the **outgoing** allocations (meaning the used allocations referenced by the specified allocation) or the **incoming** allocations (meaning the allocations that reference the specified allocation). Use the help command, for example, **help list incoming** or **help show exactincoming**, or **help summarize outgoing** for details of some of the information one can gather about references to allocations.

References from outside of dynamically allocated memory (for example, from the stack or registers for a thread or from statically allocated memory) are of interest because they help clarify how a given allocation is used. A used allocation that is directly referenced from outside of dynamically allocated memory is considered to be an **anchor point**, and the reference itself is considered to be an **anchor**. Any **anchor point** or any used allocation referenced by that **anchor point** is considered to be **anchored**, as are any used allocations referenced by any **anchored** allocations. A **used allocation** that is not **anchored** is considered to be **leaked**. A **leaked** allocation that is not referenced by any other **leaked** allocation is considered to be **unreferenced**. Try "help count leaked" or "help summarize unreferenced" for some examples.
References from outside of dynamically allocated memory (for example, from the stack or registers for a thread or from statically allocated memory) are of interest because they help clarify how a given allocation is used. A used allocation that is directly referenced from outside of dynamically allocated memory is considered to be an **anchor point**, and the reference itself is considered to be an **anchor**. Any **anchor point** or any used allocation referenced by that **anchor point** is considered to be **anchored**, as are any used allocations referenced by any **anchored** allocations. A **used allocation** that is not **anchored** is considered to be **leaked**. A **leaked** allocation that is not referenced by any other **leaked** allocation is considered to be **unreferenced**. Try **help count leaked** or **help summarize unreferenced** for some examples.

Many of the remaining commands are related to redirection of output (try "help redirect") or input (try "help source") or related to trying to reduce the number of commands needed to traverse the graph (try "help enumerate chain"). This will be documented better some time in the next few weeks. If there is something that you need to understand sooner than that, and the needed information happens not to be available from the help command within chap, feel free to file an issue stating what you would like to be addressed in the documentation.
Many of the remaining commands are related to redirection of output (try **help redirect**) or input (try **help source**) or related to trying to reduce the number of commands needed to traverse the graph (try **help enumerate chain**). This is being documented rather gradually. If there is something that you need to understand sooner than that, and the needed information happens not to be available from the help command within chap, feel free to file an issue stating what you would like to be addressed in the documentation.

## Allocations

An *allocation*, from the perspective of `chap` is a contiguous region of virtual memory that was made available to the caller by an allocation function or is currently reserved as writable memory by the process for that purpose. At present the only allocations recognized by chap are those associated with libc malloc, and so made available to the caller by malloc(), calloc() or realloc() and freed by free() or realloc(). At present, regions of memory made available by other means, such as direct use of mmap(), are not considered allocations.
An **allocation**, from the perspective of `chap` is a contiguous region of virtual memory that was made available to the caller by an allocation function or is currently reserved as writable memory by the process for that purpose. At present the only allocations recognized by chap are those associated with libc malloc, and so made available to the caller by malloc(), calloc() or realloc() and freed by free() or realloc(). At present, regions of memory made available by other means, such as direct use of mmap(), are not considered allocations.


### Used Allocations
A *used allocation* is an *allocation* that was never given back to the allocator. From the perspective of `chap`, this explicitly excludes regions of memory that are used for book-keeping about the allocation but does include the region starting from the address returned by the caller and including the full contiguous region that the caller might reasonably modify. This region may be larger than the size requested at allocation, because the allocation function is always free to return more bytes than were requested.

We can show all the used allocations from `chap`:

A **used allocation** is an **allocation** that was never given back to the allocator. From the perspective of `chap`, this explicitly excludes regions of memory that are used for book-keeping about the allocation but does include the region starting from the address returned by the caller and including the full contiguous region that the caller might reasonably modify. This region may be larger than the size requested at allocation, because the allocation function is always free to return more bytes than were requested.

### Free Allocations
A *free allocation* is a range of writable memory that can be used to satisfy allocation requests. It is worthwhile to understand these regions because typically memory is requested from the operating system in multiples of 4K pages, which are subdivided into allocations. It is more common than not that when an allocation gets freed, it just gets given back to the allocator but the larger region containing that allocation just freed cannot yet be returned to the operating system.
A **free allocation** is a range of writable memory that can be used to satisfy allocation requests. It is worthwhile to understand these regions because typically memory is requested from the operating system in multiples of 4K pages, which are subdivided into allocations. It is more common than not that when an allocation gets freed, it just gets given back to the allocator but the larger region containing that allocation just freed cannot yet be returned to the operating system.

### An Example of Used and Free Allocations

Expand Down Expand Up @@ -116,6 +122,7 @@ Used allocation at 601010 of size 18

## References

Aside, from allocations, the most important thing to understand about ```chap``` is what it considers to be a **reference**. Basically a **reference** is any contiguous region of memory or
### Real References

### False References
Expand Down Expand Up @@ -267,7 +274,7 @@ In such a situation, favoring the last arena used by a given thread can become a

Suppose that same operation later happens on another thread that, based on the last allocation done by that thread attempts and succeeds to use a different arena throughout the course of that operation. Potentially that thread can cause the process to grow by 40MB to satisfy requests on this other arena, even though the arena used the previous time the operation occurred already has 40MB free. I am oversimplifying a bit but one can see that if this should eventually happen on all 20 of the arenas, the overhead in free allocations might approach 20 * 40MB = 800 MB.

There is no actual leak associated with the above, but the process memory size can be much larger than one might see without (4) being true, and an observer gets the false impression of unbounded growth because the way in which an arena is selected makes it possible that it may take a very long time before the piggish operation in question uses any particular arena. Each time the piggish operation happens on an arena where it had never happened, that arena grows and so the process grows.
There is no actual leak associated with the above, but the process memory size can be much larger than one might see without the libc characteristic of preferring to use the arena must recently used by the current thread, and an observer gets the false impression of unbounded growth because the way in which an arena is selected makes it possible that it may take a very long time before the piggish operation in question uses any particular arena. Each time the piggish operation happens on an arena where it had never happened, that arena grows and so the process grows. The **describe allocation** command can add insight into this situation because it allows one to easily spot a discrepancy in the sizes of the arenas or in the number of bytes used by free allocations associated with each arena.

TODO: Provide examples of the specific case where we can find and eliminate the piggish operation (one finding it by looking at free allocations and one gathering a core at the point that the arena grows).

Expand Down

0 comments on commit 1682649

Please sign in to comment.