-
Notifications
You must be signed in to change notification settings - Fork 440
american fuzzy lop (copy of the source code for easy access)
mirrorer/afl
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
================== american fuzzy lop ================== Written and maintained by Michal Zalewski <[email protected]> Copyright 2013, 2014 Google Inc. All rights reserved. Released under terms and conditions of Apache License, Version 2.0. For new versions and additional information, check out: http://lcamtuf.coredump.cx/afl/ 1) Challenges of guided fuzzing ------------------------------- Fuzzing is one of the most powerful strategies for identifying security issues in real-world software. Unfortunately, it also offers fairly shallow coverage, because many of the mutations needed to reach new code paths are exceedingly unlikely to be hit purely by chance. There have been numerous attempts to solve this problem by augmenting the process with additional information about the behavior of the tested code. These techniques can be divided into three broad groups: - Simple coverage maximization. This approach boils down to trying to find initial test cases that offer diverse code coverage in the targeted application - and then fuzzing that corpus using conventional strategies. - Dynamic control flow analysis. A more sophisticated technique that leverages instrumented binaries and taint tracking to identify mutations that will hopefully trigger new internal states within the tested program. - Static analysis / symbolic execution. Uses mathematical models to reason about the relationship between inputs and program states before actually running the code. The first technique is surprisingly powerful when used to pre-select initial test cases from a massive corpus of valid data. Unfortunately, such corpora are not always available. On top of this, coverage measurements provide only a very simplistic view of the internal state of the program, making them less suited for guiding the fuzzing process later on. The latter two techniques are extremely promising in experimental settings, but in real-world applications, they frequently suffer from reliability problems or irreducible complexity. Most of the high-value targets have enough internal states and possible execution paths to make such tools fall apart and perform strictly worse than their traditional counterparts. 2) The afl-fuzz approach ------------------------ American Fuzzy Lop is a brute-force fuzzer coupled with an exceedingly simple but rock-solid instrumentation-guided genetic algorithm. It uses an enhanced form of edge coverage to easily detect subtle, local-scale changes to program control flow, without being bogged down by complex comparisons between multiple long-winded execution paths. The overall algorithm can be summed up as: 1) Load user-supplied initial test cases into the queue, 2) Take next input file from the queue, 3) Attempt to trim the test case to the smallest size that doesn't alter the observed behavior of the program, 4) Repeatedly mutate the file using a balanced and well-researched variety of traditional fuzzing strategies, 5) If any of the generated mutations resulted in a new state transition recorded by the instrumentation, add mutated output as a new entry in the queue. 6) Go to 2. The discovered test cases are also periodically culled to eliminate ones that have been made obsolete by newer, higher-coverage finds. The strategies mentioned in step 4 are fairly straightforward, but go well beyond tools such as zzuf and honggfuzz and lead to additional finds; this is discussed in more detail at http://goo.gl/SoZJ47. As a side result of the fuzzing process, the tool creates a small, self-contained corpus of interesting test cases. These are extremely useful for seeding other, labor- or resource-intensive testing regimes - for example, for stress-testing browsers, office applications, or graphics suites. The fuzzer is thoroughly tested to deliver coverage far superior to blind fuzzing or coverage-only tools without the need to dial in any settings or adjust any knobs. 3) Instrumenting programs for use with AFL ------------------------------------------ Instrumentation is injected by a companion tool that works as a drop-in replacement for gcc or clang in any standard build process for third-party code. The instrumentation has a fairly modest performance impact; in conjunction with other optimizations implemented by afl-fuzz, most programs can be fuzzed as fast or even faster than possible with traditional tools. The correct way to recompile the target program will vary depending on the specifics of the build process, but a common approach may be: $ CC=/path/to/afl/afl-gcc ./configure $ make clean all For C++ programs, you will want: $ CXX=/path/to/afl/afl-g++ ./configure The clang wrapper is used in a similar manner, by invoking afl-clang or afl-clang++. When testing libraries, it is essential to either link the tested executable against a static version of the instrumented library, or to set the right LD_LIBRARY_PATH. Usually, the simplest option is just: $ CC=/path/to/afl/afl-gcc ./configure --disable-shared Setting AFL_HARDEN=1 when calling 'make' will cause the CC wrapper to automatically enable code hardening options that make it easier to detect simple memory bugs. The cost of this is a <5% performance drop. Oh: when using ASAN, see the notes_for_asan.txt file for important caveats. 4) Choosing initial test cases ------------------------------ To operate correctly, the fuzzer requires one or more input file containing typical input normally processed by the targeted application. There are two basic rules: - Keep the files small. Under 1 kB is ideal, although not strictly necessary. For a discussion of why size *really* matters, see perf_tips.txt. - Use multiple test cases only if they are fundamentally different from each other. There is no point in using fifty different vacation photos to fuzz an image library. You can find quite a few good examples of starting files in the testcases/ subdirectory that comes with this tool. If a large corpus of data is available for screening, you may want to use the afl-showmap utility to compare instrumentation output and reject redundant files. See experimental/minimization_script/ for an example of how to implement this. 5) Fuzzing instrumented binaries -------------------------------- The fuzzing process itself is carried out by the afl-fuzz utility. The program requires an input directory containing one or more initial test cases, plus a path to the binary to test. For tested programs that accept input directly from stdin, the usual syntax may be: $ ./afl-fuzz -i input_dir -o output_dir /path/to/program [...params...] For programs that need to read input from a specific file, the appropriate path can be specified via the -f flag: $ ./afl-fuzz [...] -f testme.txt /path/to/program testme.txt It is possible to fuzz non-instrumented code using the -n flag. This gives you a fairly traditional fuzzer with a couple of nice testing strategies. You can use -t and -m to override the default timeout and memory limit for the executed process; this is seldom necessary, perhaps except for video decoders. The fuzzing process will continue until you press Ctrl-C. See the status_screen.txt file for information on how to interpret the status screen and monitor the health of the process. Tips for optimizing the performance of the process are discussed in perf_tips.txt. Note that the fuzzer starts by meticulously performing an array of deterministic fuzzing steps, which can take many hours or days. If you want more traditional behavior akin to zzuf or honggfuzz, use the -d option to skip that and get quick but less systematic and less in-depth results right away. 6) Interpreting output ---------------------- The fuzzer keeps going until aborted with Ctrl-C or killed with SIGINT or SIGTERM. The progress screen provides various useful stats. To understand what's going on, be sure to scan through the status_screen.txt file first. There are three subdirectories created within the output directory and updated in real time: - queue/ - test cases for every distinctive execution path, plus all the starting files given by the user. This is, in effect, the synthesized corpus mentioned in section 2. - hangs/ - unique test cases that cause the tested program to time out. Note that the default timeouts are fairly aggressive to keep things moving fast. - crashes/ - unique test cases that cause the tested program to receive a fatal signal (e.g., SIGSEGV, SIGILL, SIGABRT). The entries are grouped by the received signal. Crashes and hangs are considered "unique" if the associated execution paths involve any state transitions not seen in previously-recorded faults. If a single bug can be reached in a multitude of ways, the counts may end up getting inflated early on in the process, but this should quickly taper off. The file names for crashes and hangs should let you correlate them with the parent, non-faulting queue entries. This should help with debugging. The queue/ subdirectory can be also used to resume aborted jobs; simply do: $ ./afl-fuzz -i old_output_dir/queue -o new_output_dir [...etc...] Although the fuzzer does not perform any additional analysis of the discovered crashes, the coverage-based grouping usually makes it easy to triage new finds manually - or to examine them with a simple GDB script. One such script is provided in experimental/crash_triage/. 7) Parallelized fuzzing ----------------------- For tips on how to fuzz a common target on multiple cores or multiple networked machines, please refer to the parallel_fuzzing.txt file included with the source code of American Fuzzy Lop. 8) Known limitations & areas for improvement -------------------------------------------- Here are some of the most important caveats for AFL: - The fuzzer is optimized for compact data formats, such as images and other multimedia. It is less suited for human-readable formats with particularly verbose, redundant verbiage - say, XHTML or JavaScript. In such cases, template- or ABNF-based generators tend to fare better. Of course, if you want to modify the code to generate syntax-aware mutations, go ahead! You'd want to start with fuzz_one() in afl-fuzz.c. - As with any other brute-force tool, the fuzzer offers limited coverage if encryption, checksums, cryptographic signatures, or compression are used to wholly wrap the actual data format to be tested. To work around this, you may need to comment out the relevant checks in the tested programs, or use a wrapper that postprocesses the data generated by afl-fuzz. As a simple example, a patch for libpng to bypass CRC checksums is provided in experimental/libpng_no_checksum/libpng-nocrc.patch. - The included instrumentation (afl-as.h) currently supports x86. If you are feeling adventurous, an experimental ARM port can be found in experimental/arm_support/, too - but it's pretty brittle at this point. - Instrumentation of binary-only code is theoretically possible, but not supported today. Leveraging pin or DynamoRIO may be a simple approach. - There are some unfortunate trade-offs with ASAN and 64-bit binaries. This isn't due to any specific fault of afl-fuzz; see notes_for_asan.txt for more. 9) Special thanks ----------------- Many of the improvements to afl-fuzz wouldn't be possible without feedback, bug reports, or patches from: - Jann Horn, - Hanno Boeck, - Felix Groebert, - Jakub Wilk, - Richard W. M. Jones, - Alexander Cherepanov, - Tom Ritter, - Hovik Manucharyan, - Sebastian Roschke, - Eberhard Mattes, - Padraig Brady, - Ben Laurie, - @dronesec. Thank you! 10) Contact ----------- Questions? Concerns? Bug reports? The author can be usually reached at <[email protected]>.
About
american fuzzy lop (copy of the source code for easy access)
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published