Skip to content

yangke/cluehunter

Repository files navigation

cluehunter

ClueHunter is an auxiliary tool for crash point reverse data flow analysis. It generate data flow graph according to the gdb debug log(C program source code level). It receive manually specified sink variables that cause the last line crash and perform interprocedural analysis on the log trace. For obtaining the auto-debug trace, the tool robot_dbg.exp in ClueHunter requires the program under debug to be compiled with profiled code information (gcc -g -save-temps option).During the current develop stage, only command line program is supported. Please consult here for detail documents.

##Quick Start Cookbook ###Install ClueHunter depends on graphviz to generate the picture from the dot file. ####For Ubuntu

sudo apt-get install git
sudo apt-get instsall graphviz
git clone https://github.com/yangke/cluehunter.git

That's done.

####Start Funny First make sure your C program under analysis is compiled by gcc with -g -save-temps option. In most cases you can specify this in the configure procedure like this:

$./configure CFLAGS="-g -save-temps" CXXFLAGS="-g -save-temps" --prefix=$YOUR_INSTALL_PATH 

Otherwise you may have to change the Makefile. Then modify the 15 line in cluehunter/robot_dbg.exp to fit with your own debug scenarios. Here is an example for executable program swf2xml test in swfmill-0.3.3.

spawn gdb --args swfmill swf2xml exploit_it_to_crash

The input file exploit_it_to_crash will cause the crash of swf2xml.

Then use robot_dbg.exp to debug your program automatically. It executes gdb next command when meeting lines which contains library or system call site, other cases it executes gdb step command. Copy the robot_dbg.exp into the directory of binary executable program: swf2xml and the exploit input: exploit_it_to_crash. This will make the former command valid(spawn gdb -q --args swfmill swf2xml exploit_it_to_crash).

swfmill-0.3.3_install_bin_path$ls
... exploit_it_to_crash ... robot_dbg.exp ... swf2xml ...
swfmill-0.3.3_install_bin_path$./robot_dbg.exp
...
swfmill-0.3.3_install_bin_path$ls
... exploit_it_to_crash ... gdb.txt ... robot_dbg.exp ... swf2xml ...

Every thing come handy, we got the debug trace gdb.txt besides them. Then we can use cluehunter.py to analyze this trace.

python cluehunter.py -t path_to/gdb.txt\
      -vs length -ps N -o . -n telescope -l 1

This command will use the test trace located at gdb.txt to perform reverse data flow analysis for variable length. The sensitive crash data length itself are marked as tainted. The access pattern of length, 'N', means direct access. Another mark '\*' means we need to dereference this pointer to access sensitive sink data we cared about. Note that the \* must be quoted with "" or '' in command line.
This command will cause ClueHunter output telescope.dot and use graphviz to generate telescope.svg beside it.-vs, -ps and -t are three mandatory options which specify the names of sink variables, patterns and the trace to analysis respectively. -o option specified the output directory. -l specified the parsed trace redundancy level. 0 means only remove the line redundancy in same function and 1 means remove both the inner function and inter-function reduandancy.

If you want to analyze variables on specific trace line, you may need -i option. For example: -i -1 specifies the last line in trace.txt, and -i -2 specifies the line of last but one. You can also use positive line number. For instance, -i 100 means the 100 line in the trace.txt. Note that the lines we talk here are the lines in the parsed middle file: trace.txt. The last line(-i -1) in trace.txt corresponds to the last none-empty line above the error information Program receive ... in gdb.txt.

ClueHunter can analyze the function call caused by macros by expanding them. It leverages the preprocessed *.i files generated by -save-temps option of gcc to make a query. To use this function, you have to specify the path of the compiled C project corresponding to the log trace under analysis. This function is not available by default, please use -m to specify the compiled C project path.

Here is an executable test command which analyze the trace gdb-swfmill-0.3.3.txt provided in test module.

python cluehunter.py -t test/gdb_logs/swfmill-0.3.3/gdb-swfmill-0.3.3.txt\
      -vs length -ps 'N' -o . -n telescope -l 1 -m test/gdb_logs/swfmill-0.3.3/swfmill-0.3.3

##Complete Usage

usage: cluehunter.py [-h] -ps PATTERNS [PATTERNS ...] -vs VARIABLES
                     [VARIABLES ...] [-l LEVEL] -t TRACE [-o OUTPUT_PATH]
                     [-m C_PROJECT_DIR] [-n NAME] [-d | -v | -q]
                     
optional arguments:
  -h, --help            show this help message and exit
  -l LEVEL, --level LEVEL
                        Redundancy level of the parsing. 0 means just remove
                        inline or innner function redundancy; 1 means remove
                        both of the inline and interprocedural reduandancy.
  -t TRACE, --trace TRACE
                        The file path of gdb trace log, for example,
                        ./gdb.txt. This log should be generated by
                        robot_dbg.exp.
  -o OUTPUT_PATH, --output-directory OUTPUT_PATH
                        The output directory in which .dot and .png files will
                        be dumped in this path.
  -m C_PROJECT_DIR, --c-project-dir C_PROJECT_DIR
                        The C project directory with the .i files maked by gcc
                        '-save-temps' option. Usually the we add this flags
                        during configure: ./configure CFLAGS='-g -save-temps'.
  -n NAME, --name NAME  The prefix name of the generated .dot and .png files.
  -d, --debug           Enable debug output.
  -v, --verbose         Increase verbosity.
  -q, --quiet           Be quiet during processing.

sinks:
  -ps PATTERNS [PATTERNS ...], --patterns PATTERNS [PATTERNS ...]
                        Specify the access pattern list of the sink
                        identifiers. Patterns must be "*" or "N" separated
                        with blanks. "N" means direct access, "*" means this
                        is a pointer of the cared data.
  -vs VARIABLES [VARIABLES ...], --variables VARIABLES [VARIABLES ...]
                        Specify the identifier name of the sink variables.
                        Example:"father->baby.toy"

About

Find clues of program crash: a data flow tracker based on gdb log.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages