Skip to content

This is the development tree. For downloads please see:

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE.md
Unknown
COPYING
Notifications You must be signed in to change notification settings

thinrope/bulk_extractor

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

codecov

bulk_extractor is a high-performance C++ program that scans a disk image, a file, or a directory of files and extracts information such as email addresses, JPEGs and JSON snippets without parsing the file system or file system structures. The results are stored in feature files or carved into stand-alone files that can be easily inspected, parsed, or processed with automated tools. bulk_extractor also creates histograms of features that it finds, as features that are more common tend to be more important.

bulk_extractor probes every block of data to see if it contains bytes that can be decompressed or otherwise decoded. If so, the decoded data are recursively re-examined. As a result, bulk_extractor can find things like BASE64-encoded JPEGs and compressed JSON objects that traditional carving tools miss.

This is the bulk_extractor 2.0 development branch! For information about the bulk_extractor update, please see Release 2.0 roadmap.

Building bulk_extractor

To build bulk_extractor in Linux or Mac OS:

  1. Make sure required packages have been installed. You can do this by going into the etc/ directory and looking for a script that installs the necessary packages for your platform.

  2. Then run these commands:

./configure
make
make install

For detailed instructions on installing packages and building bulk_extractor, read the wiki page here: https://github.com/simsong/bulk_extractor/wiki/Installing-bulk_extractor

The Windows version of bulk_extractor must be built on Fedora using mingw.

For more information on bulk_extractor, visit: https://forensicswiki.xyz/wiki/index.php?title=Bulk_extractor

Tested Configurations

This release of bulk_extractor requires C++17 and has been tested to compile on the following platforms:

  • Amazon Linux as of 2019-11-09
  • Fedora 32
  • Ubuntu 20.04LTS
  • MacOS 11.5.2

To configure your operating system, please run the appropriate scripts in the etc/ directory.

RECOMMENDED CITATION

If you are writing a scientific paper and using bulk_extractor, please cite it with:

Garfinkel, Simson, Digital media triage with bulk data analysis and bulk_extractor. Computers and Security 32: 56-72 (2013)

@article{10.5555/2748150.2748581,
author = {Garfinkel, Simson L.},
title = {Digital Media Triage with Bulk Data Analysis and Bulk_extractor},
year = {2013},
issue_date = {February 2013},
publisher = {Elsevier Advanced Technology Publications},
address = {GBR},
volume = {32},
number = {C},
issn = {0167-4048},
journal = {Comput. Secur.},
month = feb,
pages = {56–72},
numpages = {17},
keywords = {Digital forensics, Bulk data analysis, bulk_extractor, Stream-based forensics, Windows hibernation files, Parallelized forensic analysis, Optimistic decompression, Forensic path, Margin, EnCase}
}

BULK_EXTRACTOR 2.0 STATUS REPORT

bulk_extractor 2.0 is now operational for development use. It requires C++17 to compile. I am keeping be13_api and dfxml as a modules that are included, python-style, rather than making them stand-alone libraries that are linked against.

The project took longer than anticipated. In addition to updating to C++17, I used this as an opportunity for massive code refactoring and general increase in reliability.

About

This is the development tree. For downloads please see:

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE.md
Unknown
COPYING

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 53.5%
  • Shell 14.1%
  • Python 12.8%
  • C 5.8%
  • HTML 4.7%
  • M4 3.3%
  • Other 5.8%