Welcome to bulk_extractor.
Note: bulk_extractor version 2.0 is now under development. For information, please see Release 2.0 roadmap in the release-2.0-dev branch.
To build bulk_extractor in Linux or Mac OS:
-
Make sure required packages have been installed. You can do this by going into the etc/ directory and looking for a script that installs the necessary packages for your platform.
-
Then run these commands:
./configure
make
make install
For detailed instructions on installing packages and building bulk_extractor, read the wiki page here: https://github.com/simsong/bulk_extractor/wiki/Installing-bulk_extractor
The Windows version of bulk_extractor must be built on Fedora.
To download the Windows installer and/or other releases of bulk_extractor, visit the downloads page here: http://digitalcorpora.org/downloads/bulk_extractor
For more information on bulk_extractor, visit: https://forensicswiki.xyz/wiki/index.php?title=Bulk_extractor
This release of bulk_extractor has been tested to compile on the following platforms:
- Amazon Linux as of 2019-11-09
- Fedora 32
- Ubuntu 16.04LTS
- Ubuntu 18.04LTS
To configure your operating system, please run the appropriate scripts in the etc/ directory.
If you are writing a scientific paper and using bulk_extractor, please cite it with:
Garfinkel, Simson, Digital media triage with bulk data analysis and bulk_extractor. Computers and Security 32: 56-72 (2013)
@article{10.5555/2748150.2748581,
author = {Garfinkel, Simson L.},
title = {Digital Media Triage with Bulk Data Analysis and Bulk_extractor},
year = {2013},
issue_date = {February 2013},
publisher = {Elsevier Advanced Technology Publications},
address = {GBR},
volume = {32},
number = {C},
issn = {0167-4048},
journal = {Comput. Secur.},
month = feb,
pages = {56–72},
numpages = {17},
keywords = {Digital forensics, Bulk data analysis, bulk_extractor, Stream-based forensics, Windows hibernation files, Parallelized forensic analysis, Optimistic decompression, Forensic path, Margin, EnCase}
}
I continue to port bulk_extractor, tcpflow, be13_api and dfxml to modern C++. After surveying the standards I’ve decided to go with C++17 and not C++14, as support for 17 is now widespread. (I probably don’t need 20). I am sticking with autotools, although there seems a strong reason to move to CMake. I am keeping be13_api and dfxml as a modules that are included, python-style, rather than making them stand-alone libraries that are linked against. I’m not 100% sure that’s the correct decision, though.
The project is taking longer than anticipated because I am also doing a general code refactoring. The main thing that is taking time is figuring out how to detangle all of the C++ objects having to do with parser options and configuration.
Given that tcpflow and bulk_extractor both use be13_api, my attention has shifted to using tcpflow to get be13_api operational, as it is a simpler program. I’m about three quarters of the way through now. I anticipate having something finished before the end of 2020.
--- Simson Garfinkel, October 18, 2020