ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection and Analyzing
In this work, we explored the potential of multimodal large language models in the image manipulation detection task. We constructed ForgeryAnalysis, a dataset containing forgery analysis text annotations. Each entry was initially generated by GPT-4o and then reviewed by experts. The proposed data engine ForgeryAnalyst enables the creation of a larger-scale ForgeryAnalysis-PT dataset for pre-training purposes. We also proposed ForgerySleuth, which leverages multimodal large language model to perform comprehensive clue fusion and generate segmentation outputs indicating specific regions that are tampered. More details about our work can be found in the paper.
conda create --name <env> --file requirements.txt
You can use the data engine ForgeryAnalyst-llava-13B to automatically annotate forgery analysis text for images that already have tampered region masks:
python run_engine.py --model-path Zhihao18/ForgeryAnalyst-llava-13B --image-path <path_to_image> --mask-path <path_to_mask> --manipulation-type <manipulation_type> --output-path <path_to_save_output>
To ensure consistency in the training data, for authentic images, you can use ShareCaptioner to generate detailed image captions and then organize them in the Chain-of-Clues format.
python run_sharecaptioner.py --model-path Lin-Chen/ShareCaptioner --image-path <path_to_image> --output-path <path_to_save_output>
Tips: You can download ShareCaptioner in advance and use local_files_only=True
to force the use of local weights, avoiding potential network issues.
The ForgeryAnalysis-PT dataset consists of forgery analysis texts automatically generated by our data engine, ForgeryAnalyst. The dataset corresponds to two publicly available image manipulation detection datasets: CASIA2 and MIML. Each entry in the dataset provides forgery analysis for a corresponding tampered image, including clues and explanations structured in a Chain-of-Clues format.
Before using this dataset, download the original CASIA2 and MIML datasets from the respective public repositories, as ForgeryAnalysis-PT relies on these datasets for the corresponding tampered images.
The tampering analysis for each image is saved as a .txt
file with the same name as the tampered image in the original CASIA2 and MIML datasets. You can download this dataset from the following link: Google Drive.
The ForgeryAnalysis-PT dataset is freely available for academic research and development. However, you must respect the terms and conditions of the original datasets, CASIA2 and MIML.
We used several publicly available and widely used image manipulation detection datasets to evaluate the performance of IMD methods. You can access the original repositories and download the data through the following links:
Dataset | Paper | Download URL |
---|---|---|
Columbia | Detecting Image Splicing Using Geometry Invariants And Camera Characteristics Consistency | https://www.ee.columbia.edu/ln/dvmm/downloads/authsplcuncmp |
CASIA | Casia image tampering detection evaluation database | [Unofficial] https://github.com/namtpham/casia1groundtruth |
[Unofficial] https://github.com/namtpham/casia2groundtruth | ||
Coverage | COVERAGE - A Novel Database for Copy-move Forgery Detection | https://github.com/wenbihan/coverage |
NIST16 | MFC Datasets: Large-Scale Benchmark Datasets for Media Forensic Challenge Evaluation | https://mfc.nist.gov/users/sign_in |
IMD20 | IMD2020: A Large-Scale Annotated Dataset Tailored for Detecting Manipulated Images | https://staff.utia.cas.cz/novozada/db |
COCOGlide | TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization | https://github.com/grip-unina/TruFor?tab=readme-ov-file#cocoglide-dataset |
If you find this project useful for your research and applications, please cite using this BibTeX:
@misc{sun2024forgerysleuth,
title={ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection},
author={Sun, Zhihao and Jiang, Haoran and Chen, Haoran and Cao, Yixin and Qiu, Xipeng and Wu, Zuxuan and Jiang, Yu-Gang},
publisher={arXiv:2411.19466},
year={2024},
url={https://arxiv.org/abs/2411.19466},
}
- This work is built upon the LLaVA, LISA and SAM.
- In the process of dataset creation and model evaluation, we utilized ChatGPT and ShareCaptioner.