Phishpedia A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages

Paper • Website • Video • Dataset • Citation

This is the official implementation of "Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages" USENIX'21 link to paper, link to our website, link to our dataset.
Existing reference-based phishing detectors:
- ❌ Lack of interpretability
- ❌ Lack of generalization performance in the wild
- ❌ Lack of a large-scale phishing benchmark dataset
The contributions of our paper:
- ✅ We propose a phishing identification system Phishpedia, which has high identification accuracy and low runtime overhead, outperforming the relevant state-of-the-art identification approaches.
- ✅ Our system provides explainable annotations which increase users' confidence in model prediction
- ✅ We conducted a phishing discovery experiment on emerging domains fed from CertStream and discovered 1,704 real phishing, out of which 1133 are zero-days

Framework

Input: A URL and its screenshot Output: Phish/Benign, Phishing target

Step 1: Enter Deep Object Detection Model, get predicted logos and inputs (inputs are not used for later prediction, just for explanation)
Step 2: Enter Deep Siamese Model
- If Siamese report no target, Return Benign, None
- Else Siamese report a target, Return Phish, Phishing target

Project structure

- logo_recog.py: Deep Object Detection Model
- logo_matching.py: Deep Siamese Model 
- configs.yaml: Configuration file
- phishpedia.py: Main script

Instructions

Requirements:

Anaconda installed, please refer to the official installation guide: https://docs.anaconda.com/free/anaconda/install/index.html

Create a local clone of Phishpedia

git clone https://github.com/lindsey98/Phishpedia.git

Setup

chmod +x ./setup.sh
./setup.sh

conda activate phishpedia

Run in bash

python phishpedia.py --folder <folder you want to test e.g. ./datasets/test_sites>

The testing folder should be in the structure of:

test_site_1
|__ info.txt (Write the URL)
|__ shot.png (Save the screenshot)
test_site_2
|__ info.txt (Write the URL)
|__ shot.png (Save the screenshot)
......

Miscellaneous

In our paper, we also implement several phishing detection and identification baselines, see here
The logo targetlist described in our paper includes 181 brands, we have further expanded the targetlist to include 277 brands in this code repository
For the phish discovery experiment, we obtain feed from Certstream phish_catcher, we lower the score threshold to be 40 to process more suspicious websites, readers can refer to their repo for details
We use Scrapy for website crawling

Citation

If you find our work useful in your research, please consider citing our paper by:

@inproceedings{lin2021phishpedia,
  title={Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages},
  author={Lin, Yun and Liu, Ruofan and Divakaran, Dinil Mon and Ng, Jun Yang and Chan, Qing Zhou and Lu, Yiwen and Si, Yuxuan and Zhang, Fan and Dong, Jin Song},
  booktitle={30th $\{$USENIX$\}$ Security Symposium ($\{$USENIX$\}$ Security 21)},
  year={2021}
}

Contacts

If you have any issues running our code, you can raise an issue or send an email to [email protected], [email protected], and [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 269 Commits
datasets		datasets
.gitignore		.gitignore
README.md		README.md
configs.py		configs.py
configs.yaml		configs.yaml
logo_matching.py		logo_matching.py
logo_recog.py		logo_recog.py
models.py		models.py
phishpedia.py		phishpedia.py
requirements.txt		requirements.txt
setup.sh		setup.sh
text_recog.py		text_recog.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phishpedia A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages

Framework

Project structure

Instructions

Miscellaneous

Citation

Contacts

About

Releases

Packages

Languages

marklar-co/Phishpedia

Folders and files

Latest commit

History

Repository files navigation

Phishpedia A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages

Framework

Project structure

Instructions

Miscellaneous

Citation

Contacts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages