Skip to content

Official Implementation of "Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages" USENIX'21

Notifications You must be signed in to change notification settings

marklar-co/Phishpedia

 
 

Repository files navigation

Phishpedia A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages

Dialogues Dialogues

PaperWebsiteVideoDatasetCitation

  • This is the official implementation of "Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages" USENIX'21 link to paper, link to our website, link to our dataset.

  • Existing reference-based phishing detectors:

    • ❌ Lack of interpretability
    • ❌ Lack of generalization performance in the wild
    • ❌ Lack of a large-scale phishing benchmark dataset
  • The contributions of our paper:

    • ✅ We propose a phishing identification system Phishpedia, which has high identification accuracy and low runtime overhead, outperforming the relevant state-of-the-art identification approaches.
    • ✅ Our system provides explainable annotations which increase users' confidence in model prediction
    • ✅ We conducted a phishing discovery experiment on emerging domains fed from CertStream and discovered 1,704 real phishing, out of which 1133 are zero-days

Framework

Input: A URL and its screenshot Output: Phish/Benign, Phishing target

  • Step 1: Enter Deep Object Detection Model, get predicted logos and inputs (inputs are not used for later prediction, just for explanation)

  • Step 2: Enter Deep Siamese Model

    • If Siamese report no target, Return Benign, None
    • Else Siamese report a target, Return Phish, Phishing target

Project structure

- logo_recog.py: Deep Object Detection Model
- logo_matching.py: Deep Siamese Model 
- configs.yaml: Configuration file
- phishpedia.py: Main script

Instructions

Requirements:

  1. Create a local clone of Phishpedia
git clone https://github.com/lindsey98/Phishpedia.git
  1. Setup
chmod +x ./setup.sh
./setup.sh
conda activate phishpedia
  1. Run in bash
python phishpedia.py --folder <folder you want to test e.g. ./datasets/test_sites>

The testing folder should be in the structure of:

test_site_1
|__ info.txt (Write the URL)
|__ shot.png (Save the screenshot)
test_site_2
|__ info.txt (Write the URL)
|__ shot.png (Save the screenshot)
......

Miscellaneous

  • In our paper, we also implement several phishing detection and identification baselines, see here
  • The logo targetlist described in our paper includes 181 brands, we have further expanded the targetlist to include 277 brands in this code repository
  • For the phish discovery experiment, we obtain feed from Certstream phish_catcher, we lower the score threshold to be 40 to process more suspicious websites, readers can refer to their repo for details
  • We use Scrapy for website crawling

Citation

If you find our work useful in your research, please consider citing our paper by:

@inproceedings{lin2021phishpedia,
  title={Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages},
  author={Lin, Yun and Liu, Ruofan and Divakaran, Dinil Mon and Ng, Jun Yang and Chan, Qing Zhou and Lu, Yiwen and Si, Yuxuan and Zhang, Fan and Dong, Jin Song},
  booktitle={30th $\{$USENIX$\}$ Security Symposium ($\{$USENIX$\}$ Security 21)},
  year={2021}
}

Contacts

If you have any issues running our code, you can raise an issue or send an email to [email protected], [email protected], and [email protected]

About

Official Implementation of "Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages" USENIX'21

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 91.7%
  • Shell 8.3%