Skip to content

Pytorch implementation of "LEVERAGING POSITIONAL-RELATED LOCAL-GLOBAL DEPENDENCY FOR SYNTHETIC SPEECH DETECTION"

Notifications You must be signed in to change notification settings

rst0070/Rawformer-implementation-anti-spoofing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rawformer Implementation for Anti-Spoofing

This is my own implementation of Rawformer model (LEVERAGING POSITIONAL-RELATED LOCAL-GLOBAL DEPENDENCY FOR SYNTHETIC SPEECH DETECTION - Xiaohui Liu, Meng Liu, Longbiao Wang, Kong Aik Lee2, Hanyi Zhang1, Jianwu Dang)
fig1
WARNING

  • This code could be not same as explained in the paper. -If you find some bugs and want to fix it, please open pull requests.
  • Using pre-emphasis for preprocessing showed superior performance.

Rawformer-S vs Rawformer-L vs SE-Rawformer

In the paper, authors developed three types of Rawformer, Rawformer-S, Rawformer-L and SE-Rawformer. I implemented all of these models only with 1-dimesional positional encoding. N is the number of Conv2D-based Blocks and M is the number of Transformer Encoders.

  • Rawformer-S
    • N = 4
    • M = 2
    • Conv2D-based Block - same as a ResNet block used in AASIST
  • Rawformer-L
    • N = 6
    • M = 3
    • Conv2D-based Block - same as a ResNet block used in AASIST
  • SE-Rawformer
    • N = 4
    • M = 2
    • Conv2D-based Block - replaced blocks of Rawformer-S with Res-SERes2Net blocks for last three blocks

About

Pytorch implementation of "LEVERAGING POSITIONAL-RELATED LOCAL-GLOBAL DEPENDENCY FOR SYNTHETIC SPEECH DETECTION"

Topics

Resources

Stars

Watchers

Forks

Languages