Skip to content

Research project to diarize and transcribe legal speech in U.S. Ninth Circuit Court (ca9) proceedings using model trained on U.S. Supreme Court audio files. This repo is part of ongoing Research at NYU's Center for Data Science.

Notifications You must be signed in to change notification settings

akuz91/LegalSpeech

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LegalSpeech

DS-GA 1006 - New York University

  • Jeffrey Tumminia (jt2565)
  • Amanda Kuznecov (anr431)
  • Sophia Tsilerides (smt570)
  • Ilana Weinstein (igw212)

Overview

We find it painstakingly obvious that our society contains an extremely inconsistent legal system. Historically, this has been simply accepted as an artifact of human nature (perhaps correctly). We believe that data science finally offers us a means to identify weaknesses in the system and improve its ability to perform consistently and fairly. It can open avenues to quantitative analysis of bias in judge rulings that can allow for clearer communication of the issues to public officials and the greater public. Dr. Kaufman's work has a theme of searching for casual relations in noisy political data, an area under-studied in our opinion. Our hope is that we may contribute to the cause of improving a system that affects the lives of so many individuals in our country.

Please reference our paper and video for more information.

Folder & File Descriptions

speech-to-text: Producing text from audio files using Google's Speech-to-Text API

Web Scraping ca9: Using Selenium and BeautifulSoup packages to scrape ca9 website for case information.

SCOTUS: Working with U.S. Supreme Court audio files from oyez API for consumption to UIS-RNN model.

Diarization: Diarizing full court proceeding using the Reference Dependent Speaker Verification (RDSV) method.

About

Research project to diarize and transcribe legal speech in U.S. Ninth Circuit Court (ca9) proceedings using model trained on U.S. Supreme Court audio files. This repo is part of ongoing Research at NYU's Center for Data Science.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 91.9%
  • Python 7.8%
  • Shell 0.3%