Skip to content

Object detection in street view with audible output

Notifications You must be signed in to change notification settings

rezmansouri/YOHO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

YOHO (You Only Hear Once) 👂☝️

5

Introduction

YOHO does object detection in street view and provides audible scene descriptions. With YOHO, a visually impared person would be able to take out their smartphone, put in their earbuds, and take a walk in the street while their phone lets them know about their surroundings.

YOHO was created via transfer learning and fine-tuning on a YOLOv2 architechture (pre-trained on COCO dataset) on the Berkeley Deep Drive dataset, and further, placing an inference engine - Tell a Vision - on top of the model to get audible output from its predictions.

This is the architecture of the system:

Links

You may also download the notebook and the report from this repository.

Contact

If there are any questions or recommendations, you can reach out to me at [email protected].