Kim's local files for the 2013 computational linguistics Multi-Universty Research Initiative at Carnegie Mellon University. This repository contains the custom files used to crawl the internet for Kinyarwanda text using the Apache Nutch web crawler. This project yielded 5.5 million tokens of 2.5k types. Summer 2013, Carnegie Mellon University, Language Technologies Institute.
-
Notifications
You must be signed in to change notification settings - Fork 0
spasarok/kinyarwandaCrawler
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
A Kinyarwanda web crawler
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published