Name		Name	Last commit message	Last commit date
parent directory ..
Charlie		Charlie
Dodds2014		Dodds2014
Dreams		Dreams
LCohen		LCohen
MSCOCO		MSCOCO
RedCircle		RedCircle
SUBTLEX-US		SUBTLEX-US
authorship/C50		authorship/C50
csv_data		csv_data
json_data		json_data
linguistlist		linguistlist
lyrics		lyrics
scripts		scripts
xml_data		xml_data
README.md		README.md

README.md

Data

This folder contains data that we use in the course and/or that you can use to play around and test some of the skills that you have learnt. It also contains some of the scripts that were used to get the data.

Overview

The authorship folder contains the C50 corpus that can be used to train and test automatic authorship detection systems. It can be downloaded here.
The baby_names folder contains baby names from Social Security applications in the USA. (names downloaded from here, names_by_state downloaded from here).
Charlie contains 1 simple text file containing a text snippet from Roald Dahl's 'Charlie and the Chocolate Factory'.
The concreteness folder contains concreteness ratings downloaded from here.
The Dodds2014 folder contains sentiment scores for 100,000 words across 10 languages. It was downloaded from here.
The dreams folder contains 10 text files describing dreams of Vickie, a 10-year-old girl. These texts are downloaded from DreamBank.
linguistlist is a collection of messages from the Linguist List. They were downloaded from here using get_linguist_data.py. All data is gzipped, except for this example.
MSCOCO contains image annotations, provided by Microsoft Research. These were downloadeded from here.
presidential_debate_2016 contains a CSV file with transcripts of the 2016 (vice-)presidential debate from 26 September to 9 October. They were downloadeded from here.
RedCircle contains a text file with the ebook "The Adventure of the Red Circle" by Arthur Conan Doyle downloaded from here.
Trump-Facebook This TSV file contains Facebook statuses posted by Donald Trump. The dataset was downloaded from here. It was created by Max Woolf, using this script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data

Data

README.md

Data

Files

Data

Directory actions

More options

Directory actions

More options

Latest commit

History

Data

Folders and files

parent directory

README.md

Data