Python scripts for Retrosheet data downloading and parsing.
-
Chadwick 0.6.2 http://chadwick.sourceforge.net/
-
python 2.5+ (don't know about 3.0, sorry)
-
sqlalchemy: http://www.sqlalchemy.org/
-
[if using postgres] pyscopg2 python package (dependency for sqlalchemy)
python download.py [-y <4-digit-year> | --year <4-digit-year>]
The scripts/download.py
script downloads Retrosheet data. Edit the config.ini file to configure what types of files should be downloaded. Optionally set the year to download via the command line argument.
-
download
>dl_eventfiles
determines if Retrosheet Event Files should be downloaded or not. These are the only files that can be processed byparse.py
at this time. -
download
>dl_gamelogs
determines if Retrosheet Game Logs should be downloaded or not. These are not able to be processed byparse.py
at this time.
python parse.py [-y <4-digit-year>]
After the files have been downloaded, parse them into SQL with parse.py
.
-
Create database called
retrosheet
(or whatever). -
Add schema to the database w/ the included SQL script (the .postgres.sql one works nicely w/ PG, the other w/ MySQL)
-
Configure the file
config.ini
with your appropriateENGINE
,USER
,HOST
,PASSWORD
, andDATABASE
values - if you're using postgres, you can optionally defineSCHEMA
and download directory-
Valid values for
ENGINE
are valid sqlalchemy engines e.g. 'mysql', 'postgresql', or 'sqlite', -
If you have your server configured to allow passwordless connections, you don't need to define
USER
andPASSWORD
. -
If you are using sqlite3,
database
in the config should be the path to your database file. -
Specify directory for retrosheet files to be downloaded to, needs to exist before script runs
-
-
Run
parse.py
to parse the files and insert the data into the database. (optionally use-y YYYY
to import just one year)
Github user jeffcrow made many fixes and additions and added sqlite support
If you're using PostgreSQL (and you should be), you can get a dump of all data up through 2014 (warning: 502MB) here