-
Notifications
You must be signed in to change notification settings - Fork 32
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #5 from nrbergeron/master
Added ability to download Retrosheet Game Logs
- Loading branch information
Showing
4 changed files
with
126 additions
and
46 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1,60 @@ | ||
PY-RETROSHEET | ||
============= | ||
|
||
Python scripts for Retrosheet data downloading and parsing. | ||
|
||
YE REQUIREMENTS | ||
- | ||
Chadwick 0.6.2 http://chadwick.sourceforge.net/ | ||
python 2.5+ (don't know about 3.0, sorry) | ||
sqlalchemy: http://www.sqlalchemy.org/ | ||
[if using postgres] pyscopg2 python package (dependency for sqlalchemy) | ||
|
||
1. create database called <code>retrosheet</code> (or whatever) | ||
2. add schema to the database w/ the included SQL script (the .postgres.sql one works nicely w/ PG, the other w/ MySQL) | ||
3. configure the file <code>db.ini</code> with your appropriate ENGINE, USER, HOST, PASSWORD, DATABASE values - if yer using postgres, you can optionally define SCHEMA and download directory | ||
* valid values for ENGINE are valid sqlalchemy engines e.g. 'mysql', 'postgresql' or 'sqlite' | ||
* if you have your server configured to allow passwordless connections, you don't need to define USER and PASSWORD | ||
* if you are using sqlite3, 'database' in the config should be the path to your database file | ||
* specify directory for retrosheet files to be downloaded to, needs to exist before script runs | ||
4. run <code>download.py</code> to download the files from retrosheet's servers (optionally use <code>-y XXXX</code> to get only a certain year) | ||
5. run <code>parse.py</code> to parse the files and insert the data into the database. (optionally use <code>-y XXXX</code> to import just one year) | ||
--------------- | ||
|
||
- Chadwick 0.6.2 http://chadwick.sourceforge.net/ | ||
|
||
- python 2.5+ (don't know about 3.0, sorry) | ||
|
||
- sqlalchemy: http://www.sqlalchemy.org/ | ||
|
||
- [if using postgres] pyscopg2 python package (dependency for sqlalchemy) | ||
|
||
USAGE | ||
----- | ||
|
||
### Download | ||
|
||
python download.py [-y <4-digit-year> | --year <4-digit-year>] | ||
|
||
The `scripts/download.py` script downloads Retrosheet data. Edit the config.ini file to configure what types of files should be downloaded. Optionally set the year to download via the command line argument. | ||
|
||
- `download` > `dl_eventfiles` determines if Retrosheet Event Files should be downloaded or not. These are the only files that can be processed by `parse.py` at this time. | ||
|
||
- `download` > `dl_gamelogs` determines if Retrosheet Game Logs should be downloaded or not. These are not able to be processed by `parse.py` at this time. | ||
|
||
### Parse into SQL | ||
|
||
python parse.py [-y <4-digit-year>] | ||
|
||
After the files have been downloaded, parse them into SQL with `parse.py`. | ||
|
||
1. Create database called `retrosheet` (or whatever). | ||
|
||
2. Add schema to the database w/ the included SQL script (the .postgres.sql one works nicely w/ PG, the other w/ MySQL) | ||
|
||
3. Configure the file `config.ini` with your appropriate `ENGINE`, `USER`, `HOST`, `PASSWORD`, and `DATABASE` values - if you're using postgres, you can optionally define `SCHEMA` and download directory | ||
|
||
- Valid values for `ENGINE` are valid sqlalchemy engines e.g. 'mysql', 'postgresql', or 'sqlite', | ||
|
||
- If you have your server configured to allow passwordless connections, you don't need to define `USER` and `PASSWORD`. | ||
|
||
- If you are using sqlite3, `database` in the config should be the path to your database file. | ||
|
||
- Specify directory for retrosheet files to be downloaded to, needs to exist before script runs | ||
|
||
5. Run `parse.py` to parse the files and insert the data into the database. (optionally use `-y YYYY` to import just one year) | ||
|
||
YE GRATITUDE | ||
- | ||
------------ | ||
|
||
Github user jeffcrow made many fixes and additions and added sqlite support | ||
|
||
JUST THE DATA | ||
- | ||
If you're using PostgreSQL (and you should be), you can get a dump of all data up through 2014 (warning: 502MB) [here](https://www.dropbox.com/s/03c3zyk91c2yfuw/retrosheet.sql.gz | ||
) | ||
------------- | ||
|
||
If you're using PostgreSQL (and you should be), you can get a dump of all data up through 2014 (warning: 502MB) [here](https://www.dropbox.com/s/03c3zyk91c2yfuw/retrosheet.sql.gz) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters