This repository contains
- bash scripts to download SIPP raw data and stata do-file instructions to read the data into .dta format
- modified versions of the do-files as supplied by NBER. The modification is small and only allows the scripts to be called inside a loop one after the other.
- you need a unix/macOS machine to run the bash scripts. (I think).
- downloads core and topical modules form SIPP
- The typical stata data for a panel has about 7GB
- the entire raw data is 50GB
- The structure of the files online is not uniform; file names change from one panel to the next, so don't expect uniform filenames in the bash scripts either.
Proceed like this:
- run each of the setup_SIPPyy.sh scripts with
./setup_SIPPyy.sh
on your command line. This should download all the raw data into a folder structure I set up at~/datasets/SIPP
. You can change that location by changing the variabledest
inside the scripts - run each of the
makeyyyy.do
scripts in stata. this should read the raw data into stata format and apply the labels.
The bash scripts will check if a file exists, to avoid downloading large files again. if you think anything went wrong
during the process, just delete a folder in ~/datasets/SIPP
to recreate it.
Each panel has about 7GB in stata format
The usual disclaimer applies, i.e. I do not guarantee anything about this software. I hope it is useful and feel free to use it.
- The original do-files are all copyright NBER. Please see the link above.