Automatically download and upload data for the Numerai machine learning competition.
This library is a Python client to the Numerai API. The interface is programmed
in Python and allows downloading the training data, uploading predictions, and
accessing user and submission information. Some parts of the code were taken
from numerflow by ChristianSch.
Visit his
wiki,
if you need further information on the reverse engineering process.
If you encounter a problem or have suggestions, feel free to open an issue.
- Obtain a copy of this API
-
If you do not plan on contributing to this repository, download a release.
- Navigate to releases.
- Download the latest version.
- Extract with
unzip
ortar
as necessary.
-
If you do plan on contributing, clone this repository instead.
cd
into the API directory (defaults tonumerapi
, but make sure not to go into the sub-directory also namednumerapi
).pip install -e .
See example.py
. You can run it as ./example.py
Parameters and return values are given with Python types. Dictionary keys are
given in quotes; other names to the left of colons are for reference
convenience only. In particular, list
s of dict
s have names for the dict
s;
these names will not show up in the actual data, only the actual dict
data
itself.
email
(str
, optional): email of user account- will prompt for this value if not supplied
password
(str
, optional): password of user account- will prompt for this value if not supplied
- prompting is recommended for security reasons
prompt_for_mfa
(bool
, optional): indication of whether to prompt for MFA code- only necessary if MFA is enabled for user account
user_credentials
(dict
): credentials for logged-in user"username"
(str
)"access_token"
(str
)"refresh_token"
(str
)
dest_path
(str
, optional, default:.
): destination folder for the datasetunzip
(bool
, optional, default:True
): indication of whether the training data should be unzipped
success
(bool
): indication of whether the current dataset was successfully downloaded
all_competitions
(list
): information about all competitionscompetition
(dict
)"_id"
(int
)"dataset_id"
(str
)"start_date"
(str (datetime)
)"end_date"
(str (datetime)
)"paid"
(bool
)"leaderboard
" (list
)submission
(dict
)"concordant"
(dict
)"pending"
(bool
)"value"
(bool
)
"earnings"
(dict
)"career"
(dict
)"nmr"
(str
)"usd"
(str
)
"competition"
(dict
)"nmr"
(str
)"usd"
(str
)
"logloss"
(dict
)"consistency"
(int
)"validation"
(float
)
"original"
(dict
)"pending"
(bool
)"value"
(bool
)
"submission_id"
(str
)"username"
(str
)
competition
(dict
): information about requested competition_id
(int
)"dataset_id"
(str
)"start_date"
(str (datetime)
)"end_date"
(str (datetime)
)"paid"
(bool
)"leaderboard"
(list
)submission
(dict
)"concordant"
(dict
)"pending"
(bool
)"value"
(bool
)
"earnings"
(dict
)"career"
(dict
)"nmr"
(str
)"usd"
(str
)
"competition"
(dict
)"nmr"
(str
)"usd"
(str
)
"logloss"
(dict
)"consistency"
: (int)
"validation": (float
)"original"
(dict
)"pending"
(bool
)"value"
(bool
)
"submission_id"
(str
)"username"
(str
)
username
: user for which earnings are requested
round_ids
(np.ndarray(int)
): IDs of each round for which there are earningsearnings
(np.ndarray(float)
): earnings for each round
username
: user for which scores are being requested
validation_scores
(np.ndarray(float)
): logloss validation scoresconsistency_scoress
(np.ndarray(float)
): logloss consistency scoresround_ids
(np.ndarray(int
): IDs of the rounds for which there are scores
username
:str
- name of requested user
user
(dict
): information about the requested user"_id"
(str
)"username"
(str
)"assignedEthAddress"
(str
)"created"
(str (datetime)
)"earnings"
(float
)"followers"
(int
)"rewards"
(list
)reward
(dict
)"_id"
(int
)"amount"
(float
)"earned"
(float
)"nmr_earned"
(str
)"start_date"
(str (datetime)
)"end_date"
(str (datetime)
)
"submissions"
(dict
)"results"
(list
)result
(dict
)"_id"
(str
)"competition"
(dict
)"_id"
(str
)"start_date"
(str (datetime)
)"end_date"
(str (datetime)
)
"competition_id"
(int
)"created"
(str (datetime)
)"id"
(str
)"username"
(str
)
username
(str
): user for which submission is requestedround_id
(int
, optional): round for which submission is requested- if no
round_id
is supplied, the submission for the current round will be retrieved
- if no
username
(str
): user for which submission is requestedsubmission_id
(str
): ID of submission for which data was foundlogloss_val
(float
): amount of logloss for given submissionlogloss_consistency
(float
): consistency of given submissioncareer_usd
(float
): amount of USD earned by given usercareer_nmr
(float
): amount of NMR earned by given userconcordant
(bool
ORdict
(see note)): whether given submission is concordant- for rounds before 64, this was only a boolean, but from 64 on, it is a dict which indicates whether concordance is still being computed
original
(bool
ORdict
(see note)): whether given submission is original- for rounds before 64, this was only a boolean, but from 64 on, it is a dict which indicates whether originality is still being computed
file_path
(str
): path to CSV of predictions- should already contain the file name (e.g.
"path/to/file/prediction.csv"
)
- should already contain the file name (e.g.
success
: indicator of whether the upload succeeded
- Uploading a prediction shortly before a new dataset is released may result in
a
400 Bad Request
. If this happens, wait for the new dataset and attempt to upload again. - Uploading too many predictions in a certain amount of time will result in a
429 Too Many Requests
.