-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggested changes to the queryset documentation in viewsforecasting
#16
Comments
@SofiaNordenving @chandlervincentwilliams Adding you for info now and follow-up on these points when I'm away, if they're not implemented before then. |
@chandlervincentwilliams I forgot to mention (2) above the other day – could you please implement this as well? @hhegre @chandlervincentwilliams @SofiaNordenving - any thoughts on how to implement (3) above? The core idea is to have a record of when we ingested various datasets to meet attribution policies from our data providers. |
Some comments on the above points:
|
(1, 2) Excellent, thanks!
(3) Sounds very reasonable. Would it then appear in the same db table as the concerned raw/source variable? If so, perhaps we can make it standard practise to include this column in our queryset definitions (especially for data that will be fed into models and/or input data to be shared in the API)? That way, it always gets included when we prepare and share data, making it much easier to adhere to user terms.
Also, is it possible for us to specify date of access per data entry/row to ensure full transparency, given that we most often don't update entire datasets when we ingest new data, but only the latest subsets thereof?
This would be particularly important for GED data, since UCDP updates their Candidate data records when new information is made available, but we don't update the previously ingested data until we load the next GED data. This was a huge issue for our Ethiopia forecasts when the war broke out; once the media blackout eased up and the UCDP updated their early records from the war, "our" data differed from theirs by hundreds of fatalities per month, resulting in really low forecasts even though better data had become available. Transparency on this would be incredibly helpful in explaining our forecasts.
|
I don't know exactly where this belongs but I think it could be related to (3). This thought came out of a discussion with Malika this morning, which I think could be summarised like this. When we query for model creation we need to implement a time lag for data that we are missing, so for GED it is 1 month. The problem for a lot of our data now is that it is not updated even though there is new data, so without checking what's actually in there we don't know what the lag should be. I don't think that it is realistic that we will be able to keep all the data up to date from the providers (Many of them don't have a consistent schedule for updates). What would be great is if this documentation somehow did include the timing of last ingested data, so not only when it was ingested but up to what month there is data for that variable. |
Excellent idea. Some of this information is at the dataset level, not the predictors (e.g. access date, last month with updated data). Somewhere in our GitHub system there should be a list of each "dataset" (e.g. ACLED, WDI, VDEM, ...) with this meta-information, and some link to the individual features we extract and ingest from them. |
viewsforecasting
repo. This was done to make it easier for external users to navigate the content of this repo and more easily understand what they're looking at (and prevent us from having to explain this in writing repeatedly). As requested, this has been pushed to the documentation branch of viewsforecasting.- UPDATE: Structure has been approved by HH; Chandler will implement it.
viewsforecasting
(the tables showing the variables that go into each queryset in the fatalities002 sub-models), in order for externals to consult these files without further instruction from us:- UPDATE: Structure has been approved by HH, but has yet to be implemented.
The text was updated successfully, but these errors were encountered: