You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The CSV jobs generate the summary tables and then attempt to run the reports if all the other data is there.
The HAR jobs generate the non-summary tables and then attempts to run the reports if all the other data is there.
So the last job to upload the data should run the reports, because at that point all 4 sets of tables are there.
The other 3 jobs only do the imports and fail on the report generation as not all the tables are there.
Running this shows the completion date of each upload:
bq show "httparchive:summary_pages.${YYYY_MM_DD}_desktop" | head -5
bq show "httparchive:summary_pages.${YYYY_MM_DD}_mobile" | head -5
bq show "httparchive:pages.${YYYY_MM_DD}_desktop" | head -5
bq show "httparchive:pages.${YYYY_MM_DD}_mobile" | head -5
Which is summarised below
dataset
data
httparchive:summary_pages.2022_01_01_desktop
19 Jan 01:04:59
httparchive:summary_pages.2022_01_01_mobile
25 Jan 22:16:00
httparchive:pages.2022_01_01_desktop
24 Jan 16:54:34
httparchive:pages.2022_01_01_mobile
25 Jan 07:16:24
So the last job to complete is the summary pages for mobile. So it should have kicked off the reports.
However the logs show this:
Attempting to generate reports...
The BigQuery tables for 2022_01_01_mobile are not available.
This is because the date passed to the sql/generate_reports.sh script is 2022_01_01_mobile instead of 2022_01_01. This is due to a bug in the sync_csv.sh script that sets this to the _date_client (for other reasons in the script).
The net effect is, if the mobile CSV/summary pages finishes last the reports are not generated automatically. If any of the other tables finish last, then they are automatically generated.
Will submit a fix for this, and rerun the reports.
Hopefully this. whole hacky script will be rewritten soon but this is a simple fix for now.
The text was updated successfully, but these errors were encountered:
So the January reports have not run. This happens every so often and ran it manually. but it's bugged me, and think I've finally figured it out.
We run the following in the cron:
The CSV jobs generate the summary tables and then attempt to run the reports if all the other data is there.
The HAR jobs generate the non-summary tables and then attempts to run the reports if all the other data is there.
So the last job to upload the data should run the reports, because at that point all 4 sets of tables are there.
The other 3 jobs only do the imports and fail on the report generation as not all the tables are there.
Running this shows the completion date of each upload:
Which is summarised below
So the last job to complete is the summary pages for mobile. So it should have kicked off the reports.
However the logs show this:
This is because the date passed to the
sql/generate_reports.sh
script is2022_01_01_mobile
instead of2022_01_01
. This is due to a bug in thesync_csv.sh
script that sets this to the _date_client (for other reasons in the script).The net effect is, if the mobile CSV/summary pages finishes last the reports are not generated automatically. If any of the other tables finish last, then they are automatically generated.
Will submit a fix for this, and rerun the reports.
Hopefully this. whole hacky script will be rewritten soon but this is a simple fix for now.
The text was updated successfully, but these errors were encountered: