Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reports have not generated for January 2022 #155

Closed
tunetheweb opened this issue Jan 27, 2022 · 3 comments
Closed

Reports have not generated for January 2022 #155

tunetheweb opened this issue Jan 27, 2022 · 3 comments

Comments

@tunetheweb
Copy link
Member

So the January reports have not run. This happens every so often and ran it manually. but it's bugged me, and think I've finally figured it out.

We run the following in the cron:

$ crontab -l
0 15 * * * /bin/bash -l -c 'cd /home/igrigorik/code && ./sync_csv.sh `date +\%b_1_\%Y`'  >> /var/log/HAimport.log 2>&1
0  8 * * * /bin/bash -l -c 'cd /home/igrigorik/code && ./sync_csv.sh mobile_date +\%b_1_\%Y`'  >> /var/log/HAimport.log 2>&1
0 10 * * * /bin/bash -l -c 'cd /home/igrigorik/code && ./sync_har.sh chrome' >> /var/log/HA-import-har-chrome.log 2>&1
0 11 * * * /bin/bash -l -c 'cd /home/igrigorik/code && ./sync_har.sh android' >> /var/log/HA-import-har-android.log 2>&1

The CSV jobs generate the summary tables and then attempt to run the reports if all the other data is there.
The HAR jobs generate the non-summary tables and then attempts to run the reports if all the other data is there.

So the last job to upload the data should run the reports, because at that point all 4 sets of tables are there.
The other 3 jobs only do the imports and fail on the report generation as not all the tables are there.

Running this shows the completion date of each upload:

	bq show "httparchive:summary_pages.${YYYY_MM_DD}_desktop" | head -5
	bq show "httparchive:summary_pages.${YYYY_MM_DD}_mobile" | head -5
	bq show "httparchive:pages.${YYYY_MM_DD}_desktop" | head -5
	bq show "httparchive:pages.${YYYY_MM_DD}_mobile" | head -5

Which is summarised below

dataset data
httparchive:summary_pages.2022_01_01_desktop 19 Jan 01:04:59
httparchive:summary_pages.2022_01_01_mobile 25 Jan 22:16:00
httparchive:pages.2022_01_01_desktop 24 Jan 16:54:34
httparchive:pages.2022_01_01_mobile 25 Jan 07:16:24

So the last job to complete is the summary pages for mobile. So it should have kicked off the reports.

However the logs show this:

Attempting to generate reports...
The BigQuery tables for 2022_01_01_mobile are not available.

This is because the date passed to the sql/generate_reports.sh script is 2022_01_01_mobile instead of 2022_01_01. This is due to a bug in the sync_csv.sh script that sets this to the _date_client (for other reasons in the script).

The net effect is, if the mobile CSV/summary pages finishes last the reports are not generated automatically. If any of the other tables finish last, then they are automatically generated.

Will submit a fix for this, and rerun the reports.

Hopefully this. whole hacky script will be rewritten soon but this is a simple fix for now.

@tunetheweb
Copy link
Member Author

Fix merged and reports rerunning now...

@rviscomi
Copy link
Member

Oh wow good investigation and great catch!

@tunetheweb
Copy link
Member Author

All reports have been rerun successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants