httparchive.latest.summary_requests_desktop/mobile not updated #76

foolip · 2019-07-22T11:21:42Z

summary_pages_desktop was updated on July 1, but the summary_requests_mobile table hasn't been updated since May 1:

As a result of this, it's not possible to use the two tables together by joining on pageid. I'm instead having to use JSON_EXTRACT(payload, '$._contentType') AS contentType on the full requests table.

Context: I'm updating the HTTP Archive for web compat decision making doc.

The text was updated successfully, but these errors were encountered:

rviscomi · 2019-07-22T14:55:55Z

The most recent runs of the scheduled queries for desktop/mobile summary requests have failed with this error:

Job 226352634162:scheduled_query_5d1b264f-0000-22bc-a112-f4f5e80d17d0 (table summary_requests_mobile) failed with error INVALID_ARGUMENT: Cannot read field '_gzip_save' of type STRING as INT64; JobID: 226352634162:scheduled_query_5d1b264f-0000-22bc-a112-f4f5e80d17d0

_gzip_save is type STRING in 2019_06_01_desktop, so the wildcard query is failing. I'll convert that field to INTEGER and rerun the scheduled queries.

rviscomi · 2019-07-22T15:38:27Z

A few other tables have this type mismatch due to HTTPArchive/httparchive.org#135 so it will be a bit more work to get the scheduled query running.

Instead I'll update the Dataflow pipeline to handle the copying of the latest tables. The July crawl is critical so I'll wait until that's done to make the changes.

For now I've manually copied the 2019_06_01 summary_requests tables into the latest dataset, so your queries should be working now.

foolip · 2019-07-29T08:54:11Z

Thanks @rviscomi! Looking forward to the July data :)

rviscomi · 2020-06-03T21:06:19Z

Latest summary_requests should be generated properly thanks to HTTPArchive/httparchive.org#203.

I'll create a new issue to track handling latest table creation from the Dataflow pipeline.

rviscomi added the bug label Jul 22, 2019

rviscomi closed this as completed Jun 3, 2020

rviscomi mentioned this issue Jun 3, 2020

Update the "latest" tables from Dataflow #81

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

httparchive.latest.summary_requests_desktop/mobile not updated #76

httparchive.latest.summary_requests_desktop/mobile not updated #76

foolip commented Jul 22, 2019

rviscomi commented Jul 22, 2019

rviscomi commented Jul 22, 2019 •

edited

Loading

foolip commented Jul 29, 2019

rviscomi commented Jun 3, 2020

httparchive.latest.summary_requests_desktop/mobile not updated #76

httparchive.latest.summary_requests_desktop/mobile not updated #76

Comments

foolip commented Jul 22, 2019

rviscomi commented Jul 22, 2019

rviscomi commented Jul 22, 2019 • edited Loading

foolip commented Jul 29, 2019

rviscomi commented Jun 3, 2020

rviscomi commented Jul 22, 2019 •

edited

Loading