You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sync_csv.sh points to http://httparchive.org for the downloads folder containing CSV data. Since the launch of the new website, the new host for this data is https://legacy.httparchive.org. The URLs need to be updated in this script to point to the appropriate server.
This caused a pipeline failure during the 2018_04_01 crawl and had to be manually fixed and restarted to complete.
The text was updated successfully, but these errors were encountered:
root@worker:~/code# tail /var/log/HAimport.log
Processing Apr_1_2018, mobile: 1, archive: mobile_Apr_1_2018
Downloading data for mobile_Apr_1_2018
https://httparchive.org/downloads/httparchive_mobile_Apr_1_2018_pages.csv.gz:
2018-04-13 08:00:02 ERROR 404: NOT FOUND.
Pages data for Apr_1_2018 is missing, exiting
Processing Apr_1_2018, mobile: 0, archive: Apr_1_2018
Downloading data for Apr_1_2018
https://httparchive.org/downloads/httparchive_Apr_1_2018_pages.csv.gz:
2018-04-13 15:00:02 ERROR 404: NOT FOUND.
Pages data for Apr_1_2018 is missing, exiting
sync_csv.sh points to http://httparchive.org for the downloads folder containing CSV data. Since the launch of the new website, the new host for this data is https://legacy.httparchive.org. The URLs need to be updated in this script to point to the appropriate server.
This caused a pipeline failure during the 2018_04_01 crawl and had to be manually fixed and restarted to complete.
The text was updated successfully, but these errors were encountered: