Crawl data from https://www.formula1.com
The project has two folders
crawling-f1: Directory containing crawl data tool. In this part I used the tabletojson library. If the project is real I will create a fork but here I have intervened to modify the code of the library to suit the needs. So I included this module in the commit.
nest-f1: Api server created with nestJS
Node and mongodb required.
I use docker to run mongodb. Check out docker-compose.yml at
\vrillar-f1\crawling-f1\docker-compose.yml
Note that the mongodb address mongodb://127.0.0.1:27017 has been hardcoded.
Install module and compile typescript
cd crawling-f1
npm install
npx tsc
The crawl process consists of creating a Job saved to the database to check the status and step two executing and updating the completed status. There are two tables of data to be crawled in sequential order.
In folder crawling-f1
Create Job
node .\dist\src\createDataRacesJob.js
Run Job
node .\dist\src\crawRacesInfo.js
If you see the message all Done, the job is done
Do the next job
In folder crawling-f1
Create Job
node .\dist\src\createDataRacesResultJob.js
Run Job
node .\dist\src\crawRacesResult.js
If you see the message all Done, the job is done
Checking the data in the database will have document raceresults. That is the data we need to analyze and retrieve using the API. I have exported it to a json file: vrillar-f1\nest-f1\nest-f1.rayceresults.json
cd nest-f1
npm install
npm start
Open http://localhost:3000/api#/ Swagger interface can easily interact with API.
Overview of race results by year
Overview of racing results by year
Overview of racing teams by year
Detailed results information by year by race name.
Race name can be obtained from api /Raceresults/{year}/races
Detailed results information by year by derive name.
Derive name can be obtained from api /Raceresults/{year}/drives
Detailed results information by year by teamName.
Team name can be obtained from api /Raceresults/{year}/teams
Returns a team's rank by year.