CrackDetect (Machine-learning approach for real-time assessment of road pavement service life based on vehicle fleet data)

Repository containing code for the project Machine-learning approach for real-time assessment of road pavement service life based on vehicle fleet data. Complete pipeline including data preprocessing, feature extraction, model training and prediction. An overview of the project can be found in the user manual.

Results

Our results are found in reports/figures/our_model_results/.

Quickstart

Clone this repository git clone https://github.com/rreezN/CrackDetect.git.
(Optional) Create Virtual environment in powershell. Note this project requires python >= 3.10.
Install requirements pip install -r requirements.txt.
Download the data from sciencedata.dk, unzip it and place it in the data folder (see data section).
Call wandb disabled if you have not set-up a suitable wandb project. (This project and entity information has been hard-coded into src\train_hydra_mr.py as the wandb.init() command.)
Run python src/main.py all

This will run all steps of the pipeline, from data preprocessing to model prediction. At the end a plot will appear that shows our (FleetYeeters) results and the results from the newly trained model. It will extract features using a Hydra model from all signals except location signals. The main.py script is setup to recreate our results, and thus all arguments are pre specified.

It is possible to call main.py with individual steps, or beginning at a certain step. To call individual steps, you would replace all with the desired step. Possible steps are:

python src/main.py [all, make_data, extract_features, train_model, predict_model, validate_model]

Additionally, if you wish to start from a specific step, skipping the steps before, you can add the --begin-from argument, i.e. if you wish to start from predict_model you would call:

python src/main.py --begin-from predict_model

If you wish to go through each step manually with your own arguments, call each script directly with its own arguments:

Create dataset with python src/data/make_dataset.py all
Extract features with python src/data/feature_extraction
Train model with python src/train_hydra_mr.py
Predict using trained model with python src/predict_model.py
See results in reports/figures/model_results

Installation

(Back to top)

Clone this repository

git clone https://github.com/rreezN/CrackDetect.git

Install requirements

Note: This project requires python > 3.10 to run

There are two options for installing requirements. If you wish to setup a dedicated python virtual environment for the project, follow the steps in Virtual environment in powershell. If not, then simply run the following command, and all python modules required to run the project will be installed

python -m pip install -r requirements.txt

Virtual environment in powershell

Have python >= 3.10

CD to CrackDetect cd CracDetect
python -m venv fleetenv -- Create environment
Set-ExecutionPolicy -Scope CurrentUser RemoteSigned -- Change execution policy if necessary (to be executed in powershell)
.\fleetenv\Scripts\Activate.ps1 -- Activate venv
python -m pip install -U pip setuptools wheel
python -m pip install -r requirements.txt
...
Profit

To activate venv in powershell:

.\fleetenv\Scripts\Activate.ps1

Usage

(Back to top)

There are several steps in the pipeline of this project. Detailed explanations of each step, and how to use them in code can be found in notebooks in notebooks/.

Downloading the data

(Back to top)

The data is made available at sciencedata.dk.

Once downloaded it should be unzipped and placed in the empty data/ folder. The file structure should be as follows:

data
- raw
  - AutoPi_CAN
    - platoon_CPH1_HH.hdf5
    - platoon_CPH1_VH.hdf5
    - read_hdf5_platoon.m
    - read_hdf5.m
    - readme.txt
    - visualize_hdf5.m
  - gopro
    - car1
      - GH012200
        
        GH012200_HERO8 Black-ACCL.csv
        
        GH012200_HERO8 Black-GPS5.csv
        
        GH012200_HERO8 Black-GYRO.csv
      - ...
    - car3
      - ...
  - ref_data
    - cph1_aran_hh.csv
    - cph1_aran_vh.csv
    - cph1_fric_hh.csv
    - cph1_fric_vh.csv
    - cph1_iri_mpd_rut_hh.csv
    - cph1_iri_mpd_rut_vh.csv
    - cph1_zp_hh.csv
    - cph1_zp_vh.csv

Preprocessing the data

(Notebook) (Back to top)

The data goes through several preprocessing steps before it is ready for use in the feature extractor.

Convert
Validate
Segment
Matching
Resampling
KPIs

To run all preprocessing steps

python src/data/make_dataset.py all

A single step can be run by changing all to the desired step (e.g. matching). You can also run from a step to the end by calling, e.g. from (including) validate:

python src/data/make_dataset.py --begin_from validate

The main data preprocessing script is found in src/data/make_dataset.py. It has the following arguments and default parameters

mode all
--begin-from (False)
--skip-gopro (False)
--speed-threshold 5
--time-threshold 10
--verbose (False)

Feature extraction

(Notebook) (Back to top)

There are two feature extractors implemented in this repository: HYDRA and MultiRocket. They are found in src/models/hydra and src/models/multirocket.

The main feature extraction is found in src/data/feature_extraction.py. It has the following arguments and default parameters

--cols acc.xyz_0 acc.xyz_1 acc.xyz_2
--all_cols (default False)
--all_cols_wo_location (default False)
--feature_extractor both (choices: multirocket, hydra, both)
--mr_num_features 50000
--hydra_k 8
--hydra_g 64
--subset None
--name_identifier (empty string)
--folds 5
--seed 42

To extract features using HYDRA and MultiRocket, call

python src/data/feature_extraction.py

The script will automatically set up the feature extractors based on the amount of cols (1 = univariate, >1 = multivariate). The features will be stored in data/processed/features.hdf5, along with statistics used to standardize during training and prediction. Features and statistics will be saved under feature extractors based on their names as defined in the model scripts.

The structure of the HDF5 features file can be seen below

You can print the structure of your own features.hdf5 file with src/data/check_hdf5.py by calling

python src/data/check_hdf5.py

check_hdf5 has the following arguments and defaults

--file_path data/processed/features.hdf5
--limit 3
--summary (False)

Model training

(Notebook) (Back to top)

A simple model has been implemented in src/models/hydramr.py. The model training script is implemented in src/train_hydra_mr.py. It has the following arguments and default parameters

--epochs 50
--batch_size 32
--lr 1e-3
--feature_extractors HydraMV_8_64
--name_identifier (empty string)
--folds 5
--model_name HydraMRRegressor
--weight_decay 0.0
--hidden_dim 64
--project_name hydra_mr_test (for wandb)
--dropout 0.5
--model_depth 0
--batch_norm (False)

To train the model using Hydra on a multivariate dataset call

python src/train_hydra_mr.py

The trained model will be saved in models/, along with the best model during training (based on validation loss) for each fold. The training curves are saved in reports/figures/model_results.

Trained models will be saved based on the name in their model file scripts.

Predicting

(Notebook) (Back to top)

To predict using the trained model use the script src/predict_model.py. It has the following arguments and default parameters

--model models/HydraMRRegressor/HydraMRRegressor.pt
--data data/processed/features.hdf5
--name_identifier (empty string)
--data_type test
--batch_size 32
--plot_during (False)
--fold -1
--save_predictions (False)

To run the script on the saved HydraMRRegressor model, call

python src/predict_model.py

The model will predict on the specified data set, plot the predictions and save them in reports/figures/model_results.

Points of Interest (POI)

Finally the predictions is translated to the actual locations in the real world; the process can be seen in the top figure of the README. To investigate it one can run the Points of Interest Notebook notebooks/POI.ipynb.

Credits

Supervisors

Asmus Skar
Tommy Sonne Alstrøm

LiRA-CD: An open-source dataset for road condition modelling and research

Asmus Skar, Anders M. Vestergaard, Thea Brüsch, Shahrzad Pour, Ekkart Kindler, Tommy Sonne Alstrøm, Uwe Schlotz, Jakob Elsborg Larsen, Matteo Pettinari

Asmus Skar, Anders M. Vestergaard, Thea Brüsch, Shahrzad Pour, Ekkart Kindler, Tommy Sonne Alstrøm, Uwe Schlotz, Jakob Elsborg Larsen, Matteo Pettinari. 2023. LiRA-CD: An open-source dataset for road condition modelling and research. Technical University of Denmark: Geotechnics & GeologyDepartment of Environmental and Resource Engineering, Cognitive Systems, Department of Applied Mathematics and Computer Science, Software Systems Engineering Sweco Danmark A/S Danish Road Directorate.

HYDRA: Competing convolutional kernels for fast and accurate time series classification

Angus Dempster and Daniel F. Schmidt and Geoffrey I. Webb

Angus Dempster and Daniel F. Schmidt and Geoffrey I. Webb. HYDRA: Competing convolutional kernels for fast and accurate time series classification. 2022.

MultiRocket: Multiple pooling operators and transformations for fast and effective time series classification

Tan, Chang Wei and Dempster, Angus and Bergmeir, Cristoph and Webb, Geoffrey I

Tan, Chang Wei and Dempster, Angus and Bergmeir, Christoph and Webb, Geoffrey I. 2021. MultiRocket: Multiple pooling operators and transformations for fast and effective time series classification.

Created using mlops_template, a cookiecutter template for getting started with Machine Learning Operations (MLOps).

License

(Back to top)

See LICENSE

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 558 Commits
data		data
models		models
notebooks		notebooks
reports		reports
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
User_manual.pdf		User_manual.pdf
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sweep_config.yaml		sweep_config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrackDetect (Machine-learning approach for real-time assessment of road pavement service life based on vehicle fleet data)

Results

Quickstart

Table of Contents

Installation

Virtual environment in powershell

Usage

Downloading the data

Preprocessing the data

Feature extraction

Model training

Predicting

Points of Interest (POI)

Credits

License

About

Contributors 5

Languages

License

rreezN/CrackDetect

Folders and files

Latest commit

History

Repository files navigation

CrackDetect (Machine-learning approach for real-time assessment of road pavement service life based on vehicle fleet data)

Results

Quickstart

Table of Contents

Installation

Virtual environment in powershell

Usage

Downloading the data

Preprocessing the data

Feature extraction

Model training

Predicting

Points of Interest (POI)

Credits

License

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 5

Languages