Skip to content

Exploring Insights from National Collision Database (NCDB) Open Data from Transport Canada

Notifications You must be signed in to change notification settings

rodmel/Transport-Accident-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 

Repository files navigation

#Transport Accident Analysis #####Exploring Insights from National Collision Database (NCDB) Open Data from Transport Canada

Introduction

The Government of Canada's Open Data Portal has provided an open dataset containing 13-year records of all police-reported motor vehicle collisions on public roads in Canada from 1999 to 2011. The data are provided by the provinces and territories to federal government and combined as national-level collision database.

This project conducts statistical data analysis of the collision datasets using R application and aims to find conclusion to the following questions:

  1. Which types of vehicular collisions have high trend of fatalities and injuries from 1999 to 2011?
  2. Which passenger seat position in the Light Duty Vehicles is the safest/most dangerous?
  3. Do drivers with different genders have different accident pattern?
  4. Is there a certain age range of drivers that are more likely to be involved in a single vehicular accident?
  5. In terms of single vehicular accidents, are young male drivers more dangerous compared to female?
  6. Which types of accidents are more frequent in various road surface conditions (e.g. dry, wet, snowy and icy)?
  7. Which types of accidents are more frequent in various road alignments (e.g. Straight, curved, hill, and gradient)?
  8. What roadway configurations (e.g. intersection, ramp etc.) and weather (e.g. raining, snowing etc.) have high frequency of collision?
  9. Which month of the year usually have high collision rate?
  10. Which particular day and time of the week have high collision rate?

Dataset

The source dataset NCDB_1999_to_2011.csv is downloaded as of May 2015 from National Collision Database 1999 to 2011 (NCDB). It has 4.9 million (4,900,590) observations with 22 attributes [309 MB size].

Each record has collission level, vehicle level and person level data elements as tabulated below. Correspondingly, it also indicates which attribute is used in this analysis.

#####Collision level data elements

No. ATTRIBUTE SIZE DESCRIPTION Used in Analysis
1 C_YEAR 4 Year in which the collision occurred
2 C_MNTH 2 Month in which the collision occurred
3 C_WDAY 1 Day of the week the collision occurred
4 C_HOUR 2 Collision hour
5 C_SEV 1 Collision severity
6 C_VEHS 2 Number of vehicles involved in collision
7 C_CONF 2 Collision configuration
8 C_RCFG 2 Roadway configuration
9 C_WTHR 1 Weather condition
10 C_RSUR 1 Road surface
11 C_RALN 1 Road alignment
12 C_TRAF 2 Traffic control

#####Vehicle level data elements

No. ATTRIBUTE SIZE DESCRIPTION Used in Analysis
13 V_ID 2 Vehicle sequence number
14 V_TYPE 2 Vehicle type
15 V_YEAR 4 Vehicle model year

#####Person level data elements

No. ATTRIBUTE SIZE DESCRIPTION Used in Analysis
16 P_ID 2 Person sequence number
17 P_SEX 1 Person sex
18 P_AGE 2 Person age
19 P_PSN 2 Person position
20 P_ISEV 1 Injury Severity
21 P_SAFE 2 Safety device used
22 P_USER 1 Road user class

###DATA DICTIONARY: For more detailed information of all possible values and meaning of each attribute, refer to this DATA DICTIONARY link.


Approach


#####Approach Details: - Dataset is loaded into dataframe named "NCDB" - Create a subset by selecting only the attributes that will be used in particular hypothesis - For each attribute, convert to NA values that are unknown or not applicable or in some cases some values that will be excluded in the analysis - For applicable attributes, define the factor levels and descriptive labels of the factors as defined in [data dictionary](Data_Dictionary.md). - Create another subset to extract only all records that are complete cases (observations with no NA values) to filter out those observations that are not meant to be included - Perform the tabulation and print snapshot of tables - Generate the graph based on the tabulated tables

######Reproducibility of Results: Each hypothesis includes the source code and can be reproduced independently for future verification and improvement.


### Results
No. Research Question Result Source Code
  1. | Which types of vehicular collisions have high trend of fatalities and injuries from 1999 to 2011? | Results | Source Code
  2. | Which passenger seat position in the Light Duty Vehicles is the safest/most dangerous?| Results | Source Code
  3. | Do drivers with different genders have different accident pattern?| Results | Source Code
  4. | Is there a certain age range of drivers that are more likely to be involved in a single vehicular accident? | Results | Source Code
  5. | In terms of single vehicular accidents, are young male drivers more dangerous compared to female? | Results | Source Code
  6. | Which types of accidents are more frequent in various road surface conditions (e.g. dry, wet, snowy and icy) ? | Results | Source Code
  7. | Which types of accidents are more frequent in various road alignments (e.g. Straight, curved, hill, and gradient)? | Results | Source Code
  8. | What roadway configurations (e.g. intersection, ramp etc.) and weather (e.g. raining, snowing etc.) have high frequency of collision?| Results | Source Code
  9. | Which month of the year usually have high collision rate? | Results | Source Code
  10. | Which particular day and time of the week have high collision rate? | Results | Source Code

###Conclusion From a decade compilation of police-reported motor vehicle collisions, we can uncover some rich and useful insights of accident trends and patterns. One positive insight that we can see is that there is a good decline trend of fatal and injury related collisions over the years.

Still, a vast majority of the accidents still happen on good weather condition, on perfect road surface and alignment configuration which implies that there might still be a lot of human controllable factors that can be improved.

This study might provide understanding to transport authorities, road and car engineers, insurance and government policy makers to further decrease the vehicular accident rate. Since this is an exploration of a public open data which is limited in sensitive attributes, it can help realize the potential of uncovering useful information and to realize the importance of collecting attributes that might be very valuable over time. For example, for this dataset, geo-location might suggest investigation of road configuration or alignment in prone-accident zone under different weather condition. Another example, by providing car/model and fault type of accident might trigger an early indication of common fault parts of vehicles etc.

About

Exploring Insights from National Collision Database (NCDB) Open Data from Transport Canada

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published