#Transport Accident Analysis #####Exploring Insights from National Collision Database (NCDB) Open Data from Transport Canada
The Government of Canada's Open Data Portal has provided an open dataset containing 13-year records of all police-reported motor vehicle collisions on public roads in Canada from 1999 to 2011. The data are provided by the provinces and territories to federal government and combined as national-level collision database.
This project conducts statistical data analysis of the collision datasets using R application and aims to find conclusion to the following questions:
- Which types of vehicular collisions have high trend of fatalities and injuries from 1999 to 2011?
- Which passenger seat position in the Light Duty Vehicles is the safest/most dangerous?
- Do drivers with different genders have different accident pattern?
- Is there a certain age range of drivers that are more likely to be involved in a single vehicular accident?
- In terms of single vehicular accidents, are young male drivers more dangerous compared to female?
- Which types of accidents are more frequent in various road surface conditions (e.g. dry, wet, snowy and icy)?
- Which types of accidents are more frequent in various road alignments (e.g. Straight, curved, hill, and gradient)?
- What roadway configurations (e.g. intersection, ramp etc.) and weather (e.g. raining, snowing etc.) have high frequency of collision?
- Which month of the year usually have high collision rate?
- Which particular day and time of the week have high collision rate?
The source dataset NCDB_1999_to_2011.csv
is downloaded as of May 2015 from National Collision Database 1999 to 2011 (NCDB). It has 4.9 million (4,900,590) observations with 22 attributes [309 MB size].
Each record has collission level, vehicle level and person level data elements as tabulated below. Correspondingly, it also indicates which attribute is used in this analysis.
#####Collision level data elements
#####Vehicle level data elements
No. | ATTRIBUTE | SIZE | DESCRIPTION | Used in Analysis |
---|---|---|---|---|
13 | V_ID | 2 | Vehicle sequence number | |
14 | V_TYPE | 2 | Vehicle type | |
15 | V_YEAR | 4 | Vehicle model year |
#####Person level data elements
###DATA DICTIONARY: For more detailed information of all possible values and meaning of each attribute, refer to this DATA DICTIONARY link.
#####Approach Details: - Dataset is loaded into dataframe named "NCDB" - Create a subset by selecting only the attributes that will be used in particular hypothesis - For each attribute, convert to NA values that are unknown or not applicable or in some cases some values that will be excluded in the analysis - For applicable attributes, define the factor levels and descriptive labels of the factors as defined in [data dictionary](Data_Dictionary.md). - Create another subset to extract only all records that are complete cases (observations with no NA values) to filter out those observations that are not meant to be included - Perform the tabulation and print snapshot of tables - Generate the graph based on the tabulated tables
######Reproducibility of Results: Each hypothesis includes the source code and can be reproduced independently for future verification and improvement.
### Results
No. | Research Question | Result | Source Code |
---|
- | Which types of vehicular collisions have high trend of fatalities and injuries from 1999 to 2011? | Results | Source Code
- | Which passenger seat position in the Light Duty Vehicles is the safest/most dangerous?| Results | Source Code
- | Do drivers with different genders have different accident pattern?| Results | Source Code
- | Is there a certain age range of drivers that are more likely to be involved in a single vehicular accident? | Results | Source Code
- | In terms of single vehicular accidents, are young male drivers more dangerous compared to female? | Results | Source Code
- | Which types of accidents are more frequent in various road surface conditions (e.g. dry, wet, snowy and icy) ? | Results | Source Code
- | Which types of accidents are more frequent in various road alignments (e.g. Straight, curved, hill, and gradient)? | Results | Source Code
- | What roadway configurations (e.g. intersection, ramp etc.) and weather (e.g. raining, snowing etc.) have high frequency of collision?| Results | Source Code
- | Which month of the year usually have high collision rate? | Results | Source Code
- | Which particular day and time of the week have high collision rate? | Results | Source Code
###Conclusion From a decade compilation of police-reported motor vehicle collisions, we can uncover some rich and useful insights of accident trends and patterns. One positive insight that we can see is that there is a good decline trend of fatal and injury related collisions over the years.
Still, a vast majority of the accidents still happen on good weather condition, on perfect road surface and alignment configuration which implies that there might still be a lot of human controllable factors that can be improved.
This study might provide understanding to transport authorities, road and car engineers, insurance and government policy makers to further decrease the vehicular accident rate. Since this is an exploration of a public open data which is limited in sensitive attributes, it can help realize the potential of uncovering useful information and to realize the importance of collecting attributes that might be very valuable over time. For example, for this dataset, geo-location might suggest investigation of road configuration or alignment in prone-accident zone under different weather condition. Another example, by providing car/model and fault type of accident might trigger an early indication of common fault parts of vehicles etc.