Professional Documents
Culture Documents
Introduction
Data Pre-processing:
Data
Cleaning: to clean the data by filling in missing values, smoothening noisy data, identifying and removing outliers ex. rail accidents removed Data Integration: Integrated multiple files - road accident details, people involved in the accident and demographic details Data Reduction: Only fatality cases considered;
Data Analysis
Age group of 16-25yrs is most often involved in road accidents The number of crashes increased on weekends
12000
10000
8000
6000
4000
2000
Data Analysis
No. of accidents do not increase with the speed of the vehicle
9000 8000 7000 40.00% 6000 5000 4000 3000 2000 0.00% 1000 0 30.00% 20.00% 10.00%
1000
500 0 5 7 9 11 13 15 17 19 21 23 1 3
500 0
Dependency Network
Drugs and Drinking directly predicts the fatalities
Dependency Network
Drugs predicts fatalities Drinking and drugs predicts Injury severity
Cluster Analysis
Six different clusters
Population (All) Cluster 1 Cluster 2 61787 26730 14705 57.85 44.57 67.42 22.42 28.57 16.21
States Mean Deviatio n Not Applicab le missing 1980 4170 3280 1730 370 1670
Conclusion
Texas, California and Florida are the three most unsafe places Cluster1 has highest number of accidents where as cluster 4 has the lowest number of accidents Although drugs and drinking was not a major factor of accident on weekdays but the no. of accidents involving drinking and drugs increased during the weekends. Majority of the accidents occur on the highways