Your browser doesn't support javascript.
loading
Classification of autonomous vehicle crash severity: Solving the problems of imbalanced datasets and small sample size.
Kuo, Pei-Fen; Hsu, Wei-Ting; Lord, Dominique; Putra, I Gede Brawiswa.
Afiliação
  • Kuo PF; Department of Geomatics, National Cheng Kung University, Taiwan. Electronic address: z10608024@email.ncku.edu.tw.
  • Hsu WT; Department of Geomatics, National Cheng Kung University, Taiwan.
  • Lord D; Zachry Department of Civil and Environmental Engineering, Texas A&M University, USA.
  • Putra IGB; Department of Geomatics, National Cheng Kung University, Taiwan.
Accid Anal Prev ; 205: 107666, 2024 Sep.
Article em En | MEDLINE | ID: mdl-38901160
ABSTRACT
Only a few researchers have shown how environmental factors and road features relate to Autonomous Vehicle (AV) crash severity levels, and none have focused on the data limitation problems, such as small sample sizes, imbalanced datasets, and high dimensional features. To address these problems, we analyzed an AV crash dataset (2019 to 2021) from the California Department of Motor Vehicles (CA DMV), which included 266 collision reports (51 of those causing injuries). We included external environmental variables by collecting various points of interest (POIs) and roadway features from Open Street Map (OSM) and Data San Francisco (SF). Random Over-Sampling Examples (ROSE) and the Synthetic Minority Over-Sampling Technique (SMOTE) methods were used to balance the dataset and increase the sample size. These two balancing methods were used to expand the dataset and solve the small sample size problem simultaneously. Mutual information, random forest, and XGboost were utilized to address the high dimensional feature and the selection problem caused by including a variety of types of POIs as predictive variables. Because existing studies do not use consistent procedures, we compared the effectiveness of using the feature-selection preprocessing method as the first process to employing the data-balance technique as the first process. Our results showed that AV crash severity levels are related to vehicle manufacturers, vehicle damage level, collision type, vehicle movement, the parties involved in the crash, speed limit, and some types of POIs (areas near transportation, entertainment venues, public places, schools, and medical facilities). Both resampling methods and three data preprocessing methods improved model performance, and the model that used SMOTE and data-balancing first was the best. The results suggest that over-sampling and the feature selection method can improve model prediction performance and define new factors related to AV crash severity levels.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Acidentes de Trânsito Limite: Humans País/Região como assunto: America do norte Idioma: En Revista: Accid Anal Prev / Accid. anal. prev / Accident analysis and prevention Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Acidentes de Trânsito Limite: Humans País/Região como assunto: America do norte Idioma: En Revista: Accid Anal Prev / Accid. anal. prev / Accident analysis and prevention Ano de publicação: 2024 Tipo de documento: Article
...