ABSTRACT
Vulnerable Road Users (VRUs), such as pedestrians and bicyclists, are at a higher risk of being involved in crashes with motor vehicles, and crashes involving VRUs also are more likely to result in severe injuries or fatalities. Signalized intersections are a major safety concern for VRUs due to their complex dynamics, emphasizing the need to understand how these road users interact with motor vehicles and deploy evidence-based safety countermeasures. Given the infrequency of VRU-related crashes, identifying conflicts between VRUs and motorized vehicles as surrogate safety indicators offers an alternative approach. Automatically detecting these conflicts using a video-based system is a crucial step in developing smart infrastructure to enhance VRU safety. However, further research is required to enhance its reliability and accuracy. Building upon a study conducted by the Pennsylvania Department of Transportation (PennDOT), which utilized a video-based event monitoring system to assess VRU and motor vehicle interactions at fifteen signalized intersections in Pennsylvania, this research aims to evaluate the reliability of automatically generated surrogates in predicting confirmed conflicts without human supervision, employing advanced data-driven models such as logistic regression and tree-based algorithms. The surrogate data used for this analysis includes automatically collectable variables such as vehicular and VRU speeds, movements, post-encroachment time, in addition to manually collected variables like signal states, lighting, and weather conditions. To address data scarcity challenges, synthetic data augmentation techniques are used to balance the dataset and enhance model robustness. The findings highlight the varying importance and impact of specific surrogates in predicting true conflicts, with some surrogates proving more informative than others. Additionally, the research examines the distinctions between significant variables in identifying bicycle and pedestrian conflicts. These findings can assist transportation agencies to collect the right types of data to help prioritize infrastructure investments, such as bike lanes and crosswalks, and evaluate their effectiveness.
Subject(s)
Accidents, Traffic , Bicycling , Pedestrians , Video Recording , Humans , Bicycling/injuries , Accidents, Traffic/prevention & control , Accidents, Traffic/statistics & numerical data , Reproducibility of Results , Walking/injuries , Pennsylvania , Environment Design , Safety , Motor VehiclesABSTRACT
The American Association of State Highway and Transportation Officials' Highway Safety Manual (HSM) includes a collection of safety performance functions (SPFs) or statistical models to estimate the expected crash frequency of roadway segments, intersections, and interchanges. These models are applied in several steps of the safety management process, including to screen the road network for opportunities to improve safety and to evaluate the performance of safety countermeasure deployments. The SPFs in the HSM are generally estimated using negative binomial regression modeling. In some instances, they are estimated using annual crash frequency and site-specific (e.g., traffic volume) data, while in other instances they are estimated using aggregate crash frequency and site-specific data. This paper explores the differences that result from estimating SPFs using aggregate versus disaggregate data using the same methods as those used to estimate the SPFs in the HSM. A synthetic dataset was first used to conduct these comparisons - these data were generated in a manner that is consistent with the properties of the negative binomial distribution. Then, an observational dataset from Pennsylvania was used to compare the SPFs from both aggregate and disaggregate data. The results show that SPFs estimated using the panel (disaggregate) data and aggregated data provide similar model coefficients, although some differences may sometimes arise. However, the overdispersion parameter obtained using each dataset can differ significantly. These differences result in systematic biases in calculations of expected crash frequency when Empirical Bayes adjustments are applied, which - as the paper demonstrates - could lead to different outcomes in a network screening exercise. Overall, these results reveal that aggregating crash data might result in biased SPF outputs and lead to inconsistent Empirical Bayes adjustments.