Date of Award

Summer 8-19-2022

Level of Access Assigned by Author

Open-Access Thesis

Degree Name

Master of Science (MS)

Department

Civil Engineering

Advisor

Mohammadali Shirazi

Second Committee Member

Per Garder

Third Committee Member

Eric Landis

Abstract

Crash data are often highly dispersed; it may also include a large amount of zero observations or have a long tail. The traditional Negative Binomial (NB) model cannot model these data properly. To overcome this issue, the Negative Binomial-Lindley (NB-L) model has been proposed as an alternative to the NB to analyze data with these characteristics. Research studies have shown that the NB-L model provides a superior performance compared to the NB when data include numerous zero observations or have a long tail. In addition, crash data are often collected from sites with different spatial or temporal characteristics. Therefore, it is not unusual to assume that crash data are drawn from multiple subpopulations. Finite mixture models are powerful tools that can be used to account for underlying subpopulations and capture the population heterogeneity. This thesis first documented the derivations and characteristics of the Finite mixture NB-L model (FMNB-L) to analyze data generated from heterogeneous subpopulations with many zero observations and a long tail. The application of the model was demonstrated with a simulation study to identify subpopulations. Then the FMNB-L model was used to analyze Texas four-lane freeway crashes. These data had unique characteristics; it was highly dispersed, had many locations with very large number of crashes, as well as significant number of locations with zero crash. Multiple goodness-of-fit metrics were used to compare the FMNB-L model with the NB, NB-L, and the finite mixture NB models. The FMNB-L identified two subpopulations in datasets. The results showed a significantly better fit by the FMNB-L compared to other analyzed models.

In addition, the differences in various temporal and spatial factors result in variations of model coefficients among different groups of observations. A grouped random parameters model is a strategy to account for such unobserved heterogeneity. In this thesis, the derivations and applications of a grouped random parameters negative binomial-Lindley model (G-RPNB-L) to account for the unobserved heterogeneity in crash data with many zero observations was proposed. First, a simulation study was designed to illustrate the proposed model. The simulation study showed the ability of the proposed model to correctly estimate the coefficients. Then, an empirical dataset in Maine was used to show the application of the proposed model. It was found that the impact of weather variables denoting “Days with precipitation greater than 1.0 inch”, and “Days with temperature less than 32°F” varied across Maine counties. The proposed model was also compared with the NB, NB-L, and grouped random-parameters NB (G-RPNB) models using different goodness-of-fit metrics. The proposed G-RPNB-L model showed a superior fit compared to the other models.

Share