^{1}

^{2}

A new covariate dependent zero-truncated bivariate Poisson model is proposed in this paper employing generalized linear model. A marginal-conditional approach is used to show the bivariate model. The proposed model with estimation procedure and tests for goodness-of-fit and under (or over) dispersion are shown and applied to road safety data. Two correlated outcome variables considered in this study are number of cars involved in an accident and number of casualties for given number of cars.

The count data analysis occupies an important role in applied statistics in various fields. When the observed outcomes are count and the desire is to estimate the covariate effects on outcomes, covariate dependent Bivariate Poisson (BVP) model is a tool of natural choice. It is expected that the observed outcomes on the same subject are be correlated. This type of data arises in many fields, for example, traffic accidents, health sciences, economics, social sciences, environmental studies among others. A typical example of such dependence arises in the number of traffic accidents and the number of injuries or fatalities during a specified period. However, in some situations outcomes may be truncated as zero values of counts may not be observed or may be missing for one or both of the outcomes. For example, in a sample drawn from hospital admission records, frequencies of zero accidents and length of stay are not available. Another example is the case where the data on number of traffic accidents and related injuries or fatalities and related risk factors are collected from records and, naturally, zero counts are not available. As an example, road safety data from data.gov.uk website provides detailed information about the conditions of personal injury road accidents in Great Britain including the types of vehicles involved and the consequential casualties on public roads along with other background information. Only those accidents that involve personal injury reported to the police using the accident reporting form are recorded. Damage-only accidents, with no human casualties or accidents on private roads or car parks, are not included generating zero-truncated count data. To investigate the effect of risk factors on this type of outcomes, zero- truncated BVP regression is the appropriate model.

Campbell [

Studies on the covariate dependent zero-truncated BVP model are scarce. Different techniques of the parameter estimation of BVP distribution are presented in [

In this section bivariate Poisson model without zero truncation is shown. For simplicity, we shall follow the notations used in [_{1} be the number of accidents at a specific location in a given interval that has a Poisson distribution with density

and the corresponding link function is

If_{1} accidents occurring in the jt-h time interval is Poisson with parameter

and the corresponding link function is

Then following [

The probability of _{1} is observed conditional on Y_{1} > 0. Thus, we have the conditional probability mass function

Now, using Equation (1) the zero-truncated Poisson probability mass function for

Then the exponential form of the mass function is

The mean and variance can be shown as

Similarly, the zero-truncated conditional distribution of

Then the zero-truncated conditional Poisson distribution is

The exponential form of Equation (9) can be shown as

Then the mean and variance are

Now using the marginal and conditional distribution for zero truncation derived above the joint distribution of ZTBVP can be obtained as follows

The ZTBVP expression in Equation (12) can be expressed in bivariate exponential form as

where the link functions are

The log-likelihood function is

The estimating equations are

and

Then the score vector is

The second derivatives are:

The observed information matrix is

and the approximate variance-covariance matrix for

where

We can use the likelihood ratio tests for testing

For independence, we can test the equality of zero-truncated bivariate models under independence. The independence model can be shown as

The deviance measures the difference in log-likelihood based on observed and fitted values. Let

and

After some algebra we get the deviance as

We can use following test for goodness-of-fit proposed by Islam and Chowdhury (2015).

where,

The presence of overdispersion or underdispersion may influence the standard error of parameter estimates, hence, the significance level of the estimates. Test for the goodness of fit as shown in Equation (26) is modified to test the overdispersion or underdispersion. The method of moments estimator suggested by [

Using the mean, variance and correction factor as shown in [

and then using these values we can estimate

Then the test for dispersion

where, _{1} and Y_{2}, respectively. T_{2}, is also, distributed asymptotically as

The models proposed in the paper are illustrated using the road safety data published by Department for Transport, United Kingdom. This data set is publicly available for download from UK givernment website (http://data.gov.uk/dataset/road-accidents-safety-data). The data set includes information about the conditions of personal injury road accidents in Great Britain and the consequential casualties on public roads. Background information about vehicle types, location, road conditions, drivers demographics are also available among others. A total of 1,494,275 accident records were in the data set spanning from 2005 to 2013. We have selected a random sample 14005 accident records approximately 1 percent of all accident records. The outcome variables considered are total number of vehicles involved in the accident (Y_{1}) and the number of casualties (Y_{2}). Due to small frequencies, values five or more were coded as five for both outcomes. Risk factors are sex of the driver (0 = female; 1 = male), area (0 = urban; 1 = rural), two dummy variables for accident severity (fatal severity = 1, else 0; serious severity = 1, else = 0; slight severity is the reference category), light condition (daylight = 1; others = 0) and eight dummy variables for year 2006 to year 2013, where year 2005 is considered as reference category.

The average number of vehicles involved in accident and casualties are 1.83 and 1.37, with standard deviations 0.75 and 0.92, respectively.

Number of Vehicles (Y_{1}) | Number of Casualties (Y_{2}). | |||||
---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5+ | Total | |

1 | 3721 | 379 | 3 | 39 | 11 | 4225 |

2 | 6091 | 1561 | 75 | 122 | 89 | 8304 |

3 | 681 | 286 | 441 | 44 | 37 | 1182 |

4 | 93 | 64 | 134 | 22 | 13 | 225 |

5+ | 31 | 12 | 33 | 8 | 8 | 69 |

Total | 10617 | 2302 | 693 | 235 | 158 | 14005 |

We observe that both numbers of vehicles involved in accidents and number of casualties are heavily under- dispersed as displayed in

N | Number of Vehicles | Number of Casualties | |||
---|---|---|---|---|---|

Variables | Mean | SD | Mean | SD | |

Sex of Driver | |||||

Male | 9948 | 1.83 | 0.78 | 1.37 | 0.98 |

Female | 4057 | 1.85 | 0.66 | 1.38 | 0.76 |

Accident Severity | |||||

Fatal | 173 | 1.94 | 2.63 | 2.15 | 4.01 |

Serious | 1913 | 1.70 | 0.74 | 1.45 | 0.92 |

Slight | 11919 | 1.85 | 0.68 | 1.35 | 0.79 |

Area | |||||

Urban | 5213 | 1.85 | 0.90 | 1.49 | 1.17 |

Rural | 8792 | 1.82 | 0.64 | 1.30 | 0.72 |

Light Condition | |||||

Daylight | 10347 | 1.87 | 0.75 | 1.35 | 0.90 |

Others | 3658 | 1.73 | 0.73 | 1.42 | 0.96 |

Years | |||||

2005 | 1855 | 1.86 | 0.73 | 1.39 | 0.79 |

2006 | 1768 | 1.86 | 0.72 | 1.37 | 0.81 |

2007 | 1727 | 1.84 | 0.70 | 1.38 | 0.99 |

2008 | 1608 | 1.80 | 0.73 | 1.37 | 0.83 |

2009 | 1567 | 1.83 | 0.71 | 1.39 | 0.82 |

2010 | 1489 | 1.81 | 0.63 | 1.38 | 0.78 |

2011 | 1368 | 1.86 | 1.10 | 1.40 | 1.57 |

2012 | 1357 | 1.82 | 0.68 | 1.32 | 0.73 |

2013 | 1266 | 1.83 | 0.67 | 1.31 | 0.75 |

Variables | Estimate | S.E. | p-value | p-value | |
---|---|---|---|---|---|

Y1:Constant | 0.280 | 0.034 | 0.000 | 0.017 | 0.000 |

Sex of Driver | −0.017 | 0.019 | 0.355 | 0.009 | 0.066 |

Area | −0.030 | 0.018 | 0.091 | 0.009 | 0.001 |

Fatal severity | −0.101 | 0.082 | 0.218 | 0.041 | 0.014 |

Serious severity | −0.166 | 0.027 | 0.000 | 0.014 | 0.000 |

Light Condition | 0.140 | 0.021 | 0.000 | 0.010 | 0.000 |

Year 2006 | −0.001 | 0.033 | 0.980 | 0.017 | 0.959 |

Year 2007 | −0.014 | 0.034 | 0.666 | 0.017 | 0.390 |

Year 2008 | −0.060 | 0.035 | 0.083 | 0.017 | 0.001 |

Year 2009 | −0.034 | 0.035 | 0.320 | 0.017 | 0.047 |

Year 2010 | −0.047 | 0.035 | 0.187 | 0.018 | 0.009 |

Year 2011 | −0.021 | 0.036 | 0.565 | 0.018 | 0.252 |

Year 2012 | −0.042 | 0.036 | 0.248 | 0.018 | 0.021 |

Year 2013 | −0.023 | 0.037 | 0.526 | 0.018 | 0.207 |

Y2:Constant | −0.637 | 0.049 | 0.000 | 0.029 | 0.000 |

Sex of Driver | −0.058 | 0.029 | 0.049 | 0.018 | 0.001 |

Area | −0.375 | 0.027 | 0.000 | 0.016 | 0.000 |

Fatal severity | 0.654 | 0.080 | 0.000 | 0.048 | 0.000 |

Serious severity | 0.266 | 0.036 | 0.000 | 0.022 | 0.000 |

Light Condition | −0.231 | 0.029 | 0.000 | 0.018 | 0.000 |

Year 2006 | −0.042 | 0.051 | 0.415 | 0.031 | 0.175 |

Year 2007 | −0.051 | 0.052 | 0.326 | 0.031 | 0.102 |

Year 2008 | −0.034 | 0.053 | 0.519 | 0.032 | 0.283 |

Year 2009 | 0.029 | 0.052 | 0.579 | 0.031 | 0.356 |

Year 2010 | 0.017 | 0.054 | 0.748 | 0.032 | 0.593 |

Year 2011 | −0.030 | 0.055 | 0.590 | 0.033 | 0.370 |

Year 2012 | −0.151 | 0.058 | 0.009 | 0.035 | 0.000 |

Year 2013 | −0.186 | 0.060 | 0.002 | 0.036 | 0.000 |

The summary results of estimation and tests of different models (proposed model based on marginal-condi- tional approach and both marginal models) are presented in _{1}, indicates good fit marginally (p-value = 0.064) for the proposed model. The test for under dispersion reveals the presence of significant deviation from equidispersion in both the variables as observed from T_{2} (p-value < 0.001). Adjustments are made for under- dispersion and the results are shown in

Model Statistics | Reduced Model | Full Model |
---|---|---|

Marginal/Conditional | ||

Log likelihood | −26708.6 | −26453.01 |

AIC | 53421.1 | 52962.02 |

BIC | 53433.7 | 52922.61 |

Deviance | 10593.89 | 10465.07 |

T_{1}(D.F, p-value) | 17.45(10, 0.065) | 17.48(10, 0.064) |

T_{2}(D.F, p-value) | 68.45(10, 0.000) | 69.35(10, 0.000) |

0.255 | 0.252 | |

0.377 | 0.361 | |

LR | 511.1(26, 0.000) | |

Marginal/Marginal | ||

Log likelihood | −27235.59 | −26999.44 |

AIC | 54475.20 | 54054.90 |

BIC | 54490.28 | 54266.21 |

Deviance | 11584.13 | 11322.42 |

T_{1}(D.F, p-value) | 18.48(10, 0.048) | 19.01(10, 0.040) |

T_{2}(D.F, p-value) | 71.21(10, 0.000) | 73.56(10, 0.000) |

0.255 | 0.252 | |

0.372 | 0.363 | |

LR | 1563.7(26, 0.000) |

A zero-truncated bivariate generalized linear model for count data is proposed in this paper. This model is based on the bivariate model using marginal-conditional models proposed by Islam and Chowdhury (2015) for count data. Covariate dependent bivariate generalized linear model is shown, and canonical link functions are used to estimate the parameters of the Poisson distribution. The usefulness of the proposed model is demonstrated using road safety data published by Department for Transport, United Kingdom. The proposed ZTBVP model can easily accommodate a varying number of covariates for two outcomes. The joint distribution degenerates into a marginal and conditional distribution that makes estimation problem easier.

We acknowledge gratefully that the study is supported by the HEQEP sub-project 3293, University Grants Commission of Bangladesh and the World Bank. This data set was obtained from Police reported road accident statistics (STATS19) Department for Transport (http://data.gov.uk/dataset/road-accidents-safety-data).

Rafiqul I. Chowdhury,M. Ataharul Islam, (2016) Zero Truncated Bivariate Poisson Model: Marginal-Conditional Modeling Approach with an Application to Traffic Accident Data. Applied Mathematics,07,1589-1598. doi: 10.4236/am.2016.714137