Skip to main content

On the distribution theory of over-dispersion


An overview of the evolution of probability models for over-dispersion is given looking at their origins, motivation, first main contributions, important milestones and applications. A specific class of models called the Waring and generalized Waring models will be a focal point. Their advantages relative to other classes of models and how they can be adapted to handle multivariate data and temporally evolving data will be highlighted.

1. Introduction

Data analysts have often to deal with data that exhibit a variability that differs from what they expect on the basis of the hypothesized model. The phenomenon is known as overdispersion if the observed variability exceeds the expected variability or underdispersion if it is lower than expected.

Such differences between observed and nominal variances can be interpreted as brought about by failures of some of the basic assumptions of the model. These can be classified by the mechanism leading to them. As summarized by Xekalaki ([2006]), in traditional experimental contexts, they may be caused by deviations from the hypothesized structure of the population, due to lack of independence between individual item responses, contagion, clustering, and heterogeneity. In observational study contexts, on the other hand, they are the result of the method of ascertainment, which can lead to partial distortion of the observations. In both contexts, the observed value x no longer represents an observation on the original variable X, but constitutes an observation on a random variable Y whose distribution (the observed distribution) is a distorted version of the distribution of X (original distribution).

Such practical situations have been noticed since over a century ago (e.g. Lexis [1879]; Student [1919]). The Lexis ratio appears to be the first statistic suggested for testing for the presence of over- or under-dispersion relative to a binomial hypothesized model in populations structured in clusters. Also, for count data, Fisher ([1950]) considered using the sample index of dispersion for testing the appropriateness of a Poisson distribution for an observed variable Y.

The paper is structured as follows. Section 2 introduces the reader to the various approaches to modelling overdispersion in the case of traditional experimental contexts. Section 3 highlights approaches in the case of observational study contexts. Section 4 focuses on the case of heterogeneous populations followed by Sections 5 and 6, which look into a particular type of distribution, the generalized Waring distribution, and its relevance in the context of applications under the various scenaria leading to over-dispersion mentioned above. Through the prism of these scenaria, a bivariate version of it is also presented, and its use in applied contexts is discussed in Section 7. A multivariate version of it is also given, and its application potential is outlined in Section 8. Finally, Sections 9 and 10 present a model for temporally evolving data, the multivariate generalized Waring process, and an application illustrating its practical potential.

As the field of accident studies has received much attention, and various theories have been developed for the interpretation of factors underlying an accident situation, most of the models will be presented in accident or actuarial data analysis contexts. Of course the results can be adapted in a great variety of situations with appropriate parameter interpretations so that they can be applied in several other fields ranging from economics, inventory control and insurance through to demometry, biometry, psychometry and web access modeling, as the case is with the application discussed in Section 10.

2. Modelling over - or under - dispersion in traditional experimental contexts

One important, but often ignored by data analysts, implication of using single parameter distributions such as the Poisson distribution to analyse data is that the variance can be determined by the mean, a relation that collapses by the presence of overdispersion. If this is ignored in practice, any form of statistical inference may induce low efficiency, although, for modest amounts of overdispersion this may not be the case (Cox [1983]). So, insight into the mechanisms that induce over (or under) dispersion is required when dealing with such data. Such insight can be gained by looking at the above-mentioned potential triggering sources as classified by Xekalaki ([2006]).

2.1 Lack of independence between individual responses

In accident study related contexts, where one is interested in the total number of reported accidents Y= i = 1 n Y i in a total number of accidents, n, that actually occurred, when accidents are reported with equal probabilities p = P(Y i  = 1) = 1 − P(Y i  = 0), but not independently (Cor(Y i , Y j ) = ρ ≠ 0), the mean of Y will still be E(Y) = np, but its variance will be V Y =V i = 1 n Y i =np 1 p +2 n 2 ρp 1 p =np 1 p 1 + ρ n 1 , which exceeds that anticipated under a hypothesized independent trial binomial model if ρ > 0 (over-dispersion) and is exceeded by it if ρ < 0 (under-dispersion).

2.2 Contagion

Another common reason for a variance differing from what is anticipated, is that when the assumption that the probability of the occurrence of an event in a very short interval is constant fails. This framework is the classical contagion model (Greenwood and Yule [1920]; Xekalaki [1983a]).

In data modelling problems faced by actuaries, for example, this model postulates that initially all individuals have the same probability of incurring an accident, but later this probability changes by each accident sustained. It is assumed, specifically, that none of the individuals has had an accident (e.g. new drivers or persons who are just beginning a new type of work), but later the probability with which a person with Y = y accidents by time t will have another accident in the time period from t to t + dt is of the form (k + my)dt. This leads to the negative binomial as the distribution of Y with p.f. P Y = y = k / m y e kt 1 e mt y with μ = E(Y) = k(emt − 1)/m, and V(Y) = kemt(emt − 1)/m = μemt.

2.3 Clustering

A frequently overlooked clustered structure of the population may also induce over - or under - dispersion.

In an accident context again, an accident is regarded as a cluster of injuries:

The number Y of injuries incurred by persons involved in N accidents can naturally be thought of as expressed by the sum Y = Y1 + Y2 + … + Y N of the numbers Y i of injuries resulting from the i ‐ th accident, assumed to be i.i.d. independently of the total number of accidents N, with mean μ and variance σ2. In this case, E Y =E i = 1 N Y i =μE N and V Y =V i = 1 N Y i = σ 2 E N + μ 2 V N .

So, when N is a Poisson variable with mean E(N) = θ = V(N), the last relationship leads to overdispersion or underdispersion according as σ2 + μ2 is greater or less than 1.

The first such model was introduced by Cresswell and Froggatt ([1963]) in a different accident context whereby each person is liable to spells of weak performance during which all of the person’s accidents occur. So, if the number N of spells in a unit time period is Poisson distributed with mean θ, and within spells a person can have 0 accidents with probability 1 − m log p, m > 1/log p, 0 < p < 1 and n accidents (n ≥ 1) with probability m(1 − p)n/n, m, n > 0 the observed distribution of accidents is the negative binomial distribution with probability function P Y = y = θm + y 1 y p θm 1 p y . This model, known in the literature as the spells model, can also lead to other forms of overdispersed distributions (e.g. Xekalaki [1983a], [1984a]).

2.4 Heterogeneity

Assuming a homogeneous population when in fact the population is heterogeneous, i.e., when its individuals have constant, but unequal probabilities of sustaining an event can also lead to overdispersion. In this case, each member of the population has its own value of the parameter θ and probability density function f( ; θ).

So, with θ regarded as the inhomogeneity parameter and varying from individual to individual according to any continuous, discrete, or finite step distribution G() of mean μ and variance σ2, one is led to an observed distribution for Y with probability density function f Y (y) = E G (f(y; θ)) = ∫  Θ f(y; θ)dG(θ), where Θ is the parameter space. Models of this type are known as mixtures. (For details on their application in the statistical literature see e.g. Karlis and Xekalaki [2003]; McLachlan and Peel [2001]; Titterington [1990]). Under such models, the variance of Y consists of two additive components, one representing the variance part due to the variability of θ and one due to the inherent variability of Y if θ did not vary, i.e., V(Y) = V(E(Y|θ)) + E(V(Y|θ)). This offers an explanation as to why mixture models are often referred to as overdispersion models.

It should be noted that a similar idea forms the basis for analysis-of-variance (ANOVA) models, where the total variability can be split into additive components, the ‘between groups’ and the ‘within groups’ components. In the case of the Poisson (θ) distribution, we have in particular that V(Y) = E(θ) + V(θ). Based on the fact that in this case, the factorial moments of Y coincide with the moments of θ about the origin, Carriere ([1993]) proposed a test of the hypothesis that a Poisson mixture fits a data set.

Mixed Poisson distributions were first introduced by Greenwood and Woods ([1919]) in the context of accident studies. Assuming that an individual’s accident experience Y|θ is Poisson distributed with parameter θ that was varying from individual to individual according to a gamma distribution with mean μ and index parameter μ/γ, they obtained a negative binomial distribution for Y with probability function P Y = y = μ / γ + y 1 y γ / 1 + γ y 1 + γ μ / γ and with mean and variance given respectively by E(Y) = μ and V(Y) = μ(1 + γ), where γ represents the over-dispersion parameter.

The mixed Poisson process has been popularised in the actuarial literature by Dubourdieu ([1938]) gamma mixed case was treated by Thyrion ([1969]).

Numerous other mixtures have since then been proposed in the literature for interpreting overdispersion in data, such as binomial mixtures (e.g. Tripathi et al. [1994]), negative binomial mixtures (e.g., Xekalaki [1983a], [c], [1984a]; Irwin [1975]), normal mixtures (e.g. Andrews and Mallows [1974]) and exponential mixtures (e.g. Jewell [1982]). Discrete Poisson mixtures with finite step distributions for the Poisson parameter θ have also been proposed, the interest being on creating clusters of data by grouping the observations on Y according to some criterion (cluster analysis). The number of clusters can be decided on the basis of a testing procedure for the number of components in the finite mixture (Karlis and Xekalaki [1999]).

2.4.1 Heterogeneity in mixture models treating the parameter θ as the dependent variable in a regression model

Heterogeneity in models with explanatory variables can be modelled, by assuming that Y has a parameter θ varying from individual to individual according to some regression model θ = η(x; β) + ε, where x is a vector of explanatory variables, β is a vector of regression coefficients, η is a function of a known form and ε has some known distribution. Such models are known in the literature as random effect models and have been extensively studied within the broad family of Generalized Linear Models. As a simple example in the case of a single covariate, say X, consider data Y i  , i = 1, 2, … , n coming from a Poisson population with mean θ determined by log θ = α + βx + ε for some constants α, β and with ε having a distribution with mean 0 and variance say ϕ. In this case, the marginal distribution of Y is no longer the Poisson distribution. It is a mixed Poisson distribution, with some mixing distribution g() clearly depending on the distribution of ε. In particular, YPoisson t e α + βx t g t where t = eε.

Negative Binomial and Poisson Inverse Gaussian regression models have also been proposed as overdispersed alternatives to the Poisson regression model (e.g. Lawless [1987]; Dean et al. [1989]; Xue and Deddens [1992]). The case of a two finite step distribution, the finite Poison mixture regression model of Wang et al.’s ([1996]) results. The similarity of the mixture representation and the random effects one is discussed in Hinde and Demetrio ([1998]).

In meta-analysis contexts, overdispersion (or underdispersion) refers to variance inflation (or deflation) relative to that anticipated by the fixed effects model. Two possible causes of such phenomena are a population structure in clusters or mixing resulting in a compound distribution. Kulinskaya and Olkin ([2014]) proposed approaching the problem of specification of a random effects model in meta-analysis in terms of a multiplicative model for the distribution of the effect size parameters that allows inflation or deflation. The model considered was motivated by overdispersion induced by intra-class correlation in the model assumed for the distribution of the i-th effect size estimate. In particular, the variance of the estimator θ ^ i of the effect size parameter θ i in the i-th study is assumed to be of the form σ θ ^ i 2 = 1 + α n i γ σ i 2 , where α(n i ) are some known functions of the sample sizes n i , σ i 2 is the within the i-th study variance, i = 1, 2, …, k and γ is interpreted as an intra class correlation parameter.

2.4.2 Estimation and testing for overdispersion under mixture models

The structure of mixture models, including random effect models, entails different forms of variance-to-mean relationships. So, viewing the mean and variance of Y as represented by E(Y) = μ(β), and V(Y) = σ2(μ(β), λ) respectively for some parameters β, λ a number of estimation approaches have been proposed in the literature based on moment methods (e.g. Breslow [1990]; Lawless [1987]; Moore [1986]) and quasi or pseudo likelihood methods (e.g. Davidian and Carroll [1988]; McCullagh and Nelder [1989]; Nelder and Pregibon [1987]). The above representation for the mean and variance of Y allows also estimation in the case of multiplicative overdispersion as in McCullagh and Nelder ([1989]).

Testing for the presence of overdispersion or underdispersion, on the other hand, can be done by means of asymptotic arguments. Let f(y; θ) denote the density function of a random variable Y in the initial model. Cox ([1983]) showed that, under regularity conditions, the density of y in the overdispersed model, f Y (y), admits a representation of the form f Y y = E Θ f y ; θ =f y ; μ θ + 1 2 σ θ 2 2 f y ; μ θ μ θ 2 +Ο 1 / n , with μ θ =Ε θ , σ θ 2 =V θ and Θ is the parameter space. This in turn implies that f Y (y) can be put in the form f(y; μ θ )(1 + εh(y, ϕ θ )), where h y , ϕ θ = log f y ; μ θ μ θ 2 + 2 log f y ; μ θ μ θ 2 .

This representation entails overdispersion if ε > 0, underdispersion if ε < 0 and, of course, none of these complications if ε = 0. Cox ([1983]) suggested a testing procedure for the hypothesis ε = 0, which can be regarded as a general version of standard dispersion tests.

2.5 Zero adjusted models

It would be interesting to note that another aspect of the population structure that is often responsible for the phenomenon of over-dispersion or under-dispersion is the presence of an excess or a scant number of zeros. Though the models discussed in Sections 2.3 and 2.4 may capture over-dispersion or under-dispersion rather well, they cannot capture excess or scarcity of zeros. In the literature, this question has been addressed by two types of models known as zero-inflated (or zero-deflated) models, and hurdle models. A unified representation of the models is provided by f(y; ω) = ωI{0}(y) + (1 − ω)f Y (y), where Y is the count variable, I{0}() is the indicator function and ω is a constant, whose values, if in (0,1) render a hurdle model for f Y (0) = 0, a zero-inflated model for f Y (0) ≠ 0, while negative values of it render a zero-deflated model.

Obviously, ω can be interpreted as the proportion of excess zeros in the case of the first two models and the above representation explains why there can be regarded as having a dual nature. They are (finite) mixtures, which account for heterogeneity, while at the same time, they are capturing a population structure in two clusters. However, in the case ω < 0 (zero-deflation), the model ceases to admit a mixture interpretation.

Zero-inflated and hurdle models have mostly been used for Poisson, generalized Poisson or negative binomial count distributions in various contexts (e.g. Ridout et al. [2001]; Gupta et al. [2004]; Famoye and Singh [2006]). Gupta et al. ([1996]) proposed a zero-adjusted generalized Poisson distribution and studied the effect of not using an adjusted model for zero-inflation or -deflation when the occurrence of zeroes differs from the anticipated one. Reviews of such models can be found in Ridout et al. ([1998]), Gschlößl and Czado ([2008]) and Ngatchou-Wandji and Paris ([2011]).

3. Over– or under–dispersion in observational study contexts - the effect of the method of ascertainment

Often, in connection with data collection based on observation or on recording values as produced by nature, the original distribution may not be reproduced due to various reasons. These may lead to partial destruction or partial enhancement (augmentation) of observations. The models that have been introduced to deal with such situations are respectively known as damage models introduced by Rao ([1963]) and generating models introduced by Panaretos ([1983]). The distortion mechanism is usually assumed to be manifested through the conditional distribution of the resulting random variable Y given the value of the original random variable X. Hence, the resulting (observed) distribution is a distorted version of the original distribution that can be represented as a mixture of the distortion mechanism. In particular, in the case of damage, P Y = r = n = r P Y = r | X = n P X = n ,r=0,1,2,, while, in the case of enhancement, P Y = r = n = 1 r P Y = r | X = n P X = n ,r=1,2,.

Various forms of distributions have been considered for the distortion mechanism in the above two cases. In the case of damage, the most popular forms have been the binomial distribution Rao ([1963]), mixtures on p of the binomial distribution (e.g. Panaretos [1982]; Xekalaki and Panaretos [1983]) whenever damage can be regarded as additive (Y = X − U, U independent of Y) or in terms of the uniform distribution in (0, x) (e.g. Dimaki and Xekalaki [1990], [1996]; Xekalaki [1984b]) whenever damage can be regarded as multiplicative (Y = [RX], R independent of X and uniformly distributed in (0, 1)). The latter case has also been considered in the context of continuous distributions by Krishnaji ([1970]). The generating model was introduced and studied by Panaretos ([1983]).

Both, the generating model and the damage model offer a perceptive approach in actuarial contexts where one is interested in modelling the distributions of the numbers of accidents, of the damage claims, and of the claimed amounts. These models become relevant due to the fact that people have in general a tendency to under report their accidents, so that the reported (observed) number Y is less than or equal to the actual number X (Y ≤ X), but tend to over report damages incurred by them, so that the reported damage Y is greater than or equal to the true damage X (Y ≥ X).

Another type of distortion is induced by the adoption of a sampling scheme that assigns to the units in the original distribution unequal probabilities of inclusion in the sample. As a result, the value x of X is observed with a frequency that noticeably differs from that anticipated under the original density function f X (x; θ). It represents an observation on a random variable Y whose probability distribution is the results of adjusting the probabilities of the anticipated distribution through weighting them with the probability with which the value x of X is included in the sample. So, if this probability is proportional to some weight function, w(x, β), βR, the recorded value x is a value of Y having density function f Y (x; θ, β) = w(x; β)f x (x; θ)/E(w(X; β)).

Distributions of this type are known as weighted distributions ( see, e.g. Cox [1962]; Fisher [1934]; Patil and Ord [1976]; Rao [1985]). For w(x; β) = x, these are known as size biased distributions. In actuarial data modelling contexts again, the weight function can represent reporting bias. In the context of reporting accidents or placing damage claims, for example, it can have a value that is directly or inversely analogous to the size x of X, the actual number of incurred accidents or the actual size of the incurred damage. The functions w(x; β) = x and w(x; β) = βx (β > 1 or β < 1) are plausible choices. So, for example, in the case of a Poisson (θ) distributed X, these lead to distributions for Y that are of Poisson type. In particular, the weight function w(x; β) = x leads to a shifted Poisson distribution with probability function P(Y = x) = e− θθx − 1/(x − 1) !, x = 1, 2, …, while the choice w(x; β) = βx leads to a Poisson distribution P(Y = x) = e− θβ(θβ)x/x !, x = 0, 1, …. The value of the variance of the observed variable Y under the first assumption for w(x; β) is 1 + θ and exceeds that of X (overdispersion), while under the second assumption it is θβ implying overdispersion for β > 1 or underdispersion for β < 1.

4. Looking closer into the case of heterogeneity

Assuming a specific form for the distribution of the population that generated a data set implies that the mean to variance relation is given for this distribution, e.g. the Poisson distribution with a mean to variance ratio equal to unity. As has become obvious from the above, this relationship ceases to hold in real data sets however. This being rarely the case, flexible families have been sought in the literature by allowing the parameter θ of the original distribution to vary according to a distribution with probability density function, say g().

As mentioned before, a density function f X () is a mixture on the parameter θ of the distribution function f( ; θ) with some mixing distribution G θ (), which can be continuous, discrete or a finite step distribution, if it can be written in the form f X (x) = E G (f(x; θ)) = ∫ Θ f(x; θ)dG(θ), where Θ is the parameter space. An appropriate choice of a mixing distribution allows its parameter to vary and acts as a means of “loosening” the structure of the initial model, thus offering more realistic interpretations of the mechanisms that generated the data.

A large number of Poisson mixtures have been developed. (For an extensive review, see Karlis and Xekalaki [2003], [2005]). The derivation of the negative binomial distribution, as a mixture of the Poisson distribution with a gamma distribution as the mixing distribution, originally obtained by Greenwood and Yule ([1920]) constitutes a typical example. Mixtures of the negative binomial distribution have also been widely used in connection with applications in a plethora of fields. These include the Yule distribution (Yule [1924]; Irwin [1941]; Xekalaki [1983c], [1984b]) the Waring distribution (Irwin [1963]) and the generalized Waring distribution (Irwin [1968], [1975]; Xekalaki [1981], [1983a], [1984a]), which contains the Yule distribution and the Waring distribution as a special cases.

In what follows, we focus on the generalized Waring distribution and its relevance in accident data modeling contexts.

5. The generalized Waring distribution

This was introduced by Irwin ([1968]) in connection to biological data and later was shown by him to arise as an accident distribution (Irwin [1975]). It is the distribution with probability generating function given by

G s = ρ k a + ρ k F 1 2 a , k ; a + k + ρ ; s ,α,k,ρ>0

with 2F1(a, b; c; z) denoting the Gauss hypergeometric function r = a x a r b r z r / c r r ! , where h(l) = Γ(h + l)/Γ(h), h > 0, lR.

Irwin’s starting point was Waring’s expansion (hence the distribution’s name) given by 1 x a = r = 0 a r x r + 1 , which he then generalized to 1 x a k = r = 0 a r k r x k + r 1 r ! ,α,k>0.

Hence, by multiplying both sides by ρ(k), where ρ = x − a > 0, the successive terms of the resulting series could he regarded as defining a probability function, which he termed the generalized Waring distribution with parameters α, k, ρ. In particular, the probability function of the generalized Waring distribution with parameters α, k, ρ is given by

p r = ρ k a + ρ k a r k r a + k + ρ r 1 r ! ,α,k,ρ>0,r=0,1,2,

where h(l) = Γ(h + l)/Γ(h).

Notwithstanding the complexity of its structure, this distribution was shown to offer an insightful tool in the interpretation of accident data as will be seen below. Among its aspects that can be of practical value, is that, as shown by Xekalaki ([1983b]), it is a discrete self-decomposable distribution in Steutel and van Harn’s ([1979]) sense, hence infinitely divisible, implying that its probability generating function can be put in the form G s =exp λ s 1 1 g u 1 u du , where λ = p1/p0 and g() denotes the probability generating function of the distribution with probability function satisfying the recurrence relation

q n =λ n ak + ρ a + k + ρ ak a + k + ρ + n j = 0 n 1 q j n j a + k + ρ + n 1 j / a + n 1 j k + n 1 j

6. The generalized Waring distribution in relation to accident theory

The hypotheses that have formed the basis of investigations into the occurrence of accidents since almost a century ago are

  1. (i)

    Pure chance , giving rise to the Poisson distribution

  2. (ii)

    True contagion , i.e. the hypothesis that initially all individuals have the same probability of incurring an accident but that this probability is modified by each accident sustained.

  3. (iii)

    Apparent contagion (heterogeneity) , i.e. the hypothesis that individuals have constant but unequal probabilities of having an accident - the resultant distribution being a compound Poisson distribution (“accident proneness” model).

  4. (iv)

    The “Spells” Model , i.e each person is liable to periods of time during which the person’s performance is weak (spells). All of the person’s accidents occur within those spells. The numbers of accidents within different spells are independent and independent of the number of spells.

As already seen, the negative binomial distribution can be given a an accident proneness and a “spells” interpretation in the context of accident theory in terms of a gamma mixed Poisson distribution and a Poisson distribution generalized by a logarithmic distribution (Kemp [1967]).

Therefore, a good fit of the negative binomial is no help at all in distinguishing among the “proneness”, “contagion” and “spells” hypotheses. This is known as the discrimination problem between the compounded, contagion and generalized models for the negative binomial distribution and has been discussed by Arbous and Kerrich ([1951]); Bates and Neyman ([1952]); Gurland ([1959]) and Cane ([1974], [1977]). For an extensive bibliography on the accident hypotheses mentioned, see Kemp ([1970]).

6.1 Irwin’s “Proneness” model

As evident, in all three of the above models, the data are treated as if the individuals under observation were exposed to equal environmental risk, a fact criticized by Irwin ([1968]), who suggested a three-parameter distribution, which he called the “univariate generalized Waring distribution” (UGWD). He derived this distribution in a framework that allows separately for random factors, differences in the exposure of individuals to external risk of accident, and differences in proneness.

In particular, his model assumes a non homogeneous population with respect to personal and environmental attributes affecting the occurrence of accidents.

Let the distribution of the number, X, of accidents for individuals of equal proneness ν, and of equal exposure to external risk of accident λ|ν, i.e. λ for given ν), have probability generating function

G X | λ s =exp λ | ν s 1

in a unit time interval (0, 1). If the distributions of λ|ν and ν in the population at risk can be described by the probability density functions (pdf)

ν k exp λ / ν λ k 1 /Γ k ,v,k>0


Γ a + ρ ν a 1 1 + ν a + ρ / Γ ρ Γ a ,a,ρ>0

respectively, the pgf of the resulting distribution of accidents will be {ρ(k)2F1(a, k; a + k + ρ; s}/(a + ρ)(k), i.e. the univariate generalized Waring distribution with parameters a, k and ρ, which will be denoted by UGWD(a, k; ρ). Here, 2F1(a, b; c; z) denotes the Gauss hypergeometric function r = a x a r b r z r / c r r ! , where h(l) = Γ(h + l)/Γ(h), h > 0, lR. For more information about the UGWD the reader is referred to the work of Irwin ([1963], [1968], 1975); Xekalaki ([1981]) and the references therein and Xekalaki ([1983a]).

6.2 The “Contagion” model

Xekalaki ([1983a]), extended the assumptions of the classical contagion model developed by Greenwood and Yule ([1920]) by considering a population of individuals exposed to varying accident risk.

In particular, assume that at time t = 0 none of the individuals has had an accident. This would be true if, for example, with a population of new drivers or of individuals just beginning a new type of work. Suppose that during the time period from t to t + dt a person with x accidents by time t can incur another accident with a probability of {(k + x)/(1 + λt)}λdt (independent of the times of the previous accidents), where k is a positive constant and λ refers to the individual’s risk exposure. At t = 0, since x = 0, the probability of an accident is kλdt. Hence, what the model basically assumes is that, initially, the probability of having an accident is not the same for each individual, but depends on the external conditions; later, the probability is also affected by the number of preceding accidents. Under these assumptions and if differences in the exposure to accident risk can be thought of as governed by a distribution with probability density function given by {Γ(a + ρ)va − 1(1 + ν)− (a + ρ)}/{Γ(ρ)Γ(a)}, the final distribution of accidents over a unit period of time turns out to be UGWD(a, k; ρ).

The above derivation of the generalized Waring distribution closely relates to a modeling approach whereby the distribution of accident occurrences in a time internal (0, t) is regarded as underpinned by a stochastic process and, in particular, by a pure birth process {X t t = 0, 1, 2, …} where the probability of a person to incur an accident in (t, t + dt), having had x accidents by time t is P(Xt + δt = x + 1|X t  = x) = f λ (n, t)δt + o(δt).

Irwin ([1941]), followed later by Arbous and Kerrich ([1951]), derived the negative binomial distribution on the hypothesis solving the associated Kolmogorov forward differential equations by a method due to McKendrick ([1925]). Specifically, assuming that individuals can have during the time period from t to dt, individuals can have 0 accidents with probability 1 − f λ (x, t)dt, 1 accident with probability f λ (x, t)dt and > 1 accidents with probability 0, he solved the resulting system of Kolmogorov forward difference-differential equations

t P λ 0 , t = f λ 0 , t P λ 0 , t t P λ x , t = f λ x , t P λ x , t + f λ x 1 , t P λ x 1 , t , x 1

in terms of a single difference-differential equation involving the probability generating function G λ (s; t) of X t given by

t G λ s ; t = s 1 x = 0 s x f λ x , t P λ x , t

where G λ s ; t = x = 0 P λ x , t s x . (He obtained this equation by multiplying the i-th equation of the system by si − 1, i = 1, 2, … and summing the resulting equations).

Assuming further that f λ (x, t) = λ(k + mx), k, m > 0 and subject to the initial conditions G λ (1; t) = G λ (s; 0) = 1, he obtained for the distribution of accidents

G λ s ; t = e λmt s e λmt 1 k / m ,

i.e. the probability generating function of the negative binomial distribution with parameters k/m and (1 − e− λmt)− 1.

Relaxing Irwin’s implicit assumption that all individuals were exposed to the same accident risk, Xekalaki ([1981]) treated the parameter λ as referring to a variable risk exposure according to an exponential distribution with density ae− , a > 0 and obtained the generalized Waring distribution as the accident distribution. In particular,

G X t s = a 0 e e λmt s e λmt 1 k / m = a 1 s k / m mt 0 e λ a + kt mt 1 s s 1 e λ k / m = a 1 s k / m mt Γ a + kt / mt Γ 1 + a + kt / mt F 1 2 k m , a + kt mt ; a + kt mt + 1 ; s s 1 = a a + mt F 1 2 k m , 1 ; a mt + k m + 1 ; s

which is the probability generating function of the UGWD k m , 1 ; a mt .

This model was considered by Panaretos ([1989]) for the description of the evolution of surnames. Faddy ([1997]) provided a unifying approach to under- and over-dispersion relative to the Poisson distribution within a scheme of a similar nature, which generalizes the simple Poisson process that underpins the Poisson distribution. He demonstrated that any count distribution can be obtained by a suitable choice of f λ (x, t) and provided an expression for the system of Kolmogorov forward differential equations in terms of a matrix-exponential function.

Finally, Winkelmann ([1995]) looked at under- and over-dispersion using renewal theory by exploring the link between duration dependence and dispersion. He demonstrated that discrepancies between observed and nominal variances are conveyed by a hazard function of the waiting times that is not constant, but instead is a decreasing function of time inducing over-dispersion or an increasing function of time inducing under-dispersion.

6.3 The “Spells” model

Further, Xekalaki ([1983a]) considered a variant of the “spells” model due to Cresswell and Froggatt ([1963]) that rejects the presence of proneness and contagion.

Assume that every individual is liable to spells and that the number of spells in a given time period (0, t) is a Poisson variable with parameter θt, θ > 0. Suppose that no accidents occur outside spells and that the probability of an accident within a spell depends on the risk exposure of the particular individual. In particular, suppose that within a spell a person can have

or 0 accidents with probability 1 m log 1 + λ n accidents n 1 with probability m λ / 1 + λ n / n ,

0 < m < 1/log(1 + λ), λ > 0, where λ is the external risk parameter for the given individual. Assume further that the numbers of accidents arising out of different spells are independent and independent of the number of spells. Then, if differences in the risk exposure can be described by a beta distribution of the second kind with probability density function, {Γ(a + ρ)va − 1(1 + ν)− (a + ρ)}/{Γ(ρ)Γ(a)}, a, ρ > 0, the resulting accident distribution will have probability generating function given by

ρ a F 1 2 a , θmt ; a + θmt + ρ ; s / ρ + θmt a .

Hence, in a unit time period, the number of accidents follows the UGWD(a, θm; ρ).

It is worth noticing that the form of the distribution of λ in the last two models is more general than that considered by the proneness model. It is however, a reasonable choice as it implies a beta distribution of the first kind (Pearson Type I) for the parameter q = λ/(1 + λ) of the negative binomial distribution of X|λ.

6.4 Deciding about the underlying model

It is evident from the above, that three completely different sets of hypotheses give rise to exactly the same form of distribution and that while the UGWD may be a plausible model if accident proneness is a accepted as an established fact, a satisfactory fit of it is not to be taken as evidence for the validity of the proneness hypothesis. How can we then discriminate?

Statisticians have always been excited to look for ways of discriminating among different models that give rise to the same distribution. Most attempts seem to have been concentrated on distinguishing between the proneness and contagion models generating the negative binomial distribution. The papers by Bates and Neyman ([1952]) and Bates ([1955]) cover part of the work that has been done on the subject, though they primarily focus on distinguishing between different forms of contagion. Shaw and Sichel’s ([1971]) attempt was on proving or disproving proneness by ranking individual accident performance on a scale based on their average interval between successive accidents. However, the first systematic study on how one can discriminate between the proneness and contagion models of the negative binomial distribution appears to be that by Cane ([1974]).

She demonstrated, however, that one cannot distinguish between the two models, even with knowledge of the time sequence of accidents. She demonstrated, in particular, that the conditional distribution of the times, t i , i = 1, 2, …, n at which accidents occurred in a time period (0, T) is the same in both cases, namely that of an ordered sample from a uniform distribution over (0, T) with probability density function n ! T− n. In fact, this is the case for any compound Poisson accident distribution whose compounding distribution has finite moments (Cane [1977]), hence also for the UGWD(a, k; ρ).

This implies that the availability of information on the times of the occurrence of accidents is not sufficient to guide one’s choice between the proneness and contagion models.

However, as demonstrated by Xekalaki ([1983a]), there appears to exist a possibility in the framework of the Spells model. Consider, in particular, the problem of finding the joint distribution of times t i , i = 1, 2, …, n of accidents by individuals with n accidents in a unit period of time under the spells model. For fixed λ, accidents occur as events in a generalized Poisson process: X t = i = 1 N t Y i ,N t Poisson θt , where θ > 0, t ≥ 0 and Y i are identically and independently distributed with probability density function given by {Γ(a + ρ)va − 1(1 + ν)− (a + ρ)}/{Γ(ρ)Γ(a)}, a, ρ > 0. Consequently, the required probability function can be written as 0 1 + λ θm 1 t n i = 1 n λmθ 1 + λ θm t i t i 1 1 d t i dH λ , with H() denoting the distribution function of the beta distribution of the second kind defined as above. Hence, the required probability is θm n ρ a a n θm + ρ a + n d t 1 d t n . Therefore, conditional on n accidents during a time period from 0 to 1, the joint pdf of t i , i = 1, 2, …, n, is n ! (θm)n/(θm)(n).

The obtained form differs from that arising under the proneness and contagion models. This fact is itself is very interesting as far as establishing the presence of spells is concerned, as it implies the following: if an observed accident distribution of the UGWD type has arisen from the spells model, the time intervals (0, t i ), i = 1, 2, …, n, given a total of n accidents, will be jointly distributed with the above density function. Any departure from this distribution is, then, evidence against the spells model. Of course, if on the available evidence one has to reject this form in favor of that obtained by Cane, then one is faced again with the question: “proneness or contagion?” This cannot be answered by studying the distribution of t i .

6.5 What does Irwin’s accident model offer beyond a good fit to the data?

The innovation brought by Irwin’s accident proneness model does not merely lie in the better fit it provides to accident data, but in the possibility of partitioning the total variance (σ2) into three additive components due to proneness σ ν 2 , liability σ λ 2 and randomness σ R 2 thus,

σ 2 = σ λ 2 + k 2 σ ν 2 + σ R 2 ,


σ λ 2 = ak a + 1 ρ 1 1 ρ 2 1 σ ν 2 = a a + ρ 1 1 ρ 1 2 ρ 2 1 σ R 2 = ak ρ 1 1 σ 2 = ak a + ρ 1 k + ρ 1 ρ 1 2 ρ 2 1 .

There is still, however, a problem due to the fact that the UGWD(a, k; ρ) is symmetrical in a and k (UGWD(a, k; ρ) UGWD(k, a; ρ)). Hence, although one may consider that σ λ 2 + k 2 σ ν 2 represents the variance component due to all non-random factors, the mathematics alone cannot determine whether σ λ 2 represents the liability component and k 2 σ ν 2 the proneness component or vice versa. As a consequence, distinguishable estimates for the non-random variance components σ λ 2 and σ ν 2 cannot be obtained unless subjective judgement is made. This problem was addressed by Xekalaki ([1984a]) with the introduction of her bivariate form of the generalized Waring distribution.

7. The bivariate generalized Waring distribution

Generalizing further Irwin’s ([1963]) generalization of Waring’s expansion, we have for k, m, a > 0,

1 x a k + m = = 0 r = 0 a 1 ! r Δ r 1 x k Δ r 1 x + k + r m = r = 0 = 0 a r + 1 r + r ! ! Δ r 1 x k Δ 1 x + k + r m = r = 0 = 0 a r + k r m x k + m + r + 1 r ! 1 !

If x > a, the above series is convergent. Then, by letting ρ = x − a > 0 and multiplying both sides by ρ(k + m), leads to a double series of positive terms converging to unity. The general term of the series therefore can be regarded as defining a bivariate discrete probability distribution with probability function

p r , = ρ k + m a + ρ k + m a r + k r m a + k + m + ρ r + 1 r ! 1 ! ,a,k,m,ρ>0,r,=0,1,2,

In the remainder of the paper, we refer to this distribution as the bivariate generalized Waring distribution with parameters a, k, m and ρ and we denote it by BGWD(a; k, m; ρ).

7.1 The BGWD in relation to accident theory

Assume that individuals of proneness ν and liability λ i |ν for a period i of observation incur, over two non-overlapping time periods, accidents X, Y according to a double Poisson distribution G X , Y | λ 1 , λ 2 , ν s , t =exp λ 1 | ν s 1 + λ 2 | ν t 1 , λ 1 , λ 2 >0. Assume further that the liability parameters λ1|ν, λ2|ν are independently gamma distributed with densities Γ θ i ν θ i 1 e λ i | ν λ i θ i 1 , θ 1 k, θ 2 m,ν>0, whence for individuals with the same proneness ν, but varying liabilities, the numbers of occurring accidents over the two periods are jointly distributed as the double negative binomial with probability generating function

G X , Y | ν s , t = 1 + ν 1 s k 1 + ν 1 t m .

Letting now the proneness parameter ν be beta distributed with density function {Γ(a + ρ)va − 1(1 + ν)− (a + ρ)}/{Γ(ρ)Γ(a)}, a, ρ > 0, the probability generating function of the joint distribution of accidents over the two periods takes the form

G X , Y s , t = Γ ρ + a Γ ρ Γ a 0 + ν a 1 1 + ν a + ρ 1 + ν 1 s k 1 + ν 1 t m = ρ k + m a + ρ k + m F 1 a ; k , m ; a + k + m + ρ ; s , t ~ BGWD a ; k , m ; ρ ,

where F 1 a ; b , c ; d ; u , v = r , s = 0 a r + s b r c s u r v s / d r + s r ! s ! is Appell’s hypergeometric series and h(l) = Γ(h + l)/Γ(h), h > 0, lR.

Regarding separate estimation of the contribution of proneness, liability and randomness in a given accident situation over a period of observation whenever proneness is accepted as an established fact, Xekalaki ([1984a]) showed that rearranging the observed distribution in two non-overlapping sub-intervals and fitting the BGWD(a; k, m; ρ) to the resulting bivariate accident distribution does enable separate estimation of the variance components. This is demonstrated in Table 1.

Table 1 Estimators of the components of the variance of the generalized waring distribution

Further models leading to the BGWD provided by Xekalaki ([1984c]), provide the framework within which one can also obtain the BGWD as an accident distribution under the contagion and the spells accident theories.

8. The multivariate generalized Waring distribution

The n-variate version of the genaralized Waring distribution introduced and studied by Xekalaki ([1986]) is also obtained as an inverse factorial distribution. Its probability generating function is given by

G t ¯ = ρ k i a + ρ k i F D a ; k 1 , , k n ; a + i = 1 n k i + ρ ; t ¯

with % F D a ; β 1 , , β n ; γ ; t ¯ denoting Lauricella’s hypergeometric function given by

F D a ; β 1 , , β n ; γ ; t ¯ = r 1 , , r n a r i γ r i i = 1 n β i r i t i r i r i !

The probability function of it is given by

P r ¯ P X ¯ = r ¯ = ρ k i a + ρ k i a r i k 1 r i k n r n a + ρ + k i r i r 1 ! r n ! , r i = 0 , 1 , 2 , ; i = 1 , , n

and its probabilities are related by the following first order recurrences, which facilitate their computation

P l 1 , l 2 , , l h 1 , l h + 1 , l h + 1 , , l n P l 1 , l 2 , , l n = a + i = 1 n l i k n + l n a + i = 1 n k i + i = 1 n l i l n + 1 ,l=0,1,2,;i=1,2,,n

An interesting aspect of the bivariate and multivariate versions of the generalized Waring distribution is that their marginal distributions (conditional and unconditional) as well as their convolution are of the same form (UGWD’s), properties that exhibit a symmetry analogous to that existing in the case of the multivariate normal distribution. Further, the generalized Waring distribution is self-decomposable (Xekalaki [1983b]).

9. The Generalized Waring Process (gWp)

Looking into how temporally evolving data from the wide spectrum of application contexts that can reasonably be viewed from the perspective of the frameworks discussed in Sections 6, 7 and 8 can be treated, Xekalaki and Zografi ([2008]) defined and studied the generalized Waring process. In establishing its definition, the structural properties of both the bivariate and the multivariate versions of the generalized Waring distribution played a significant role. This process, analogously to the case of Poisson and Pólya processes, which can be obtained as limiting cases of it, was shown to be a Markov process.

Let {N(t), t ≥ 0} be a counting process. This is said to be a generalized Waring process with parameters a, k, ρ > 0, denoted by gWp(a, k; ρ), if (i) N(0) = 0, (ii) N(t) is a Markov process, and (iii) N(t + h) − N(t) has the generalized Waring distribution with parameters a, k; ρ for h > 0, t ≥ 0. The process starts at 0, it has stationary increments and

P N t = n = ρ kt ρ + a kt a n kt n a + ρ + kt n 1 n !

i.e., N(t) has a generalized Waring distribution with parameters a, kt; ρ.

The transition probabilities of the generalized Waring process are given by

p m , n s , s + t = P N s + t = n | N s = m = Γ a + n Γ a + m kt n m n m ! ρ + ks a + m ρ + ks + kt a + n p 0 , n 0 , t = P N t = n | N 0 = 0 = ρ kt ρ + a kt a n kt n a + ρ + kt n 1 n ! = P N t = n

with the last equality indicating that the generalized Waring process is a non-homogenous Markov process. Its mean and variance are respectively

E N t = akt ρ 1 andVar N t = akt ρ + kt 1 ρ + α 1 ρ 1 2 ρ 2

Note that since the generalized Waring process is a stationary process and its mean is of the form E[N(t)] = ηt, the above formula implies that its intensity is η = ak/(ρ − 1). Its variance can be split into three additive components, thus

Var N t = σ Λ t 2 + kt 2 σ ν 2 + σ R 2

with the liability and random components dependent on time. In particular,

σ Λ t 2 =akt a + 1 ρ 1 1 ρ 2 1 ; σ ν 2 =a a + ρ 1 ρ 1 2 ρ 2 1 ; σ R 2 =akt ρ 1 1 .

9.1 The generalized Waring process in an accident proneness context

We consider a population which is inhomogeneous with respect to personal and environmental attributes affecting the occurrence of accidents. The terms “accident proneness” and “accident liability” are again used to refer respectively to a person’s predisposition to accidents, and to a person’s exposure to external risk of accident with the conditional distribution of the random variable λ given ν describing differences in external risk factors among individuals. Liability fluctuations over a time interval (t, t + h) depend on the length h of the interval and are described by a distribution for λ|ν with probability density function λkh − 1e− λ/(νh)(νh)− kh/Γ(kh). Allowing further the parameter ν have a beta distribution of the second kind with parameters a and ρ and density function ϕ given by

ϕ(ν) = Γ(a + ρ)νa − 1(1 + ν)− (a + ρ)/[Γ(a)Γ(ρ)], a, ρ ≥ 0, we obtain for the distribution of the number of accidents N(t):

P N t + h N t = n = ρ kh a + ρ kh a n kh n a + ρ + kh n 1 n !


P N t = n = P n t = ρ kt a + ρ kt a n kt n a + ρ + kt n 1 n ! ,n=0,1,

So, the process arising in the context of this model, satisfies the defining conditions of the generalized Waring process.

9.2 The generalized Waring process in the context of a spells model

Xekalaki and Zografi ([2008]) showed that the generalized Waring process could also be used in modeling temporally evolving data in the context of a spells model. Assume again that each person is liable to spells and that no accidents can occur outside spells. Let S(t), t = 0, 1, 2, …, the number of spells up to a given moment t, be a homogeneous Poisson process with rate k/m, k > 0, the number X i of accidents within a spell i be a random variable with a logarithmic series distribution with parameters m and ν and probability function given by %P X i = n = m n ν 1 + ν n ,n1 with P(X i  = 0) = 1 − m log(1 + ν), ν > 0, 0 < m < 1/log(1 + ν), and the numbers of accidents arising out of different spells be independent and independent of the number S(t) of spells. Here ν is regarded as the external risk parameter, too, which they assumed varying according to a beta distribution of the second kind with parameters a and ρ and probability density function given by Γ(a + ρ)νa − 1(1 + ν)− (a + ρ)/[Γ(a)Γ(ρ)], a, ρ ≥ 0. They then showed that the above framework leads to a process conforming with the postulates of the generalized Waring process, thus demonstrating its potential application in the context of the Spells model.

10. An application: modeling the counting process {N(s), s > 0} associated with the access pattern of a web site

As an illustration of the application potential of the generalized Waring process in other fields by appropriately adjusting the concepts and terminology used in this paper so as to have natural interpretations, we outline an example of a model for temporally evolving data on web access patterns provided by Xekalaki and Zografi ([2008]).

In this context, {N(s), s > 0} is the counting process associated with the access pattern of a web site, where, for any t > 0, N(t) represents the number of visits that the web pages on this particular site get within the interval (0, t). Note that the generalized Waring distribution was cited in Ajiferuke et al. ([2004]) as used by them to fit observed website visitation data for a given period, i.e, to model counts N(t0) of web visits on a given fixed time interval (0, t0).

Except for chance, visits to a web site can be regarded as affected by the intrinsic appeal of the particular site to web users (corresponding to proneness) as well as by exogenous factors (corresponding to external factors) such as, links provided by other sites to the particular site, how well the site is advertised etc.

Letting ν denote the intrinsic factors and λ|ν the exogenous factors. Then assuming that N(t)|λ follows a Poisson(λ(t)) distribution, where λ(t) = λt with λ|ν following a gamma distribution with density λkt − 1e− λ/(νt)(νt)− kt/Γ(kt), and with ν following a beta distribution of the second kind with density Γ(a + ρ)νa − 1(1 + ν)− (a + ρ)/[Γ(a)Γ(ρ)], a, ρ ≥ 0, then the unconditional distribution of N(t) is the GWD(a, kt; ρ), i.e. the process {N(t), t ≥ 0} is a generalized Waring process.

10.1 The data

The log files representing the hits on an e-shop site for the period from March 31, 2006 to April 30, 2006 have been used to fit this model. (A log file typically contains information on the times of visits per IP address per day). On the basis of such log files, the visits per day made by each of 468 IP addresses to a web site during the above period were enumerated yielding 468 paths of visits N i (t j ) made by IP address i up to and including time t j denoted by {N i (t j ), i = 1, 2, …, 468; j = 1, 2, …, 31}.

Moment estimates of the parameters of the generalized Waring process were obtained employing an estimation procedure for spatial point process data termed in the literature as the centered reduced moment method. The method introduced and studied by Ripley ([1976], [1977]) utilizes the intensity of the process and the mean number of further points within distance s of an arbitrary point of the process. In particular, the method utilizes the moment estimators %E N ^ s = μ ^ 1 = η ^ s=ns/h,E N ^ 2 s = μ ^ 2 =X/ n 2 ,E N ^ 3 s = μ ^ 3 = Z X / n 3 with %X= i = 1 n i j ϕ s 2 x i , x j ,Z= i = 1 n j i ϕ s x i , x j k i ϕ s x i , x k , where the quantities involved in the above equations represent weights defined, for each value x i in the collection of points {x i  : i = 1, 2, …, n} of the process within a time interval of length h, as follows: For each x i in {x i  : i = 1, 2, …, n} and a given s > 0, consider the interval of center x i and length s and assign to every point x j , j ≠ i in this interval the weight ϕ s (x i , x j ) = ω(x i , x j )− 1, where ω(x i , x j ) is the number of other points {x k , k ≠ i, k ≠ j} of the process that are included in the interval of length |x i  − x j | and center x i (see also Diggle and Chetwynd [1991]; Chetwynd and Diggle [1998], among others). The standard errors of the thus obtained parameter estimators can in principle be determined by simulation, but the associated computations are formidable. Approximation formulas exist only for the case of homogeneous planar Poisson process, while, for the class of stationary Cox process, there is no obvious way to obtain estimable expressions as noted by Chetwynd and Diggle ([1998]).

The observed paths were compared to the corresponding time series of simulated realizations of the generalized Waring process over the same time segment. For each IP address, 100 simulated realizations of the gWp(a, k; ρ), were obtained and each of the observed time series paths was compared to the corresponding simulated ones. On average, the realizations of the generalized Waring process exhibited similar structural characteristics, notably recognizable, to those of the paths of the observed time series. For illustration purposes, the path of the observed time series associated with one of the IP addresses considered is presented in Figure 1. In the graph, the path is superimposed by a sample of three of the 100 corresponding simulated realizations of the gWp(a, k; ρ). Inspection of the graph provides a visual appreciation of the degree of similarity in the structural characteristics of the path of the observed and the realized time series.

Figure 1
figure 1

Observed and simulated paths of the gWp(3.87, 0.83; 4.21) corresponding to the selected IP address (Xekalaki and Zografi[2008]).

Following Lewis ([1972]), Brillinger ([1978]) and Andersen et al. ([1993]), the closeness of the observed and realized time series was also checked using diagnostic plots based on the inverse-intensity residuals computed for each value x j in the collection of points {x j  : j = 1, 2, …, n} of the process given by % R θ ^ B j , η 1 = x i B j η ^ x i B j j I R + η ^ x dx where % B j = 0 , x j , θ ^ = a ^ , k ^ , ρ ^ 1 , η ^ x =η x , θ ^ is the fitted intensity and % I R + is the indicator function. These plots exhibit similar results. The plot corresponding to the data associated to the IP address considered is shown in Figure 2.

Figure 2
figure 2

Plot of inverse-intensity residuals corresponding to the selected IP address (Xekalaki and Zografi[2008]).


  • Ajiferuke, I, Wolfram, D, Xie, H: Modelling website visitation and resource usage characteristics by IP address data. In: Julien, H, Thompson, S (eds.). Proceedings of the 32nd Annual Conference of the Canadian Association for Information Science, Manitoba, Canada (2004). Available at:

    Google Scholar 

  • Andersen P, Borgan Ø, Gill R, Keiding N: Statistical Models based on Counting Processes. Springer, New York; 1993.

    Book  MATH  Google Scholar 

  • Andrews DF, Mallows CL: Scale mixtures of normal distributions. J. R. Stat. Soc. B 1974, 36: 99–102.

    MathSciNet  MATH  Google Scholar 

  • Arbous AG, Kerrich JE: Accident statistics and the concept of accident proneness. Biometrics 1951, 7: 340–432. 10.2307/3001656

    Article  Google Scholar 

  • Bates GE: Joint distributions of time intervals for the occurrence of successive accidents in a generalized Polya sheme. Ann. Math. Stat. 1955, 26: 705–720. 10.1214/aoms/1177728429

    Article  MathSciNet  MATH  Google Scholar 

  • Bates GE, Neyman J: Contributions to the theory of accident proneness. U. Calif. Publ. Stat. 1952, 1: 215–275.

    MathSciNet  MATH  Google Scholar 

  • Breslow N: Tests of hypotheses in overdispersed poisson regression and other quasi-likelihood models. J. Am. Stat. Assoc. 1990, 85: 565–571. 10.1080/01621459.1990.10476236

    Article  Google Scholar 

  • Brillinger D: Comparative Aspects of the Study of Ordinary Time Series and of Point Processes. In Developments in Statistics. Edited by: Krishnaiah PR. Academic Press, New York; 1978:33–133.

    Google Scholar 

  • Cane VR: The concept of accident proneness. B. Inst. Math. Bulgarian Acad. Sci. 1974, 15: 183–188.

    MathSciNet  MATH  Google Scholar 

  • Cane VR: A class of non-identifiable stochastic models. J. Appl. Probab. 1977, 14: 475–782. 10.2307/3213450

    Article  MathSciNet  MATH  Google Scholar 

  • Carriere J: Nonparametric tests for mixed poisson distributions. Insur. Math. Econ. 1993, 12: 3–8. 10.1016/0167-6687(93)90994-Z

    Article  MathSciNet  MATH  Google Scholar 

  • Chetwynd AG, Diggle PJ: On estimating the reduced second moment measure of a stationary spatial point process. Aust. N.Z. J. Stat. 1998, 40(1):11–15. 10.1111/1467-842X.00002

    Article  MathSciNet  MATH  Google Scholar 

  • Cox DR: Renewal Theory. Barnes & Noble, New York; 1962.

    MATH  Google Scholar 

  • Cox DR: Some remarks on overdispersion. Biometrika 1983, 70: 269–274. 10.1093/biomet/70.1.269

    Article  MathSciNet  MATH  Google Scholar 

  • Cresswell WL, Froggatt P: The Causation of Bus Driver Accidents. Oxford University Press, London; 1963.

    Google Scholar 

  • Davidian M, Carroll RJ: A note on extended quasi-likelihood. J. R. Stat. Soc. B 1988, 50: 74–82.

    MathSciNet  Google Scholar 

  • Dean CB, Lawless J, Willmot GE: A mixed poisson-inverse Gaussian regression model. Can. J. Stat. 1989, 17: 171–182. 10.2307/3314846

    Article  MathSciNet  MATH  Google Scholar 

  • Diggle PJ, Chetwynd AG: Second- order analysis of spatial clustering for inhomogeneous populations. Biometrics 1991, 47: 1155–1163. 10.2307/2532668

    Article  Google Scholar 

  • Dimaki C, Xekalaki E: Identifiability of income distributions in the context of damage and generating models. Commun. Stat. A-Theor. 1990, 19(8):2757–2766. 10.1080/03610929008830346

    Article  MathSciNet  MATH  Google Scholar 

  • Dimaki C, Xekalaki E: Additive and multiplicative distortion of observations: some characteristic properties. J. Appl. Stat. Sc. 1996, 5(2/3):113–127.

    MathSciNet  MATH  Google Scholar 

  • Dubourdieu J: Remarques Relatives sur la Théorie Mathématique de l’Assurance-Accidents. Bull. Trim. Inst. Actuaries Fran. 1938, 44: 79–146.

    MATH  Google Scholar 

  • Faddy MJ: Extending poisson process modelling and analysis of count data. Biometrical J. 1997, 39(4):431–440. 10.1002/bimj.4710390405

    Article  MATH  Google Scholar 

  • Famoye F, Singh KP: Zero-inflated generalized poisson regression model with an application to domestic violence. Data. J. Data Sci. 2006, 4: 117–130.

    Google Scholar 

  • Fisher RA: The effects of methods of ascertainment upon the estimation of frequencies. Ann. Eugenic. 1934, 6: 13–25. 10.1111/j.1469-1809.1934.tb02105.x

    Article  Google Scholar 

  • Fisher RA: The significance of deviations from expectation in a poisson series. Biometrics 1950, 6: 17–24. 10.2307/3001420

    Article  Google Scholar 

  • Greenwood M, Woods HM: On the Incidence of Industrial Accidents upon Individuals with Special Reference to Multiple Accidents. In Report of the Industrial Fatigue Research Board, 4. His Majesty’s Stationary Office, London; 1919:1–28.

    Google Scholar 

  • Greenwood M, Yule GU: An inquiry into the nature of frequency distributions representative of multiple happenings with particular reference to the occurrence of multiple attack of disease or repeated accidents. J. R. Stat. Soc. 1920, 83: 255–279. 10.2307/2341080

    Article  Google Scholar 

  • Gschlößl S, Czado C: Modelling count data with overdispersion and spatial effects. Stat. Pap. 2008, 49: 531–552. 10.1007/s00362-006-0031-6

    Article  MathSciNet  MATH  Google Scholar 

  • Gupta PL, Gupta RC, Tripathi RC: Analysis of zero adjusted count data. Comput. Stat. Data An. 1996, 23: 207–218. 10.1016/S0167-9473(96)00032-1

    Article  MATH  Google Scholar 

  • Gupta PL, Gupta RC, Tripathi RC: Score test for zero inflated generalized poisson regression model. Commun. Stat. A-Theor. 2004, 33: 47–64. 10.1081/STA-120026576

    Article  MathSciNet  MATH  Google Scholar 

  • Gurland J: Some applications of the negative binomial and other contagious distributions. Am. J. Public Health 1959, 49: 1388–1399. 10.2105/AJPH.49.10.1388

    Article  Google Scholar 

  • Hinde J, Demetrio CGB: Overdispersion: models and estimation. Comput. Stat. Data An. 1998, 27: 151–170. 10.1016/S0167-9473(98)00007-3

    Article  MATH  Google Scholar 

  • Irwin JO: Discussion on chambers and Yule’s paper. J. R. Stat. Soc. Supplement 1941, 7: 101–109.

    Google Scholar 

  • Irwin JO: The place of mathematics in medical and biological statistics. J. R. Stat. Soc. A 1963, 126: 1–44. 10.2307/2982445

    Article  Google Scholar 

  • Irwin JO: The generalized Waring distribution applied to accident theory. J. R. Stat. Soc. A 1968, 131: 205–225. 10.2307/2343842

    Article  MathSciNet  Google Scholar 

  • Irwin JO: The Generalized Waring Distribution. J. R. Stat. Soc. A 1975, 138: 18–31.

    Article  MathSciNet  Google Scholar 

  • Jewell N: Mixtures of exponential distributions. Ann. Stat. 1982, 10: 479–484. 10.1214/aos/1176345789

    Article  MathSciNet  MATH  Google Scholar 

  • Karlis D, Xekalaki E: On testing for the number of components in a mixed poisson model. Ann. I. Stat. Math. 1999, 51: 149–162. 10.1023/A:1003839420071

    Article  MathSciNet  MATH  Google Scholar 

  • Karlis D, Xekalaki E: Mixtures Everywhere. In Stochastic Musings: Perspectives from the Pioneers of the Late 20th Century. Edited by: Panaretos J. Laurence Erlbaum, USA; 2003:78–95.

    Google Scholar 

  • Karlis D, Xekalaki E: Mixed poisson distributions. Int. Stat. Rev. 2005, 73(1):35–58. 10.1111/j.1751-5823.2005.tb00250.x

    Article  MATH  Google Scholar 

  • Kemp CD: On a contagious distribution suggested for accident data. Biometrics 1967, 23: 241–255. 10.2307/2528159

    Article  Google Scholar 

  • Kemp CD: “Accident proneness” and discrete distribution theory. In Random Counts in Scientific Work, Vol.2. Edited by: Patil GP. State College: Pennsylvania State University Press, University Park, USA; 1970:41–65.

    Google Scholar 

  • Krishnaji N: Characterization of the Pareto distribution through a model of under-reported incomes. Econometrica 1970, 38: 251–255. 10.2307/1913007

    Article  Google Scholar 

  • Kulinskaya E, Olkin I: An overdispersion model in meta-analysis. Stat. Model. 2014, 14(1):49–76. 10.1177/1471082X13494616

    Article  MathSciNet  Google Scholar 

  • Lawless JF: Negative binomial and mixed poisson regression. Can. J. Stat. 1987, 15: 209–225. 10.2307/3314912

    Article  MathSciNet  MATH  Google Scholar 

  • Lewis PAW: Recent results in the statistical analysis of univariate point processes. In Stochastic Point Processes. Edited by: Lewis PAW. Wiley, New York; 1972:1–54.

    Google Scholar 

  • Lexis W: Über die Theorie der Statilität Statistischer Reichen. Jahrb. Nationalökon. u. Statist. 1879, 32: 60–98.

    Google Scholar 

  • McCullagh P, Nelder JA: Generalized Linear Models. 2nd edition. Chapman & Hall, London; 1989.

    Book  MATH  Google Scholar 

  • McKendrick AG: The applications of mathematics to medical problems. Proc. Edinb. Math. Soc. 1925, 44: 98–130.

    Article  Google Scholar 

  • McLachlan JA, Peel D: Finite Mixture Models. Wiley, New York; 2001.

    MATH  Google Scholar 

  • Moore DF: Asymptotic properties of moment estimators for overdispersed counts and proportions. Biometrika 1986, 23: 583–588. 10.1093/biomet/73.3.583

    Article  MathSciNet  MATH  Google Scholar 

  • Nelder JA, Pregibon D: An extended quasi-likelihood function. Biometrika 1987, 74: 221–232. 10.1093/biomet/74.2.221

    Article  MathSciNet  MATH  Google Scholar 

  • Ngatchou-Wandji J, Paris C: On the zero-inflated count models with application to modelling annual trends in incidences of some occupational allergic diseases in France. J. Data Sci. 2011, 9: 639–659.

    MathSciNet  Google Scholar 

  • Panaretos J: An extension of the damage model. Metrika 1982, 29: 189–194. 10.1007/BF01893378

    Article  MathSciNet  MATH  Google Scholar 

  • Panaretos J: A generating model involving Pascal and logarithmic series distributions. Commun. Stat. A-Theor. 1983, 12(7):841–848. 10.1080/03610928308828499

    Article  MathSciNet  MATH  Google Scholar 

  • Panaretos J: On the evolution of surnames. Int. Stat. Rev. 1989, 57(2):161–167. 10.2307/1403384

    Article  MATH  Google Scholar 

  • Patil GP, Ord JK: On size-biased sapling and related form-invariant weighted distributions. Sankhya 1976, 38: 48–61.

    MathSciNet  MATH  Google Scholar 

  • Rao CR: On discrete distributions arising out of methods of ascertainment. Sankhya 1963, A25: 311–324.

    MATH  Google Scholar 

  • Rao CR: Weighted Distributions Arising Out of Methods of Ascertainment. In A Celebration of Statistics, Chapter 24. Edited by: Atkinson AC, Fienberg SE. Springer-Verlag, New York; 1985:543–569. 10.1007/978-1-4613-8560-8_24

    Chapter  Google Scholar 

  • Ridout M, Demetrio CGB, Hinde J: Models for count data with many zeros. Invited paper presented at the Nineteenth International Biometric Conference, Cape Town, South Africa; 1998.

    Google Scholar 

  • Ridout M, Hinde J, Demetrio CGB: A score test for testing zero inflated Poisson regression model against zero inflated negative binomial alternatives. Biometrics 2001, 57: 219–223. 10.1111/j.0006-341X.2001.00219.x

    Article  MathSciNet  MATH  Google Scholar 

  • Ripley BD: The second-order analysis of stationary point processes. J. Appl. Probab. 1976, 13: 255–266. 10.2307/3212829

    Article  MathSciNet  MATH  Google Scholar 

  • Ripley BD: Modelling spatial patterns (with Discussion). J. R. Stat. Soc. B 1977, 39: 172–212.

    MathSciNet  Google Scholar 

  • Shaw L, Sichel HS: Accidents Proneness. Pergamon Press, Oxford; 1971.

    Google Scholar 

  • Steutel FW, van Harn K: Discrete analogues of self-decomposability and stability. Ann. Prob. 1979, 7: 893–899. 10.1214/aop/1176994950

    Article  MathSciNet  MATH  Google Scholar 

  • Student: An explanation of deviations from poisson’s law in practice Biometrika 1919, 12: 211–215. 10.2307/2331767

    Article  Google Scholar 

  • Thyrion P: Extension of the collective risk theory. Skand. Aktuaritidskrift 1969, 52(Supplement):84–98.

    MathSciNet  MATH  Google Scholar 

  • Titterington DM: Some recent research in the analysis of mixture distributions. Statistics 1990, 21: 619–641. 10.1080/02331889008802274

    Article  MathSciNet  MATH  Google Scholar 

  • Tripathi R, Gupta R, Gurland J: Estimation of parameters in the beta binomial model. Ann. I. Stat. Math. 1994, 46: 317–331. 10.1007/BF01720588

    Article  MathSciNet  MATH  Google Scholar 

  • Wang P, Puterman M, Cokburn I, Le N: Mixed poisson regression models with covariate dependent rates. Biometrics 1996, 52: 381–400. 10.2307/2532881

    Article  MATH  Google Scholar 

  • Winkelmann R: Duration dependence and dispersion in count-data models. J. Bus. Econ. Stat. 1995, 13(4):467–474.

    MathSciNet  Google Scholar 

  • Xekalaki E: Chance mechanisms for the univariate generalized Waring distribution and related characterizations. In Statistical Distributions in Scientific Work, Vol. 4 (Models, Structures and Characterizations). Edited by: Taillie C, Patil GP, Baldessari B. Reidel, Dordrecht; 1981:157–171. 10.1007/978-94-009-8549-0_12

    Chapter  Google Scholar 

  • Xekalaki E: The univariate generalized Waring distribution in relation to accident theory: proneness, spells or contagion? Biometrics 1983a, 39(3):887–895. 10.2307/2531324

    Article  Google Scholar 

  • Xekalaki E: Infinite divisibility, completeness and regression properties of the univariate generalized Waring distribution. Ann. I. Stat. Math. A 1983, 35: 279–289. 10.1007/BF02480983

    Article  MathSciNet  MATH  Google Scholar 

  • Xekalaki E: A property of the Yule distribution and its applications. Commun. Stat. A-Theor. 1983, 12(10):1181–1189. 10.1080/03610928308828523

    Article  MathSciNet  MATH  Google Scholar 

  • Xekalaki E: The bivariate generalized Waring distribution and its application to accident theory. J. R. Stat. Soc. A 1984, 147(3):488–498. 10.2307/2981580

    Article  MathSciNet  MATH  Google Scholar 

  • Xekalaki E: Linear regression and the Yule distribution. J. Econometrics 1984, 24(1):397–403. 10.1016/0304-4076(84)90061-7

    Article  MathSciNet  MATH  Google Scholar 

  • Xekalaki E: Models leading to the bivariate generalized Waring distribution. Utilitas Math. 1984, 25: 263–290.

    MathSciNet  MATH  Google Scholar 

  • Xekalaki E: The multivariate generalized Waring distribution. Commun. Stat. A-Theor. 1986, 15(3):1047–1064. 10.1080/03610928608829168

    Article  MathSciNet  MATH  Google Scholar 

  • Xekalaki, E: Under- and Overdispersion. Enc. Act. Sci. 3, (2006) []

    Google Scholar 

  • Xekalaki E, Panaretos J: Identifiability of compound poisson distributions. Scand. Actuar. J. 1983, 66: 39–45. 10.1080/03461238.1983.10408688

    Article  MathSciNet  MATH  Google Scholar 

  • Xekalaki E, Zografi M: The generalized Waring process and its application. Commun. Stat. A-Theor. 2008, 37(12):1835–1854. 10.1080/03610920801893707

    Article  MathSciNet  MATH  Google Scholar 

  • Xue D, Deddens J: Overdispersed negative binomial models. Commun. Stat. A-Theor. 1992, 21: 2215–2226. 10.1080/03610929208830908

    Article  MATH  Google Scholar 

  • Yule GW: A mathematical theory of evolution based on the conclusions of J.C. Willis, F.R.S. Philos. T. R. Soc. B 1924, 213: 21–87. 10.1098/rstb.1925.0002

    Article  Google Scholar 

Download references


The authoress would like to thank the associate editor and the referees for their constructive comments.

This paper is an extended version of the authoress’ invited plenary presentation at the International Conference on Statistical Distributions and Applications, Michigan USA, October10-12, 2013.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Evdokia Xekalaki.

Additional information

Competing interests

The authoress declares that she has no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xekalaki, E. On the distribution theory of over-dispersion. J Stat Distrib App 1, 19 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: