 Review
 Open Access
 Published:
On the distribution theory of overdispersion
Journal of Statistical Distributions and Applications volume 1, Article number: 19 (2014)
Abstract
An overview of the evolution of probability models for overdispersion is given looking at their origins, motivation, first main contributions, important milestones and applications. A specific class of models called the Waring and generalized Waring models will be a focal point. Their advantages relative to other classes of models and how they can be adapted to handle multivariate data and temporally evolving data will be highlighted.
1. Introduction
Data analysts have often to deal with data that exhibit a variability that differs from what they expect on the basis of the hypothesized model. The phenomenon is known as overdispersion if the observed variability exceeds the expected variability or underdispersion if it is lower than expected.
Such differences between observed and nominal variances can be interpreted as brought about by failures of some of the basic assumptions of the model. These can be classified by the mechanism leading to them. As summarized by Xekalaki ([2006]), in traditional experimental contexts, they may be caused by deviations from the hypothesized structure of the population, due to lack of independence between individual item responses, contagion, clustering, and heterogeneity. In observational study contexts, on the other hand, they are the result of the method of ascertainment, which can lead to partial distortion of the observations. In both contexts, the observed value x no longer represents an observation on the original variable X, but constitutes an observation on a random variable Y whose distribution (the observed distribution) is a distorted version of the distribution of X (original distribution).
Such practical situations have been noticed since over a century ago (e.g. Lexis [1879]; Student [1919]). The Lexis ratio appears to be the first statistic suggested for testing for the presence of over or underdispersion relative to a binomial hypothesized model in populations structured in clusters. Also, for count data, Fisher ([1950]) considered using the sample index of dispersion for testing the appropriateness of a Poisson distribution for an observed variable Y.
The paper is structured as follows. Section 2 introduces the reader to the various approaches to modelling overdispersion in the case of traditional experimental contexts. Section 3 highlights approaches in the case of observational study contexts. Section 4 focuses on the case of heterogeneous populations followed by Sections 5 and 6, which look into a particular type of distribution, the generalized Waring distribution, and its relevance in the context of applications under the various scenaria leading to overdispersion mentioned above. Through the prism of these scenaria, a bivariate version of it is also presented, and its use in applied contexts is discussed in Section 7. A multivariate version of it is also given, and its application potential is outlined in Section 8. Finally, Sections 9 and 10 present a model for temporally evolving data, the multivariate generalized Waring process, and an application illustrating its practical potential.
As the field of accident studies has received much attention, and various theories have been developed for the interpretation of factors underlying an accident situation, most of the models will be presented in accident or actuarial data analysis contexts. Of course the results can be adapted in a great variety of situations with appropriate parameter interpretations so that they can be applied in several other fields ranging from economics, inventory control and insurance through to demometry, biometry, psychometry and web access modeling, as the case is with the application discussed in Section 10.
2. Modelling over  or under  dispersion in traditional experimental contexts
One important, but often ignored by data analysts, implication of using single parameter distributions such as the Poisson distribution to analyse data is that the variance can be determined by the mean, a relation that collapses by the presence of overdispersion. If this is ignored in practice, any form of statistical inference may induce low efficiency, although, for modest amounts of overdispersion this may not be the case (Cox [1983]). So, insight into the mechanisms that induce over (or under) dispersion is required when dealing with such data. Such insight can be gained by looking at the abovementioned potential triggering sources as classified by Xekalaki ([2006]).
2.1 Lack of independence between individual responses
In accident study related contexts, where one is interested in the total number of reported accidents $Y={\displaystyle \sum _{i=1}^{n}{Y}_{i}}$ in a total number of accidents, n, that actually occurred, when accidents are reported with equal probabilities p = P(Y_{ i } = 1) = 1 − P(Y_{ i } = 0), but not independently (Cor(Y_{ i }, Y_{ j }) = ρ ≠ 0), the mean of Y will still be E(Y) = np, but its variance will be $V\left(Y\right)=V\left({\displaystyle \sum _{i=1}^{n}{Y}_{i}}\right)=\mathit{np}\left(1p\right)+2\left(\begin{array}{c}\hfill n\hfill \\ \hfill 2\hfill \end{array}\right)\mathit{\rho p}\left(1p\right)=\mathit{np}\left(1p\right)\left(1+\rho \left(n1\right)\right)$, which exceeds that anticipated under a hypothesized independent trial binomial model if ρ > 0 (overdispersion) and is exceeded by it if ρ < 0 (underdispersion).
2.2 Contagion
Another common reason for a variance differing from what is anticipated, is that when the assumption that the probability of the occurrence of an event in a very short interval is constant fails. This framework is the classical contagion model (Greenwood and Yule [1920]; Xekalaki [1983a]).
In data modelling problems faced by actuaries, for example, this model postulates that initially all individuals have the same probability of incurring an accident, but later this probability changes by each accident sustained. It is assumed, specifically, that none of the individuals has had an accident (e.g. new drivers or persons who are just beginning a new type of work), but later the probability with which a person with Y = y accidents by time t will have another accident in the time period from t to t + dt is of the form (k + my)dt. This leads to the negative binomial as the distribution of Y with p.f. $P\left(Y=y\right)=\left(\begin{array}{c}\hfill k/m\hfill \\ \hfill y\hfill \end{array}\right){e}^{\mathit{kt}}{\left(1{e}^{\mathit{mt}}\right)}^{y}$ with μ = E(Y) = k(e^{mt} − 1)/m, and V(Y) = ke^{mt}(e^{mt} − 1)/m = μe^{mt}.
2.3 Clustering
A frequently overlooked clustered structure of the population may also induce over  or under  dispersion.
In an accident context again, an accident is regarded as a cluster of injuries:
The number Y of injuries incurred by persons involved in N accidents can naturally be thought of as expressed by the sum Y = Y_{1} + Y_{2} + … + Y_{ N } of the numbers Y_{ i } of injuries resulting from the i ‐ th accident, assumed to be i.i.d. independently of the total number of accidents N, with mean μ and variance σ^{2}. In this case, $E\left(Y\right)=E\left({\displaystyle \sum _{i=1}^{N}{Y}_{i}}\right)=\mathit{\mu E}\left(N\right)$ and $V\left(Y\right)=V\left({\displaystyle \sum _{i=1}^{N}{Y}_{i}}\right)={\sigma}^{2}E\left(N\right)+{\mu}^{2}V\left(N\right)$.
So, when N is a Poisson variable with mean E(N) = θ = V(N), the last relationship leads to overdispersion or underdispersion according as σ^{2} + μ^{2} is greater or less than 1.
The first such model was introduced by Cresswell and Froggatt ([1963]) in a different accident context whereby each person is liable to spells of weak performance during which all of the person’s accidents occur. So, if the number N of spells in a unit time period is Poisson distributed with mean θ, and within spells a person can have 0 accidents with probability 1 − m log p, m > 1/log p, 0 < p < 1 and n accidents (n ≥ 1) with probability m(1 − p)^{n}/n, m, n > 0 the observed distribution of accidents is the negative binomial distribution with probability function $P\left(Y=y\right)=\left(\begin{array}{c}\hfill \mathit{\theta m}+y1\hfill \\ \hfill y\hfill \end{array}\right){p}^{\mathit{\theta m}}{\left(1p\right)}^{y}$. This model, known in the literature as the spells model, can also lead to other forms of overdispersed distributions (e.g. Xekalaki [1983a], [1984a]).
2.4 Heterogeneity
Assuming a homogeneous population when in fact the population is heterogeneous, i.e., when its individuals have constant, but unequal probabilities of sustaining an event can also lead to overdispersion. In this case, each member of the population has its own value of the parameter θ and probability density function f(⋅ ; θ).
So, with θ regarded as the inhomogeneity parameter and varying from individual to individual according to any continuous, discrete, or finite step distribution G(⋅) of mean μ and variance σ^{2}, one is led to an observed distribution for Y with probability density function f_{ Y }(y) = E_{ G }(f(y; θ)) = ∫ _{ Θ }f(y; θ)dG(θ), where Θ is the parameter space. Models of this type are known as mixtures. (For details on their application in the statistical literature see e.g. Karlis and Xekalaki [2003]; McLachlan and Peel [2001]; Titterington [1990]). Under such models, the variance of Y consists of two additive components, one representing the variance part due to the variability of θ and one due to the inherent variability of Y if θ did not vary, i.e., V(Y) = V(E(Yθ)) + E(V(Yθ)). This offers an explanation as to why mixture models are often referred to as overdispersion models.
It should be noted that a similar idea forms the basis for analysisofvariance (ANOVA) models, where the total variability can be split into additive components, the ‘between groups’ and the ‘within groups’ components. In the case of the Poisson (θ) distribution, we have in particular that V(Y) = E(θ) + V(θ). Based on the fact that in this case, the factorial moments of Y coincide with the moments of θ about the origin, Carriere ([1993]) proposed a test of the hypothesis that a Poisson mixture fits a data set.
Mixed Poisson distributions were first introduced by Greenwood and Woods ([1919]) in the context of accident studies. Assuming that an individual’s accident experience Yθ is Poisson distributed with parameter θ that was varying from individual to individual according to a gamma distribution with mean μ and index parameter μ/γ, they obtained a negative binomial distribution for Y with probability function $P\left(Y=y\right)=\left(\begin{array}{c}\hfill \mu /\gamma +y1\hfill \\ \hfill y\hfill \end{array}\right){\left\{\gamma /\left(1+\gamma \right)\right\}}^{y}{\left(1+\gamma \right)}^{\mu /\gamma}$ and with mean and variance given respectively by E(Y) = μ and V(Y) = μ(1 + γ), where γ represents the overdispersion parameter.
The mixed Poisson process has been popularised in the actuarial literature by Dubourdieu ([1938]) gamma mixed case was treated by Thyrion ([1969]).
Numerous other mixtures have since then been proposed in the literature for interpreting overdispersion in data, such as binomial mixtures (e.g. Tripathi et al. [1994]), negative binomial mixtures (e.g., Xekalaki [1983a], [c], [1984a]; Irwin [1975]), normal mixtures (e.g. Andrews and Mallows [1974]) and exponential mixtures (e.g. Jewell [1982]). Discrete Poisson mixtures with finite step distributions for the Poisson parameter θ have also been proposed, the interest being on creating clusters of data by grouping the observations on Y according to some criterion (cluster analysis). The number of clusters can be decided on the basis of a testing procedure for the number of components in the finite mixture (Karlis and Xekalaki [1999]).
2.4.1 Heterogeneity in mixture models treating the parameter θ as the dependent variable in a regression model
Heterogeneity in models with explanatory variables can be modelled, by assuming that Y has a parameter θ varying from individual to individual according to some regression model θ = η(x; β) + ε, where x is a vector of explanatory variables, β is a vector of regression coefficients, η is a function of a known form and ε has some known distribution. Such models are known in the literature as random effect models and have been extensively studied within the broad family of Generalized Linear Models. As a simple example in the case of a single covariate, say X, consider data Y_{ i } , i = 1, 2, … , n coming from a Poisson population with mean θ determined by log θ = α + βx + ε for some constants α, β and with ε having a distribution with mean 0 and variance say ϕ. In this case, the marginal distribution of Y is no longer the Poisson distribution. It is a mixed Poisson distribution, with some mixing distribution g(⋅) clearly depending on the distribution of ε. In particular, $Y\sim \mathit{Poisson}\left(t{e}^{\left(\alpha +\mathit{\beta x}\right)}\right)\underset{t}{\wedge}g\left(t\right)$ where t = e^{ε}.
Negative Binomial and Poisson Inverse Gaussian regression models have also been proposed as overdispersed alternatives to the Poisson regression model (e.g. Lawless [1987]; Dean et al. [1989]; Xue and Deddens [1992]). The case of a two finite step distribution, the finite Poison mixture regression model of Wang et al.’s ([1996]) results. The similarity of the mixture representation and the random effects one is discussed in Hinde and Demetrio ([1998]).
In metaanalysis contexts, overdispersion (or underdispersion) refers to variance inflation (or deflation) relative to that anticipated by the fixed effects model. Two possible causes of such phenomena are a population structure in clusters or mixing resulting in a compound distribution. Kulinskaya and Olkin ([2014]) proposed approaching the problem of specification of a random effects model in metaanalysis in terms of a multiplicative model for the distribution of the effect size parameters that allows inflation or deflation. The model considered was motivated by overdispersion induced by intraclass correlation in the model assumed for the distribution of the ith effect size estimate. In particular, the variance of the estimator ${\widehat{\theta}}_{i}$ of the effect size parameter θ_{ i } in the ith study is assumed to be of the form ${\sigma}_{{\widehat{\theta}}_{i}}^{2}=\left(1+\alpha \left({n}_{i}\right)\gamma \right){\sigma}_{i}^{2}$, where α(n_{ i }) are some known functions of the sample sizes ${n}_{i},\phantom{\rule{0.5em}{0ex}}{\sigma}_{i}^{2}$ is the within the ith study variance, i = 1, 2, …, k and γ is interpreted as an intra class correlation parameter.
2.4.2 Estimation and testing for overdispersion under mixture models
The structure of mixture models, including random effect models, entails different forms of variancetomean relationships. So, viewing the mean and variance of Y as represented by E(Y) = μ(β), and V(Y) = σ^{2}(μ(β), λ) respectively for some parameters β, λ a number of estimation approaches have been proposed in the literature based on moment methods (e.g. Breslow [1990]; Lawless [1987]; Moore [1986]) and quasi or pseudo likelihood methods (e.g. Davidian and Carroll [1988]; McCullagh and Nelder [1989]; Nelder and Pregibon [1987]). The above representation for the mean and variance of Y allows also estimation in the case of multiplicative overdispersion as in McCullagh and Nelder ([1989]).
Testing for the presence of overdispersion or underdispersion, on the other hand, can be done by means of asymptotic arguments. Let f(y; θ) denote the density function of a random variable Y in the initial model. Cox ([1983]) showed that, under regularity conditions, the density of y in the overdispersed model, f_{ Y }(y), admits a representation of the form ${f}_{Y}\left(y\right)={E}_{\Theta}\left(f\left(y;\theta \right)\right)=f\left(y;{\mu}_{\theta}\right)+\frac{1}{2}{\sigma}_{\theta}^{2}\frac{{\partial}^{2}f\left(y;{\mu}_{\theta}\right)}{\partial {\mu}_{\theta}^{2}}+{\rm O}\left(1/n\right)$, with ${\mu}_{\theta}={\rm E}\left(\theta \right),\phantom{\rule{0.5em}{0ex}}{\sigma}_{\theta}^{2}=V\left(\theta \right)$ and Θ is the parameter space. This in turn implies that f_{ Y }(y) can be put in the form f(y; μ_{ θ })(1 + εh(y, ϕ_{ θ })), where $h\left(y,{\varphi}_{\theta}\right)={\left[\frac{\partial logf\left(y;{\mu}_{\theta}\right)}{\partial {\mu}_{\theta}}\right]}^{2}+\frac{{\partial}^{2}logf\left(y;{\mu}_{\theta}\right)}{\partial {\mu}_{\theta}^{2}}$.
This representation entails overdispersion if ε > 0, underdispersion if ε < 0 and, of course, none of these complications if ε = 0. Cox ([1983]) suggested a testing procedure for the hypothesis ε = 0, which can be regarded as a general version of standard dispersion tests.
2.5 Zero adjusted models
It would be interesting to note that another aspect of the population structure that is often responsible for the phenomenon of overdispersion or underdispersion is the presence of an excess or a scant number of zeros. Though the models discussed in Sections 2.3 and 2.4 may capture overdispersion or underdispersion rather well, they cannot capture excess or scarcity of zeros. In the literature, this question has been addressed by two types of models known as zeroinflated (or zerodeflated) models, and hurdle models. A unified representation of the models is provided by f(y; ω) = ωI_{{0}}(y) + (1 − ω)f_{ Y }(y), where Y is the count variable, I_{{0}}(⋅) is the indicator function and ω is a constant, whose values, if in (0,1) render a hurdle model for f_{ Y }(0) = 0, a zeroinflated model for f_{ Y }(0) ≠ 0, while negative values of it render a zerodeflated model.
Obviously, ω can be interpreted as the proportion of excess zeros in the case of the first two models and the above representation explains why there can be regarded as having a dual nature. They are (finite) mixtures, which account for heterogeneity, while at the same time, they are capturing a population structure in two clusters. However, in the case ω < 0 (zerodeflation), the model ceases to admit a mixture interpretation.
Zeroinflated and hurdle models have mostly been used for Poisson, generalized Poisson or negative binomial count distributions in various contexts (e.g. Ridout et al. [2001]; Gupta et al. [2004]; Famoye and Singh [2006]). Gupta et al. ([1996]) proposed a zeroadjusted generalized Poisson distribution and studied the effect of not using an adjusted model for zeroinflation or deflation when the occurrence of zeroes differs from the anticipated one. Reviews of such models can be found in Ridout et al. ([1998]), Gschlößl and Czado ([2008]) and NgatchouWandji and Paris ([2011]).
3. Over– or under–dispersion in observational study contexts  the effect of the method of ascertainment
Often, in connection with data collection based on observation or on recording values as produced by nature, the original distribution may not be reproduced due to various reasons. These may lead to partial destruction or partial enhancement (augmentation) of observations. The models that have been introduced to deal with such situations are respectively known as damage models introduced by Rao ([1963]) and generating models introduced by Panaretos ([1983]). The distortion mechanism is usually assumed to be manifested through the conditional distribution of the resulting random variable Y given the value of the original random variable X. Hence, the resulting (observed) distribution is a distorted version of the original distribution that can be represented as a mixture of the distortion mechanism. In particular, in the case of damage, $P\left(Y=r\right)={\displaystyle \sum _{n=r}^{\infty}P\left(Y=rX=n\right)P\left(X=n\right)},\phantom{\rule{0.5em}{0ex}}r=0,\phantom{\rule{0.5em}{0ex}}1,\phantom{\rule{0.5em}{0ex}}2,\phantom{\rule{0.5em}{0ex}}\dots $, while, in the case of enhancement, $P\left(Y=r\right)={\displaystyle \sum _{n=1}^{r}P\left(Y=rX=n\right)P\left(X=n\right)},\phantom{\rule{0.5em}{0ex}}r=\phantom{\rule{0.5em}{0ex}}1,\phantom{\rule{0.5em}{0ex}}2,\phantom{\rule{0.5em}{0ex}}\dots $.
Various forms of distributions have been considered for the distortion mechanism in the above two cases. In the case of damage, the most popular forms have been the binomial distribution Rao ([1963]), mixtures on p of the binomial distribution (e.g. Panaretos [1982]; Xekalaki and Panaretos [1983]) whenever damage can be regarded as additive (Y = X − U, U independent of Y) or in terms of the uniform distribution in (0, x) (e.g. Dimaki and Xekalaki [1990], [1996]; Xekalaki [1984b]) whenever damage can be regarded as multiplicative (Y = [RX], R independent of X and uniformly distributed in (0, 1)). The latter case has also been considered in the context of continuous distributions by Krishnaji ([1970]). The generating model was introduced and studied by Panaretos ([1983]).
Both, the generating model and the damage model offer a perceptive approach in actuarial contexts where one is interested in modelling the distributions of the numbers of accidents, of the damage claims, and of the claimed amounts. These models become relevant due to the fact that people have in general a tendency to under report their accidents, so that the reported (observed) number Y is less than or equal to the actual number X (Y ≤ X), but tend to over report damages incurred by them, so that the reported damage Y is greater than or equal to the true damage X (Y ≥ X).
Another type of distortion is induced by the adoption of a sampling scheme that assigns to the units in the original distribution unequal probabilities of inclusion in the sample. As a result, the value x of X is observed with a frequency that noticeably differs from that anticipated under the original density function f_{ X }(x; θ). It represents an observation on a random variable Y whose probability distribution is the results of adjusting the probabilities of the anticipated distribution through weighting them with the probability with which the value x of X is included in the sample. So, if this probability is proportional to some weight function, w(x, β), β ∈ R, the recorded value x is a value of Y having density function f_{ Y }(x; θ, β) = w(x; β)f_{ x }(x; θ)/E(w(X; β)).
Distributions of this type are known as weighted distributions ( see, e.g. Cox [1962]; Fisher [1934]; Patil and Ord [1976]; Rao [1985]). For w(x; β) = x, these are known as size biased distributions. In actuarial data modelling contexts again, the weight function can represent reporting bias. In the context of reporting accidents or placing damage claims, for example, it can have a value that is directly or inversely analogous to the size x of X, the actual number of incurred accidents or the actual size of the incurred damage. The functions w(x; β) = x and w(x; β) = β^{x} (β > 1 or β < 1) are plausible choices. So, for example, in the case of a Poisson (θ) distributed X, these lead to distributions for Y that are of Poisson type. In particular, the weight function w(x; β) = x leads to a shifted Poisson distribution with probability function P(Y = x) = e^{− θ}θ^{x − 1}/(x − 1) !, x = 1, 2, …, while the choice w(x; β) = β^{x} leads to a Poisson distribution P(Y = x) = e^{− θβ}(θβ)^{x}/x !, x = 0, 1, …. The value of the variance of the observed variable Y under the first assumption for w(x; β) is 1 + θ and exceeds that of X (overdispersion), while under the second assumption it is θβ implying overdispersion for β > 1 or underdispersion for β < 1.
4. Looking closer into the case of heterogeneity
Assuming a specific form for the distribution of the population that generated a data set implies that the mean to variance relation is given for this distribution, e.g. the Poisson distribution with a mean to variance ratio equal to unity. As has become obvious from the above, this relationship ceases to hold in real data sets however. This being rarely the case, flexible families have been sought in the literature by allowing the parameter θ of the original distribution to vary according to a distribution with probability density function, say g(⋅).
As mentioned before, a density function f_{ X }(⋅) is a mixture on the parameter θ of the distribution function f(⋅ ; θ) with some mixing distribution G_{ θ }(⋅), which can be continuous, discrete or a finite step distribution, if it can be written in the form f_{ X }(x) = E_{ G }(f(x; θ)) = ∫_{ Θ } f(x; θ)dG(θ), where Θ is the parameter space. An appropriate choice of a mixing distribution allows its parameter to vary and acts as a means of “loosening” the structure of the initial model, thus offering more realistic interpretations of the mechanisms that generated the data.
A large number of Poisson mixtures have been developed. (For an extensive review, see Karlis and Xekalaki [2003], [2005]). The derivation of the negative binomial distribution, as a mixture of the Poisson distribution with a gamma distribution as the mixing distribution, originally obtained by Greenwood and Yule ([1920]) constitutes a typical example. Mixtures of the negative binomial distribution have also been widely used in connection with applications in a plethora of fields. These include the Yule distribution (Yule [1924]; Irwin [1941]; Xekalaki [1983c], [1984b]) the Waring distribution (Irwin [1963]) and the generalized Waring distribution (Irwin [1968], [1975]; Xekalaki [1981], [1983a], [1984a]), which contains the Yule distribution and the Waring distribution as a special cases.
In what follows, we focus on the generalized Waring distribution and its relevance in accident data modeling contexts.
5. The generalized Waring distribution
This was introduced by Irwin ([1968]) in connection to biological data and later was shown by him to arise as an accident distribution (Irwin [1975]). It is the distribution with probability generating function given by
with _{2}F_{1}(a, b; c; z) denoting the Gauss hypergeometric function $\sum}_{r=a}^{x}\left\{{a}_{\left(r\right)}{b}_{\left(r\right)}{z}^{r}\right\}/\left\{{c}_{\left(r\right)}r!\right\$, where h_{(l)} = Γ(h + l)/Γ(h), h > 0, l ∈ R.
Irwin’s starting point was Waring’s expansion (hence the distribution’s name) given by $\frac{1}{xa}={\displaystyle \sum _{r=0}^{\infty}\frac{{a}_{\left(r\right)}}{{x}_{\left(r+1\right)}}}$, which he then generalized to $\frac{1}{{\left(xa\right)}_{\left(k\right)}}={\displaystyle \sum _{r=0}^{\infty}\frac{{a}_{\left(r\right)}{k}_{\left(r\right)}}{{x}_{\left(k+r\right)}}\frac{1}{r!}},\phantom{\rule{0.5em}{0ex}}\alpha ,k>0$.
Hence, by multiplying both sides by ρ_{(k)}, where ρ = x − a > 0, the successive terms of the resulting series could he regarded as defining a probability function, which he termed the generalized Waring distribution with parameters α, k, ρ. In particular, the probability function of the generalized Waring distribution with parameters α, k, ρ is given by
where h_{(l)} = Γ(h + l)/Γ(h).
Notwithstanding the complexity of its structure, this distribution was shown to offer an insightful tool in the interpretation of accident data as will be seen below. Among its aspects that can be of practical value, is that, as shown by Xekalaki ([1983b]), it is a discrete selfdecomposable distribution in Steutel and van Harn’s ([1979]) sense, hence infinitely divisible, implying that its probability generating function can be put in the form $G\left(s\right)=exp\left\{\lambda {\displaystyle {\int}_{s}^{1}\frac{1g\left(u\right)}{1u}\mathit{du}}\right\}$, where λ = p_{1}/p_{0} and g(⋅) denotes the probability generating function of the distribution with probability function satisfying the recurrence relation
6. The generalized Waring distribution in relation to accident theory
The hypotheses that have formed the basis of investigations into the occurrence of accidents since almost a century ago are

(i)
Pure chance , giving rise to the Poisson distribution

(ii)
True contagion , i.e. the hypothesis that initially all individuals have the same probability of incurring an accident but that this probability is modified by each accident sustained.

(iii)
Apparent contagion (heterogeneity) , i.e. the hypothesis that individuals have constant but unequal probabilities of having an accident  the resultant distribution being a compound Poisson distribution (“accident proneness” model).

(iv)
The “Spells” Model , i.e each person is liable to periods of time during which the person’s performance is weak (spells). All of the person’s accidents occur within those spells. The numbers of accidents within different spells are independent and independent of the number of spells.
As already seen, the negative binomial distribution can be given a an accident proneness and a “spells” interpretation in the context of accident theory in terms of a gamma mixed Poisson distribution and a Poisson distribution generalized by a logarithmic distribution (Kemp [1967]).
Therefore, a good fit of the negative binomial is no help at all in distinguishing among the “proneness”, “contagion” and “spells” hypotheses. This is known as the discrimination problem between the compounded, contagion and generalized models for the negative binomial distribution and has been discussed by Arbous and Kerrich ([1951]); Bates and Neyman ([1952]); Gurland ([1959]) and Cane ([1974], [1977]). For an extensive bibliography on the accident hypotheses mentioned, see Kemp ([1970]).
6.1 Irwin’s “Proneness” model
As evident, in all three of the above models, the data are treated as if the individuals under observation were exposed to equal environmental risk, a fact criticized by Irwin ([1968]), who suggested a threeparameter distribution, which he called the “univariate generalized Waring distribution” (UGWD). He derived this distribution in a framework that allows separately for random factors, differences in the exposure of individuals to external risk of accident, and differences in proneness.
In particular, his model assumes a non homogeneous population with respect to personal and environmental attributes affecting the occurrence of accidents.
Let the distribution of the number, X, of accidents for individuals of equal proneness ν, and of equal exposure to external risk of accident λν, i.e. λ for given ν), have probability generating function
in a unit time interval (0, 1). If the distributions of λν and ν in the population at risk can be described by the probability density functions (pdf)
and
respectively, the pgf of the resulting distribution of accidents will be {ρ_{(k)2} F_{1}(a, k; a + k + ρ; s}/(a + ρ)_{(k)}, i.e. the univariate generalized Waring distribution with parameters a, k and ρ, which will be denoted by UGWD(a, k; ρ). Here, _{2} F_{1}(a, b; c; z) denotes the Gauss hypergeometric function $\sum}_{r=a}^{x}\left\{{a}_{\left(r\right)}{b}_{\left(r\right)}{z}^{r}\right\}/\left\{{c}_{\left(r\right)}r!\right\$, where h_{(l)} = Γ(h + l)/Γ(h), h > 0, l ∈ R. For more information about the UGWD the reader is referred to the work of Irwin ([1963], [1968], 1975); Xekalaki ([1981]) and the references therein and Xekalaki ([1983a]).
6.2 The “Contagion” model
Xekalaki ([1983a]), extended the assumptions of the classical contagion model developed by Greenwood and Yule ([1920]) by considering a population of individuals exposed to varying accident risk.
In particular, assume that at time t = 0 none of the individuals has had an accident. This would be true if, for example, with a population of new drivers or of individuals just beginning a new type of work. Suppose that during the time period from t to t + dt a person with x accidents by time t can incur another accident with a probability of {(k + x)/(1 + λt)}λdt (independent of the times of the previous accidents), where k is a positive constant and λ refers to the individual’s risk exposure. At t = 0, since x = 0, the probability of an accident is kλdt. Hence, what the model basically assumes is that, initially, the probability of having an accident is not the same for each individual, but depends on the external conditions; later, the probability is also affected by the number of preceding accidents. Under these assumptions and if differences in the exposure to accident risk can be thought of as governed by a distribution with probability density function given by {Γ(a + ρ)v^{a − 1}(1 + ν)^{− (a + ρ)}}/{Γ(ρ)Γ(a)}, the final distribution of accidents over a unit period of time turns out to be UGWD(a, k; ρ).
The above derivation of the generalized Waring distribution closely relates to a modeling approach whereby the distribution of accident occurrences in a time internal (0, t) is regarded as underpinned by a stochastic process and, in particular, by a pure birth process {X_{ t }, t = 0, 1, 2, …} where the probability of a person to incur an accident in (t, t + dt), having had x accidents by time t is P(X_{t + δt} = x + 1X_{ t } = x) = f_{ λ }(n, t)δt + o(δt).
Irwin ([1941]), followed later by Arbous and Kerrich ([1951]), derived the negative binomial distribution on the hypothesis solving the associated Kolmogorov forward differential equations by a method due to McKendrick ([1925]). Specifically, assuming that individuals can have during the time period from t to dt, individuals can have 0 accidents with probability 1 − f_{ λ }(x, t)dt, 1 accident with probability f_{ λ }(x, t)dt and > 1 accidents with probability 0, he solved the resulting system of Kolmogorov forward differencedifferential equations
in terms of a single differencedifferential equation involving the probability generating function G_{ λ }(s; t) of X_{ t } given by
where ${G}_{\lambda}\left(s;t\right)={\displaystyle \sum _{x=0}^{\infty}{P}_{\lambda}\left(x,t\right){s}^{x}}$. (He obtained this equation by multiplying the ith equation of the system by s^{i − 1}, i = 1, 2, … and summing the resulting equations).
Assuming further that f_{ λ }(x, t) = λ(k + mx), k, m > 0 and subject to the initial conditions G_{ λ }(1; t) = G_{ λ }(s; 0) = 1, he obtained for the distribution of accidents
i.e. the probability generating function of the negative binomial distribution with parameters k/m and (1 − e^{− λmt})^{− 1}.
Relaxing Irwin’s implicit assumption that all individuals were exposed to the same accident risk, Xekalaki ([1981]) treated the parameter λ as referring to a variable risk exposure according to an exponential distribution with density ae^{− aλ}, a > 0 and obtained the generalized Waring distribution as the accident distribution. In particular,
which is the probability generating function of the $\mathit{UGWD}\left(\frac{k}{m},1;\frac{a}{\mathit{mt}}\right)$.
This model was considered by Panaretos ([1989]) for the description of the evolution of surnames. Faddy ([1997]) provided a unifying approach to under and overdispersion relative to the Poisson distribution within a scheme of a similar nature, which generalizes the simple Poisson process that underpins the Poisson distribution. He demonstrated that any count distribution can be obtained by a suitable choice of f_{ λ }(x, t) and provided an expression for the system of Kolmogorov forward differential equations in terms of a matrixexponential function.
Finally, Winkelmann ([1995]) looked at under and overdispersion using renewal theory by exploring the link between duration dependence and dispersion. He demonstrated that discrepancies between observed and nominal variances are conveyed by a hazard function of the waiting times that is not constant, but instead is a decreasing function of time inducing overdispersion or an increasing function of time inducing underdispersion.
6.3 The “Spells” model
Further, Xekalaki ([1983a]) considered a variant of the “spells” model due to Cresswell and Froggatt ([1963]) that rejects the presence of proneness and contagion.
Assume that every individual is liable to spells and that the number of spells in a given time period (0, t) is a Poisson variable with parameter θt, θ > 0. Suppose that no accidents occur outside spells and that the probability of an accident within a spell depends on the risk exposure of the particular individual. In particular, suppose that within a spell a person can have
0 < m < 1/log(1 + λ), λ > 0, where λ is the external risk parameter for the given individual. Assume further that the numbers of accidents arising out of different spells are independent and independent of the number of spells. Then, if differences in the risk exposure can be described by a beta distribution of the second kind with probability density function, {Γ(a + ρ)v^{a − 1}(1 + ν)^{− (a + ρ)}}/{Γ(ρ)Γ(a)}, a, ρ > 0, the resulting accident distribution will have probability generating function given by
Hence, in a unit time period, the number of accidents follows the UGWD(a, θm; ρ).
It is worth noticing that the form of the distribution of λ in the last two models is more general than that considered by the proneness model. It is however, a reasonable choice as it implies a beta distribution of the first kind (Pearson Type I) for the parameter q = λ/(1 + λ) of the negative binomial distribution of Xλ.
6.4 Deciding about the underlying model
It is evident from the above, that three completely different sets of hypotheses give rise to exactly the same form of distribution and that while the UGWD may be a plausible model if accident proneness is a accepted as an established fact, a satisfactory fit of it is not to be taken as evidence for the validity of the proneness hypothesis. How can we then discriminate?
Statisticians have always been excited to look for ways of discriminating among different models that give rise to the same distribution. Most attempts seem to have been concentrated on distinguishing between the proneness and contagion models generating the negative binomial distribution. The papers by Bates and Neyman ([1952]) and Bates ([1955]) cover part of the work that has been done on the subject, though they primarily focus on distinguishing between different forms of contagion. Shaw and Sichel’s ([1971]) attempt was on proving or disproving proneness by ranking individual accident performance on a scale based on their average interval between successive accidents. However, the first systematic study on how one can discriminate between the proneness and contagion models of the negative binomial distribution appears to be that by Cane ([1974]).
She demonstrated, however, that one cannot distinguish between the two models, even with knowledge of the time sequence of accidents. She demonstrated, in particular, that the conditional distribution of the times, t_{ i }, i = 1, 2, …, n at which accidents occurred in a time period (0, T) is the same in both cases, namely that of an ordered sample from a uniform distribution over (0, T) with probability density function n ! T^{− n}. In fact, this is the case for any compound Poisson accident distribution whose compounding distribution has finite moments (Cane [1977]), hence also for the UGWD(a, k; ρ).
This implies that the availability of information on the times of the occurrence of accidents is not sufficient to guide one’s choice between the proneness and contagion models.
However, as demonstrated by Xekalaki ([1983a]), there appears to exist a possibility in the framework of the Spells model. Consider, in particular, the problem of finding the joint distribution of times t_{ i }, i = 1, 2, …, n of accidents by individuals with n accidents in a unit period of time under the spells model. For fixed λ, accidents occur as events in a generalized Poisson process: $X\left(t\right)={\displaystyle \sum _{i=1}^{N\left(t\right)}{Y}_{i}},\phantom{\rule{0.5em}{0ex}}N\left(t\right)\sim \mathit{Poisson}\left(\mathit{\theta t}\right)$, where θ > 0, t ≥ 0 and Y_{ i } are identically and independently distributed with probability density function given by {Γ(a + ρ)v^{a − 1}(1 + ν)^{− (a + ρ)}}/{Γ(ρ)Γ(a)}, a, ρ > 0. Consequently, the required probability function can be written as ${\int}_{0}^{\infty}{\left(1+\lambda \right)}^{\mathit{\theta m}\left(1{t}_{n}\right)}\left[{\displaystyle \prod _{i=1}^{n}\left\{\mathit{\lambda m\theta}{\left(1+\lambda \right)}^{\mathit{\theta m}\left({t}_{i}{t}_{i1}\right)1}d{t}_{i}\right\}}\right]\mathit{dH}\left(\lambda \right)$, with H(⋅) denoting the distribution function of the beta distribution of the second kind defined as above. Hence, the required probability is $\left\{\frac{{\left(\mathit{\theta m}\right)}^{n}{\rho}_{\left(a\right)}{a}_{\left(n\right)}}{{\left(\mathit{\theta m}+\rho \right)}_{\left(a+n\right)}}\right\}d{t}_{1}\dots \phantom{\rule{0.5em}{0ex}}d{t}_{n}$. Therefore, conditional on n accidents during a time period from 0 to 1, the joint pdf of t_{ i }, i = 1, 2, …, n, is n ! (θm)^{n}/(θm)_{(n)}.
The obtained form differs from that arising under the proneness and contagion models. This fact is itself is very interesting as far as establishing the presence of spells is concerned, as it implies the following: if an observed accident distribution of the UGWD type has arisen from the spells model, the time intervals (0, t_{ i }), i = 1, 2, …, n, given a total of n accidents, will be jointly distributed with the above density function. Any departure from this distribution is, then, evidence against the spells model. Of course, if on the available evidence one has to reject this form in favor of that obtained by Cane, then one is faced again with the question: “proneness or contagion?” This cannot be answered by studying the distribution of t_{ i }.
6.5 What does Irwin’s accident model offer beyond a good fit to the data?
The innovation brought by Irwin’s accident proneness model does not merely lie in the better fit it provides to accident data, but in the possibility of partitioning the total variance (σ^{2}) into three additive components due to proneness $\left({\sigma}_{\nu}^{2}\right)$, liability $\left({\sigma}_{\lambda}^{2}\right)$ and randomness $\left({\sigma}_{R}^{2}\right)$ thus,
Where
There is still, however, a problem due to the fact that the UGWD(a, k; ρ) is symmetrical in a and k (UGWD(a, k; ρ) ∼ UGWD(k, a; ρ)). Hence, although one may consider that ${\sigma}_{\lambda}^{2}+{k}^{2}{\sigma}_{\nu}^{2}$ represents the variance component due to all nonrandom factors, the mathematics alone cannot determine whether ${\sigma}_{\lambda}^{2}$ represents the liability component and ${k}^{2}{\sigma}_{\nu}^{2}$ the proneness component or vice versa. As a consequence, distinguishable estimates for the nonrandom variance components ${\sigma}_{\lambda}^{2}$ and ${\sigma}_{\nu}^{2}$ cannot be obtained unless subjective judgement is made. This problem was addressed by Xekalaki ([1984a]) with the introduction of her bivariate form of the generalized Waring distribution.
7. The bivariate generalized Waring distribution
Generalizing further Irwin’s ([1963]) generalization of Waring’s expansion, we have for k, m, a > 0,
If x > a, the above series is convergent. Then, by letting ρ = x − a > 0 and multiplying both sides by ρ_{(k + m)}, leads to a double series of positive terms converging to unity. The general term of the series therefore can be regarded as defining a bivariate discrete probability distribution with probability function
In the remainder of the paper, we refer to this distribution as the bivariate generalized Waring distribution with parameters a, k, m and ρ and we denote it by BGWD(a; k, m; ρ).
7.1 The BGWD in relation to accident theory
Assume that individuals of proneness ν and liability λ_{ i }ν for a period i of observation incur, over two nonoverlapping time periods, accidents X, Y according to a double Poisson distribution ${G}_{\left(X,Y\right){\lambda}_{1},{\lambda}_{2},\nu}\left(s,t\right)=exp\left\{\left({\lambda}_{1}\nu \right)\left(s1\right)+\left({\lambda}_{2}\nu \right)\left(t1\right)\right\},\phantom{\rule{0.5em}{0ex}}{\lambda}_{1},{\lambda}_{2}>0$. Assume further that the liability parameters λ_{1}ν, λ_{2}ν are independently gamma distributed with densities ${\left(\Gamma \left({\theta}_{i}\right){\nu}^{{\theta}_{i}}\right)}^{1}{e}^{{\lambda}_{i}\nu}{\lambda}_{i}^{{\theta}_{i}1},\phantom{\rule{0.5em}{0ex}}{\theta}_{1}\equiv k,\phantom{\rule{0.5em}{0ex}}{\theta}_{2}\equiv m,\phantom{\rule{0.5em}{0ex}}\nu >0$, whence for individuals with the same proneness ν, but varying liabilities, the numbers of occurring accidents over the two periods are jointly distributed as the double negative binomial with probability generating function
Letting now the proneness parameter ν be beta distributed with density function {Γ(a + ρ)v^{a − 1}(1 + ν)^{− (a + ρ)}}/{Γ(ρ)Γ(a)}, a, ρ > 0, the probability generating function of the joint distribution of accidents over the two periods takes the form
where ${F}_{1}\left(a;b,c;d;u,v\right)={\displaystyle {\sum}_{r,s=0}^{\infty}\left\{{a}_{\left(r+s\right)}{b}_{\left(r\right)}{c}_{\left(s\right)}{u}^{r}{v}^{s}\right\}/\left\{{d}_{\left(r+s\right)}r!s!\right\}}$ is Appell’s hypergeometric series and h(l) = Γ(h + l)/Γ(h), h > 0, l ∈ R.
Regarding separate estimation of the contribution of proneness, liability and randomness in a given accident situation over a period of observation whenever proneness is accepted as an established fact, Xekalaki ([1984a]) showed that rearranging the observed distribution in two nonoverlapping subintervals and fitting the BGWD(a; k, m; ρ) to the resulting bivariate accident distribution does enable separate estimation of the variance components. This is demonstrated in Table 1.
Further models leading to the BGWD provided by Xekalaki ([1984c]), provide the framework within which one can also obtain the BGWD as an accident distribution under the contagion and the spells accident theories.
8. The multivariate generalized Waring distribution
The nvariate version of the genaralized Waring distribution introduced and studied by Xekalaki ([1986]) is also obtained as an inverse factorial distribution. Its probability generating function is given by
with %${F}_{D}\left(a;{\beta}_{1},\dots ,{\beta}_{n};\gamma ;\underset{\xaf}{t}\right)$ denoting Lauricella’s hypergeometric function given by
The probability function of it is given by
and its probabilities are related by the following first order recurrences, which facilitate their computation
An interesting aspect of the bivariate and multivariate versions of the generalized Waring distribution is that their marginal distributions (conditional and unconditional) as well as their convolution are of the same form (UGWD’s), properties that exhibit a symmetry analogous to that existing in the case of the multivariate normal distribution. Further, the generalized Waring distribution is selfdecomposable (Xekalaki [1983b]).
9. The Generalized Waring Process (gWp)
Looking into how temporally evolving data from the wide spectrum of application contexts that can reasonably be viewed from the perspective of the frameworks discussed in Sections 6, 7 and 8 can be treated, Xekalaki and Zografi ([2008]) defined and studied the generalized Waring process. In establishing its definition, the structural properties of both the bivariate and the multivariate versions of the generalized Waring distribution played a significant role. This process, analogously to the case of Poisson and Pólya processes, which can be obtained as limiting cases of it, was shown to be a Markov process.
Let {N(t), t ≥ 0} be a counting process. This is said to be a generalized Waring process with parameters a, k, ρ > 0, denoted by gWp(a, k; ρ), if (i) N(0) = 0, (ii) N(t) is a Markov process, and (iii) N(t + h) − N(t) has the generalized Waring distribution with parameters a, k; ρ for h > 0, t ≥ 0. The process starts at 0, it has stationary increments and
i.e., N(t) has a generalized Waring distribution with parameters a, kt; ρ.
The transition probabilities of the generalized Waring process are given by
with the last equality indicating that the generalized Waring process is a nonhomogenous Markov process. Its mean and variance are respectively
Note that since the generalized Waring process is a stationary process and its mean is of the form E[N(t)] = ηt, the above formula implies that its intensity is η = ak/(ρ − 1). Its variance can be split into three additive components, thus
with the liability and random components dependent on time. In particular,
9.1 The generalized Waring process in an accident proneness context
We consider a population which is inhomogeneous with respect to personal and environmental attributes affecting the occurrence of accidents. The terms “accident proneness” and “accident liability” are again used to refer respectively to a person’s predisposition to accidents, and to a person’s exposure to external risk of accident with the conditional distribution of the random variable λ given ν describing differences in external risk factors among individuals. Liability fluctuations over a time interval (t, t + h) depend on the length h of the interval and are described by a distribution for λν with probability density function λ^{kh − 1}e^{− λ/(νh)}(νh)^{− kh}/Γ(kh). Allowing further the parameter ν have a beta distribution of the second kind with parameters a and ρ and density function ϕ given by
ϕ(ν) = Γ(a + ρ)ν^{a − 1}(1 + ν)^{− (a + ρ)}/[Γ(a)Γ(ρ)], a, ρ ≥ 0, we obtain for the distribution of the number of accidents N(t):
and
So, the process arising in the context of this model, satisfies the defining conditions of the generalized Waring process.
9.2 The generalized Waring process in the context of a spells model
Xekalaki and Zografi ([2008]) showed that the generalized Waring process could also be used in modeling temporally evolving data in the context of a spells model. Assume again that each person is liable to spells and that no accidents can occur outside spells. Let S(t), t = 0, 1, 2, …, the number of spells up to a given moment t, be a homogeneous Poisson process with rate k/m, k > 0, the number X_{ i } of accidents within a spell i be a random variable with a logarithmic series distribution with parameters m and ν and probability function given by %$P\left({X}_{i}=n\right)=\frac{m}{n}{\left(\frac{\nu}{1+\nu}\right)}^{n},\phantom{\rule{0.5em}{0ex}}n\ge 1$ with P(X_{ i } = 0) = 1 − m log(1 + ν), ν > 0, 0 < m < 1/log(1 + ν), and the numbers of accidents arising out of different spells be independent and independent of the number S(t) of spells. Here ν is regarded as the external risk parameter, too, which they assumed varying according to a beta distribution of the second kind with parameters a and ρ and probability density function given by Γ(a + ρ)ν^{a − 1}(1 + ν)^{− (a + ρ)}/[Γ(a)Γ(ρ)], a, ρ ≥ 0. They then showed that the above framework leads to a process conforming with the postulates of the generalized Waring process, thus demonstrating its potential application in the context of the Spells model.
10. An application: modeling the counting process {N(s), s > 0} associated with the access pattern of a web site
As an illustration of the application potential of the generalized Waring process in other fields by appropriately adjusting the concepts and terminology used in this paper so as to have natural interpretations, we outline an example of a model for temporally evolving data on web access patterns provided by Xekalaki and Zografi ([2008]).
In this context, {N(s), s > 0} is the counting process associated with the access pattern of a web site, where, for any t > 0, N(t) represents the number of visits that the web pages on this particular site get within the interval (0, t). Note that the generalized Waring distribution was cited in Ajiferuke et al. ([2004]) as used by them to fit observed website visitation data for a given period, i.e, to model counts N(t_{0}) of web visits on a given fixed time interval (0, t_{0}).
Except for chance, visits to a web site can be regarded as affected by the intrinsic appeal of the particular site to web users (corresponding to proneness) as well as by exogenous factors (corresponding to external factors) such as, links provided by other sites to the particular site, how well the site is advertised etc.
Letting ν denote the intrinsic factors and λν the exogenous factors. Then assuming that N(t)λ follows a Poisson(λ(t)) distribution, where λ(t) = λt with λν following a gamma distribution with density λ^{kt − 1}e^{− λ/(νt)}(νt)^{− kt}/Γ(kt), and with ν following a beta distribution of the second kind with density Γ(a + ρ)ν^{a − 1}(1 + ν)^{− (a + ρ)}/[Γ(a)Γ(ρ)], a, ρ ≥ 0, then the unconditional distribution of N(t) is the GWD(a, kt; ρ), i.e. the process {N(t), t ≥ 0} is a generalized Waring process.
10.1 The data
The log files representing the hits on an eshop site for the period from March 31, 2006 to April 30, 2006 have been used to fit this model. (A log file typically contains information on the times of visits per IP address per day). On the basis of such log files, the visits per day made by each of 468 IP addresses to a web site during the above period were enumerated yielding 468 paths of visits N_{ i }(t_{ j }) made by IP address i up to and including time t_{ j } denoted by {N_{ i }(t_{ j }), i = 1, 2, …, 468; j = 1, 2, …, 31}.
Moment estimates of the parameters of the generalized Waring process were obtained employing an estimation procedure for spatial point process data termed in the literature as the centered reduced moment method. The method introduced and studied by Ripley ([1976], [1977]) utilizes the intensity of the process and the mean number of further points within distance s of an arbitrary point of the process. In particular, the method utilizes the moment estimators %$E\left(\widehat{N}\left(s\right)\right)={\widehat{\mu}}_{1}=\widehat{\eta}s=\mathit{ns}/h,\phantom{\rule{0.5em}{0ex}}E\left({\widehat{N}}^{2}\left(s\right)\right)={\widehat{\mu}}_{2}=X/{n}_{\left(2\right)},\phantom{\rule{0.5em}{0ex}}E\left({\widehat{N}}^{3}\left(s\right)\right)={\widehat{\mu}}_{3}=\left(ZX\right)/{n}_{\left(3\right)}$ with %$X={\displaystyle \sum _{i=1}^{n}{\displaystyle \sum _{i\ne j}{\varphi}_{s}^{2}\left({x}_{i},{x}_{j}\right)}},\phantom{\rule{0.5em}{0ex}}Z={\displaystyle \sum _{i=1}^{n}\left({\displaystyle \sum _{j\ne i}{\varphi}_{s}\left({x}_{i},{x}_{j}\right)}\right)}\left({\displaystyle \sum _{k\ne i}{\varphi}_{s}\left({x}_{i},{x}_{k}\right)}\right)$, where the quantities involved in the above equations represent weights defined, for each value x_{ i } in the collection of points {x_{ i } : i = 1, 2, …, n} of the process within a time interval of length h, as follows: For each x_{ i } in {x_{ i } : i = 1, 2, …, n} and a given s > 0, consider the interval of center x_{ i } and length s and assign to every point x_{ j }, j ≠ i in this interval the weight ϕ_{ s }(x_{ i }, x_{ j }) = ω(x_{ i }, x_{ j })^{− 1}, where ω(x_{ i }, x_{ j }) is the number of other points {x_{ k }, k ≠ i, k ≠ j} of the process that are included in the interval of length x_{ i } − x_{ j } and center x_{ i } (see also Diggle and Chetwynd [1991]; Chetwynd and Diggle [1998], among others). The standard errors of the thus obtained parameter estimators can in principle be determined by simulation, but the associated computations are formidable. Approximation formulas exist only for the case of homogeneous planar Poisson process, while, for the class of stationary Cox process, there is no obvious way to obtain estimable expressions as noted by Chetwynd and Diggle ([1998]).
The observed paths were compared to the corresponding time series of simulated realizations of the generalized Waring process over the same time segment. For each IP address, 100 simulated realizations of the gWp(a, k; ρ), were obtained and each of the observed time series paths was compared to the corresponding simulated ones. On average, the realizations of the generalized Waring process exhibited similar structural characteristics, notably recognizable, to those of the paths of the observed time series. For illustration purposes, the path of the observed time series associated with one of the IP addresses considered is presented in Figure 1. In the graph, the path is superimposed by a sample of three of the 100 corresponding simulated realizations of the gWp(a, k; ρ). Inspection of the graph provides a visual appreciation of the degree of similarity in the structural characteristics of the path of the observed and the realized time series.
Following Lewis ([1972]), Brillinger ([1978]) and Andersen et al. ([1993]), the closeness of the observed and realized time series was also checked using diagnostic plots based on the inverseintensity residuals computed for each value x_{ j } in the collection of points {x_{ j } : j = 1, 2, …, n} of the process given by %${R}_{\widehat{\theta}}\left({B}_{j},{\eta}^{1}\right)={\displaystyle \sum _{{x}_{i}\in {B}_{j}}\widehat{\eta}\left({x}_{i}\right)}{\displaystyle {\int}_{{B}_{j}}^{j}{I}_{{R}^{+}}\left(\widehat{\eta}\left(x\right)\right)\mathit{dx}}$ where %${B}_{j}=\left(0,{x}_{j}\right),\phantom{\rule{0.5em}{0ex}}\widehat{\theta}={\left(\widehat{a},\widehat{k},\widehat{\rho}\right)}^{1},\phantom{\rule{0.5em}{0ex}}\widehat{\eta}\left(x\right)=\eta \left(x,\widehat{\theta}\right)$ is the fitted intensity and %${I}_{{R}^{+}}\left(\cdot \right)$ is the indicator function. These plots exhibit similar results. The plot corresponding to the data associated to the IP address considered is shown in Figure 2.
References
Ajiferuke, I, Wolfram, D, Xie, H: Modelling website visitation and resource usage characteristics by IP address data. In: Julien, H, Thompson, S (eds.). Proceedings of the 32nd Annual Conference of the Canadian Association for Information Science, Manitoba, Canada (2004). Available at: www.caisacsi.ca/proceedings/2004/ajiferuke_2004.pdf
Andersen P, Borgan Ø, Gill R, Keiding N: Statistical Models based on Counting Processes. Springer, New York; 1993.
Andrews DF, Mallows CL: Scale mixtures of normal distributions. J. R. Stat. Soc. B 1974, 36: 99–102.
Arbous AG, Kerrich JE: Accident statistics and the concept of accident proneness. Biometrics 1951, 7: 340–432. 10.2307/3001656
Bates GE: Joint distributions of time intervals for the occurrence of successive accidents in a generalized Polya sheme. Ann. Math. Stat. 1955, 26: 705–720. 10.1214/aoms/1177728429
Bates GE, Neyman J: Contributions to the theory of accident proneness. U. Calif. Publ. Stat. 1952, 1: 215–275.
Breslow N: Tests of hypotheses in overdispersed poisson regression and other quasilikelihood models. J. Am. Stat. Assoc. 1990, 85: 565–571. 10.1080/01621459.1990.10476236
Brillinger D: Comparative Aspects of the Study of Ordinary Time Series and of Point Processes. In Developments in Statistics. Edited by: Krishnaiah PR. Academic Press, New York; 1978:33–133.
Cane VR: The concept of accident proneness. B. Inst. Math. Bulgarian Acad. Sci. 1974, 15: 183–188.
Cane VR: A class of nonidentifiable stochastic models. J. Appl. Probab. 1977, 14: 475–782. 10.2307/3213450
Carriere J: Nonparametric tests for mixed poisson distributions. Insur. Math. Econ. 1993, 12: 3–8. 10.1016/01676687(93)90994Z
Chetwynd AG, Diggle PJ: On estimating the reduced second moment measure of a stationary spatial point process. Aust. N.Z. J. Stat. 1998, 40(1):11–15. 10.1111/1467842X.00002
Cox DR: Renewal Theory. Barnes & Noble, New York; 1962.
Cox DR: Some remarks on overdispersion. Biometrika 1983, 70: 269–274. 10.1093/biomet/70.1.269
Cresswell WL, Froggatt P: The Causation of Bus Driver Accidents. Oxford University Press, London; 1963.
Davidian M, Carroll RJ: A note on extended quasilikelihood. J. R. Stat. Soc. B 1988, 50: 74–82.
Dean CB, Lawless J, Willmot GE: A mixed poissoninverse Gaussian regression model. Can. J. Stat. 1989, 17: 171–182. 10.2307/3314846
Diggle PJ, Chetwynd AG: Second order analysis of spatial clustering for inhomogeneous populations. Biometrics 1991, 47: 1155–1163. 10.2307/2532668
Dimaki C, Xekalaki E: Identifiability of income distributions in the context of damage and generating models. Commun. Stat. ATheor. 1990, 19(8):2757–2766. 10.1080/03610929008830346
Dimaki C, Xekalaki E: Additive and multiplicative distortion of observations: some characteristic properties. J. Appl. Stat. Sc. 1996, 5(2/3):113–127.
Dubourdieu J: Remarques Relatives sur la Théorie Mathématique de l’AssuranceAccidents. Bull. Trim. Inst. Actuaries Fran. 1938, 44: 79–146.
Faddy MJ: Extending poisson process modelling and analysis of count data. Biometrical J. 1997, 39(4):431–440. 10.1002/bimj.4710390405
Famoye F, Singh KP: Zeroinflated generalized poisson regression model with an application to domestic violence. Data. J. Data Sci. 2006, 4: 117–130.
Fisher RA: The effects of methods of ascertainment upon the estimation of frequencies. Ann. Eugenic. 1934, 6: 13–25. 10.1111/j.14691809.1934.tb02105.x
Fisher RA: The significance of deviations from expectation in a poisson series. Biometrics 1950, 6: 17–24. 10.2307/3001420
Greenwood M, Woods HM: On the Incidence of Industrial Accidents upon Individuals with Special Reference to Multiple Accidents. In Report of the Industrial Fatigue Research Board, 4. His Majesty’s Stationary Office, London; 1919:1–28.
Greenwood M, Yule GU: An inquiry into the nature of frequency distributions representative of multiple happenings with particular reference to the occurrence of multiple attack of disease or repeated accidents. J. R. Stat. Soc. 1920, 83: 255–279. 10.2307/2341080
Gschlößl S, Czado C: Modelling count data with overdispersion and spatial effects. Stat. Pap. 2008, 49: 531–552. 10.1007/s0036200600316
Gupta PL, Gupta RC, Tripathi RC: Analysis of zero adjusted count data. Comput. Stat. Data An. 1996, 23: 207–218. 10.1016/S01679473(96)000321
Gupta PL, Gupta RC, Tripathi RC: Score test for zero inflated generalized poisson regression model. Commun. Stat. ATheor. 2004, 33: 47–64. 10.1081/STA120026576
Gurland J: Some applications of the negative binomial and other contagious distributions. Am. J. Public Health 1959, 49: 1388–1399. 10.2105/AJPH.49.10.1388
Hinde J, Demetrio CGB: Overdispersion: models and estimation. Comput. Stat. Data An. 1998, 27: 151–170. 10.1016/S01679473(98)000073
Irwin JO: Discussion on chambers and Yule’s paper. J. R. Stat. Soc. Supplement 1941, 7: 101–109.
Irwin JO: The place of mathematics in medical and biological statistics. J. R. Stat. Soc. A 1963, 126: 1–44. 10.2307/2982445
Irwin JO: The generalized Waring distribution applied to accident theory. J. R. Stat. Soc. A 1968, 131: 205–225. 10.2307/2343842
Irwin JO: The Generalized Waring Distribution. J. R. Stat. Soc. A 1975, 138: 18–31.
Jewell N: Mixtures of exponential distributions. Ann. Stat. 1982, 10: 479–484. 10.1214/aos/1176345789
Karlis D, Xekalaki E: On testing for the number of components in a mixed poisson model. Ann. I. Stat. Math. 1999, 51: 149–162. 10.1023/A:1003839420071
Karlis D, Xekalaki E: Mixtures Everywhere. In Stochastic Musings: Perspectives from the Pioneers of the Late 20th Century. Edited by: Panaretos J. Laurence Erlbaum, USA; 2003:78–95.
Karlis D, Xekalaki E: Mixed poisson distributions. Int. Stat. Rev. 2005, 73(1):35–58. 10.1111/j.17515823.2005.tb00250.x
Kemp CD: On a contagious distribution suggested for accident data. Biometrics 1967, 23: 241–255. 10.2307/2528159
Kemp CD: “Accident proneness” and discrete distribution theory. In Random Counts in Scientific Work, Vol.2. Edited by: Patil GP. State College: Pennsylvania State University Press, University Park, USA; 1970:41–65.
Krishnaji N: Characterization of the Pareto distribution through a model of underreported incomes. Econometrica 1970, 38: 251–255. 10.2307/1913007
Kulinskaya E, Olkin I: An overdispersion model in metaanalysis. Stat. Model. 2014, 14(1):49–76. 10.1177/1471082X13494616
Lawless JF: Negative binomial and mixed poisson regression. Can. J. Stat. 1987, 15: 209–225. 10.2307/3314912
Lewis PAW: Recent results in the statistical analysis of univariate point processes. In Stochastic Point Processes. Edited by: Lewis PAW. Wiley, New York; 1972:1–54.
Lexis W: Über die Theorie der Statilität Statistischer Reichen. Jahrb. Nationalökon. u. Statist. 1879, 32: 60–98.
McCullagh P, Nelder JA: Generalized Linear Models. 2nd edition. Chapman & Hall, London; 1989.
McKendrick AG: The applications of mathematics to medical problems. Proc. Edinb. Math. Soc. 1925, 44: 98–130.
McLachlan JA, Peel D: Finite Mixture Models. Wiley, New York; 2001.
Moore DF: Asymptotic properties of moment estimators for overdispersed counts and proportions. Biometrika 1986, 23: 583–588. 10.1093/biomet/73.3.583
Nelder JA, Pregibon D: An extended quasilikelihood function. Biometrika 1987, 74: 221–232. 10.1093/biomet/74.2.221
NgatchouWandji J, Paris C: On the zeroinflated count models with application to modelling annual trends in incidences of some occupational allergic diseases in France. J. Data Sci. 2011, 9: 639–659.
Panaretos J: An extension of the damage model. Metrika 1982, 29: 189–194. 10.1007/BF01893378
Panaretos J: A generating model involving Pascal and logarithmic series distributions. Commun. Stat. ATheor. 1983, 12(7):841–848. 10.1080/03610928308828499
Panaretos J: On the evolution of surnames. Int. Stat. Rev. 1989, 57(2):161–167. 10.2307/1403384
Patil GP, Ord JK: On sizebiased sapling and related forminvariant weighted distributions. Sankhya 1976, 38: 48–61.
Rao CR: On discrete distributions arising out of methods of ascertainment. Sankhya 1963, A25: 311–324.
Rao CR: Weighted Distributions Arising Out of Methods of Ascertainment. In A Celebration of Statistics, Chapter 24. Edited by: Atkinson AC, Fienberg SE. SpringerVerlag, New York; 1985:543–569. 10.1007/9781461385608_24
Ridout M, Demetrio CGB, Hinde J: Models for count data with many zeros. Invited paper presented at the Nineteenth International Biometric Conference, Cape Town, South Africa; 1998.
Ridout M, Hinde J, Demetrio CGB: A score test for testing zero inflated Poisson regression model against zero inflated negative binomial alternatives. Biometrics 2001, 57: 219–223. 10.1111/j.0006341X.2001.00219.x
Ripley BD: The secondorder analysis of stationary point processes. J. Appl. Probab. 1976, 13: 255–266. 10.2307/3212829
Ripley BD: Modelling spatial patterns (with Discussion). J. R. Stat. Soc. B 1977, 39: 172–212.
Shaw L, Sichel HS: Accidents Proneness. Pergamon Press, Oxford; 1971.
Steutel FW, van Harn K: Discrete analogues of selfdecomposability and stability. Ann. Prob. 1979, 7: 893–899. 10.1214/aop/1176994950
Student: An explanation of deviations from poisson’s law in practice Biometrika 1919, 12: 211–215. 10.2307/2331767
Thyrion P: Extension of the collective risk theory. Skand. Aktuaritidskrift 1969, 52(Supplement):84–98.
Titterington DM: Some recent research in the analysis of mixture distributions. Statistics 1990, 21: 619–641. 10.1080/02331889008802274
Tripathi R, Gupta R, Gurland J: Estimation of parameters in the beta binomial model. Ann. I. Stat. Math. 1994, 46: 317–331. 10.1007/BF01720588
Wang P, Puterman M, Cokburn I, Le N: Mixed poisson regression models with covariate dependent rates. Biometrics 1996, 52: 381–400. 10.2307/2532881
Winkelmann R: Duration dependence and dispersion in countdata models. J. Bus. Econ. Stat. 1995, 13(4):467–474.
Xekalaki E: Chance mechanisms for the univariate generalized Waring distribution and related characterizations. In Statistical Distributions in Scientific Work, Vol. 4 (Models, Structures and Characterizations). Edited by: Taillie C, Patil GP, Baldessari B. Reidel, Dordrecht; 1981:157–171. 10.1007/9789400985490_12
Xekalaki E: The univariate generalized Waring distribution in relation to accident theory: proneness, spells or contagion? Biometrics 1983a, 39(3):887–895. 10.2307/2531324
Xekalaki E: Infinite divisibility, completeness and regression properties of the univariate generalized Waring distribution. Ann. I. Stat. Math. A 1983, 35: 279–289. 10.1007/BF02480983
Xekalaki E: A property of the Yule distribution and its applications. Commun. Stat. ATheor. 1983, 12(10):1181–1189. 10.1080/03610928308828523
Xekalaki E: The bivariate generalized Waring distribution and its application to accident theory. J. R. Stat. Soc. A 1984, 147(3):488–498. 10.2307/2981580
Xekalaki E: Linear regression and the Yule distribution. J. Econometrics 1984, 24(1):397–403. 10.1016/03044076(84)900617
Xekalaki E: Models leading to the bivariate generalized Waring distribution. Utilitas Math. 1984, 25: 263–290.
Xekalaki E: The multivariate generalized Waring distribution. Commun. Stat. ATheor. 1986, 15(3):1047–1064. 10.1080/03610928608829168
Xekalaki, E: Under and Overdispersion. Enc. Act. Sci. 3, (2006) [http://onlinelibrary.wiley.com/doi/10.1002/9780470012505.tau003/abstract]
Xekalaki E, Panaretos J: Identifiability of compound poisson distributions. Scand. Actuar. J. 1983, 66: 39–45. 10.1080/03461238.1983.10408688
Xekalaki E, Zografi M: The generalized Waring process and its application. Commun. Stat. ATheor. 2008, 37(12):1835–1854. 10.1080/03610920801893707
Xue D, Deddens J: Overdispersed negative binomial models. Commun. Stat. ATheor. 1992, 21: 2215–2226. 10.1080/03610929208830908
Yule GW: A mathematical theory of evolution based on the conclusions of J.C. Willis, F.R.S. Philos. T. R. Soc. B 1924, 213: 21–87. 10.1098/rstb.1925.0002
Acknowledgement
The authoress would like to thank the associate editor and the referees for their constructive comments.
This paper is an extended version of the authoress’ invited plenary presentation at the International Conference on Statistical Distributions and Applications, Michigan USA, October1012, 2013.
Author information
Additional information
Competing interests
The authoress declares that she has no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Xekalaki, E. On the distribution theory of overdispersion. J Stat Distrib App 1, 19 (2014) doi:10.1186/s404880140019z
Received:
Accepted:
Published:
Keywords
 Heterogeneity
 Contagion
 Clustering
 Spells model
 Accident proneness
 Mixtures
 Zeroadjusted models
 Biased sampling
 Generalized Waring distribution
 Generalized Waring process