- Research
- Open Access
A flexible distribution class for count data
- Kimberly F. Sellers^{1}Email authorView ORCID ID profile,
- Andrew W. Swift^{2} and
- Kimberly S. Weems^{3}
https://doi.org/10.1186/s40488-017-0077-0
© The Author(s) 2017
- Received: 23 December 2016
- Accepted: 11 September 2017
- Published: 26 September 2017
The Correction to this article has been published in Journal of Statistical Distributions and Applications 2017 4:23
Abstract
The Poisson, geometric and Bernoulli distributions are special cases of a flexible count distribution, namely the Conway-Maxwell-Poisson (CMP) distribution – a two-parameter generalization of the Poisson distribution that can accommodate data over- or under-dispersion. This work further generalizes the ideas of the CMP distribution by considering sums of CMP random variables to establish a flexible class of distributions that encompasses the Poisson, negative binomial, and binomial distributions as special cases. This sum-of-Conway-Maxwell-Poissons (sCMP) class captures the CMP and its special cases, as well as the classical negative binomial and binomial distributions. Through simulated and real data examples, we demonstrate this model’s flexibility, encompassing several classical distributions as well as other count data distributions containing significant data dispersion.
Keywords
- Conway-Maxwell-Poisson (CMP)
- Negative binomial
- Poisson
- Binomial
- Geometric
- Bernoulli
- Over-dispersion
- Under-dispersion
Mathematics Subject Classification
- 60E05
- 62F10
Introduction
This dispersion index motivates considering the negative binomial distribution as a viable option for addressing data over-dispersion. In fact, this distribution is a popular choice for modeling over-dispersion in various statistical methods (e.g. regression (Hilbe 2008)) and is well studied with statistical computational ability in many softwares (e.g. SAS, R, etc.). The negative binomial distribution, however, is unable to address data under-dispersion, as demonstrated in Eq. (1). This result further illustrates that the Poisson GOF is the boundary case of the negative binomial distribution; the Poisson distribution is known to be the limiting case of the negative binomial distribution where n→∞.
Naturally, the bernoulli (p _{∗}) distribution is a special case of the binomial distribution where b=1. The associated mean and variance of this random variable equal E(Y)=bp _{∗} and Var(Y)=bp _{∗}(1−p _{∗}), respectively, thus the goodness-of-fit index for dispersion is GOF=1−p _{∗}≤1. The Poisson, negative binomial, and binomial distributions are popular, classical tools for modeling count data of a particular (in)finite form. What is most interesting about these distributions is that they each represent sums of other classical distributions, namely the Poisson, geometric, and Bernoulli distributions, respectively. The Poisson, geometric and Bernoulli distributions are themselves special cases of the Conway-Maxwell-Poisson (CMP) distribution – a two-parameter flexible count distribution that generalizes the Poisson distribution to accommodate data over- or under-dispersion. This work introduces and thus considers the sum of CMP random variables to establish the flexible class of distributions that encompass the Poisson, geometric, Bernoulli, negative binomial, binomial, and CMP distributions as special cases.
The paper is outlined as follows. Section 2 acquaints the reader with the CMP distribution in order to motivate and introduce the sum-of-Conway-Maxwell-Poissons (sCMP) class in Section 3, including discussion of the statistical properties associated with this larger class of count distributions. Section 4 addresses parameter estimation and statistical computing procedures. Section 5 illustrates the flexibility of this class of distributions via simulated and real data examples. Finally, Section 6 concludes the manuscript with discussion.
The Conway-Maxwell-Poisson distribution
Well-known distributions associated with the Conway-Maxwell-Poisson (CMP) distribution for special cases of λ and ν
Case | Z(λ,ν) | pmf | Distribution |
---|---|---|---|
ν=1 | e ^{ λ } | \(P(X=x) = \frac {e^{-\lambda }\lambda ^{x}}{x!}, \ x=0,1,2,\ldots \) | Poisson(λ) |
ν=0,λ<1 | \(\frac {1}{1-\lambda }\) | P(X=x)=(1−λ)λ ^{ x }, x=0,1,2,… | Geom(1−λ) |
ν→∞ | 1+λ | \(P(X=0) = \frac {1}{1+\lambda }; \ P(X=1) = \frac {\lambda }{1+\lambda }\) | Bernoulli\(\left (\frac {\lambda }{1+\lambda }\right)\) |
where the approximation holds for ν≤1 or λ>10^{ ν } (Sellers et al. 2011); see Minka et al. (2003) for details. More generally, the associated moment generating function of X is \(\mathrm {M}_{X}(t) = \frac {Z(\lambda e^{t}, \nu)}{Z(\lambda, \nu)}\), from which the higher moments can be obtained for X.
The linear relation among probabilities of two consecutive values is achieved when ν=1, i.e. given data equi-dispersion associated with the Poisson(λ) model. Meanwhile, for ν=0 and λ<1 (i.e. the geometric distribution with success probability 1−λ), we confirm that the ratio between probabilities of two consecutive values is constant, equaling \(\frac {1}{\lambda }>1\).
The CMP distribution has quickly grown in popularity because of its ability to model count data in a flexible manner. Methodological developments are vast, including works in distribution theory (Sellers 2012; Sellers and Shmueli 2013; Borges et al. 2014), regression analysis (Sellers and Shmueli 2009; 2010; Sellers and Raim 2016), control chart theory (Sellers 2012; Saghir and Lin 2014a; 2014b), stochastic processes (Zhu et al. 2017), and multivariate data analysis (Sellers et al. 2016). The model has further been applied for various data problems including fitting word lengths (Wimmer et al. 1994), modeling online sales (Boatwright et al. 2003; Borle et al. 2006) and customer behavior (Borle et al. 2007), analyzing traffic accident data (Lord et al. 2008), and for use as a disclosure limitation procedure to protect individual privacy (Kadane et al. 2006). See Sellers et al. (2011) for additional overview and discussion.
The sum of Conway-Maxwell-Poissons (sCMP) class of distributions and its statistical properties
The sum of m independent and identically distributed (iid) CMP variables leads to what will be termed a sum of Conway-Maxwell-Poissons (sCMP) (λ,ν,m) class of distributions. Theorem 1 defines the three-parameter structure for some generalized rate parameter (λ), dispersion parameter (ν), and number of underlying CMP random variables (m).
Theorem 1.
where \({y \choose x_{1} \cdots x_{m}} = \frac {y!}{x_{1}! \cdots x_{m}!}\) is the multinomial coefficient.
Proof
□
The sCMP(λ,ν,m) class encompasses the Poisson distribution with rate parameter μ _{∗}=m λ (for ν=1), negative binomial(m,1−λ) distribution (for ν=0 and λ<1), and Binomial(m,p) distribution \(\left (\text {as } \ \nu \rightarrow \infty \ \ \text {with success probability}\ \ p=\frac {\lambda }{\lambda + 1}\right)\) as special cases. Further, for m=1, the sCMP(λ,ν,m=1) is the CMP(λ,ν) distribution. Accordingly, the sCMP class further captures the special case distributions of the CMP model: a geometric distribution with success probability, p=1−λ, when m=1, ν=0, and λ<1; and a Bernoulli distribution with success probability \(p_{\ast \ =\ } \frac {\lambda }{1+\lambda }\) when m=1 and ν→∞.
where the ratio of sums drops out in the special case where m=1 (i.e. one CMP(λ,ν) random variable); clearly, this produces the special case shown in Eq. (6). For the special case where ν=1, \(\gamma _{Y,y} = \frac {y}{m\lambda }\), which is the linear form property of the Poisson random variable with parameter m λ (i.e. the distribution of the sum of m Poisson random variables). Meanwhile, for ν=0, \(\gamma _{Y,y} = \frac {y}{\lambda (m+y-1)}\), namely the form associated with a negative binomial distribution (i.e. the sum of m geometric random variables). Equation (8) implies that the sCMP model has a mode at 0 when γ _{ Y,y }>1, i.e. \(\lambda < y^{\nu } \cdot \frac {\sum _{\stackrel {a_{1},\ldots,a_{m} = 0} {a_{1} + \ldots + a_{m} = y-1} }^{y-1} {y-1 \choose a_{1}, \cdots, a_{m}}^{\nu } }{\sum _{\stackrel {b_{1},\ldots,b_{m} = 0} {b_{1} + \ldots + b_{m} = y} }^{y} {y \choose b_{1}, \cdots, b_{m}}^{\nu }}\). In particular, \(\gamma _{Y,1} = \frac {P(Y = 0)}{P(Y = 1)}= \frac {1}{m\lambda }\), thus sCMP(λ,ν,m) models where \(\lambda < \frac {1}{m}\) have a mode at 0. Figure 2 displays the sCMP(λ=0.25,ν,m) distributions for ν=0.5,1,5,30 and m=1,2,3,5. Given that \(\lambda = 0.25 = \frac {1}{4}\), we expect sCMP distributions where m<4 to have the mode at 0. This is illustrated accordingly in Fig. 2; sCMP(λ=0.25,ν,m) distributions for m=2,3 and any ν≥0 have the mode at 0, while the sCMP(λ=0.25,ν,m=5) distribution has the mode at 1 for all ν≥0.
which is the mgf of a sCMP (λ,ν,m _{1}+m _{2}) distribution, therefore Y _{1}+Y _{2} has a sCMP (λ,ν,m _{1}+m _{2}) distribution. This result is logically sound because Y _{1} and Y _{2} respectively represent the sum of m _{1} and m _{2} iid CMP (λ,ν) random variables; thus, Y _{1}+Y _{2} defines the sum of m _{1}+m _{2} iid CMP random variables, which precisely has a sCMP(λ,ν,m _{1}+m _{2}) distribution. This distinction is key between the CMP distribution and the larger sCMP class – the CMP distribution does not have the invariance property under addition.
3.1 Moments of the distribution
One can differentiate Eq. (9) to obtain the moments of the sCMP model, with the help of the following relation.
Theorem 2.
Proof
This proof is straightforward, given the differentiation formula for exponential functions. □
This result proves helpful in showing that the sCMP(λ,ν,m) has mean E(Y)=mE(X) and variance V(Y)=mV(X), where E(X) and V(X) (provided in Eqs. (4)-(5), respectively) are the mean and variance of a CMP(λ,ν) random variable X.
3.2 Introducing the generalized Conway-Maxwell-Binomial (gCMB) distribution
for \(r \in \mathcal {Z}^{+}\), p∈(0,1), and \(\nu \in \mathcal {R}\) such that ν=1 produces the usual binomial(r,p) distribution, ν>1 corresponds to data under-dispersion relative to a binomial distribution while ν<1 corresponds to data over-dispersion relative to the binomial(r,p) model. Extreme distribution cases hold where, for ν→∞, the pmf is concentrated at the point, rp and, for ν→−∞, it is concentrated at 0 or r (Borges et al. 2014).
Parameter estimation and statistical computing
Optimal sCMP(λ,ν,m) models are determined by comparing potential conditional sCMP models where m is assumed known and identifying the conditional model with the largest log-likelihood value. Section 5 illustrates this procedure via simulated and real data examples.
Statistical computing for the Poisson and negative binomial distributions are conducted in R (R Core Team 2017) via the function, fitdistr, contained in the MASS package (Venables and Ripley 2002). This package uses an alternative parametrization for the negative binomial model, namely θ=n and \(\mu =\frac {n(1-p)}{p}\), hence we can backsolve for \(p=\frac {\theta }{\mu +\theta }\). Estimates for θ and μ are reported in the discussions provided in Section 5.
Examples
5.1 Simulation study
Simulated data example
Simulated data distribution | ||||
---|---|---|---|---|
Bin(b=3, p _{∗}=0.667) | Pois(μ _{∗}=6) | NB(n=3, p=0.333) | ||
\(\hat {\mu }_{*}\) (SE) | 2.0000(0.1414) | 6.1100(0.2472) | 5.3000(0.2302) | |
Pois. | log(L) | -144.8109 | -228.1811 | -288.0710 |
AIC | 291.6218 | 458.3623 | 578.1419 | |
BIC | 294.2270 | 460.9675 | 580.7471 | |
\(\hat {\mu }\) (SE) | 1.9999(0.1419) | 6.1100(0.2503) | 5.3001(0.3632) | |
\(\hat {\theta }\) (SE) | 276.5396(394.3489) | 239.7812(421.0122) | 3.5599(0.8564) | |
NB | log(L) | -145.0563 | -228.3236 | -258.7486 |
AIC | 294.1126 | 460.6472 | 521.4971 | |
BIC | 299.3229 | 465.8575 | 526.7075 | |
\(\hat {\lambda }\) (SE) | 18.7071(8.9855) | 6.9145(2.1193) | 1.5576(0.2079) | |
\(\hat {\nu }\) (SE) | 3.3931(0.5024) | 1.0653(0.1603) | 0.3150(0.0708) | |
CMP/sCMP(m=1) | log(L) | -123.2624 | -228.0950 | -260.3649 |
AIC | 250.5248 | 460.1900 | 524.7298 | |
BIC | 255.7351 | 465.4003 | 529.9402 | |
\(\hat {\lambda }\) (SE) | 4.2531(0.9494) | 3.4046(0.8152) | 0.9309(0.1109) | |
\(\hat {\nu }\) (SE) | 4.2854(0.4998) | 1.0838(0.1826) | 0.1674(0.0825) | |
sCMP(m=2) | log(L) | -120.8816 | -228.0722 | -259.3193 |
AIC | 245.7632 | 460.1444 | 522.6386 | |
BIC | 250.9735 | 465.3547 | 527.8489 | |
\(\hat {\lambda }\) (SE) | 2.0000(0.2450) | 2.2683(0.4822) | 0.6709(0.0576) | |
\(\hat {\nu }\) (SE) | 33.6942(12536.57) | 1.1093(0.2127) | 0.0392(0.0698) | |
sCMP(m=3) | log(L) | -116.2486 | -228.0418 | -258.8683 |
AIC | 236.4972 | 460.0836 | 521.7366 | |
BIC | 241.7075 | 465.2939 | 526.9469 | |
\(\hat {\lambda }\) (SE) | 1.0000(0.1000) | 1.7044(0.3382) | 0.5700(0.0486) | |
\(\hat {\nu }\) (SE) | 32.5126(12322.57) | 1.1381(0.2469) | 0.0000(0.0826) | |
sCMP(m=4) | log(L) | -124.7123 | -228.0175 | -258.8470 |
AIC | 253.4246 | 460.0350 | 521.6940 | |
BIC | 258.6349 | 465.2453 | 526.9043 |
For model comparison via AIC, Burnham and Anderson (2002) suggest considering Δ _{ i }=AIC _{ i }−AIC _{min}, where AIC _{min} is the minimum of the model AIC values being compared, thus infering that the best model has Δ=0 and the other models have Δ>0. Model comparisons are thus determined via these difference measures in that “models having Δ _{ i }≤2 have substantial support (evidence), those in which 4≤Δ _{ i }≤7 have considerably less support, and models having Δ _{ i }>10 have essentially no support" in comparison with the best model; see p. 70-71 of Burnham and Anderson (2002). We will apply this approach for model comparison accordingly, and can analogously apply this method using BIC.
The sCMP class of distributions appears to offer a consistent ability to properly model all of the simulated classical data structures. What is interesting to see is the distribution’s resulting parameter estimations as m increases. For the binomial example (i.e. the case of extreme under-dispersion), we see that λ decreases and ν increases for m≤3. While that pattern does not continue for m=4, we see that the log-likelihood value is maximized (and the AIC and BIC values minimized) with the sCMP(m=3) case. We see that the sCMP(\(\hat {\lambda } = 2.0000\), \(\hat {\nu } = 33.6942\), m=3) distribution is the best model, when compared with the other considered distributions. In fact, all other models produce a difference Δ that associates with considerably less support to essentially no support.
The binomial case can be viewed as the summation of three Bernoulli trials, thus we expect the corresponding sCMP estimates to be \(\hat {\lambda } \approx 2\) and \(\hat {\nu } \ge 30\); recall that the special CMP case that corresponds with a Bernoulli distribution occurs when ν→∞ with probability \(\frac {\lambda }{1+\lambda }\), where empirical evidence shows that dispersion parameter estimation is sufficiently achieved when \(\hat {\nu } \approx 30\) or more (see Sellers et al. (2016) and Sellers and Raim (2016) for examples). In fact, for the simulated Binomial dataset, we obtain \(\hat {\lambda } = 2.0000\), \(\hat {\nu } = 33.6942\); the obtained estimate for ν implies extreme under-dispersion, thus we have sufficient evidence implying that the estimates approximate a Bernoulli distribution with success probability, \(\hat {p}_{\ast } = \frac {2.0000}{1 + 2.0000} = 0.6667\). The sCMP(m=3) distribution best models the binomial data, producing the largest log-likelihood (log(L)=−116.2486), and the smallest AIC and BIC (236.4972 and 241.7075, respectively). In comparison, the Poisson and negative binomial models produce comparable log-likelihoods (both that are considerably less than those from the sCMP class) because they are unable to effectively model the under-dispersion present in this dataset. The large negative binomial parameter (\(\hat {\theta } = 276.5396\)) shows that the model is converging to a Poisson model (i.e. towards data equi-dispersion) to estimate this data. While the CMP model is able to recognize the dataset as being under-dispersed (\(\hat {\nu } = 3.3931 > 1\)), the form of the distribution still limits the amount of model flexibility it can address.
For the Poisson example, we see that all of the considered models perform comparably well. While the best model is naturally Poisson, this is true moreso because the distribution only requires estimating one parameter. All of the models considered produced log-likelihoods equalling approximately − 228, thus the associated difference measures imply that the other models (in particular, the sCMP class of distributions) show substantial support for model consideration. The negative binomial estimates (\(\hat {\theta }=239.7812\) and \(\hat {\mu }=6.1100\)) demonstrate the convergence of the negative binomial distribution to the Poisson model as θ→∞ in order to address the limiting case of analyzing equi-dispersed data.
Because the simulation reflects a Poisson(6) dataset, we expect to obtain sCMP parameter estimates \(\hat {\lambda }\approx 6/m\) and \(\hat {\nu }\approx 1\) for all m=1,2,3,4. The obtained estimates for λ and ν are consistently larger than their projected values where \(\hat {\nu }\) increases slightly with m. The corresponding parameter standard errors, however, suggest that none of these estimates is statistically significantly different from their hypothesized values.
For the negative binomial example, we see that the sCMP class of distributions again performs well in estimating the form of the simulated dataset. The true parameter values associated with the negative binomial model imply that \(\mu =\frac {n(1-p)}{p} = 6\) and θ=3. The negative binomial distribution(\(\hat {\theta }=3.5599\), \(\hat {\mu }=5.3001\)) is the best model among the distributions considered (AIC = 521.4971), however, the sCMP class of distributions performs more optimally as m increases (among those values for m considered). Larger values for m were not considered here because the sCMP models for m=3,4 produce approximately equal log-likelihood values, thus likewise producing comparable AIC and BIC values; this makes sense because the negative binomial estimate is \(3< \hat {\theta }=3.5599 <4\). Meanwhile, even for the sCMP models where m=1,2, the difference in AIC when compared with the best model still implies that these models show considerable support.
With the sCMP class of distributions, we see that \(\hat {\nu }\) decreases as m increases. Interestingly here, because we know the data are simulated from a negative binomial(n=3,p=0.333) distribution, we expect the sCMP(m=3) distribution to produce estimates \(\hat {\lambda } = 0.667\) and ν≈0. In fact, the observed estimates (\(\hat {\lambda }=0.6709\) and \(\hat {\nu }=0.0392\)) are within one standard error of the projected estimates. The CMP (i.e. the sCMP(m=1)) model does reasonably well, as evidenced by the resulting log-likelihood and AIC values ( −260.3649 and 524.7298, respectively); the CMP estimated dispersion parameter, \(\hat {\nu } = 0.3150\), indicates recognized over-dispersion in the dataset. The Poisson model is the worst performer (with log(L) = −288.0710) because of its constraining equi-dispersion requirement.
5.2 Under-dispersed real data example: word count
Word count model comparisons
CMP/sCMP(m=1) | sCMP(m=2) | sCMP(m=3) | sCMP(m=4) | |
---|---|---|---|---|
\(\hat {\lambda }\) (SE) | 1.8897 (0.4219) | 0.9120 (0.1511) | 0.5385 (0.0652) | 0.3559 (0.0404) |
\(\hat {\nu }\) (SE) | 2.1033 (0.3858) | 3.7750 (1.0049) | 3.0900 (15045) | 29.7650 (13118) |
log(L) | -118.319 | -117.327 | -117.331 | -118.521 |
AIC | 240.638 | 238.655 | 238.662 | 241.041 |
BIC | 245.848 | 243.865 | 243.873 | 246.252 |
We consider the Poisson, negative binomial, and sCMP(m) models where m=1,2,3,4 to describe the data distribution; Bailey (1990) previously considered a binomial model to describe the data. Table 3 provides the sCMP parameter estimates and standard errors (in parentheses), along with the log-likelihood, AIC, and BIC values for model comparison. The sCMP(m=2) model is the optimal choice, producing a log-likelihood equaling −117.327, and AIC and BIC equaling 238.6546 and 243.8649, respectively. Because this dataset is under-dispersed (with a sample mean and sample variance equaling 1.05 and 0.654, respectively), all models considered from the sCMP family outperform the Poisson and negative binomial models. The Poisson model produces an estimated sample mean and standard error, 1.0500 (0.1025), with log-likelihood − 123.2741. The negative binomial model meanwhile produces estimates (\(\hat {\theta } = 269.9607\) (702.1046), \(\hat {\mu }=1.0500\) (0.1027), log(L)=−123.3487) comparable to the Poisson. Because the data are under-dispersed, the negative binomial model can only perform as well as the Poisson model. As demonstrated, the estimated size parameter is large and the estimated mean equals that from the Poisson model.
Figure 3 provides the empirical and estimated distributions for this data based on the various considered models, including the estimated binomial frequencies provided in Bailey (1990). This figure confirms the results provided in Table 3, namely that the sCMP(m=2) best represents the shape of the observed distribution for the number of occurrences of an article in 10-word samples from Macaulay’s ‘Essay on Milton’. In particular, we see the small estimated sCMP(m=2) frequency associated with more than 3 articles; recall that the observed number of articles is zero. Meanwhile, the number of occurrences as determined via the Poisson and negative binomial are visibly over- or under-estimated, including a sizable estimated frequency associated with more than 3 articles. The estimated frequencies determined by the binomial distribution, while better than those from the Poisson and negative binomial models, still deviate considerably in comparison to the sCMP class. Finally, while Table 3 shows that the sCMP(m=3) distribution performs comparably well, we nonetheless determine the sCMP(\(\hat {\lambda }=0.9120, \hat {\nu }=3.7750, m=2\)) model to be the best choice to estimate the observed distribution, based on the resulting estimated frequencies shown in Fig. 3.
5.3 Over-dispersed real data example: fetal lamb movement
Fetal lamb summary statistic information
(a) | (b) | |
---|---|---|
5-second | 15-second | |
Summary statistic | Data value | Data value |
No. of obs. | 225 | 75 |
Minimum | 0.000 | 0.000 |
1st Quartile | 0.000 | 0.000 |
Median | 0.000 | 1.000 |
Mean | 0.382 | 1.147 |
3rd Quartile | 1.000 | 2.000 |
Maximum | 7.000 | 12.000 |
Variance | 0.693 | 3.694 |
Std. Deviation | 0.832 | 1.922 |
5-second fetal lamb data model comparisons
Est. | SE | log(L) | AIC | BIC | |
---|---|---|---|---|---|
Poisson | \(\hat {\mu }_{*} =0.382\) | (0.041) | -195.4933 | 394.9866 | 401.8188 |
Geom | \(\hat {p} = ~~~0.723\) | (0.025) | -183.3791 | 368.7583 | 372.1744 |
NB | \(\hat {\mu } = ~~0.382\) | (0.053) | |||
\(\hat {\theta } = ~~~0.587\) | (0.200) | -182.3702 | 368.7404 | 375.5726 | |
CMP/sCMP(m=1) | \(\hat {\lambda }_{*} = ~0.277\) | (0.040) | |||
\(\hat {\nu }_{*} = ~0.000\) | (0.264) | -183.3791 | 370.7582 | 377.5904 | |
sCMP(m=2) | \(\hat {\lambda } = ~~~0.160\) | (0.019) | |||
\(\hat {\nu } = ~~~0.000\) | (0.263) | -186.3873 | 376.7746 | 383.6068 | |
sCMP(m=3) | \(\hat {\lambda } = ~~~0.113\) | (0.013) | |||
\(\hat {\nu } = ~~~0.000\) | (0.294) | -188.3006 | 380.6012 | 387.4334 | |
sCMP(m=4) | \(\hat {\lambda } = ~~~0.087\) | (0.009) | |||
\(\hat {\nu } = ~~~0.000\) | (0.213) | -189.5569 | 383.1138 | 389.9460 |
The sCMP class consistently recognizes this real count distribution to be extremely over-dispersed (\(\hat {\nu }=0.000\) and λ<1), implying that the sCMP class interprets the data as being represented as sums of size m from geometrically distributed data with some success probability, \(1-\hat {\lambda }\). In fact, the estimates for λ decrease as m increases yet the corresponding log-likelihood value decreases, thus providing a sense of the contour of the larger log-likelihood space that is determined by λ,ν, and m. Because m is a natural number, we find that the optimal sCMP(m) class for modeling the 5-second fetal lamb dataset occurs for m=1, i.e. the CMP distribution with \(\hat {\lambda }=0.277\) and \(\hat {\nu }=0.000\). Continuing with this logic, however, we recognize then that one should thus consider the special case of a geometric (i.e. the CMP distribution where ν=0) distribution with approximate success probability, \(\hat {p}=1 - 0.277 = 0.723\)). Indeed, estimating the observed count distribution via a geometric model produces the estimated success probability, \(\hat {p}=0.723\) (with standard error, 0.025). While this estimation procedure determines a geometric model to be the best model within the sCMP class, the negative binomial distribution is another viable model, as determined by Burnham and Anderson (2002); see Table 5.
15-s fetal lamb data model comparisons
Est. | SE | log(L) | AIC | BIC | |
---|---|---|---|---|---|
Poisson | \(\hat {\mu }_{*} =1.147\) | (0.124) | -131.3450 | 264.690 | 267.010 |
Geom | \(\hat {p}= ~~~0.466\) | (0.039) | -111.2206 | 224.441 | 226.759 |
NB | \(\hat {\mu } = ~~1.147\) | (0.195) | |||
\(\hat {\theta } = ~~~0.767\) | (0.251) | -110.9094 | 225.819 | 230.454 | |
CMP/sCMP(m=1) | \(\hat {\lambda }_{*} = ~0.534\) | (0.093) | |||
\(\hat {\nu }_{*} = ~0.000\) | (0.161) | -111.2206 | 226.442 | 231.076 | |
sCMP(m=2) | \(\hat {\lambda } = ~~~0.364\) | (0.065) | |||
\(\hat {\nu } = ~~~0.000\) | (0.259) | -114.2996 | 232.599 | 237.234 | |
sCMP(m=3) | \(\hat {\lambda } = ~~~0.277\) | (0.071) | |||
\(\hat {\nu } = ~~~0.000\) | (0.554) | -116.9413 | 237.882 | 242.518 | |
sCMP(m=4) | \(\hat {\lambda } = ~~~0.223\) | (0.071) | |||
\(\hat {\nu } = ~~~0.000\) | (0.554) | -118.9119 | 241.824 | 246.459 |
Table 6 again displays an interesting trend with respect to the sCMP class estimators. Again, the estimations for λ decrease as m increases, while the dispersion parameter consistently estimates to be \(\hat {\nu } = 0\) (indicating consideration of an appropriate negative binomial model structure). Further, as m increases, the corresponding log-likelihood associated with each of the models decreases. Hence, the optimal sCMP model is again the CMP model (i.e. sCMP when m=1). Meanwhile, the CMP model with estimated dispersion parameter, \(\hat {\nu }=0\), again suggests to consider a geometric model with success probability \(1-\hat {\lambda } = 0.466\). In fact, the estimated success probability for the geometric model is \(\hat {p}=0.466 \ (0.039)\). The negative binomial model slightly outperforms the CMP model, although both models perform comparably well, based on their respective AIC values (Burnham and Anderson 2002). The slight outperformance in the negative binomial model relative to the CMP/geometric model (based on log-likelihood comparisons) stems from the negative binomial estimation procedure’s allowance for real θ, thus obtaining a more precise estimation of the data over-dispersion. However, because we recognize that this special case of the sCMP class where m=1 and ν=0 corresponds to a geometric model, the geometric model is deemed better than the negative binomial model, given the reduction in the number of estimated parameters and thus the reduced AIC and BIC (224.441 and 226.759 for the geometric model, versus 225.819 and 230.454 for the negative binomial model); see Table 6.
Notice that sCMP(m=3) parameter estimates associated with the 15-second fetal lamb data equal the sCMP(m=1)/CMP parameter estimates for the 5-second fetal lamb example. This result is logically sound, given the means by which the sCMP distribution is derived; conducting estimations over an interval that is three times its original period is akin to “summing" the three CMP random variables to consider the sCMP model.
This example demonstrates the suitability of the sCMP class to serve as an exploratory tool for count data modeling. For over-dispersed data examples, the negative binomial distribution is generally expected to be a good model to describe the distribution. The sCMP class of distributions contains the negative binomial (and geometric) distribution as a special case; accordingly, it is not necessarily expected for the sCMP distribution to outperform simpler distributions but rather to demonstrate that the sCMP distribution offers insights regarding model considerations. Indeed, applying the sCMP model to these over-dispersed examples motivated consideration of the geometric distribution, which turned out to be an optimal model. Accordingly, while one may not consider the geometric distribution to be a viable model a priori, the sCMP showed why the geometric model is viable.
Discussion
The sum-of-Conway-Maxwell-Poissons (sCMP) class of distributions is a flexible construct for modeling count data that captures several well-known distributions as special cases: the Poisson, negative binomial, binomial, geometric, Bernoulli, and Conway-Maxwell-Poisson (CMP). Just as the CMP distribution bridges the gap between the Poisson, geometric, and Bernoulli distributions through the addition of a dispersion parameter, the sCMP distribution sums over m CMP random variables, producing an encompassing distributional form that has an even greater containment of numerous count distributions.
The provided examples illustrate the flexibility of the sCMP class for handling over- or under-dispersed data. These examples, however, consider only the marginal distribution through unconditional means and variances (and hence unconditional dispersion), thus the true significance of the sCMP class is subdued. In actuality, it is not necessarily straightforward to determine if observed dispersion is true or “apparent". In a regression setting, for example, dispersion is measured via conditional means and variances, and exploratory data analysis may not detect the true complexity of the data (Sellers and Shmueli 2013). Under such circumstances, the sCMP class can aid with detecting dispersion when a more sophisticated approach is required.
As noted in the over-dispersed data example, we are limited in our ability to estimate m because it is a natural number. We opt for this formulation as it holds true to the form that generalizes the construction of the three special case models (negative binomial, Poisson, and binomial) as sums of their respective special case distributions associated with the CMP distribution (namely, the geometric, Poisson, and Bernoulli models). For example, the negative binomial pmf is often described as the probability of observing y failures before the nth success in a series of Bernoulli trials, or as a sum of n geometric random variables. Yet, the negative binomial distribution can alternatively be derived via a Poisson-gamma mixture, in which case the parameter n is a real number. As Hilbe (2008) notes, “there is no compelling mathematical reason to limit this parameter to integers." (page 82). Future work considers broadening the sCMP formulation to likewise allow for real-valued m and any associated implications from such a definition.
While we estimate the standard errors of the parameter estimates via the approximate information matrix as described in Section 4, the sampling distributions associated with λ and ν are known to possess skewness (Sellers and Shmueli 2013). Thus, an alternative approach is non-parametric bootstrapping. To compute parameter estimates and associated variation in this manner, one can (for example) randomly draw 1000 samples with replacement from the data using the boot package (Canty and Ripley 2015) in R (R Core Team 2017).
Notes
Declarations
Acknowledgements
Support for Kimberly Sellers was provided in part by the American Statistical Association (ASA)/National Science Foundation (NSF)/Census Research Program, U. S. Census Bureau Contract #YA1323-14-SE-0122. Support for Andrew W. Swift was provided by a grant from the Simons Foundation (#359536). Support for Kimberly S. Weems was provided in part by the NSF grant #1700235. The authors thank the reviewers and Drs. Darcy S. Morris (U.S. Census Bureau) and Derek Young (University of Kentucky) for helpful comments regarding the manuscript.
Authors’ contributions
KFS conceived the study. KFS, AS, and KSW contributed to statistical methods and computational development, and analyses. All authors read and approved the final manuscript.
Competing interests
The authors declare no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- Bailey, BJR: A model for function word counts. J. R. Stat. Soc. Ser. C. 39(1), 107–114 (1990).MATHGoogle Scholar
- Boatwright, P, Borle, S, Kadane, JB: A model of the joint distribution of purchase quantity and timing. J. Am. Stat. Assoc. 98, 564–572 (2003).MathSciNetView ArticleMATHGoogle Scholar
- Borges, P, Rodrigues, J, Balakrishnan, N, Bazán, J: A COM-Poisson type generalization of the binomial distribution and its properties and applications. Stat. Probab. Lett. 87, 158–166 (2014).MathSciNetView ArticleMATHGoogle Scholar
- Borle, S, Boatwright, P, Kadane, JB: The timing of bid placement and extent of multiple bidding: An empirical investigation using ebay online auctions. Stat. Sci. 21(2), 194–205 (2006).MathSciNetView ArticleMATHGoogle Scholar
- Borle, S, Dholakia, U, Singh, S, Westbrook, R: The impact of survey participation on subsequent behavior: An empirical investigation. Mark. Sci. 26(5), 711–726 (2007).View ArticleGoogle Scholar
- Burnham, KP, Anderson, DR: Model Selection and Multimodel Inference. Springer, New York (2002).MATHGoogle Scholar
- Canty, A, Ripley, B: Boot: Bootstrap Functions. 1.3-15 edn (2015). http://cran.r-project.org/web/packages/boot/index.html.
- Casella, G, Berger, RL: Statistical Inference, Second Edition. Duxbury, Pacific Grove (2002).Google Scholar
- Conway, RW, Maxwell, WL: A queuing model with state dependent service rates. J. Ind. Eng. 12, 132–136 (1962).Google Scholar
- Gilbert, P, Varadhan, R: NumDeriv: Accurate Numerical Derivatives. 2016.8-1 edn (2016). https://cran.r-project.org/web/packages/numDeriv/index.html.
- Guttorp, P: Stochastic Modeling of Scientific Data. Chapman & Hall/CRC, Boca Raton (1995).View ArticleMATHGoogle Scholar
- Hilbe, JM: Negative Binomial Regression. Cambridge University Press, United Kingdom (2008).MATHGoogle Scholar
- Kadane, JB, Krishnan, R, Shmueli, G: A data disclosure policy for count data based on the COM-Poisson distribution. Manag. Sci. 52(10), 1610–1617 (2006).View ArticleGoogle Scholar
- Kadane, JB: Sums of possibly associated Bernoulli variables: The Conway-Maxwell-Binomial distribution. Bayesian Anal. 11(2), 403–420 (2016).MathSciNetView ArticleMATHGoogle Scholar
- Lord, D, Guikema, SD, Geedipally, SR: Application of the Conway-Maxwell-Poisson generalized linear model for analyzing motor vehicle crashes. Accid. Anal. Prev. 40(3), 1123–1134 (2008).View ArticleGoogle Scholar
- Minka, TP, Shmueli, G, Kadane, JB, Borle, S, Boatwright, P: Computing with the COM-Poisson distribution. Technical Report 776, Dept. of Statistics, Carnegie Mellon University (2003). http://www.stat.cmu.edu/tr/tr776/tr776.pdf.
- R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (2017). https://www.R-project.org/.Google Scholar
- Saghir, A, Lin, Z: Cumulative sum charts for monitoring the COM-Poisson processes. Comput. Ind. Eng. 68, 65–77 (2014).View ArticleGoogle Scholar
- Saghir, A, Lin, Z: A flexible and generalized exponentially weighted moving average control chart for count data. Qual. Reliab. Eng. Int. 30(8), 1427–1443 (2014).View ArticleGoogle Scholar
- Sellers, KF, Shmueli, G, Borle, S: The COM-Poisson model for count data: a survey of methods and applications. Appl. Stoch. Model. Bus. Ind. 28, 104–116 (2011).MathSciNetView ArticleMATHGoogle Scholar
- Sellers, K: A distribution describing differences in count data containing common dispersion levels. Adv. Appl. Stat. Sci. 7(3), 35–46 (2012).MathSciNetMATHGoogle Scholar
- Sellers, KF, Shmueli, G: A regression model for count data with observation-level dispersion. In: Booth, JG (ed.) Proceedings of the 24th International Workshop on Statistical Modelling, pp. 337–344. Cornell University Press, Ithaca (2009).Google Scholar
- Sellers, KF, Shmueli, G: Data dispersion: Now you see it... now you don’t. Commun. Stat. Theory Methods. 42, 1–14 (2013).MathSciNetView ArticleMATHGoogle Scholar
- Sellers, KF: A generalized statistical control chart for over- or under-dispersed data. Qual. Reliab. Eng. Int. 28(1), 59–65 (2012).View ArticleGoogle Scholar
- Sellers, KF, Raim, A: A flexible zero-inflated model to address data dispersion. Comput. Stat. Data Anal. 99, 68–80 (2016).MathSciNetView ArticleGoogle Scholar
- Sellers, KF, Shmueli, G: A flexible regression model for count data. Ann. Appl. Stat. 4(2), 943–961 (2010).MathSciNetView ArticleMATHGoogle Scholar
- Sellers, KF, Morris, DS, Balakrishnan, N: Bivariate Conway-Maxwell-Poisson distribution: Formulation, properties, and inference. J. Multivar. Anal. 150, 152–168 (2016).MathSciNetView ArticleMATHGoogle Scholar
- Shmueli, G, Minka, TP, Kadane, JB, Borle, S, Boatwright, P: A useful distribution for fitting discrete data: revival of the Conway-Maxwell-Poisson distribution. Appl. Stat. 54, 127–142 (2005).MathSciNetMATHGoogle Scholar
- Venables, WN, Ripley, BD: Modern Applied Statistics with S. 4th edn. Springer, New York (2002).View ArticleMATHGoogle Scholar
- Wimmer, G, Köhler, R, Frotjahn, R, Altmann, G: Towards a theory of word length distribution. J. Quant. Linguist. 1(1), 98–106 (1994).View ArticleGoogle Scholar
- Zhu, L, Sellers, KF, Morris, DS, Shmuéli, G: Bridging the gap: A generalized stochastic process for count data. Am. Stat. 71(1), 71–80 (2017).MathSciNetView ArticleGoogle Scholar