A flexible univariate moving average time-series model for dispersed count data

Sellers, Kimberly F.; Arab, Ali; Melville, Sean; Cui, Fanyu

doi:10.1186/s40488-021-00115-2

Research
Open access
Published: 21 February 2021

A flexible univariate moving average time-series model for dispersed count data

Kimberly F. Sellers ORCID: orcid.org/0000-0001-6516-0548^1,2,
Ali Arab¹,
Sean Melville¹ &
…
Fanyu Cui¹

Journal of Statistical Distributions and Applications volume 8, Article number: 1 (2021) Cite this article

2801 Accesses
1 Citations
6 Altmetric
Metrics details

Abstract

Al-Osh and Alzaid (1988) consider a Poisson moving average (PMA) model to describe the relation among integer-valued time series data; this model, however, is constrained by the underlying equi-dispersion assumption for count data (i.e., that the variance and the mean equal). This work instead introduces a flexible integer-valued moving average model for count data that contain over- or under-dispersion via the Conway-Maxwell-Poisson (CMP) distribution and related distributions. This first-order sum-of-Conway-Maxwell-Poissons moving average (SCMPMA(1)) model offers a generalizable construct that includes the PMA (among others) as a special case. We highlight the SCMPMA model properties and illustrate its flexibility via simulated data examples.

Introduction

Integer-valued thinning-based models have been proposed to model time series data represented as counts. Al-Osh and Alzaid (1988) introduce a generally defined integer-valued moving average (INMA) process as an analog to the moving average (MA) model for continuous data which assumes an underlying Gaussian distribution. This INMA process instead utilizes a thinning operator that maintains an integer-valued range of possible outcomes. To form such a model, they consider the “survivals” of independent and identically distributed (iid) non-negative integer valued random innovations to maintain and ensure discrete data outcomes (Weiss 2021). Al-Osh and Alzaid (1988) particularly consider a first-order Poisson moving average (PMA(1)), i.e. a stationary sequence U_t of the form U_t=γ∘ε_t−1+ε_t where {ε_t} is a sequence of iid Poisson(η) random variables and $(\gamma \circ \epsilon) = \sum _{i=1}^{\epsilon } B_{i}$ for a sequence of iid Bernoulli(γ) random variables {B_i} independent of {ε}. By design, the PMA(1) is an INMA whose maximum stay time in the sequence is two time units. Consequently, components of U_t are dependent, while the components of ε_t and (γ∘ε_t−1) are independent.

Given the PMA(1) structure,

$$\begin{array}{@{}rcl@{}} E(U_{t}) = Var(U_{t}) = (1 + \gamma)\eta, \end{array} $$

(1)

and the covariance of consecutive variables Cov(U_t−1,U_t)=γη; this implies that the correlation is

$$\begin{array}{@{}rcl@{}} \rho_{U}(r) = Corr(U_{t-r}, U_{t}) = \left\{ \begin{array}{lll} \frac{\gamma}{1+\gamma} & r=1 \\ 0 & r> 1. \end{array} \right. \end{array} $$

(2)

Meanwhile, the probability generating function (pgf) of U_t is $\Phi _{U_{t}} (u) = e^{-\eta (1+\gamma)(1-u)}$, the joint pgf of {U₁,…,U_r} is $\Phi _{r}(u_{1}, \ldots, u_{r}) = \exp \left (-\eta \left [r + \gamma - (1-\gamma)\sum _{i=1}^{r}u_{i} - \gamma (u_{1} + u_{r}) - \gamma \sum _{i=1}^{r-1}u_{i} u_{i+1}\right ]\right)$ (which infers that time reversibility holds for the PMA), and the pgf of $T_{U,r}= \sum _{i=1}^{r}U_{i}$ is

$$\Phi_{T_{U,r}}(u)=\exp\left(-\eta\left[(1-\gamma)r+2\gamma\right](1-u)-\eta \gamma(r-1)(1-u^{2})\right).$$

Al-Osh and Alzaid (1988) note that T_U,r does not have a Poisson distribution, which is in contrast to the standard MA(1) process. The conditional mean and variance of U_t+1 given U_t=u are both linear in U_t, namely

$$\begin{array}{@{}rcl@{}} E(U_{t+1} \mid U_{t}= u) &=& \eta + \gamma u/(1+\gamma), \text{ and} \end{array} $$

(3)

$$\begin{array}{@{}rcl@{}} Var(U_{t+1} \mid U_{t}= u) &=& \eta + \gamma u/(1+\gamma)^{2}. \end{array} $$

(4)

The PMA is a natural choice for modeling an integer-valued process, in part because of its tractability (Al-Osh and Alzaid 1988). This model, however, is limited by its constraining equi-dispersion property, i.e. the assumption that the mean and variance of the underlying process equal. Real data do not generally conform to this construct (Hilbe 2014; Weiss 2018); they usually display over-dispersion relative to the Poisson model (i.e. where the variance is greater than the mean), however integer-valued data are surfacing with greater frequency that express data under-dispersion relative to Poisson (i.e. the variance is less than the mean). Accordingly, it would be fruitful to instead consider a flexible time series model that can accommodate data over- and/or under-dispersion.

Alzaid and Al-Osh (1993) introduce a first-order generalized Poisson moving average (GPMA(1)) process as an alternative to the PMA. The associated model has the form,

$$\begin{array}{@{}rcl@{}} W_{t}=Q^{*}_{t}\left(\epsilon^{*}_{t-1}\right)+\epsilon^{*}_{t}, & t = 0, 1, 2, \ldots, \end{array} $$

(5)

where $\left \{\epsilon ^{*}_{t}\right \}$ is a sequence of iid generalized Poisson GP(μ^∗,θ), and $\{Q^{*}_{t}(\cdot)\}$ is a sequence of quasi-binomial QB(p^∗,θ/μ^∗,·) random operators independent of $\{\epsilon ^{*}_{t}\}$. As with the PMA, W_t+r and W_t are independent for |r|>1. The marginal distribution of W_t is GP((1+p^∗)μ^∗,θ). Recognizing the relationship between moving average and autoregressive models, Alzaid and Al-Osh (1993) equate terms in this GPMA(1) model to their first-order generalized Poisson autoregressive (GPAR(1)) counterpart,

$$\begin{array}{@{}rcl@{}} W_{t}=Q_{t}(W_{t-1})+\epsilon_{t}, & t = 0, 1, 2, \ldots, \end{array} $$

(6)

where {ε_t} is a sequence of iid GP(qμ,θ) random variables where q=1−p, and {Q_t(·)} is a sequence of QB(p,θ/μ,·) random operators, independent of {ε_t}; i.e., they let μ=(1+p^∗)μ^∗ and $p = \frac {p^{*}}{1+p^{*}}$. The bivariate pgf of W_t+1 and W_t can thus be represented as

$$\begin{array}{@{}rcl@{}} \Phi_{W_{t+1}, W_{t}}(u_{1},u_{2}) &=& \exp\left[\mu^{*}\left(A_{\theta}(u_{1})+A_{\theta}(u_{2})-2\right)+\mu^{*}p^{*}\left(A_{\theta}(u_{1}u_{2})-1\right)\right] \end{array} $$

(7)

$$\begin{array}{@{}rcl@{}} &=& \exp\left[\mu q\left(A_{\theta}(u_{1})+A_{\theta}(u_{2})-2\right)+\mu p\left(A_{\theta}(u_{1}u_{2})-1\right)\right], \end{array} $$

(8)

where A_θ(s) is the inverse function that satisfies A_θ(se^−θ(s−1))=s; see Alzaid and Al-Osh (1993). This substitution in Eq. (7) to obtain Eq. (8) further illustrates the relationship between the GPMA(1) and GPAR(1) models such that they have the same joint pgf. Eq. (8) and the related GPAR work of Alzaid and Al-Osh (1993) therefore show that

$$\begin{array}{@{}rcl@{}} E(W_{t} \mid W_{t-1} =w)=pw+\frac{q\mu}{1-\theta}. \end{array} $$

(9)

The joint pgf of ($W_{t}, W_{t-1}, \dots, W_{t-r+1}$) is given by

$$\begin{array}{@{}rcl@{}} \Phi(u_{1}, \dots, u_{r})= \exp\left[\mu q\sum_{i=1}^{{r}}(A_{\theta}(u_{i})-1)+\mu p\sum_{i=1}^{{r}}\left(A_{\theta}(u_{i} u_{i+1})-1\right)\right]. \end{array} $$

(10)

From the joint pgf, we see that the GPMA(1) is also time-reversible, because it has the same dynamics if time is reversed. Further, the pgf associated with the total counts occurring during time lag r (i.e. $T_{W,r}=\sum _{i=1}^{r}W_{t-r+i}$) is $\Phi _{T_{w,r}}(u)=\exp \left [\mu q r(A_{\theta }(u)-1) + \mu p(r-1)(A_{\theta }(u^{2})-1)\right ]$. Alzaid and Al-Osh (1993) note that this result extends the analogous PMA result to the broader GPMA(1) model. Finally, the GPMA autocorrelation function is

$$\rho_{W}(r) = \text{Corr}(W_{t}, W_{t+r}) = \left\{ \begin{array}{ll} {p} & |r|=1 \\ 0 & |r|> 1, \end{array} \right. $$

where p=p^∗(1+p^∗)⁻¹; by definition, ρ_W(r)∈[0,0.5] (Alzaid and Al-Osh 1993).

Even though the GPMA can be considered to model over- or under-dispersed count time series, it may not be a viable option for count data that express extreme under-dispersion; see, e.g. Famoye (1993). This work instead introduces another alternative for modeling integer-valued time series data. The subsequent writing proceeds as follows. We first provide background regarding the probability distributions that motivate the development of our flexible INMA model. Then, we introduce the SCMPMA(1) model to the reader and discuss its statistical properties. The subsequent section illustrates the model flexibility through simulated and real data examples. Finally, the manuscript concludes with discussion.

Motivating distributions

While the above constructs show increased ability and improvement towards modeling integer-valued time series data with various forms of dispersion, each of the models suffers from respective limitations. In order to develop and describe our SCMPMA(1), we first introduce its underlying motivating distributions: the CMP distribution and its generalized sum-of-CMPs distribution (sCMP), as well as the Conway-Maxwell-Binomial (CMB) along with a generalized CMB (gCMB) distribution.

The Conway-Maxwell-Poisson distribution and its generalization

The Conway-Maxwell-Poisson (CMP) distribution (introduced by Conway and Maxwell (1962), and revived by Shmueli et al. (2005)) is a viable count distribution that generalizes the Poisson distribution in light of potential data dispersion. The CMP probability mass function (pmf) takes the form

$$ P(X=x \mid \lambda, \nu) = \frac{\lambda^{x}}{(x!)^{\nu} \zeta(\lambda,\nu)}, \;\;\; x=0,1,2,\ldots, $$

(11)

for a random variable X, where λ=E(X^ν)≥0,ν≥0 is the associated dispersion parameter, and $\zeta (\lambda, \nu) = \sum _{s=0}^{\infty } \frac {\lambda ^{s}}{(s!)^{\nu }}$ is the normalizing constant. The CMP distribution includes three well-known distributions as special cases, namely the Poisson (ν=1), geometric (ν=0,λ<1), and Bernoulli $\left (\nu \rightarrow \infty \text { with probability} \frac {\lambda }{1+\lambda } \right)$ distributions.

The associated pgf of X is $\Phi _{X}(u) = E(u^{X}) = \frac {\zeta (\lambda u, \nu)}{\zeta (\lambda, \nu)}$, and its moment generating function (mgf) is $\mathrm {M}_{X}(u) = E(e^{Xu}) = \frac {\zeta (\lambda e^{u}, \nu)}{\zeta (\lambda, \nu)}$. The moments can meanwhile be represented recursively as

$$\begin{array}{@{}rcl@{}} E(X^{g+1}) = \left\{ \begin{array}{ll} \lambda [E(X+1)]^{1-\nu}, & g=0\\ \lambda \frac{\partial}{\partial\lambda} E(X^{g}) + E(X)\mathrm{E}(X^{g}), &g>0. \end{array} \right. \end{array} $$

(12)

In particular, the expected value and variance can be written in the form and approximated respectively as

$$\begin{array}{*{20}l} E(X) &= \frac{\partial \ln \zeta(\lambda, \nu)}{\partial \ln \lambda} \approx \lambda^{1/\nu} - \frac{\nu - 1}{2\nu}, \end{array} $$

(13)

$$\begin{array}{*{20}l} Var(X) &= \frac{\partial E(X)}{\partial\ln\lambda} \approx \frac{1}{\nu} \lambda^{1 / \nu}, \end{array} $$

(14)

where the approximations are especially good for ν≤1 or λ>10^ν (Shmueli et al. 2005). This distribution is a member of the exponential family, where the joint pmf of the random sample x=(x₁,…,x_N) is

$$\begin{array}{@{}rcl@{}} P(\boldsymbol{x} \mid \lambda, \nu) &=& \frac{\prod_{i=1}^{N} \lambda^{x_{i}}}{\prod_{i=1}^{N} x_{i}!}\cdot \zeta^{-N}(\lambda, \nu) = \lambda^{S_{1}} \exp(-\nu S_{2})\zeta^{-N}(\lambda, \nu), \end{array} $$

where $S_{1} = \sum _{i=1}^{N}x_{i}$ and $S_{2} = \sum _{i=1}^{N} \log (x_{i}!)$ are joint sufficient statistics for λ and ν. Further, because the CMP distribution belongs to the exponential family, the conjugate prior distribution has the form, h(λ,ν)=λ^a−1e^−νbζ^−c(λ,ν)δ(a,b,c), where λ>0,ν≥0, and δ(a,b,c) is a normalizing constant such that $\delta ^{-1}(a,b,c) = \int _{0}^{\infty } \int _{0}^{\infty } \lambda ^{a-1} e^{-b \nu } \zeta ^{-c}(\lambda, \nu) d\lambda d\nu < \infty $.

Meanwhile, letting $X_{*} = \sum _{i=1}^{n} X_{i}$ for iid random variables X_i∼CMP(λ,ν),i=1,…,n, we say that X_∗ is distributed as a sum-of-CMPs [denoted sCMP(λ,ν,n)] variable, and has the pmf

$$\begin{array}{@{}rcl@{}} P(X_{*} = x_{*}) = \frac{\lambda^{x_{*}}}{({x_{*}}!)^{\nu} \zeta^{n}(\lambda, \nu)} \sum_{\stackrel {a_{1},\ldots,a_{n} = 0} {a_{1} + \ldots + a_{n} = {x_{*}}} }^{{x_{*}}} {{x_{*}} \choose a_{1}, \hspace{0.05in} \cdots, \hspace{0.05in} a_{n}}^{\nu},\ {x_{*}}=0,1,2,\ldots, \end{array} $$

where ζⁿ(λ,ν) is the nth power of ζ(λ,ν), and ${{x_{*}} \choose a_{1}, \hspace {0.05in} \cdots, \hspace {0.05in} a_{n}} = \frac {{x_{*}}!}{a_{1}! \cdots a_{n}!}$ is a multinomial coefficient. The sCMP(λ,ν,n) distribution encompasses the Poisson distribution with rate parameter nλ (for ν=1), negative binomial(n,1−λ) distribution (for ν=0 and λ<1), and Binomial(n,p) distribution $\left (\text {as}\ \nu \rightarrow \infty \text { with success probability} p=\frac {\lambda }{\lambda + 1}\right)$ as special cases. Further, for n=1, the sCMP(λ,ν,n=1) is simply the CMP(λ,ν) distribution.

The mgf and pgf for a sCMP(λ,ν,n) random variable X_∗ are

$$M_{X_{*}}(t) = \left(\frac{\zeta(\lambda e^{t}, \nu)}{\zeta(\lambda, \nu)} \right)^{n} \text{ and}\ \Phi_{X_{*}}(t) = \left(\frac{\zeta(\lambda t, \nu)}{\zeta(\lambda, \nu)} \right)^{n},$$

respectively; accordingly, the sCMP(λ,ν) has mean E(X_∗)=nE(X) and variance V(X_∗)=nV(X), where E(X) and V(X) are defined in Eqs. (13)-(14), respectively. Invariance under addition holds for two independent sCMP distributions with the same rate and dispersion parameters. See Sellers et al. (2017) for additional information regarding the sCMP distribution.

The Conway-Maxwell-Binomial distribution and its generalization

The Conway-Maxwell-Binomial distribution of Kadane (2016) (also known as the Conway-Maxwell-Poisson-Binomial distribution by Borges et al. (2014)) is a three-parameter generalization of the Binomial distribution. Denoted as CMB(d,p,ν) distributed, its pmf is

$$\begin{array}{@{}rcl@{}} P(Y=y) = \frac{{d \choose y}^{\nu} p^{y}(1-p)^{d-y}}{\chi(p,\nu, d)}, \text{ } y = 0, \ldots, d \end{array} $$

(15)

for some random variable Y where $0 \le p \le 1, \nu \in \mathbb {R}$, and $\chi (p, \nu, d) = \sum _{y=0}^{d} {d \choose y}^{\nu } p^{y}(1-p)^{d-y}$ is the associated normalizing constant. The Binomial(d,p) distribution is the special case of the CMB(d,p,ν) where ν=1. Meanwhile, ν>(<)1 corresponds to under-dispersion (over-dispersion) relative to the Binomial distribution. For ν→∞, the pmf is concentrated on the point dp while, for ν→−∞, the pmf is concentrated at 0 or d. For independent X_i∼CMP(λ_i,ν),i=1,2, the conditional distribution of X₁ given that X₁+X₂=d has a $\text {CMB}\left (d, \frac {\lambda _{1}}{\lambda _{1}+\lambda _{2}}, \nu \right)$ distribution.

The pgf and mgf of Y have the form,

$$\begin{array}{@{}rcl@{}} \Phi_{Y}(u) = E\left(u^{Y}\right) = \frac{\tau\left(\frac{up}{1-p}, \nu, d\right)}{\tau\left(\frac{p}{1-p}, \nu, d\right)} \hspace{.25in} \text{and} \hspace{.25in} M_{Y}(u) = \frac{\tau\left(\frac{pe^{u}}{1-p}, \nu, d\right)}{\tau\left(\frac{p}{1-p}, \nu, d\right)}, \end{array} $$

(16)

respectively, where $\tau (\theta _{*}, \nu, d) = \sum _{y=0}^{d} {d \choose y}^{\nu } \theta _{*}^{y}$ for some θ_∗. The CMB distribution is a member of the exponential family whose joint pmf of the random sample y={y₁,…,y_N} is

$$\begin{array}{@{}rcl@{}} P(\boldsymbol{y} \mid p, \nu) & \propto & (1-p)^{dN} \prod_{i=1}^{N} \left(\frac{p}{1-p} \right)^{y_{i}} \frac{d!^{N\nu}}{[y_{i}!(d-y_{i})!]^{\nu}}\\ & \propto & \exp\left(S_{*1} \log \left(\frac{p}{1-p}\right) - \nu S_{*2} \right), \end{array} $$

where $S_{*1} = \sum _{i=1}^{N}y_{i}$ and $S_{*2} = \sum _{i=1}^{N} \log [y_{i}! (d-y_{i})!]$ are the joint sufficient statistics for p and ν. Further, its existence as a member of the exponential family implies that a conjugate prior family exists of the form,

$$h(\theta_{*}, \nu) = \theta_{*}^{a-1}e^{-\nu b}\omega^{-c}(\theta_{*}, \nu)\psi(a,b,c), \hspace{.25in} 0 < \theta_{*} < \infty, \text{} 0 < \nu < \infty, $$

where $\omega (\theta _{*}, \nu) = \sum _{y=0}^{d} \theta _{*}^{y}/[y!(d-y)!]^{\nu }, \psi ^{-1}(a,b,c) = \int _{0}^{\infty } \int _{0}^{\infty } \theta _{*}^{a-1} e^{-\nu b} \omega ^{-c}(\theta _{*}, \nu)d\theta _{*} d\nu < \infty $ (Kadane 2016).

Sellers et al. (2017) further introduce a generalized Conway-Maxwell-Binomial (gCMB) distribution whose pmf is

$$ \begin{aligned} P(Z=z) \propto {s \choose z}^{\nu} p^{z} (1-p)^{s-z} {\left[{\sum_{\stackrel {a_{1},\ldots,a_{n_{1}} = 0} {a_{1} + \ldots + a_{n_{1}} = z} }^{z} {z \choose {a_{1}, \hspace{0.05in} \dots, \hspace{0.05in} a_{n_{1}}} }^{\nu}} \right] } {\left[{\sum_{\stackrel {b_{1}, \ldots, b_{n_{2}} = 0} {b_{1} + \ldots + b_{n_{2}} = s-z} }^{s-z} {s-z \choose {b_{1}, \hspace{0.05in} \dots, \hspace{0.05in} b_{n_{2}}} }^{\nu}} \right] } \end{aligned} $$

(17)

for a random variable Z with parameters (p,ν,s,n₁,n₂). As with the conditional probability of a CMP random variable given the sum of it and another independent CMP random variable sharing the same dispersion parameter, a special case of a gCMB distribution can be derived as the conditional distribution of X_∗1, given the sum X_∗1+X_∗2=d for independent sCMP random variables, X_∗i∼ sCMP (λ_i,ν,n_i),i=1,2; the resulting distribution is analogously a gCMB$\left (\frac {\lambda _{1}}{\lambda _{1}+\lambda _{2}},\nu, d, n_{1}, n_{2}\right)$ distribution. The gCMB distribution contains several special cases, including the CMB (d,p,ν) distribution (for n₁=n₂=1); the Binomial (d,p) distribution (when n₁=n₂=1 and ν=1); and, for λ₁=λ₂=λ, the hypergeometric distribution when ν→∞ and the negative hypergeometric distribution when ν=0 and λ<1.

First-order sCMP time series models

This section highlights two first-order models for discrete time series data that have a sCMP marginal distribution, namely the first-order sCMP autoregressive (SCMPAR(1)) model, and a first-order SCMP moving average (SCMPMA(1)) model with the same marginal distribution structure.

First-order sCMP autoregressive (SCMPAR(1)) model

Sellers et al. (2020) introduce a first-order sCMP autoregressive (SCMPAR(1)) model to describe count data correlated in time that express over- or under-dispersion. Based on the sCMP and gCMB distributions, respectively (as described in the “Motivating distributions” section with more detail available in Sellers et al. (2017)), we use the sCMP distribution to model the marginals of the first-order integer-valued autoregressive (INAR(1)) process as

$$\begin{array}{@{}rcl@{}} X_{t}=C_{t}(X_{t-1})+\epsilon_{t} \hspace{.3in} t=1, 2, \ldots, \end{array} $$

(18)

where ε_t∼sCMP(λ,ν,n₂), and {C_t(∙): t=1,2,…} is a sequence of independent gCMB$\left (\frac {1}{2}, \nu, \bullet, n_{1}, n_{2}\right)$ operators, independent of {ε_t}. This flexible INAR(1) model contains the first-order Poisson autoregressive (PAR(1)) as described in several references (Al-Osh and Alzaid 1987; McKenzie 1988; Weiss 2008), and the first-order binomial auto-regressive model of Al-Osh and Alzaid (1991) as special cases. It likewise contains an INAR(1) model that allows for negative binomial marginals with a thinning operator whose pmf is negative hypergeometric.

The SCMPAR(1) model is yet another special case of the infinitely divisible convolution-closed class of first-order autoregressive (AR(1)) models described in Joe (1996), and satisfies the Markov property with the transition probability,

$$ \begin{aligned} P(X_{t}|X_{t-1}) &= \sum_{k=0}^{\min(x_{t}, x_{t-1})} \frac{ {x_{t-1} \choose k}^{\nu} \left[ \sum_{\stackrel {a_{1},\ldots,a_{n_{1}} = 0} {a_{1} + \ldots + a_{n_{1}} = k} }^{k} {k \choose {a_{1},\dots,a_{n_{1}}} }^{\nu} \right] \left[ \sum_{\stackrel {b_{1},\ldots,b_{n_{2}} = 0} {b_{1} + \ldots + b_{n_{2}} = x_{t-1}-k} }^{x_{t-1}-k} {x_{t-1}-k \choose {b_{1},\dots,b_{n_{2}}} }^{\nu} \right] }{\sum_{\stackrel {c_{1},\ldots,c_{{n_{1}}+{n_{2}}} = 0} {c_{1} + \ldots + c_{{n_{1}}+{n_{2}}} = x_{t-1}} }^{x_{t-1}} {x_{t-1} \choose c_{1} \ldots c_{{n_{1}}+{n_{2}}}}^{\nu}} \\ & \times \frac{{\lambda}^{{x_{t}}-k}}{[(x_{t}-k)!]^{\nu} Z^{n_{2}}({\lambda}, \nu)} {\sum_{\stackrel {d_{1}, \ldots, d_{n_{2}} = 0} {d_{1} + \ldots + d_{n_{2}} = x_{t-1}-k} }^{x_{t}-k} {x_{t} - k \choose {d_{1}, \dots, d_{n_{2}}} }^{\nu} }. \end{aligned} $$

(19)

The SCMPAR(1) model has an ergodic Markov chain, thus X_t has a stationary sCMP (λ,ν,n₁+n₂) distribution that is unique. The joint pgf associated with the SCMPAR(1) model is

$$\begin{array}{@{}rcl@{}} {}\phi_{X_{t+1},X_{t}} (u,l) = \frac{\zeta^{n_{2}}(\lambda u, \nu)}{\zeta^{n_{2}}(\lambda, \nu)} \frac{\zeta^{n_{1}}(\lambda ul, \nu)}{\zeta^{n_{1}}(\lambda, \nu)} \frac{\zeta^{n_{2}}(\lambda l, \nu)}{\zeta^{n_{2}}(\lambda, \nu)} = \frac{\left(\zeta(\lambda u, \nu)\zeta(\lambda l, \nu)\right)^{n_{2}}\zeta^{n_{1}}(\lambda ul, \nu)}{\zeta^{n_{1} + 2{n_{2}}}(\lambda, \nu)}, \end{array} $$

(20)

where the pgf is symmetric in u and l, and hence the joint distribution of X_t+1 and X_t is time reversible. The regression form for the SCMPAR(1) process can be determined, and the general autocorrelation function for the process {X_t} is $\rho _{r} = \text {Corr}(X_{t}, X_{t-r}) = \left (\frac {n_{1}}{n_{1}+n_{2}}\right)^{r}$ for r=0,1,2,…. Parameter estimation can be conducted via conditional maximum likelihood with statistical computation tools (e.g. in R); see Sellers et al. (2020) for details.

Introducing the sCMPMA(1) model

Motivated by the SCMPAR(1) model of Sellers et al. (2020), we introduce a first-order sum-of-CMPs moving average (SCMPMA(1)) process X_t by

$$\begin{array}{@{}rcl@{}} X_{t} = C^{*}_{t}\left(\epsilon^{*}_{t-1}\right) + \epsilon^{*}_{t}, \hspace{.3in} t=1, 2, \ldots, \end{array} $$

(21)

where ${\epsilon ^{*}_{t}}$ is a sequence of iid sCMP (λ,ν,m₁+m₂) random variables and $C^{*}_{t}(\bullet)$ is a sequence of independent gCMB (1/2,ν,∙,m₁,m₂) operators independent of ${\epsilon ^{*}_{t}}$. By definition, X_t is a stationary process with the sCMP (λ,ν,2m₁+m₂) distribution, and X_t+r and X_t are independent for |r|>1. While this model can analogously be viewed as a special case of the infinitely divisible convolution-closed class of discrete MA models (Joe 1996), unlike the sCMPAR(1) process, the sCMPMA(1) process is not Markovian.

The autocorrelation between X_t and X_t+1 is

$$\begin{array}{@{}rcl@{}} \rho_{1} = \text{Corr}(X_{t}, X_{t+1}) &=& \frac{\text{Cov}\left(C^{*}_{t}\left(\epsilon^{*}_{t-1}\right) + \epsilon^{*}_{t}, C^{*}_{t+1}\left(\epsilon^{*}_{t}\right) + \epsilon^{*}_{t+1}\right)}{\sqrt{\text{Var}(X_{t})\text{Var}(X_{t+1})}}\\ &=& \frac{\text{Cov}\left(\epsilon^{*}_{t}, C^{*}_{t+1}\left(\epsilon^{*}_{t}\right)\right)}{\sqrt{\text{Var}(X_{t})\text{Var}(X_{t+1})}} \text{by the independence assumptions}, \end{array} $$

where $C^{*}_{t+1}(\epsilon ^{*}_{t}) = \sum _{i=1}^{m_{1}} Y_{i}$ and $\epsilon ^{*}_{t} = \sum _{i=1}^{m_{1}+m_{2}} Y_{i}$, respectively, are sCMP (λ,ν,m₁) and sCMP (λ,ν,m₁+m₂) random variables; i.e. each sCMP random variable can be viewed as respective sums of iid CMP (λ,ν) random variables, Y_i. Thus,

$$\begin{array}{@{}rcl@{}} \text{Cov}\left(\epsilon^{*}_{t}, C^{*}_{t+1}\left(\epsilon^{*}_{t}\right)\right) = \text{Cov}\left(\sum_{i=1}^{m_{1}+m_{2}} Y_{i}, \sum_{i=1}^{m_{1}} Y_{i}\right) = \text{Var}\left(\sum_{i=1}^{m_{1}} Y_{i}\right) = m_{1}\text{Var}(Y), \end{array} $$

where, without loss of generality, we let Y denote any of the iid Y_i random variables. Meanwhile, because {X_t} is a sCMP (λ,ν,2m₁+m₂) distributed stationary process, we can likewise represent $\text {Var}(X_{t}) = \text {Var}\left (\sum _{i=1}^{2m_{1}+m_{2}} Y_{i}\right) = \sum _{i=1}^{2m_{1}+m_{2}} \text {Var}(Y_{i}) = (2m_{1} + m_{2})\text {Var}(Y)$ for all t. We therefore find that

$$\begin{array}{@{}rcl@{}} \rho_{1} = \text{Corr}\left(X_{t}, X_{t+1}\right) = \frac{\text{Cov}\left(\epsilon^{*}_{t}, C^{*}_{t+1}\left(\epsilon^{*}_{t}\right)\right)}{\sqrt{\text{Var}\left(X_{t}\right)\text{Var}(X_{t+1})}} = \frac{m_{1}\text{Var}(Y)}{(2m_{1}+m_{2})\text{Var}(Y)} = \frac{m_{1}}{2m_{1}+m_{2}}. \end{array} $$

(22)

Because m₁,m₂≥1, the one-step range of possible correlation values is 0≤ρ₁≤0.5. In particular, for m₁=m₂, we have the special case where ρ₁=1/3. Meanwhile, ρ_k=0 for all k>1 because, by definition of the SCMPMA(1) model assumptions, there is no dependent structure between X_t and X_t+r for r>1.

Recall from the “The Conway-Maxwell-Poisson distribution and its generalization” section that $\Phi _{G}(w) \,=\, \left (\frac {\zeta (\lambda w, \nu)}{\zeta (\lambda, \nu)} \right)^{\pi }$ is the pgf for a sCMP(λ,ν,π) distributed random variable, (say) G. Using this knowledge along with Eq. (21), the joint pgf can be derived as

$$\begin{array}{@{}rcl@{}} \phi_{X_{t+1},X_{t}} (u,l) &=& E\left(u^{X_{t+1}} l^{X_{t}}\right)\\ &=&E\left(u^{C^{*}\left(\epsilon^{*}_{t}\right) + \epsilon^{*}_{t+1}} l^{X_{t}}\right)\\ &=&E\left(u^{C^{*}\left(\epsilon^{*}_{t}\right) + \epsilon^{*}_{t+1}} l^{X_{t} - C^{*}({\epsilon^{*}_{t})}} l^{C^{*}(\epsilon^{*}_{t})}\right)\\ &=& E\left((u l)^{C^{*}\left(\epsilon^{*}_{t}\right)} u^{\epsilon^{*}_{t+1}} l^{X_{t} - C^{*}\left(\epsilon_{t}\right)} \right) \text{where}\ X_{t} - C^{*}\left(\epsilon^{*}_{t}\right) \stackrel{d}{=} \epsilon^{*}_{t}\\ &=&E\left((u l)^{C^{*}\left(\epsilon^{*}_{t}\right)}\right) E\left(u^{\epsilon^{*}_{t+1}}\right)E\left(l^{\epsilon^{*}_{t}}\right) \text{ by independence}\\ &=& \phi_{C^{*}\left(\epsilon^{*}_{t}\right)} (u l) \phi_{\epsilon^{*}_{t+1}}(u) \phi_{\epsilon^{*}_{t}} (l) \\ &=& \left(\frac{\zeta(\lambda u l, \nu)}{\zeta(\lambda, \nu)} \right)^{m_{1}} \left(\frac{\zeta(\lambda u, \nu)}{\zeta(\lambda, \nu)} \right)^{m_{1} + m_{2}} \left(\frac{\zeta(\lambda l, \nu)}{\zeta(\lambda, \nu)} \right)^{m_{1} + m_{2}}\\ &=& \frac{\left(\zeta(\lambda u, \nu)\zeta(\lambda l, \nu)\right)^{m_{1}+m_{2}}\left(\zeta(\lambda u l, \nu)\right)^{m_{1}}}{\left(\zeta(\lambda, \nu)\right)^{3m_{1} + 2m_{2}}}, \end{array} $$

(23)

where Eq. (23) is equivalent to Eq. (20) (i.e. the SCMPMA(1) process is comparable to the SCMPAR(1) process) when m₁=n₁=n₂−m₂. Given this comparison, we can easily determine the conditional mean E(X_t+1∣X_t=x) and conditional variance Var(X_t+1∣X_t=x). Eq. (23) further demonstrates that the SCMPMA(1) model is time-reversible.

Parameter estimation via maximum likelihood (ML) is a difficult task with INMA models given the complex form of the underlying distributions. Even a conditional least squares approach does not appear to be feasible “because of the thinning operators, unless randomization is used” (Brännäs and Hall 2001). We therefore instead consider the following ad hoc procedure for parameter estimation. Given a data set with an observed correlation ρ₁, we first propose values for $m_{1}, m_{2} \in \mathbb {N}$ that satisfy the constraint, $\rho _{1} \approx \frac {m_{1}}{2m_{1} + m_{2}}$. Given m₁ and m₂ and recognizing that X_t is stationary with a sCMP(λ,ν,2m₁+m₂) distribution, we proceed with ML estimation to determine $\hat {\lambda }$ and $\hat {\nu }$ as described in Zhu et al. (2017) for conducting sCMP(λ,ν,s=2m₁+m₂) parameter estimation with regard to a CMP process over an interval of length s≥1. The corresponding variation for $\hat {\lambda }$ and $\hat {\nu }$ can be quantified via the Fisher information matrix or nonparametric bootstrapping. While the sampling distribution for $\hat {\lambda }$ is approximately symmetric, the sampling distribution for $\hat {\nu }$ is considerably right-skewed, hence analysts are advised to quantify estimator variation via nonparametric bootstrapping. While this is a means to an end, it only achieves in determining an appropriate distributional form regarding the data; it does not fully address the nature of the time series.

Data examples

To illustrate the flexibility of our INMA model, we consider various data simulations and a real data example. Below contains the respective details and associated commentary.

Simulated data examples

Table 1 reports the estimated mean, variance, and autocorrelation that result from various data simulations of SCMPMA(1) data given parameters (λ,ν,m₁,m₂). In all examples, we let λ=0.5,m₁,m₂∈{1,2}, and ν={0,0.5,1,2,35}, where ν=0 captures the case of extreme over-dispersion, ν=1 denotes equi-dispersion, and ν=35 sufficiently illustrates the case computationally of utmost under-dispersion where ν→∞.

Table 1 Estimated mean, variance, and autocorrelation for various SCMPMA(1) data simulations of length 10,000 given parameters, (λ,ν,m₁,m₂); λ=0.5 for all simulations

Full size table

For all examples, we find that the associated mean and variance compare with each other as expected, i.e. the variance is greater than the mean when ν<1 (i.e. the data are over-dispersed), the variance and mean are approximately equal when ν=1 (i.e. equi-dispersion holds), and the variance is less than the mean (i.e. the data are under-dispersed) when ν>1. In particular, we can easily verify that the three special case models perform as expected. For the Poisson cases (ν=1), we expect the mean and variance to both equal (2m₁+m₂)λ, while the binomial cases (i.e. ν→∞ and $p=\frac {\lambda }{\lambda +1}$) produce a mean equal to $(2m_{1} +m_{2})\frac {\lambda }{\lambda +1}$ and variance equaling $(2m_{1}+m_{2}) \frac {\lambda }{\lambda +1} \left (1-\frac {\lambda }{\lambda +1}\right)$, and the negative binomial cases (ν=0 with p=1−λ) have a mean of $\frac {(2m_{1} +m_{2})\lambda }{1-\lambda }$ and variance equaling $\frac {(2m_{1} +m_{2})\lambda }{(1-\lambda)^{2}}$. In fact, even with the ν→∞ case approximated by letting ν=35, we still obtain reasonable estimates for the mean and variance for all of the associated cases of m₁ and m₂.

For each {m₁,m₂} pair, the mean and variance both decrease as ν increases while, for all of the considered examples, we obtain estimated correlation values $\hat {\rho }$ that approximately equal the true correlation, ρ. In particular, for those cases where m₁=m₂, we obtain $\hat {\rho } \approx 1/3$ as expected (see Eq. (22)).

Real data example: IP address counts

Weiss (2007) considers a modified dataset regarding the number of unique IP-addresses which access the University of Wurzburg Department of Statistics’s webpages in 240 two-minute intervals. Collected on November 29, 2005 (from 10:00:00 to 18:00:00), these data have an associated mean and variance equaling 1.286 and 1.205, respectively. Weiss (2007) considers a PAR(1) model, noting that “the empirical partial autocorrelation function indicates that a first order [autoregressive] model may be an appropriate choice” with $\hat {\rho }_{1}=0.292$; Sellers et al. (2020), following suit, consider a SCMPAR(1) model as a flexible alternative to the PAR(1) model. The ACF and PACF plots of these data, however, do not clearly distinguish between considering a first-order autoregressive or a moving average model; see Fig. 1a-b. Further, recognizing that the data express apparent under- to equi-dispersion, we therefore consider the SCMPMA(1) as an illustrative model for analysis.

We perform ML estimation assuming various combinations for (m₁,m₂) (i.e. {(1,1), (1,2), (2,2)}) as these values contain the observed correlation, $0.25 = \frac {1}{4} < \hat {\rho }_{1} < \frac {1}{3} \approx 0.33$. Table 2 contains the resulting parameter estimates for λ and ν, along with the respective Akaike Information Criterion (AIC). While the SCMPMA(1) model with m₁=m₂=2 has the lowest AIC among the four models considered, all of these models produce approximately equal AIC values (i.e. 695.2) where the increasing m₁ and m₂ values associate with decreasing $\hat {\lambda }$ and increasing $\hat {\nu }$. This makes sense because the resulting estimates rely solely on the assumed underlying sCMP (λ,ν,2m₁+m₂) distributional form for the data.

Table 2 Estimated parameters, the 95% confidence intervals for λ and ν derived from nonparametric bootstrapping, and Akaike Information Criterion (AIC) values for various SCMPMA(1) models for the IP data

Full size table

The dispersion estimates in Table 2 are all greater than 1, thus implying a perceived level of data under-dispersion. These results naturally stem from the reported mean of the data (1.286) being greater than its corresponding variance (1.205). Their associated 95% confidence intervals (determined via nonparametric bootstrapping; also supplied in Table 2), however, are sufficiently large such that they contain ν=1. This suggests that the apparent data under-dispersion is not statistically significant, thus instead suggesting that the data can be analyzed via the Al-Osh and Alzaid (1988) PMA(1) model. It is further striking to see that the respective 95% confidence intervals associated with the dispersion parameter increase with the size of the underlying sCMP(2m₁+m₂) model. This is an artifact of the (s)CMP distribution, namely that the distribution of ν is a right-skewed distribution (as discussed in Zhu et al. (2017)). This approach confirms interest in the PMA(1) model where Eqs. (1)-(2) imply that associated estimated parameters are $\hat {\gamma } \approx 0.4124$ and $\hat {\eta } \approx 0.9105$. Thus, we benefit from the SCMPMA(1) as a tool for parsimonious model determination.

Discussion

This work utilizes the sCMP distribution of Sellers et al. (2017) to develop a SCMPMA(1) model that serves as a flexible moving average time series model for discrete data where data dispersion is present. The SCMPMA(1) model captures the PMA(1), as well as versions of a negative binomial and binomial MA(1) structure, respectively, as special cases. This along with the flexible SCMPAR(1) can be used further to derive broader auto-regressive moving average (ARMA) and auto-regressive integrated moving average (ARIMA) models based on the sCMP distribution.

The SCMPMA(1) shares many properties with the analogous SCMPAR(1) model by Sellers et al. (2020). The presented models rely on predefining discrete values (i.e. m₁,m₂ for the SCMPMA(1)) for parameter estimation. As done in Sellers et al. (2017) and Sellers and Young (2019), we utilize a profile likelihood approach where, given m₁ and m₂, we estimate the remaining model coefficients and then identify that collection of parameter estimates that produces the largest likelihood, thus identifying these parameter estimates as the MLEs. While this profile likelihood approach is acceptable as demonstrated in other applications, directly estimating m₁,m₂ along with the other SCMPMA(1) model estimates would likewise prove beneficial, as would redefining the model to allow for real-valued estimators for m₁ and m₂. These generalizations and estimation approaches can be explored in future work.

Simulated data examples illustrate that the SCMPMA(1) model can obtain unbiased estimates, and the model demonstrates potential for accurate forecasts given data containing any measure of data dispersion. The real data illustration, however, highlights the complexities that come with parameter estimation. While we nonetheless present a means towards achieving this goal, this approach does not perform but so strongly with regard to prediction and forecasting. It nonetheless serves as a starting point for parameter estimation that we will continue to investigate in future work. Moreover, the flexibility of the SCMPMA(1) aids in determining a parsimonious model form as appropriate.

Availability of data and materials

Simulated data can vary given the generation process. Simulation code(s) can be supplied upon request. IP data set obtained from Dr. Christian Weiss of Helmut Schmidt University.

Abbreviations

AR(1):: First-order autoregressive
ARIMA:: Auto-regressive integrated moving average
ARMA:: Auto-regressive moving average
CMB:: Conway-Maxwell-Binomial
CMP:: Conway-Maxwell-Poisson
gCMB:: Generalized Conway-Maxwell-Binomial
GPAR(1):: First-order generalized Poisson autoregressive
GPMA(1):: First-order generalized Poisson moving average
INAR(1):: First-order integer-valued autoregressive
INMA:: Integer-valued moving average
MA:: Moving average
mgf:: Moment generating function
PAR(1):: First-order Poisson autoregressive
pgf:: Probability generating function
PMA:: Poisson moving average
PMA(1):: First-order Poisson moving average
QB:: Quasi-binomial
sCMP:: Sum-of-Conway-Maxwell-Poisson
SCMPAR(1):: First-order sum-of-Conway-Maxwell-Poisson autoregressive
SCMPMA(1):: First-order sum-of-Conway-Maxwell-Poissons moving average

References

Al-Osh, M. A., Alzaid, A. A.: First-order integer valued autoregressive (INAR(1)) process. J. Time Ser. Anal. 8(3), 261–275 (1987).
Article MathSciNet Google Scholar
Al-Osh, M. A., Alzaid, A. A.: Integer-valued moving average (INMA) process. Stat. Pap. 29(1), 281–300 (1988).
Article MathSciNet Google Scholar
Al-Osh, M. A., Alzaid, A. A.: Binomial autoregressive moving average models. Commun. Stat. Stoch. Model. 7(2), 261–282 (1991).
Article MathSciNet Google Scholar
Alzaid, A. A., Al-Osh, M. A.: Some autoregressive moving average processes with generalized Poisson marginal distributions. Ann. Inst. Stat. Math. 45(2), 223–232 (1993).
Article MathSciNet Google Scholar
Borges, P., Rodrigues, J., Balakrishnan, N., Bazán, J.: A COM-Poisson type generalization of the binomial distribution and its properties and applications. Stat. Probab. Lett. 87, 158–166 (2014).
Article MathSciNet Google Scholar
Brännäs, K., Hall, A.: Estimation in integer-valued moving average models. Appl. Stoch. Model. Bus. Ind. 17, 277–291 (2001).
Article MathSciNet Google Scholar
Conway, R. W., Maxwell, W. L.: A queuing model with state dependent service rates. J. Ind. Eng. 12, 132–136 (1962).
Google Scholar
Famoye, F.: Restricted generalized Poisson regression model. Commun. Stat. Theory Methods. 22(5), 1335–1354 (1993).
Article MathSciNet Google Scholar
Hilbe, J. M.: Modeling Count Data. Cambridge University Press, New York, NY (2014).
Book Google Scholar
Joe, H.: Time series models with univariate margins in the convolution-closed infinitely divisible class. J. Appl. Probab. 33(3), 664–677 (1996).
Article MathSciNet Google Scholar
Kadane, J. B.: Sums of possibly associated Bernoulli variables: The Conway-Maxwell-Binomial distribution. Bayesian Anal. 11(2), 403–420 (2016).
Article MathSciNet Google Scholar
McKenzie, E.: ARMA models for dependent sequences of Poisson counts. Adv. Appl. Probab. 20(4), 822–835 (1988).
Article MathSciNet Google Scholar
Sellers, K. F., Peng, S. J., Arab, A.: A flexible univariate autoregressive time-series model for dispersed count data. J. Time Ser. Anal. 41(3), 436–453 (2020). https://doi.org/10.1111/jtsa.12516.
Article Google Scholar
Sellers, K. F., Swift, A. W., Weems, K. S.: A flexible distribution class for count data. J. Stat. Distrib. Appl. 4(22), 1–21 (2017). https://doi.org/10.1186/s40488-017-0077-0.
MATH Google Scholar
Sellers, K. F., Young, D. S.: Zero-inflated sum of Conway-Maxwell-Poissons (ZISCMP) regression. J. Stat. Comput. Simul. 89(9), 1649–1673 (2019).
Article MathSciNet Google Scholar
Shmueli, G., Minka, T. P., Kadane, J. B., Borle, S., Boatwright, P.: A useful distribution for fitting discrete data: revival of the Conway-Maxwell-Poisson distribution. Appl. Stat. 54, 127–142 (2005).
MathSciNet MATH Google Scholar
Weiss, C. H.: Controlling correlated processes of Poisson counts. Qual. Reliab. Eng. Int. 23(6), 741–754 (2007).
Article Google Scholar
Weiss, C. H.: Thinning operations for modeling time series of counts–a survey. Adv. Stat. Anal. 92, 319–341 (2008).
Article MathSciNet Google Scholar
Weiss, C. H.: An Introduction to Discrete-Valued Time Series. John Wiley & Sons, Inc., Hoboken, NJ (2018).
Book Google Scholar
Weiss, C. H.: Stationary count time series models. Wiley Interdiscip. Rev. Comput. Stat. 13(1), 1502 (2021). https://doi.org/10.1002/wics.1502.
Article MathSciNet Google Scholar
Zhu, L., Sellers, K. F., Morris, D. S., Shmuéli, G.: Bridging the gap: A generalized stochastic process for count data. Am. Stat. 71(1), 71–80 (2017).
Article MathSciNet Google Scholar

Download references

Acknowledgements

This paper is released to inform interested parties of research and to encourage discussion. The views expressed are those of the authors and not necessarily those of the U.S. Census Bureau. SM and FC thank the Georgetown Undergraduate Research Opportunities Program (GUROP) for their support. All authors thank Dr. Christian Weiss for use of the IP dataset, and the reviewers for their feedback and comments.

Funding

SM was funded in part by the GUROP.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Georgetown University, Washington, DC, USA
Kimberly F. Sellers, Ali Arab, Sean Melville & Fanyu Cui
Center for Statistical Research and Methodology Division, U.S. Census Bureau, Washington, DC, USA
Kimberly F. Sellers

Authors

Kimberly F. Sellers
View author publications
You can also search for this author in PubMed Google Scholar
Ali Arab
View author publications
You can also search for this author in PubMed Google Scholar
Sean Melville
View author publications
You can also search for this author in PubMed Google Scholar
Fanyu Cui
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

KFS developed the research idea. All authors contributed towards the literature review, theoretical developments, and statistical computing. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Kimberly F. Sellers.

Ethics declarations

Competing interests

No authors have competing interests relating to this work.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sellers, K.F., Arab, A., Melville, S. et al. A flexible univariate moving average time-series model for dispersed count data. J Stat Distrib App 8, 1 (2021). https://doi.org/10.1186/s40488-021-00115-2

Download citation

Received: 22 April 2020
Accepted: 26 January 2021
Published: 21 February 2021
DOI: https://doi.org/10.1186/s40488-021-00115-2

A flexible univariate moving average time-series model for dispersed count data

Abstract

Introduction

Motivating distributions

The Conway-Maxwell-Poisson distribution and its generalization

The Conway-Maxwell-Binomial distribution and its generalization

First-order sCMP time series models

First-order sCMP autoregressive (SCMPAR(1)) model

Introducing the sCMPMA(1) model

Data examples

Simulated data examples

Real data example: IP address counts

Discussion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords