# A flexible univariate moving average time-series model for dispersed count data

## Abstract

Al-Osh and Alzaid (1988) consider a Poisson moving average (PMA) model to describe the relation among integer-valued time series data; this model, however, is constrained by the underlying equi-dispersion assumption for count data (i.e., that the variance and the mean equal). This work instead introduces a flexible integer-valued moving average model for count data that contain over- or under-dispersion via the Conway-Maxwell-Poisson (CMP) distribution and related distributions. This first-order sum-of-Conway-Maxwell-Poissons moving average (SCMPMA(1)) model offers a generalizable construct that includes the PMA (among others) as a special case. We highlight the SCMPMA model properties and illustrate its flexibility via simulated data examples.

## Introduction

Integer-valued thinning-based models have been proposed to model time series data represented as counts. Al-Osh and Alzaid (1988) introduce a generally defined integer-valued moving average (INMA) process as an analog to the moving average (MA) model for continuous data which assumes an underlying Gaussian distribution. This INMA process instead utilizes a thinning operator that maintains an integer-valued range of possible outcomes. To form such a model, they consider the “survivals” of independent and identically distributed (iid) non-negative integer valued random innovations to maintain and ensure discrete data outcomes (Weiss 2021). Al-Osh and Alzaid (1988) particularly consider a first-order Poisson moving average (PMA(1)), i.e. a stationary sequence Ut of the form Ut=γεt−1+εt where {εt} is a sequence of iid Poisson(η) random variables and $$(\gamma \circ \epsilon) = \sum _{i=1}^{\epsilon } B_{i}$$ for a sequence of iid Bernoulli(γ) random variables {Bi} independent of {ε}. By design, the PMA(1) is an INMA whose maximum stay time in the sequence is two time units. Consequently, components of Ut are dependent, while the components of εt and (γεt−1) are independent.

Given the PMA(1) structure,

$$\begin{array}{@{}rcl@{}} E(U_{t}) = Var(U_{t}) = (1 + \gamma)\eta, \end{array}$$
(1)

and the covariance of consecutive variables Cov(Ut−1,Ut)=γη; this implies that the correlation is

$$\begin{array}{@{}rcl@{}} \rho_{U}(r) = Corr(U_{t-r}, U_{t}) = \left\{ \begin{array}{lll} \frac{\gamma}{1+\gamma} & r=1 \\ 0 & r> 1. \end{array} \right. \end{array}$$
(2)

Meanwhile, the probability generating function (pgf) of Ut is $$\Phi _{U_{t}} (u) = e^{-\eta (1+\gamma)(1-u)}$$, the joint pgf of {U1,…,Ur} is $$\Phi _{r}(u_{1}, \ldots, u_{r}) = \exp \left (-\eta \left [r + \gamma - (1-\gamma)\sum _{i=1}^{r}u_{i} - \gamma (u_{1} + u_{r}) - \gamma \sum _{i=1}^{r-1}u_{i} u_{i+1}\right ]\right)$$ (which infers that time reversibility holds for the PMA), and the pgf of $$T_{U,r}= \sum _{i=1}^{r}U_{i}$$ is

$$\Phi_{T_{U,r}}(u)=\exp\left(-\eta\left[(1-\gamma)r+2\gamma\right](1-u)-\eta \gamma(r-1)(1-u^{2})\right).$$

Al-Osh and Alzaid (1988) note that TU,r does not have a Poisson distribution, which is in contrast to the standard MA(1) process. The conditional mean and variance of Ut+1 given Ut=u are both linear in Ut, namely

$$\begin{array}{@{}rcl@{}} E(U_{t+1} \mid U_{t}= u) &=& \eta + \gamma u/(1+\gamma), \text{ and} \end{array}$$
(3)
$$\begin{array}{@{}rcl@{}} Var(U_{t+1} \mid U_{t}= u) &=& \eta + \gamma u/(1+\gamma)^{2}. \end{array}$$
(4)

The PMA is a natural choice for modeling an integer-valued process, in part because of its tractability (Al-Osh and Alzaid 1988). This model, however, is limited by its constraining equi-dispersion property, i.e. the assumption that the mean and variance of the underlying process equal. Real data do not generally conform to this construct (Hilbe 2014; Weiss 2018); they usually display over-dispersion relative to the Poisson model (i.e. where the variance is greater than the mean), however integer-valued data are surfacing with greater frequency that express data under-dispersion relative to Poisson (i.e. the variance is less than the mean). Accordingly, it would be fruitful to instead consider a flexible time series model that can accommodate data over- and/or under-dispersion.

Alzaid and Al-Osh (1993) introduce a first-order generalized Poisson moving average (GPMA(1)) process as an alternative to the PMA. The associated model has the form,

$$\begin{array}{@{}rcl@{}} W_{t}=Q^{*}_{t}\left(\epsilon^{*}_{t-1}\right)+\epsilon^{*}_{t}, & t = 0, 1, 2, \ldots, \end{array}$$
(5)

where $$\left \{\epsilon ^{*}_{t}\right \}$$ is a sequence of iid generalized Poisson GP(μ,θ), and $$\{Q^{*}_{t}(\cdot)\}$$ is a sequence of quasi-binomial QB(p,θ/μ,·) random operators independent of $$\{\epsilon ^{*}_{t}\}$$. As with the PMA, Wt+r and Wt are independent for |r|>1. The marginal distribution of Wt is GP((1+p)μ,θ). Recognizing the relationship between moving average and autoregressive models, Alzaid and Al-Osh (1993) equate terms in this GPMA(1) model to their first-order generalized Poisson autoregressive (GPAR(1)) counterpart,

$$\begin{array}{@{}rcl@{}} W_{t}=Q_{t}(W_{t-1})+\epsilon_{t}, & t = 0, 1, 2, \ldots, \end{array}$$
(6)

where {εt} is a sequence of iid GP(qμ,θ) random variables where q=1−p, and {Qt(·)} is a sequence of QB(p,θ/μ,·) random operators, independent of {εt}; i.e., they let μ=(1+p)μ and $$p = \frac {p^{*}}{1+p^{*}}$$. The bivariate pgf of Wt+1 and Wt can thus be represented as

$$\begin{array}{@{}rcl@{}} \Phi_{W_{t+1}, W_{t}}(u_{1},u_{2}) &=& \exp\left[\mu^{*}\left(A_{\theta}(u_{1})+A_{\theta}(u_{2})-2\right)+\mu^{*}p^{*}\left(A_{\theta}(u_{1}u_{2})-1\right)\right] \end{array}$$
(7)
$$\begin{array}{@{}rcl@{}} &=& \exp\left[\mu q\left(A_{\theta}(u_{1})+A_{\theta}(u_{2})-2\right)+\mu p\left(A_{\theta}(u_{1}u_{2})-1\right)\right], \end{array}$$
(8)

where Aθ(s) is the inverse function that satisfies Aθ(seθ(s−1))=s; see Alzaid and Al-Osh (1993). This substitution in Eq. (7) to obtain Eq. (8) further illustrates the relationship between the GPMA(1) and GPAR(1) models such that they have the same joint pgf. Eq. (8) and the related GPAR work of Alzaid and Al-Osh (1993) therefore show that

$$\begin{array}{@{}rcl@{}} E(W_{t} \mid W_{t-1} =w)=pw+\frac{q\mu}{1-\theta}. \end{array}$$
(9)

The joint pgf of ($$W_{t}, W_{t-1}, \dots, W_{t-r+1}$$) is given by

$$\begin{array}{@{}rcl@{}} \Phi(u_{1}, \dots, u_{r})= \exp\left[\mu q\sum_{i=1}^{{r}}(A_{\theta}(u_{i})-1)+\mu p\sum_{i=1}^{{r}}\left(A_{\theta}(u_{i} u_{i+1})-1\right)\right]. \end{array}$$
(10)

From the joint pgf, we see that the GPMA(1) is also time-reversible, because it has the same dynamics if time is reversed. Further, the pgf associated with the total counts occurring during time lag r (i.e. $$T_{W,r}=\sum _{i=1}^{r}W_{t-r+i}$$) is $$\Phi _{T_{w,r}}(u)=\exp \left [\mu q r(A_{\theta }(u)-1) + \mu p(r-1)(A_{\theta }(u^{2})-1)\right ]$$. Alzaid and Al-Osh (1993) note that this result extends the analogous PMA result to the broader GPMA(1) model. Finally, the GPMA autocorrelation function is

$$\rho_{W}(r) = \text{Corr}(W_{t}, W_{t+r}) = \left\{ \begin{array}{ll} {p} & |r|=1 \\ 0 & |r|> 1, \end{array} \right.$$

where p=p(1+p)−1; by definition, ρW(r)[0,0.5] (Alzaid and Al-Osh 1993).

Even though the GPMA can be considered to model over- or under-dispersed count time series, it may not be a viable option for count data that express extreme under-dispersion; see, e.g. Famoye (1993). This work instead introduces another alternative for modeling integer-valued time series data. The subsequent writing proceeds as follows. We first provide background regarding the probability distributions that motivate the development of our flexible INMA model. Then, we introduce the SCMPMA(1) model to the reader and discuss its statistical properties. The subsequent section illustrates the model flexibility through simulated and real data examples. Finally, the manuscript concludes with discussion.

## Motivating distributions

While the above constructs show increased ability and improvement towards modeling integer-valued time series data with various forms of dispersion, each of the models suffers from respective limitations. In order to develop and describe our SCMPMA(1), we first introduce its underlying motivating distributions: the CMP distribution and its generalized sum-of-CMPs distribution (sCMP), as well as the Conway-Maxwell-Binomial (CMB) along with a generalized CMB (gCMB) distribution.

### The Conway-Maxwell-Poisson distribution and its generalization

The Conway-Maxwell-Poisson (CMP) distribution (introduced by Conway and Maxwell (1962), and revived by Shmueli et al. (2005)) is a viable count distribution that generalizes the Poisson distribution in light of potential data dispersion. The CMP probability mass function (pmf) takes the form

$$P(X=x \mid \lambda, \nu) = \frac{\lambda^{x}}{(x!)^{\nu} \zeta(\lambda,\nu)}, \;\;\; x=0,1,2,\ldots,$$
(11)

for a random variable X, where λ=E(Xν)≥0,ν≥0 is the associated dispersion parameter, and $$\zeta (\lambda, \nu) = \sum _{s=0}^{\infty } \frac {\lambda ^{s}}{(s!)^{\nu }}$$ is the normalizing constant. The CMP distribution includes three well-known distributions as special cases, namely the Poisson (ν=1), geometric (ν=0,λ<1), and Bernoulli $$\left (\nu \rightarrow \infty \text { with probability} \frac {\lambda }{1+\lambda } \right)$$ distributions.

The associated pgf of X is $$\Phi _{X}(u) = E(u^{X}) = \frac {\zeta (\lambda u, \nu)}{\zeta (\lambda, \nu)}$$, and its moment generating function (mgf) is $$\mathrm {M}_{X}(u) = E(e^{Xu}) = \frac {\zeta (\lambda e^{u}, \nu)}{\zeta (\lambda, \nu)}$$. The moments can meanwhile be represented recursively as

$$\begin{array}{@{}rcl@{}} E(X^{g+1}) = \left\{ \begin{array}{ll} \lambda [E(X+1)]^{1-\nu}, & g=0\\ \lambda \frac{\partial}{\partial\lambda} E(X^{g}) + E(X)\mathrm{E}(X^{g}), &g>0. \end{array} \right. \end{array}$$
(12)

In particular, the expected value and variance can be written in the form and approximated respectively as

$$\begin{array}{*{20}l} E(X) &= \frac{\partial \ln \zeta(\lambda, \nu)}{\partial \ln \lambda} \approx \lambda^{1/\nu} - \frac{\nu - 1}{2\nu}, \end{array}$$
(13)
$$\begin{array}{*{20}l} Var(X) &= \frac{\partial E(X)}{\partial\ln\lambda} \approx \frac{1}{\nu} \lambda^{1 / \nu}, \end{array}$$
(14)

where the approximations are especially good for ν≤1 or λ>10ν (Shmueli et al. 2005). This distribution is a member of the exponential family, where the joint pmf of the random sample x=(x1,…,xN) is

$$\begin{array}{@{}rcl@{}} P(\boldsymbol{x} \mid \lambda, \nu) &=& \frac{\prod_{i=1}^{N} \lambda^{x_{i}}}{\prod_{i=1}^{N} x_{i}!}\cdot \zeta^{-N}(\lambda, \nu) = \lambda^{S_{1}} \exp(-\nu S_{2})\zeta^{-N}(\lambda, \nu), \end{array}$$

where $$S_{1} = \sum _{i=1}^{N}x_{i}$$ and $$S_{2} = \sum _{i=1}^{N} \log (x_{i}!)$$ are joint sufficient statistics for λ and ν. Further, because the CMP distribution belongs to the exponential family, the conjugate prior distribution has the form, h(λ,ν)=λa−1eνbζc(λ,ν)δ(a,b,c), where λ>0,ν≥0, and δ(a,b,c) is a normalizing constant such that $$\delta ^{-1}(a,b,c) = \int _{0}^{\infty } \int _{0}^{\infty } \lambda ^{a-1} e^{-b \nu } \zeta ^{-c}(\lambda, \nu) d\lambda d\nu < \infty$$.

Meanwhile, letting $$X_{*} = \sum _{i=1}^{n} X_{i}$$ for iid random variables XiCMP(λ,ν),i=1,…,n, we say that X is distributed as a sum-of-CMPs [denoted sCMP(λ,ν,n)] variable, and has the pmf

$$\begin{array}{@{}rcl@{}} P(X_{*} = x_{*}) = \frac{\lambda^{x_{*}}}{({x_{*}}!)^{\nu} \zeta^{n}(\lambda, \nu)} \sum_{\stackrel {a_{1},\ldots,a_{n} = 0} {a_{1} + \ldots + a_{n} = {x_{*}}} }^{{x_{*}}} {{x_{*}} \choose a_{1}, \hspace{0.05in} \cdots, \hspace{0.05in} a_{n}}^{\nu},\ {x_{*}}=0,1,2,\ldots, \end{array}$$

where ζn(λ,ν) is the nth power of ζ(λ,ν), and $${{x_{*}} \choose a_{1}, \hspace {0.05in} \cdots, \hspace {0.05in} a_{n}} = \frac {{x_{*}}!}{a_{1}! \cdots a_{n}!}$$ is a multinomial coefficient. The sCMP(λ,ν,n) distribution encompasses the Poisson distribution with rate parameter nλ (for ν=1), negative binomial(n,1−λ) distribution (for ν=0 and λ<1), and Binomial(n,p) distribution $$\left (\text {as}\ \nu \rightarrow \infty \text { with success probability} p=\frac {\lambda }{\lambda + 1}\right)$$ as special cases. Further, for n=1, the sCMP(λ,ν,n=1) is simply the CMP(λ,ν) distribution.

The mgf and pgf for a sCMP(λ,ν,n) random variable X are

$$M_{X_{*}}(t) = \left(\frac{\zeta(\lambda e^{t}, \nu)}{\zeta(\lambda, \nu)} \right)^{n} \text{ and}\ \Phi_{X_{*}}(t) = \left(\frac{\zeta(\lambda t, \nu)}{\zeta(\lambda, \nu)} \right)^{n},$$

respectively; accordingly, the sCMP(λ,ν) has mean E(X)=nE(X) and variance V(X)=nV(X), where E(X) and V(X) are defined in Eqs. (13)-(14), respectively. Invariance under addition holds for two independent sCMP distributions with the same rate and dispersion parameters. See Sellers et al. (2017) for additional information regarding the sCMP distribution.

### The Conway-Maxwell-Binomial distribution and its generalization

The Conway-Maxwell-Binomial distribution of Kadane (2016) (also known as the Conway-Maxwell-Poisson-Binomial distribution by Borges et al. (2014)) is a three-parameter generalization of the Binomial distribution. Denoted as CMB(d,p,ν) distributed, its pmf is

$$\begin{array}{@{}rcl@{}} P(Y=y) = \frac{{d \choose y}^{\nu} p^{y}(1-p)^{d-y}}{\chi(p,\nu, d)}, \text{ } y = 0, \ldots, d \end{array}$$
(15)

for some random variable Y where $$0 \le p \le 1, \nu \in \mathbb {R}$$, and $$\chi (p, \nu, d) = \sum _{y=0}^{d} {d \choose y}^{\nu } p^{y}(1-p)^{d-y}$$ is the associated normalizing constant. The Binomial(d,p) distribution is the special case of the CMB(d,p,ν) where ν=1. Meanwhile, ν>(<)1 corresponds to under-dispersion (over-dispersion) relative to the Binomial distribution. For ν, the pmf is concentrated on the point dp while, for ν→−, the pmf is concentrated at 0 or d. For independent XiCMP(λi,ν),i=1,2, the conditional distribution of X1 given that X1+X2=d has a $$\text {CMB}\left (d, \frac {\lambda _{1}}{\lambda _{1}+\lambda _{2}}, \nu \right)$$ distribution.

The pgf and mgf of Y have the form,

$$\begin{array}{@{}rcl@{}} \Phi_{Y}(u) = E\left(u^{Y}\right) = \frac{\tau\left(\frac{up}{1-p}, \nu, d\right)}{\tau\left(\frac{p}{1-p}, \nu, d\right)} \hspace{.25in} \text{and} \hspace{.25in} M_{Y}(u) = \frac{\tau\left(\frac{pe^{u}}{1-p}, \nu, d\right)}{\tau\left(\frac{p}{1-p}, \nu, d\right)}, \end{array}$$
(16)

respectively, where $$\tau (\theta _{*}, \nu, d) = \sum _{y=0}^{d} {d \choose y}^{\nu } \theta _{*}^{y}$$ for some θ. The CMB distribution is a member of the exponential family whose joint pmf of the random sample y={y1,…,yN} is

$$\begin{array}{@{}rcl@{}} P(\boldsymbol{y} \mid p, \nu) & \propto & (1-p)^{dN} \prod_{i=1}^{N} \left(\frac{p}{1-p} \right)^{y_{i}} \frac{d!^{N\nu}}{[y_{i}!(d-y_{i})!]^{\nu}}\\ & \propto & \exp\left(S_{*1} \log \left(\frac{p}{1-p}\right) - \nu S_{*2} \right), \end{array}$$

where $$S_{*1} = \sum _{i=1}^{N}y_{i}$$ and $$S_{*2} = \sum _{i=1}^{N} \log [y_{i}! (d-y_{i})!]$$ are the joint sufficient statistics for p and ν. Further, its existence as a member of the exponential family implies that a conjugate prior family exists of the form,

$$h(\theta_{*}, \nu) = \theta_{*}^{a-1}e^{-\nu b}\omega^{-c}(\theta_{*}, \nu)\psi(a,b,c), \hspace{.25in} 0 < \theta_{*} < \infty, \text{} 0 < \nu < \infty,$$

where $$\omega (\theta _{*}, \nu) = \sum _{y=0}^{d} \theta _{*}^{y}/[y!(d-y)!]^{\nu }, \psi ^{-1}(a,b,c) = \int _{0}^{\infty } \int _{0}^{\infty } \theta _{*}^{a-1} e^{-\nu b} \omega ^{-c}(\theta _{*}, \nu)d\theta _{*} d\nu < \infty$$ (Kadane 2016).

Sellers et al. (2017) further introduce a generalized Conway-Maxwell-Binomial (gCMB) distribution whose pmf is

\begin{aligned} P(Z=z) \propto {s \choose z}^{\nu} p^{z} (1-p)^{s-z} {\left[{\sum_{\stackrel {a_{1},\ldots,a_{n_{1}} = 0} {a_{1} + \ldots + a_{n_{1}} = z} }^{z} {z \choose {a_{1}, \hspace{0.05in} \dots, \hspace{0.05in} a_{n_{1}}} }^{\nu}} \right] } {\left[{\sum_{\stackrel {b_{1}, \ldots, b_{n_{2}} = 0} {b_{1} + \ldots + b_{n_{2}} = s-z} }^{s-z} {s-z \choose {b_{1}, \hspace{0.05in} \dots, \hspace{0.05in} b_{n_{2}}} }^{\nu}} \right] } \end{aligned}
(17)

for a random variable Z with parameters (p,ν,s,n1,n2). As with the conditional probability of a CMP random variable given the sum of it and another independent CMP random variable sharing the same dispersion parameter, a special case of a gCMB distribution can be derived as the conditional distribution of X1, given the sum X1+X2=d for independent sCMP random variables, Xi sCMP (λi,ν,ni),i=1,2; the resulting distribution is analogously a gCMB$$\left (\frac {\lambda _{1}}{\lambda _{1}+\lambda _{2}},\nu, d, n_{1}, n_{2}\right)$$ distribution. The gCMB distribution contains several special cases, including the CMB (d,p,ν) distribution (for n1=n2=1); the Binomial (d,p) distribution (when n1=n2=1 and ν=1); and, for λ1=λ2=λ, the hypergeometric distribution when ν and the negative hypergeometric distribution when ν=0 and λ<1.

## First-order sCMP time series models

This section highlights two first-order models for discrete time series data that have a sCMP marginal distribution, namely the first-order sCMP autoregressive (SCMPAR(1)) model, and a first-order SCMP moving average (SCMPMA(1)) model with the same marginal distribution structure.

### First-order sCMP autoregressive (SCMPAR(1)) model

Sellers et al. (2020) introduce a first-order sCMP autoregressive (SCMPAR(1)) model to describe count data correlated in time that express over- or under-dispersion. Based on the sCMP and gCMB distributions, respectively (as described in the “Motivating distributions” section with more detail available in Sellers et al. (2017)), we use the sCMP distribution to model the marginals of the first-order integer-valued autoregressive (INAR(1)) process as

$$\begin{array}{@{}rcl@{}} X_{t}=C_{t}(X_{t-1})+\epsilon_{t} \hspace{.3in} t=1, 2, \ldots, \end{array}$$
(18)

where εtsCMP(λ,ν,n2), and {Ct(∙): t=1,2,…} is a sequence of independent gCMB$$\left (\frac {1}{2}, \nu, \bullet, n_{1}, n_{2}\right)$$ operators, independent of {εt}. This flexible INAR(1) model contains the first-order Poisson autoregressive (PAR(1)) as described in several references (Al-Osh and Alzaid 1987; McKenzie 1988; Weiss 2008), and the first-order binomial auto-regressive model of Al-Osh and Alzaid (1991) as special cases. It likewise contains an INAR(1) model that allows for negative binomial marginals with a thinning operator whose pmf is negative hypergeometric.

The SCMPAR(1) model is yet another special case of the infinitely divisible convolution-closed class of first-order autoregressive (AR(1)) models described in Joe (1996), and satisfies the Markov property with the transition probability,

\begin{aligned} P(X_{t}|X_{t-1}) &= \sum_{k=0}^{\min(x_{t}, x_{t-1})} \frac{ {x_{t-1} \choose k}^{\nu} \left[ \sum_{\stackrel {a_{1},\ldots,a_{n_{1}} = 0} {a_{1} + \ldots + a_{n_{1}} = k} }^{k} {k \choose {a_{1},\dots,a_{n_{1}}} }^{\nu} \right] \left[ \sum_{\stackrel {b_{1},\ldots,b_{n_{2}} = 0} {b_{1} + \ldots + b_{n_{2}} = x_{t-1}-k} }^{x_{t-1}-k} {x_{t-1}-k \choose {b_{1},\dots,b_{n_{2}}} }^{\nu} \right] }{\sum_{\stackrel {c_{1},\ldots,c_{{n_{1}}+{n_{2}}} = 0} {c_{1} + \ldots + c_{{n_{1}}+{n_{2}}} = x_{t-1}} }^{x_{t-1}} {x_{t-1} \choose c_{1} \ldots c_{{n_{1}}+{n_{2}}}}^{\nu}} \\ & \times \frac{{\lambda}^{{x_{t}}-k}}{[(x_{t}-k)!]^{\nu} Z^{n_{2}}({\lambda}, \nu)} {\sum_{\stackrel {d_{1}, \ldots, d_{n_{2}} = 0} {d_{1} + \ldots + d_{n_{2}} = x_{t-1}-k} }^{x_{t}-k} {x_{t} - k \choose {d_{1}, \dots, d_{n_{2}}} }^{\nu} }. \end{aligned}
(19)

The SCMPAR(1) model has an ergodic Markov chain, thus Xt has a stationary sCMP (λ,ν,n1+n2) distribution that is unique. The joint pgf associated with the SCMPAR(1) model is

$$\begin{array}{@{}rcl@{}} {}\phi_{X_{t+1},X_{t}} (u,l) = \frac{\zeta^{n_{2}}(\lambda u, \nu)}{\zeta^{n_{2}}(\lambda, \nu)} \frac{\zeta^{n_{1}}(\lambda ul, \nu)}{\zeta^{n_{1}}(\lambda, \nu)} \frac{\zeta^{n_{2}}(\lambda l, \nu)}{\zeta^{n_{2}}(\lambda, \nu)} = \frac{\left(\zeta(\lambda u, \nu)\zeta(\lambda l, \nu)\right)^{n_{2}}\zeta^{n_{1}}(\lambda ul, \nu)}{\zeta^{n_{1} + 2{n_{2}}}(\lambda, \nu)}, \end{array}$$
(20)

where the pgf is symmetric in u and l, and hence the joint distribution of Xt+1 and Xt is time reversible. The regression form for the SCMPAR(1) process can be determined, and the general autocorrelation function for the process {Xt} is $$\rho _{r} = \text {Corr}(X_{t}, X_{t-r}) = \left (\frac {n_{1}}{n_{1}+n_{2}}\right)^{r}$$ for r=0,1,2,…. Parameter estimation can be conducted via conditional maximum likelihood with statistical computation tools (e.g. in R); see Sellers et al. (2020) for details.

### Introducing the sCMPMA(1) model

Motivated by the SCMPAR(1) model of Sellers et al. (2020), we introduce a first-order sum-of-CMPs moving average (SCMPMA(1)) process Xt by

$$\begin{array}{@{}rcl@{}} X_{t} = C^{*}_{t}\left(\epsilon^{*}_{t-1}\right) + \epsilon^{*}_{t}, \hspace{.3in} t=1, 2, \ldots, \end{array}$$
(21)

where $${\epsilon ^{*}_{t}}$$ is a sequence of iid sCMP (λ,ν,m1+m2) random variables and $$C^{*}_{t}(\bullet)$$ is a sequence of independent gCMB (1/2,ν,∙,m1,m2) operators independent of $${\epsilon ^{*}_{t}}$$. By definition, Xt is a stationary process with the sCMP (λ,ν,2m1+m2) distribution, and Xt+r and Xt are independent for |r|>1. While this model can analogously be viewed as a special case of the infinitely divisible convolution-closed class of discrete MA models (Joe 1996), unlike the sCMPAR(1) process, the sCMPMA(1) process is not Markovian.

The autocorrelation between Xt and Xt+1 is

$$\begin{array}{@{}rcl@{}} \rho_{1} = \text{Corr}(X_{t}, X_{t+1}) &=& \frac{\text{Cov}\left(C^{*}_{t}\left(\epsilon^{*}_{t-1}\right) + \epsilon^{*}_{t}, C^{*}_{t+1}\left(\epsilon^{*}_{t}\right) + \epsilon^{*}_{t+1}\right)}{\sqrt{\text{Var}(X_{t})\text{Var}(X_{t+1})}}\\ &=& \frac{\text{Cov}\left(\epsilon^{*}_{t}, C^{*}_{t+1}\left(\epsilon^{*}_{t}\right)\right)}{\sqrt{\text{Var}(X_{t})\text{Var}(X_{t+1})}} \text{by the independence assumptions}, \end{array}$$

where $$C^{*}_{t+1}(\epsilon ^{*}_{t}) = \sum _{i=1}^{m_{1}} Y_{i}$$ and $$\epsilon ^{*}_{t} = \sum _{i=1}^{m_{1}+m_{2}} Y_{i}$$, respectively, are sCMP (λ,ν,m1) and sCMP (λ,ν,m1+m2) random variables; i.e. each sCMP random variable can be viewed as respective sums of iid CMP (λ,ν) random variables, Yi. Thus,

$$\begin{array}{@{}rcl@{}} \text{Cov}\left(\epsilon^{*}_{t}, C^{*}_{t+1}\left(\epsilon^{*}_{t}\right)\right) = \text{Cov}\left(\sum_{i=1}^{m_{1}+m_{2}} Y_{i}, \sum_{i=1}^{m_{1}} Y_{i}\right) = \text{Var}\left(\sum_{i=1}^{m_{1}} Y_{i}\right) = m_{1}\text{Var}(Y), \end{array}$$

where, without loss of generality, we let Y denote any of the iid Yi random variables. Meanwhile, because {Xt} is a sCMP (λ,ν,2m1+m2) distributed stationary process, we can likewise represent $$\text {Var}(X_{t}) = \text {Var}\left (\sum _{i=1}^{2m_{1}+m_{2}} Y_{i}\right) = \sum _{i=1}^{2m_{1}+m_{2}} \text {Var}(Y_{i}) = (2m_{1} + m_{2})\text {Var}(Y)$$ for all t. We therefore find that

$$\begin{array}{@{}rcl@{}} \rho_{1} = \text{Corr}\left(X_{t}, X_{t+1}\right) = \frac{\text{Cov}\left(\epsilon^{*}_{t}, C^{*}_{t+1}\left(\epsilon^{*}_{t}\right)\right)}{\sqrt{\text{Var}\left(X_{t}\right)\text{Var}(X_{t+1})}} = \frac{m_{1}\text{Var}(Y)}{(2m_{1}+m_{2})\text{Var}(Y)} = \frac{m_{1}}{2m_{1}+m_{2}}. \end{array}$$
(22)

Because m1,m2≥1, the one-step range of possible correlation values is 0≤ρ1≤0.5. In particular, for m1=m2, we have the special case where ρ1=1/3. Meanwhile, ρk=0 for all k>1 because, by definition of the SCMPMA(1) model assumptions, there is no dependent structure between Xt and Xt+r for r>1.

Recall from the “The Conway-Maxwell-Poisson distribution and its generalization” section that $$\Phi _{G}(w) \,=\, \left (\frac {\zeta (\lambda w, \nu)}{\zeta (\lambda, \nu)} \right)^{\pi }$$ is the pgf for a sCMP(λ,ν,π) distributed random variable, (say) G. Using this knowledge along with Eq. (21), the joint pgf can be derived as

$$\begin{array}{@{}rcl@{}} \phi_{X_{t+1},X_{t}} (u,l) &=& E\left(u^{X_{t+1}} l^{X_{t}}\right)\\ &=&E\left(u^{C^{*}\left(\epsilon^{*}_{t}\right) + \epsilon^{*}_{t+1}} l^{X_{t}}\right)\\ &=&E\left(u^{C^{*}\left(\epsilon^{*}_{t}\right) + \epsilon^{*}_{t+1}} l^{X_{t} - C^{*}({\epsilon^{*}_{t})}} l^{C^{*}(\epsilon^{*}_{t})}\right)\\ &=& E\left((u l)^{C^{*}\left(\epsilon^{*}_{t}\right)} u^{\epsilon^{*}_{t+1}} l^{X_{t} - C^{*}\left(\epsilon_{t}\right)} \right) \text{where}\ X_{t} - C^{*}\left(\epsilon^{*}_{t}\right) \stackrel{d}{=} \epsilon^{*}_{t}\\ &=&E\left((u l)^{C^{*}\left(\epsilon^{*}_{t}\right)}\right) E\left(u^{\epsilon^{*}_{t+1}}\right)E\left(l^{\epsilon^{*}_{t}}\right) \text{ by independence}\\ &=& \phi_{C^{*}\left(\epsilon^{*}_{t}\right)} (u l) \phi_{\epsilon^{*}_{t+1}}(u) \phi_{\epsilon^{*}_{t}} (l) \\ &=& \left(\frac{\zeta(\lambda u l, \nu)}{\zeta(\lambda, \nu)} \right)^{m_{1}} \left(\frac{\zeta(\lambda u, \nu)}{\zeta(\lambda, \nu)} \right)^{m_{1} + m_{2}} \left(\frac{\zeta(\lambda l, \nu)}{\zeta(\lambda, \nu)} \right)^{m_{1} + m_{2}}\\ &=& \frac{\left(\zeta(\lambda u, \nu)\zeta(\lambda l, \nu)\right)^{m_{1}+m_{2}}\left(\zeta(\lambda u l, \nu)\right)^{m_{1}}}{\left(\zeta(\lambda, \nu)\right)^{3m_{1} + 2m_{2}}}, \end{array}$$
(23)

where Eq. (23) is equivalent to Eq. (20) (i.e. the SCMPMA(1) process is comparable to the SCMPAR(1) process) when m1=n1=n2m2. Given this comparison, we can easily determine the conditional mean E(Xt+1Xt=x) and conditional variance Var(Xt+1Xt=x). Eq. (23) further demonstrates that the SCMPMA(1) model is time-reversible.

Parameter estimation via maximum likelihood (ML) is a difficult task with INMA models given the complex form of the underlying distributions. Even a conditional least squares approach does not appear to be feasible “because of the thinning operators, unless randomization is used” (Brännäs and Hall 2001). We therefore instead consider the following ad hoc procedure for parameter estimation. Given a data set with an observed correlation ρ1, we first propose values for $$m_{1}, m_{2} \in \mathbb {N}$$ that satisfy the constraint, $$\rho _{1} \approx \frac {m_{1}}{2m_{1} + m_{2}}$$. Given m1 and m2 and recognizing that Xt is stationary with a sCMP(λ,ν,2m1+m2) distribution, we proceed with ML estimation to determine $$\hat {\lambda }$$ and $$\hat {\nu }$$ as described in Zhu et al. (2017) for conducting sCMP(λ,ν,s=2m1+m2) parameter estimation with regard to a CMP process over an interval of length s≥1. The corresponding variation for $$\hat {\lambda }$$ and $$\hat {\nu }$$ can be quantified via the Fisher information matrix or nonparametric bootstrapping. While the sampling distribution for $$\hat {\lambda }$$ is approximately symmetric, the sampling distribution for $$\hat {\nu }$$ is considerably right-skewed, hence analysts are advised to quantify estimator variation via nonparametric bootstrapping. While this is a means to an end, it only achieves in determining an appropriate distributional form regarding the data; it does not fully address the nature of the time series.

## Data examples

To illustrate the flexibility of our INMA model, we consider various data simulations and a real data example. Below contains the respective details and associated commentary.

### Simulated data examples

Table 1 reports the estimated mean, variance, and autocorrelation that result from various data simulations of SCMPMA(1) data given parameters (λ,ν,m1,m2). In all examples, we let λ=0.5,m1,m2{1,2}, and ν={0,0.5,1,2,35}, where ν=0 captures the case of extreme over-dispersion, ν=1 denotes equi-dispersion, and ν=35 sufficiently illustrates the case computationally of utmost under-dispersion where ν.

For all examples, we find that the associated mean and variance compare with each other as expected, i.e. the variance is greater than the mean when ν<1 (i.e. the data are over-dispersed), the variance and mean are approximately equal when ν=1 (i.e. equi-dispersion holds), and the variance is less than the mean (i.e. the data are under-dispersed) when ν>1. In particular, we can easily verify that the three special case models perform as expected. For the Poisson cases (ν=1), we expect the mean and variance to both equal (2m1+m2)λ, while the binomial cases (i.e. ν and $$p=\frac {\lambda }{\lambda +1}$$) produce a mean equal to $$(2m_{1} +m_{2})\frac {\lambda }{\lambda +1}$$ and variance equaling $$(2m_{1}+m_{2}) \frac {\lambda }{\lambda +1} \left (1-\frac {\lambda }{\lambda +1}\right)$$, and the negative binomial cases (ν=0 with p=1−λ) have a mean of $$\frac {(2m_{1} +m_{2})\lambda }{1-\lambda }$$ and variance equaling $$\frac {(2m_{1} +m_{2})\lambda }{(1-\lambda)^{2}}$$. In fact, even with the ν case approximated by letting ν=35, we still obtain reasonable estimates for the mean and variance for all of the associated cases of m1 and m2.

For each {m1,m2} pair, the mean and variance both decrease as ν increases while, for all of the considered examples, we obtain estimated correlation values $$\hat {\rho }$$ that approximately equal the true correlation, ρ. In particular, for those cases where m1=m2, we obtain $$\hat {\rho } \approx 1/3$$ as expected (see Eq. (22)).

### Real data example: IP address counts

Weiss (2007) considers a modified dataset regarding the number of unique IP-addresses which access the University of Wurzburg Department of Statistics’s webpages in 240 two-minute intervals. Collected on November 29, 2005 (from 10:00:00 to 18:00:00), these data have an associated mean and variance equaling 1.286 and 1.205, respectively. Weiss (2007) considers a PAR(1) model, noting that “the empirical partial autocorrelation function indicates that a first order [autoregressive] model may be an appropriate choice” with $$\hat {\rho }_{1}=0.292$$; Sellers et al. (2020), following suit, consider a SCMPAR(1) model as a flexible alternative to the PAR(1) model. The ACF and PACF plots of these data, however, do not clearly distinguish between considering a first-order autoregressive or a moving average model; see Fig. 1a-b. Further, recognizing that the data express apparent under- to equi-dispersion, we therefore consider the SCMPMA(1) as an illustrative model for analysis.

We perform ML estimation assuming various combinations for (m1,m2) (i.e. {(1,1), (1,2), (2,2)}) as these values contain the observed correlation, $$0.25 = \frac {1}{4} < \hat {\rho }_{1} < \frac {1}{3} \approx 0.33$$. Table 2 contains the resulting parameter estimates for λ and ν, along with the respective Akaike Information Criterion (AIC). While the SCMPMA(1) model with m1=m2=2 has the lowest AIC among the four models considered, all of these models produce approximately equal AIC values (i.e. 695.2) where the increasing m1 and m2 values associate with decreasing $$\hat {\lambda }$$ and increasing $$\hat {\nu }$$. This makes sense because the resulting estimates rely solely on the assumed underlying sCMP (λ,ν,2m1+m2) distributional form for the data.

The dispersion estimates in Table 2 are all greater than 1, thus implying a perceived level of data under-dispersion. These results naturally stem from the reported mean of the data (1.286) being greater than its corresponding variance (1.205). Their associated 95% confidence intervals (determined via nonparametric bootstrapping; also supplied in Table 2), however, are sufficiently large such that they contain ν=1. This suggests that the apparent data under-dispersion is not statistically significant, thus instead suggesting that the data can be analyzed via the Al-Osh and Alzaid (1988) PMA(1) model. It is further striking to see that the respective 95% confidence intervals associated with the dispersion parameter increase with the size of the underlying sCMP(2m1+m2) model. This is an artifact of the (s)CMP distribution, namely that the distribution of ν is a right-skewed distribution (as discussed in Zhu et al. (2017)). This approach confirms interest in the PMA(1) model where Eqs. (1)-(2) imply that associated estimated parameters are $$\hat {\gamma } \approx 0.4124$$ and $$\hat {\eta } \approx 0.9105$$. Thus, we benefit from the SCMPMA(1) as a tool for parsimonious model determination.

## Discussion

This work utilizes the sCMP distribution of Sellers et al. (2017) to develop a SCMPMA(1) model that serves as a flexible moving average time series model for discrete data where data dispersion is present. The SCMPMA(1) model captures the PMA(1), as well as versions of a negative binomial and binomial MA(1) structure, respectively, as special cases. This along with the flexible SCMPAR(1) can be used further to derive broader auto-regressive moving average (ARMA) and auto-regressive integrated moving average (ARIMA) models based on the sCMP distribution.

The SCMPMA(1) shares many properties with the analogous SCMPAR(1) model by Sellers et al. (2020). The presented models rely on predefining discrete values (i.e. m1,m2 for the SCMPMA(1)) for parameter estimation. As done in Sellers et al. (2017) and Sellers and Young (2019), we utilize a profile likelihood approach where, given m1 and m2, we estimate the remaining model coefficients and then identify that collection of parameter estimates that produces the largest likelihood, thus identifying these parameter estimates as the MLEs. While this profile likelihood approach is acceptable as demonstrated in other applications, directly estimating m1,m2 along with the other SCMPMA(1) model estimates would likewise prove beneficial, as would redefining the model to allow for real-valued estimators for m1 and m2. These generalizations and estimation approaches can be explored in future work.

Simulated data examples illustrate that the SCMPMA(1) model can obtain unbiased estimates, and the model demonstrates potential for accurate forecasts given data containing any measure of data dispersion. The real data illustration, however, highlights the complexities that come with parameter estimation. While we nonetheless present a means towards achieving this goal, this approach does not perform but so strongly with regard to prediction and forecasting. It nonetheless serves as a starting point for parameter estimation that we will continue to investigate in future work. Moreover, the flexibility of the SCMPMA(1) aids in determining a parsimonious model form as appropriate.

## Availability of data and materials

Simulated data can vary given the generation process. Simulation code(s) can be supplied upon request. IP data set obtained from Dr. Christian Weiss of Helmut Schmidt University.

## Abbreviations

AR(1):

First-order autoregressive

ARIMA:

Auto-regressive integrated moving average

ARMA:

Auto-regressive moving average

CMB:

Conway-Maxwell-Binomial

CMP:

Conway-Maxwell-Poisson

gCMB:

Generalized Conway-Maxwell-Binomial

GPAR(1):

First-order generalized Poisson autoregressive

GPMA(1):

First-order generalized Poisson moving average

INAR(1):

First-order integer-valued autoregressive

INMA:

Integer-valued moving average

MA:

Moving average

mgf:

Moment generating function

PAR(1):

First-order Poisson autoregressive

pgf:

Probability generating function

PMA:

Poisson moving average

PMA(1):

First-order Poisson moving average

QB:

Quasi-binomial

sCMP:

Sum-of-Conway-Maxwell-Poisson

SCMPAR(1):

First-order sum-of-Conway-Maxwell-Poisson autoregressive

SCMPMA(1):

First-order sum-of-Conway-Maxwell-Poissons moving average

## References

1. Al-Osh, M. A., Alzaid, A. A.: First-order integer valued autoregressive (INAR(1)) process. J. Time Ser. Anal. 8(3), 261–275 (1987).

2. Al-Osh, M. A., Alzaid, A. A.: Integer-valued moving average (INMA) process. Stat. Pap. 29(1), 281–300 (1988).

3. Al-Osh, M. A., Alzaid, A. A.: Binomial autoregressive moving average models. Commun. Stat. Stoch. Model. 7(2), 261–282 (1991).

4. Alzaid, A. A., Al-Osh, M. A.: Some autoregressive moving average processes with generalized Poisson marginal distributions. Ann. Inst. Stat. Math. 45(2), 223–232 (1993).

5. Borges, P., Rodrigues, J., Balakrishnan, N., Bazán, J.: A COM-Poisson type generalization of the binomial distribution and its properties and applications. Stat. Probab. Lett. 87, 158–166 (2014).

6. Brännäs, K., Hall, A.: Estimation in integer-valued moving average models. Appl. Stoch. Model. Bus. Ind. 17, 277–291 (2001).

7. Conway, R. W., Maxwell, W. L.: A queuing model with state dependent service rates. J. Ind. Eng. 12, 132–136 (1962).

8. Famoye, F.: Restricted generalized Poisson regression model. Commun. Stat. Theory Methods. 22(5), 1335–1354 (1993).

9. Hilbe, J. M.: Modeling Count Data. Cambridge University Press, New York, NY (2014).

10. Joe, H.: Time series models with univariate margins in the convolution-closed infinitely divisible class. J. Appl. Probab. 33(3), 664–677 (1996).

11. Kadane, J. B.: Sums of possibly associated Bernoulli variables: The Conway-Maxwell-Binomial distribution. Bayesian Anal. 11(2), 403–420 (2016).

12. McKenzie, E.: ARMA models for dependent sequences of Poisson counts. Adv. Appl. Probab. 20(4), 822–835 (1988).

13. Sellers, K. F., Peng, S. J., Arab, A.: A flexible univariate autoregressive time-series model for dispersed count data. J. Time Ser. Anal. 41(3), 436–453 (2020). https://doi.org/10.1111/jtsa.12516.

14. Sellers, K. F., Swift, A. W., Weems, K. S.: A flexible distribution class for count data. J. Stat. Distrib. Appl. 4(22), 1–21 (2017). https://doi.org/10.1186/s40488-017-0077-0.

15. Sellers, K. F., Young, D. S.: Zero-inflated sum of Conway-Maxwell-Poissons (ZISCMP) regression. J. Stat. Comput. Simul. 89(9), 1649–1673 (2019).

16. Shmueli, G., Minka, T. P., Kadane, J. B., Borle, S., Boatwright, P.: A useful distribution for fitting discrete data: revival of the Conway-Maxwell-Poisson distribution. Appl. Stat. 54, 127–142 (2005).

17. Weiss, C. H.: Controlling correlated processes of Poisson counts. Qual. Reliab. Eng. Int. 23(6), 741–754 (2007).

18. Weiss, C. H.: Thinning operations for modeling time series of counts–a survey. Adv. Stat. Anal. 92, 319–341 (2008).

19. Weiss, C. H.: An Introduction to Discrete-Valued Time Series. John Wiley & Sons, Inc., Hoboken, NJ (2018).

20. Weiss, C. H.: Stationary count time series models. Wiley Interdiscip. Rev. Comput. Stat. 13(1), 1502 (2021). https://doi.org/10.1002/wics.1502.

21. Zhu, L., Sellers, K. F., Morris, D. S., Shmuéli, G.: Bridging the gap: A generalized stochastic process for count data. Am. Stat. 71(1), 71–80 (2017).

## Acknowledgements

This paper is released to inform interested parties of research and to encourage discussion. The views expressed are those of the authors and not necessarily those of the U.S. Census Bureau. SM and FC thank the Georgetown Undergraduate Research Opportunities Program (GUROP) for their support. All authors thank Dr. Christian Weiss for use of the IP dataset, and the reviewers for their feedback and comments.

## Funding

SM was funded in part by the GUROP.

## Author information

Authors

### Contributions

KFS developed the research idea. All authors contributed towards the literature review, theoretical developments, and statistical computing. The author(s) read and approved the final manuscript.

### Corresponding author

Correspondence to Kimberly F. Sellers.

## Ethics declarations

### Competing interests

No authors have competing interests relating to this work. 