A flexible univariate moving average time-series model for dispersed count data

Al-Osh and Alzaid (1988) consider a Poisson moving average (PMA) model to describe the relation among integer-valued time series data; this model, however, is constrained by the underlying equi-dispersion assumption for count data (i.e., that the variance and the mean equal). This work instead introduces a flexible integer-valued moving average model for count data that contain overor under-dispersion via the Conway-Maxwell-Poisson (CMP) distribution and related distributions. This first-order sum-of-Conway-Maxwell-Poissons moving average (SCMPMA(1)) model offers a generalizable construct that includes the PMA (among others) as a special case. We highlight the SCMPMA model properties and illustrate its flexibility via simulated data examples.


Introduction
Integer-valued thinning-based models have been proposed to model time series data represented as counts. Al-Osh and Alzaid (1988) introduce a generally defined integer-valued moving average (INMA) process as an analog to the moving average (MA) model for continuous data which assumes an underlying Gaussian distribution. This INMA process instead utilizes a thinning operator that maintains an integer-valued range of possible outcomes. To form such a model, they consider the "survivals" of independent and identically distributed (iid) non-negative integer valued random innovations to maintain and ensure discrete data outcomes (Weiss 2021). Al-Osh and Alzaid (1988) particularly consider a first-order Poisson moving average (PMA(1)), i.e. a stationary sequence U t of the form U t = γ • t−1 + t where { t } is a sequence of iid Poisson(η) random variables and (γ • ) = i=1 B i for a sequence of iid Bernoulli(γ ) random variables {B i } independent of { }. By design, the PMA(1) is an INMA whose maximum stay time in the sequence is two time units. Consequently, components of U t are dependent, while the components of Sellers et al. Journal of Statistical Distributions and Applications (2021) 8:1 Page 2 of 12 and the covariance of consecutive variables Cov(U t−1 , U t ) = γ η; this implies that the correlation is Meanwhile, the probability generating function (pgf ) of U t is U t (u) = e −η(1+γ ) (1−u) , the joint pgf of {U 1 , . . . , U r } is r (u 1 , . . . , (which infers that time reversibility holds for the PMA), and the pgf of Al-Osh and Alzaid (1988) note that T U,r does not have a Poisson distribution, which is in contrast to the standard MA(1) process. The conditional mean and variance of U t+1 given U t = u are both linear in U t , namely The PMA is a natural choice for modeling an integer-valued process, in part because of its tractability (Al-Osh and Alzaid 1988). This model, however, is limited by its constraining equi-dispersion property, i.e. the assumption that the mean and variance of the underlying process equal. Real data do not generally conform to this construct (Hilbe 2014;Weiss 2018); they usually display over-dispersion relative to the Poisson model (i.e. where the variance is greater than the mean), however integer-valued data are surfacing with greater frequency that express data under-dispersion relative to Poisson (i.e. the variance is less than the mean). Accordingly, it would be fruitful to instead consider a flexible time series model that can accommodate data over-and/or under-dispersion. Alzaid and Al-Osh (1993) introduce a first-order generalized Poisson moving average (GPMA(1)) process as an alternative to the PMA. The associated model has the form, where * t is a sequence of iid generalized Poisson GP(μ * , θ), and {Q * t (·)} is a sequence of quasi-binomial QB(p * , θ/μ * , ·) random operators independent of { * t }. As with the PMA, W t+r and W t are independent for |r| > 1. The marginal distribution of W t is GP((1 + p * )μ * , θ). Recognizing the relationship between moving average and autoregressive models, Alzaid and Al-Osh (1993) equate terms in this GPMA(1) model to their first-order generalized Poisson autoregressive (GPAR(1)) counterpart, where { t } is a sequence of iid GP(qμ, θ) random variables where q = 1 − p, and {Q t (·)} is a sequence of QB(p, θ/μ, ·) random operators, independent of { t }; i.e., they let μ = (1 + p * )μ * and p = p * 1+p * . The bivariate pgf of W t+1 and W t can thus be represented as where A θ (s) is the inverse function that satisfies A θ se −θ(s−1) = s; see Alzaid and Al-Osh (1993). This substitution in Eq. (7) to obtain Eq. (8) further illustrates the relationship Sellers et al. Journal of Statistical Distributions and Applications (2021)  between the GPMA(1) and GPAR(1) models such that they have the same joint pgf. Eq. (8) and the related GPAR work of Alzaid and Al-Osh (1993) therefore show that The joint pgf of (W t , W t−1 , . . . , W t−r+1 ) is given by From the joint pgf, we see that the GPMA(1) is also time-reversible, because it has the same dynamics if time is reversed. Further, the pgf associated with the total counts occurring during time lag r (i.e. Alzaid and Al-Osh (1993) note that this result extends the analogous PMA result to the broader GPMA(1) model. Finally, the GPMA autocorrelation function is (Alzaid and Al-Osh 1993). Even though the GPMA can be considered to model over-or under-dispersed count time series, it may not be a viable option for count data that express extreme underdispersion; see, e.g. Famoye (1993). This work instead introduces another alternative for modeling integer-valued time series data. The subsequent writing proceeds as follows. We first provide background regarding the probability distributions that motivate the development of our flexible INMA model. Then, we introduce the SCMPMA(1) model to the reader and discuss its statistical properties. The subsequent section illustrates the model flexibility through simulated and real data examples. Finally, the manuscript concludes with discussion.

Motivating distributions
While the above constructs show increased ability and improvement towards modeling integer-valued time series data with various forms of dispersion, each of the models suffers from respective limitations. In order to develop and describe our SCMPMA(1), we first introduce its underlying motivating distributions: the CMP distribution and its generalized sum-of-CMPs distribution (sCMP), as well as the Conway-Maxwell-Binomial (CMB) along with a generalized CMB (gCMB) distribution.

The Conway-Maxwell-Poisson distribution and its generalization
The Conway-Maxwell-Poisson (CMP) distribution (introduced by Conway and Maxwell (1962), and revived by Shmueli et al. (2005)) is a viable count distribution that generalizes the Poisson distribution in light of potential data dispersion. The CMP probability mass function (pmf ) takes the form for a random variable X, where λ = E(X ν ) ≥ 0, ν ≥ 0 is the associated dispersion parameter, and ζ(λ, ν) = ∞ s=0 λ s (s!) ν is the normalizing constant. The CMP distribution includes Sellers et al. Journal of Statistical Distributions and Applications (2021)  three well-known distributions as special cases, namely the Poisson (ν = 1), geometric (ν = 0, λ < 1), and Bernoulli ν → ∞ with probability λ 1+λ distributions. The associated pgf of X is X (u) = E(u X ) = ζ (λu,ν) ζ (λ,ν) , and its moment generating func- ζ (λ,ν) . The moments can meanwhile be represented recursively as In particular, the expected value and variance can be written in the form and approximated respectively as where the approximations are especially good for ν ≤ 1 or λ > 10 ν (Shmueli et al. 2005). This distribution is a member of the exponential family, where the joint pmf of the random sample are joint sufficient statistics for λ and ν. Further, because the CMP distribution belongs to the exponential family, the con- . . , n, we say that X * is distributed as a sum-of-CMPs [denoted sCMP(λ, ν, n)] variable, and has the pmf where ζ n (λ, ν) is the nth power of ζ(λ, ν), and x * a 1 , ··· , a n = x * ! a 1 !···a n ! is a multinomial coefficient. The sCMP(λ, ν, n) distribution encompasses the Poisson distribution with rate parameter nλ (for ν = 1), negative binomial(n, 1 − λ) distribution (for ν = 0 and λ < 1), and Binomial(n, p) distribution as ν → ∞ with success probabilityp = λ λ+1 as special cases. Further, for n = 1, the sCMP(λ, ν, n = 1) is simply the CMP(λ, ν) distribution.
The mgf and pgf for a sCMP(λ, ν, n) random variable X * are respectively; accordingly, the sCMP(λ, ν) has mean E(X * ) = nE(X) and variance V (X * ) = nV (X), where E(X) and V (X) are defined in Eqs. (13)- (14), respectively. Invariance under addition holds for two independent sCMP distributions with the same rate and dispersion parameters. See Sellers et al. (2017) for additional information regarding the sCMP distribution.

Sellers et al. Journal of Statistical Distributions and Applications
(2021) 8:1 Page 5 of 12

The Conway-Maxwell-Binomial distribution and its generalization
The Conway-Maxwell-Binomial distribution of Kadane (2016) (also known as the Conway-Maxwell-Poisson-Binomial distribution by Borges et al. (2014)) is a threeparameter generalization of the Binomial distribution. Denoted as CMB(d, p, ν) distributed, its pmf is For ν → ∞, the pmf is concentrated on the point dp while, for ν → −∞, the pmf is concentrated at 0 or d.
The CMB distribution is a member of the exponential family whose joint pmf of the random sample y = {y 1 , . . . , y N } is are the joint sufficient statistics for p and ν. Further, its existence as a member of the exponential family implies that a conjugate prior family exists of the form, Sellers et al. (2017) further introduce a generalized Conway-Maxwell-Binomial (gCMB) distribution whose pmf is for a random variable Z with parameters (p, ν, s, n 1 , n 2 ). As with the conditional probability of a CMP random variable given the sum of it and another independent CMP random variable sharing the same dispersion parameter, a special case of a gCMB distribution can be derived as the conditional distribution of X * 1 , given the sum X * 1 + X * 2 = d for Sellers et al. Journal of Statistical Distributions and Applications (2021) 8:1 Page 6 of 12 independent sCMP random variables, X * i ∼ sCMP(λ i , ν, n i ), i = 1, 2; the resulting distribution is analogously a gCMB λ 1 λ 1 +λ 2 , ν, d, n 1 , n 2 distribution. The gCMB distribution contains several special cases, including the CMB(d, p, ν) distribution (for n 1 = n 2 = 1); the Binomial(d, p) distribution (when n 1 = n 2 = 1 and ν = 1); and, for λ 1 = λ 2 = λ, the hypergeometric distribution when ν → ∞ and the negative hypergeometric distribution when ν = 0 and λ < 1.

First-order sCMP time series models
This section highlights two first-order models for discrete time series data that have a sCMP marginal distribution, namely the first-order sCMP autoregressive (SCMPAR(1)) model, and a first-order SCMP moving average (SCMPMA(1)) model with the same marginal distribution structure.
First-order sCMP autoregressive (SCMPAR(1)) model Sellers et al. (2020) introduce a first-order sCMP autoregressive (SCMPAR(1)) model to describe count data correlated in time that express over-or under-dispersion. Based on the sCMP and gCMB distributions, respectively (as described in the "Motivating distributions" section with more detail available in Sellers et al. (2017)), we use the sCMP distribution to model the marginals of the first-order integer-valued autoregressive (INAR(1)) process as where t ∼ sCMP(λ, ν, n 2 ), and {C t (•) : t = 1, 2, . . .} is a sequence of independent gCMB 1 2 , ν, •, n 1 , n 2 operators, independent of { t }. This flexible INAR(1) model contains the first-order Poisson autoregressive (PAR(1)) as described in several references (Al-Osh and Alzaid 1987;McKenzie 1988;Weiss 2008), and the first-order binomial autoregressive model of Al-Osh and Alzaid (1991) as special cases. It likewise contains an INAR(1) model that allows for negative binomial marginals with a thinning operator whose pmf is negative hypergeometric.

Introducing the sCMPMA(1) model
Motivated by the SCMPAR(1) model of Sellers et al. (2020), we introduce a first-order sum-of-CMPs moving average (SCMPMA(1)) process X t by where * t is a sequence of iid sCMP(λ, ν, m 1 + m 2 ) random variables and C * t (•) is a sequence of independent gCMB(1/2, ν, •, m 1 , m 2 ) operators independent of * t . By definition, X t is a stationary process with the sCMP(λ, ν, 2m 1 + m 2 ) distribution, and X t+r and X t are independent for |r| > 1. While this model can analogously be viewed as a special case of the infinitely divisible convolution-closed class of discrete MA models (Joe 1996), unlike the sCMPAR(1) process, the sCMPMA(1) process is not Markovian.
The autocorrelation between X t and X t+1 is Y i , respectively, are sCMP(λ, ν, m 1 ) and sCMP(λ, ν, m 1 + m 2 ) random variables; i.e. each sCMP random variable can be viewed as respective sums of iid CMP(λ, ν) random variables, Y i . Thus, where, without loss of generality, we let Y denote any of the iid Y i random variables. Meanwhile, because {X t } is a sCMP(λ, ν, 2m 1 + m 2 ) distributed stationary process, we can likewise represent Var(X t ) = Var Var(Y i ) = (2m 1 + m 2 )Var(Y ) for all t. We therefore find that Because m 1 , m 2 ≥ 1, the one-step range of possible correlation values is 0 ≤ ρ 1 ≤ 0.5. In particular, for m 1 = m 2 , we have the special case where ρ 1 = 1/3. Meanwhile, ρ k = 0 for all k > 1 because, by definition of the SCMPMA(1) model assumptions, there is no dependent structure between X t and X t+r for r > 1. Sellers et al. Journal of Statistical Distributions and Applications (2021)  Recall from the "The Conway-Maxwell-Poisson distribution and its generalization" section that G (w) = ζ (λw,ν) ζ (λ,ν) π is the pgf for a sCMP(λ, ν, π) distributed random variable, (say) G. Using this knowledge along with Eq. (21), the joint pgf can be derived as where Eq. (23) is equivalent to Eq. (20) (i.e. the SCMPMA(1) process is comparable to the SCMPAR(1) process) when m 1 = n 1 = n 2 − m 2 . Given this comparison, we can easily determine the conditional mean E(X t+1 | X t = x) and conditional variance Var(X t+1 | X t = x). Eq. (23) further demonstrates that the SCMPMA(1) model is time-reversible. Parameter estimation via maximum likelihood (ML) is a difficult task with INMA models given the complex form of the underlying distributions. Even a conditional least squares approach does not appear to be feasible "because of the thinning operators, unless randomization is used" (Brännäs and Hall 2001). We therefore instead consider the following ad hoc procedure for parameter estimation. Given a data set with an observed correlation ρ 1 , we first propose values for m 1 , m 2 ∈ N that satisfy the constraint, ρ 1 ≈ m 1 2m 1 +m 2 . Given m 1 and m 2 and recognizing that X t is stationary with a sCMP(λ, ν, 2m 1 + m 2 ) distribution, we proceed with ML estimation to determineλ andν as described in Zhu et al. (2017) for conducting sCMP(λ, ν, s = 2m 1 +m 2 ) parameter estimation with regard to a CMP process over an interval of length s ≥ 1. The corresponding variation forλ andν can be quantified via the Fisher information matrix or nonparametric bootstrapping. While the sampling distribution forλ is approximately symmetric, the sampling distribution forν is considerably right-skewed, hence analysts are advised to quantify estimator variation via nonparametric bootstrapping. While this is a means to an end, it only achieves in determining an appropriate distributional form regarding the data; it does not fully address the nature of the time series.

Data examples
To illustrate the flexibility of our INMA model, we consider various data simulations and a real data example. Below contains the respective details and associated commentary. Table 1 reports the estimated mean, variance, and autocorrelation that result from Sellers et al. Journal of Statistical Distributions and Applications (2021)   For the special cases (i.e. ν = 0, 1, 35, where ν = 35 sufficiently represents performance as ν → ∞), we likewise provide the expected/true mean and variance. Along with the estimated autocorrelationρ that results from the data, the table reports the true autocorrelation, ρ 1 = m1 2m1+m2 , for all {m 1 , m 2 } and any ν (Eq. (22) For all examples, we find that the associated mean and variance compare with each other as expected, i.e. the variance is greater than the mean when ν < 1 (i.e. the data are over-dispersed), the variance and mean are approximately equal when ν = 1 (i.e. equidispersion holds), and the variance is less than the mean (i.e. the data are under-dispersed) when ν > 1. In particular, we can easily verify that the three special case models perform as expected. For the Poisson cases (ν = 1), we expect the mean and variance to both equal (2m 1 + m 2 )λ, while the binomial cases (i.e. ν → ∞ and p = λ λ+1 ) produce a mean equal to (2m 1 + m 2 ) λ λ+1 and variance equaling (2m 1 + m 2 ) λ λ+1 1 − λ λ+1 , and the negative binomial cases (ν = 0 with p = 1 − λ) have a mean of (2m 1 +m 2 )λ 1−λ and variance equaling (2m 1 +m 2 )λ (1−λ) 2 . In fact, even with the ν → ∞ case approximated by letting ν = 35, we still obtain reasonable estimates for the mean and variance for all of the associated cases of m 1 and m 2 .

Simulated data examples
For each {m 1 , m 2 } pair, the mean and variance both decrease as ν increases while, for all of the considered examples, we obtain estimated correlation valuesρ that approximately equal the true correlation, ρ. In particular, for those cases where m 1 = m 2 , we obtain ρ ≈ 1/3 as expected (see Eq. (22)).  (2007) considers a PAR(1) model, noting that "the empirical partial autocorrelation function indicates that a first order [autoregressive] model may be an appropriate choice" witĥ ρ 1 = 0.292; Sellers et al. (2020), following suit, consider a SCMPAR(1) model as a flexible alternative to the PAR(1) model. The ACF and PACF plots of these data, however, do not clearly distinguish between considering a first-order autoregressive or a moving average model; see Fig. 1a-b. Further, recognizing that the data express apparent underto equi-dispersion, we therefore consider the SCMPMA(1) as an illustrative model for analysis. We perform ML estimation assuming various combinations for (m 1 , m 2 ) (i.e. {(1,1), (1,2), (2,2)}) as these values contain the observed correlation, 0.25 = 1 4 <ρ 1 < 1 3 ≈ 0.33. Table 2 contains the resulting parameter estimates for λ and ν, along with the respective Akaike Information Criterion (AIC). While the SCMPMA(1) model with m 1 = m 2 = 2 has the lowest AIC among the four models considered, all of these models produce approximately equal AIC values (i.e. 695.2) where the increasing m 1 and m 2 values associate with decreasingλ and increasingν. This makes sense because the resulting estimates rely solely on the assumed underlying sCMP(λ, ν, 2m 1 + m 2 ) distributional form for the data.
The dispersion estimates in Table 2 are all greater than 1, thus implying a perceived level of data under-dispersion. These results naturally stem from the reported mean of the data (1.286) being greater than its corresponding variance (1.205). Their associated 95% confidence intervals (determined via nonparametric bootstrapping; also supplied in Table 2), however, are sufficiently large such that they contain ν = 1. This suggests that the apparent data under-dispersion is not statistically significant, thus instead suggesting that the data can be analyzed via the Al-Osh and Alzaid (1988) PMA(1) model. It is further striking to see that the respective 95% confidence intervals associated with the dispersion parameter increase with the size of the underlying sCMP(2m 1 + m 2 ) model. This is an artifact of the (s)CMP distribution, namely that the distribution of ν is a right-skewed distribution (as discussed in Zhu et al. (2017)). This approach confirms interest in the PMA(1) model where Eqs.

Discussion
This work utilizes the sCMP distribution of Sellers et al. (2017) to develop a SCMPMA(1) model that serves as a flexible moving average time series model for discrete data where data dispersion is present. The SCMPMA(1) model captures the PMA(1), as well as versions of a negative binomial and binomial MA(1) structure, respectively, as special cases. This along with the flexible SCMPAR(1) can be used further to derive broader auto-regressive moving average (ARMA) and auto-regressive integrated moving average (ARIMA) models based on the sCMP distribution. The SCMPMA(1) shares many properties with the analogous SCMPAR(1) model by Sellers et al. (2020). The presented models rely on predefining discrete values (i.e. m 1 , m 2 for the SCMPMA(1)) for parameter estimation. As done in Sellers et al. (2017) and Sellers and Young (2019), we utilize a profile likelihood approach where, given m 1 and m 2 , we estimate the remaining model coefficients and then identify that collection of parameter estimates that produces the largest likelihood, thus identifying these parameter estimates as the MLEs. While this profile likelihood approach is acceptable as demonstrated in other applications, directly estimating m 1 , m 2 along with the other SCMPMA(1) model estimates would likewise prove beneficial, as would redefining the model to allow for realvalued estimators for m 1 and m 2 . These generalizations and estimation approaches can be explored in future work.
Simulated data examples illustrate that the SCMPMA(1) model can obtain unbiased estimates, and the model demonstrates potential for accurate forecasts given data containing any measure of data dispersion. The real data illustration, however, highlights the complexities that come with parameter estimation. While we nonetheless present a means towards achieving this goal, this approach does not perform but so strongly with regard to prediction and forecasting. It nonetheless serves as a starting point for parameter estimation that we will continue to investigate in future work. Moreover, the flexibility of the SCMPMA(1) aids in determining a parsimonious model form as appropriate.