2.1 Application of the generalized beta to sea ice extents
As mentioned above, while there is decline in Arctic sea ice overall, there are considerable regional differences in trends and seasonal ice extent. What are here denoted "core" Arctic sea regions (namely the central Arctic Ocean, the Canadian Archipelago, and Hudson Bay) retain partial ice cover throughout the year, and in winter months show complete ice cover. For example, in 2011, the central Arctic had readings of 7.158mn k m2 throughout January to March while the Canadian Archipelago had readings of 0.751 mn k m2 for January through to April. Figure 2 shows monthly extent totals for the central Arctic ocean during 2007-11 and illustrates maximum inflation in winter. The trend in these three regions is similar to that in the Arctic ocean considered as an aggregate, namely stronger declines in summer extent.
A time series representation needs to express the annual reversion to complete cover (maximum recurrence) in winter months, together with the irregular trend for declining extent in non-winter months. The application of a generalized form of the beta density is motivated by the fact that the observed ice extents y can be seen as ratios r = y/d to a maximum possible extent d, though substantive interest is in trends in extents y. It is important that the bounded nature of the response is included in any model. Another possibility might be some form of truncated sampling mechanism (e.g. a log-normal for extent readings y with ceiling d) but this precludes any analysis of the factors producing seasonal extremes.
Consider the beta distribution on (0,1), with density function given by
with a > 0,b > 0. An alternative representation (Ospina and Ferrari 2010) involves mean and precision parameters (μ,ϕ), where a = μ ϕ,b = (1 - μ)ϕ, namely
with ϕ > 0, and 0 < μ < 1. This form facilitates separate modelling of mean and variance trends (Huang and Oosterlee 2008). For values (a,b) apart from a = b = 1, the Beta(a,b) density has zero mass at the extreme values 0 and 1, and zero-inflated or one-inflated versions of the beta need to be applied (Ospina and Ferrari 2010). Let g = 0 or 1, then inflation at either boundary is achieved by the mechanism
where α
g
is an inflation probability.
The generalized beta is obtained by extending the support interval to an arbitrary bounded interval (c,d) (with d > 0) via a linear transformation y = c + z(d - c) (Pham-Gia and Duong 1989), so that
with equivalent representation
(1)
with mean c + (d - c)μ, and variance (d - c)2μ(1 - μ)/(ϕ + 1).
For the generalized beta in (1), inflation will need to be applied for values occurring at the boundary points, when y = c or y = d (this may be termed minimum and maximum inflation). The maximum inflated version of the generalized beta is particularly relevant to the sea ice application and has the form
In the generalized beta applied to core Arctic regions, c = 0 while d is the maximum winter extent (namely d1 = 7.158 mn k m2 for the central Arctic ocean, d2 = 0.751mn k m2 for the Canadian Archipelago, and d3 = 1.233 mn k m2 for Hudson Bay).
While summer minimum extents in the central Arctic and Canadian Archipelago remain well in excess of zero, those in Hudson Bay are becoming relatively small, e.g. y = 0.025 mn k m2 in September 2010. This raises the possibility of needing to represent both maximum and minimum inflation in the generalized beta. This can be handled by the mechanism
where the vector of probabilities (α
c
,α
c
,1 - α
c
- α
d
) should be modelled using a multinomial logistic.
2.2 Generalized beta time series regression with maximum inflation for sea ice extents
Let {μ
t
,ϕ
t
,α
d
t
} denote the series of parameters underlying the y
t
series, m = m
t
represent the month that observation t corresponds to, and s = s
t
represent the year corresponding to observation t (e.g. s = 2 in 1980 for observations t = 13,..,24 and s = 31 for observations t = 361,..,372). A parsimonious time series model is sought (Ledolter and Abraham 1981), combining close fit with low predictive variability, especially for cross-validatory and out-of-sample predictions. As discussed in Section 4, these aspects of fit are assessed using a posterior predictive fit criterion (Laud and Ibrahim 1995). A parsimonious model for the level of the series is expressed by a logit regression in μ
t
,
logit(μ
t
) = Δ
ms
+ ϑ
m
, where Δ
ms
represents trend for each combination of month m and year s, and ϑ
m
represents seasonal effects.
Two options for the trend are considered. One option is a stochastic trend, with random variation around a central linear trend. To allow for steeper declines in some months, a discrete mixture is implemented via
(2)
with ρ
m
∼ Bern(π
m
) being binary indicators. With the constraint δ11 < δ01,φ1s represents the stronger downward trend.
The other form of trend assumption (deterministic) involves a linear trend in years s
t
combined with a short term AR1 lag effect in extents y
t
. The latter represents carry over between successive months; for example, if September extent is relatively low in a particular year, then October extent may also be relatively low. The linear trend may vary between months and by broad sub-period. For example, (Comiso et al. 2008) report a stronger decline during 1996-2007 than 1979-96. Here we consider three sub-periods p = 1,...,3 of 12 years, including out-of-sample years (2012-14), namely 1979-1990, 1991-2002, and 2003-2014. The trend parameter for a particular time point is chosen by a monthly specific discrete mixture between guide linear trend parameters, specific to broad period, Γ0p and Γ1p, with Γ1p < Γ0p. The AR1 lag effect is also taken to vary by month, with normal priors for each monthly lag parameter. Thus for month m = m
t
, and year s
t
,
where the linear trend γ1m for month m = m
t
in period p is chosen using a discrete mixture
Remaining aspects of the model are applicable across different representations of trend. Seasonal (monthly) effects are represented by a Fourier series (Höhle and Paul 2008),
(3)
where ω = 2π/M, with M = 12, and J1 is the number of harmonics. To allow for changing precision it is assumed that
(4)
namely a linear trend (varying by month) in year units. For example, (Stroeve et al. 2012) find evidence of increased variability in overall Arctic sea ice extent, especially in summer months.
To represent extreme data (complete winter coverage), a logit regression is used to model the probabilities α
dt
of maximum inflation, with form
A trend element in the inflation probability is not included as it would be confounded with the trend model in the mean.
2.3 Other generalized beta applications
While the application here focuses on sea ice extent and a time series application, the generalized inflated beta with mechanisms or regressions for both extreme and non-extreme observations has potential applications in other settings where the observations can be regarded as ratios r
i
= y
i
/d
i
of actual extents to a maximum extent d
i
, but substantive interest is in the extents y
i
. The extents may be, inter alia, expressed in spatial units (e.g. areas in millions of square kilometres) or time units (e.g. durations in hours). As an example with time extents, one might analyse hours with cloud cover y
i
in relation to daylight hours d
i
, while a spatial application might consider desertified extents y
i
in relation to total area extents d
i
.
Data of this form can be considered as a form of compositional data, and widely used methods (Aitchison and Egozcue 2005, Butler and Glasbey 2008) focus on the ratios r
i
, or more specifically the log ratios. Spatial applications involving compositional data and zero inflation have been described in (Leininger et al. 2013), but also focus on the ratios.
However, for policy purposes, the interest may be in trends or patterns in the extents themselves (i.e. the y-data), rather than in the ratios, as is the case with the sea ice extents. Alternatively stated, "compositions provide information only about the relative magnitudes of the compositional components and so interpretations involving absolute values... cannot be justified" (Aitchison and Egozcue 2005) p. 839. Thus in the case of desertification (e.g. Zhao et al. 2010), the substantive focus may be on spreading desertification, implying analysis of desertified extents y
i
. Some areas may be totally desertified with y
i
= d
i
(maximum inflation). Regression modelling of desertification extents would then need to include a mechanism or regression describing maximum inflation, as would spatial forecasting (or interpolation) of desertified extents in situations where comprehensive assessment of desertification status is only available for some area units.
The generalized beta density with maximum or zero inflation might be potentially extended to Dirichlet density applications, and to generalized inflated Dirichlet densities parallel to equation (1), where there are more than two categories and where extreme observations can occur. For example, with three categories the observations would be (y1i,y2i,y3i), with and maximum inflation occurring when any y
ki
= d
i
. The inflation probabilities can be modelled using a multiple logistic. For example, in the sea ice application, one may distinguish by sea-ice type (e.g. Fissel et al. 2011) between perennial multi-year sea ice and first-year ice, so that sub-region observations become (y1,y2,y3) for area covered by multi-year ice, area covered by first-year ice, and area without ice cover respectively.