Alternative approaches for econometric modeling of panel data using mixture distributions
 Judex Hyppolite^{1}Email authorView ORCID ID profile
https://doi.org/10.1186/s4048801700663
© The Author(s) 2017
Received: 4 January 2017
Accepted: 11 July 2017
Published: 1 August 2017
Abstract
The economic researcher is sometimes confronted with panel datasets that come from a population made of a finite number of subpopulations. Within each subpopulation the individuals may also be heterogenous according to some unobserved characteristics. A good understanding of the behavior of the observed individuals may then require the ability to identify the groups to which they belong and to study their behavior across groups and within groups. This may not be a complicated exercise when a group indicator variable is available in the dataset. However, such a variable may not be included in the dataset; and as a result, the econometrician is forced to work with the marginal distribution of the observed response variable, which takes the form of a mixture distribution.
One can model a given response variable with a variety of mixture distributions. In this paper, I present several related mixture models. The most flexible one is an extension of the model by Kim et al. (2008) to the panel data setting.
I have reviewed the estimation of some of these models by the ExpectationMaximization (EM) algorithm. The intent is to exploit the nice convergence properties of this algorithm when it is difficult to find good starting values for a Newtontype algorithm. I have also discussed how to compare these models and ultimately identify the one that provides the best fit to the data set under investigation. As an application I examine the investment behavior of U.S. manufacturing firms.
Keywords
Panel data Mixture of distributions Hidden Markov models HeterogeneityIntroduction
To model the heterogeneity of economic agents I present a series of panel data mixture models of increasing degree of flexibility and complexity and show how they can be used to handle at least two types of heterogeneity: heterogeneity with respect to group membership, and heterogeneity with respect to within group differences in individual characteristics. I have also reviewed the methods of estimation of some of these models via the ExpectationMaximization algorithm. The objective is to take advantage of the nice convergence properties of this algorithm when it is difficult to find good starting values for a newtontype algorithm. I have also reviewed some statistical tests that can be used to choose the best models among those discussed in this paper.
Heterogeneity is an important problem faced by the statistician or the econometrician trying to infer the behavior of economic agents from available data sets. Economic decison makers are heterogeneous in their characteristics and they usually operate in heterogeneous (different) environments. As a result, their behavior generate data whose distributions are sometimes difficult to approximate with the traditional single component econometric models. To deal with this problem, often economists divide their sample into groups using observed variables such as time (in time series) or other individual characteristics (in time series and longitudinal data). The groups obtained this way are usually static and may differ from alternative groups obtained using different observed variables.
While this strategy may allow the researchers to draw some useful conclusions, it is less attractive than the approach that uses multiple characteristics for determining group membership. It is also less flexible than the approach that allows for the possibility that an individual changes group membership depending on the evolution of his characteristics and of the conditions that he is facing. Lastly, it is much less flexible than the approach that offers a unified way (one step method) to make inference about both group membership and behavior. Mixture of distribution models offer such flexibility. These models are justified not only in theory, because they offer a nice way to model heterogeneity, but also in practice since they can be used to provide a semiparametric approximation to the nonstandard distributions of some economic variables at a reasonable cost (McLachlan and Peel 2000). Mixture of distributions are in fact at the crossroad between parametric and non parametric families of distributions. They are parametric because each component distribution usually belongs to a parametric family of distributions, and they are nonparametric because it is possible to provide a very good approximation to the distribution of some variables by increasing the number of components of the mixture (Fink 2007).
Among economic variables whose study can benefit from the applications of mixture distributions one can cite firms’ investment, households consumption, money demand, household use of healthcare, etc. Finite mixture distributions are commonly used in Econometrics, mainly in crosssectional and time series analyses. Following Hamilton (1988), some versions of the hidden Markov models have been extensively used in macroeconometrics to model business cycle fluctuations under the name of Markov Switching regression models. Nevertheless, applications of mixture of distributions in the panel data setting appear to be limited. In many cases the panel data set is treated almost the same way as a cross section. In some rare cases, as in Deb and Trivedi (2013) the dependence of the observations within each unit is modeled using individual specific effects. However, if the panel data set is viewed as a collection of time series it is not difficult to extend the hidden Markov models used in time series analysis to the panel data setting. This is the point of view adopted in this paper and also by Asea and Blomberg (1998) as well as Atman (2007) and Maruotti (2007). The most flexible models presented in this paper extends the times series model by Kim et al. (2008). I allow the Markov chains to be timeinhomogeneous and nonstationary and I introduce within group heterogeneity in the component distributions using the specification by Mundlak (1978). The models are closer to the models by Atman (2007) and Maruotti (2007). A related set of models applied to Panel Count data can also be found in Trivedi and Hyppolite (2012).
The models
Several alternative mixture distributions can be used to model the bivariate process constituted by an economic agent’s decision and its group membership. In the following sections, nine such models are described going from the simplest to the most complicated. All of the models are assumed to be made of two components, but extension to more than two components is not difficult.
The models can be used to study several different economic phenomena such as households consumption under financial constraints, firms investment under financial constraints, households demand for money, household use of healthcare, etc. In what follows I will use the example of investment choices under financial constraints to motivate the specifications.
A finite mixture model with constant mixing proportions (\(\mathcal {M}_{1}\))
This is a classical finite mixture of distributions with constant weights π and 1−π.
Parameters Estimation
The parameters of the preceding model can be estimated using maximum likelihood.
Since W _{ it } is missing, maximizing the marginal likelihood appears to be the most natural estimation approach. However, the ExpectationMaximization (EM) algorithm (Dempster et al. 1977) offers a much simpler alternative. This algorithm maximizes the completedata likelihood after augmenting the data for the missing variable W _{ it } during the expectation step.

EStep (Expectation Step)
During this step, an intermediate quantity$$Q\left(\boldsymbol{\theta},\boldsymbol{\theta}^{\prime}\right)=\mathbb{E}_{w_{it}}\left(\log{\left(L^{c}(\boldsymbol{\theta})\boldsymbol{\theta}^{\prime}\right)}\right) $$ 
Msetp (Maximization step)
during which the following maximization problem is solved$$\begin{aligned} &\hat{\boldsymbol{\theta}}=\underset{\boldsymbol{\theta}}{\text{argmax}}\ {Q}\left(\boldsymbol{\theta},\boldsymbol{\theta}^{\prime}\right)\\ &\qquad \text{Subject to:}\\ &\qquad \text{Various appropriate constraints.} \end{aligned} $$
 1.
Choose initial values \(\boldsymbol {\theta }^{0}=\left (\pi ^{0},\boldsymbol {\beta }_{1}^{0},\sigma _{1}^{0},\boldsymbol {\beta }_{2}^{0},\sigma _{2}^{0}\right)\)
 2.
Compute E(w _{ it }y _{ it };θ ^{0}) for each observation
 3.
Substitute E(w _{ it }y _{ it };θ ^{0}) in the completedata loglikelihood
 4.
Find new values for the parameters \(\boldsymbol {\theta }^{1}=\left (\pi ^{1},\boldsymbol {\beta }_{1}^{1},\sigma _{1}^{1},\boldsymbol {\beta }_{2}^{1},\sigma _{2}^{1}\right)\) by maximizing the completedata likelihood
 5.
Compute \(\text {error}=\frac {L\left (\boldsymbol {\theta }^{1}\right)L\left (\boldsymbol {\theta }^{0}\right)}{L\left (\boldsymbol {\theta }^{0}\right)}\)
 6.
If error is higher than a chosen tolerance level, repeat step 2 with the last estimates for the parameters
 7.
Otherwise, stop; the last estimates are the maximum likelihood estimates.
This algorithm is attractive not only because it provides an intuitive interpretation of the estimation, but also because of its monotone and global convergence properties. It has been proved (McLachlan and Krishman 1997) that the loglikelihood is nondecreasing at each consecutive iteration. This property is very useful for detecting programming errors. Moreover, the global convergence property allows for more flexibility in the choice of starting values than is possible with a Newtontype algorithm.
However, the EM Algorithm is criticized not only because it converges at a low rate, but also because it does not supply automatically an estimate of the covariance matrix of the parameters (McLachlan and Krishman 1997). The Hessian necessary to obtain an estimate of the information matrix in the maximum likelihood setting is not used in the computations. There have been several solutions proposed in the literature to solve this problem. The most notable one is provided by Louis (1982).
Note that according to this model the probability that an economic agent belongs to a certain group remains the same every period. In a dynamic economic environment this assumption is too restrictive. For example, the financial status of a firm cannot be determined by flipping a coin; it is more likely to be dependent on the firm’s performance, its characteristics and on the economic conditions it is facing. Thus, several observed variables should help in determining group membership. So, a more realistic model should allow for covariates dependent mixing proportions.
A finite mixture model with smoothly varying mixing proportions (\(\mathcal {M}_{2}\))
z _{ it } is a row vector of covariates that impact the probability for an agent i to belong to a certain group and γ is a column vector of parameters.
Parameters Estimation
The EM algorithm can be implemented exactly as before.
An Endogenous Switching Regression Model (\(\mathcal {M}_{3}\))
When σ _{ ε1}=σ _{ ε2}=0, the preceding likelihood is the same as in the previous model and the weights would add up to one. Xiaoqiang and Schiantarelli (1998), Hovakimian and Titman (2006) and Almeida and Campello (2007) use classical econometric methods to estimate the preceding endogeneous switching regression model with fixed effects.
This model can also be estimated with the EM algorithm, but the intermediate EM quantity is no longer separable in the parameters which makes this less appealing than the direct maximization of the log of the marginal likelihood. Maximizing Q(θ,θ ^{′}) at each iteration is potentially as computationally involved as the onestep maximization of the marginal likelihood. However, if one has difficulty finding good starting values for a Newtontype algorithm, one can still benefit from the nice convergence properties of the EM algorithm via the simpler model \(\mathcal {M}_{2}\). As indicated before, if the correlations between the components and the group membership equation are zero \(\mathcal {M}_{3}\) is identical to \(\mathcal {M}_{2}\) and as a result the latter will provide very good starting values for the former. One just has to apply the EM algorithm to \(\mathcal {M}_{2}\) and use the solution as starting value for \(\mathcal {M}_{3}\).
An Endogenous Switching Regression Model with Random Effect (\(\mathcal {M}_{4}\))
This specification of the firmspecific effect is interesting because in practice one expects that some of the exogenous variables will be correlated with the agent’s unobserved characteristics, which may also contain a random component. Moreover, the use of two different random effects for each component distribution allows the data to dictate whether or not those agents who fall more often in a given group have the same unobserved specific characteristics as those who fall most of the time in the other group. When ζ _{1} and ζ _{2} equal zero one obtains the usual random effect specification.
Using the MATLAB function mherzo.m written by Zhang and Jin (1996) I have generated 9point onedimensional GaussHermite weights, w _{ r } and nodes z _{ r } (r=1,..,9).
One should note that Gaussian quadrature, or numerical integration methods in general, suffer from the curse of dimensionality. The number of function evaluations required to approximate the integral to a certain degree of accuracy increases exponentially with the dimension of the integral. Monte Carlo Integration or a monomial rule may be less costly. González et al. (2006) show that Monte Carlo and QuasiMonte Carlo methods can not only reduce computation time but also provide better accuracy in the case of logistic regressions.
Alternatively, one can use the hlikelihood method by Lee and Nelder (1996) and bypass the computation of the integral. In this case the random effects are treated as additional parameters that are estimated with the other parameters. For panel data with a large number of units this method increases significantly the number of parameters to be estimated.
A Hidden Markov Model (\(\mathcal {M}_{5}\))
where 1 ^{′} is a column vector of ones.
Parameters Estimation (EM Algorithm)

Choose initial values θ _{0} and,

Compute \(\phantom {\dot {i}\!}p(w_{it}=j\text {\boldmath \(\Im \)}_{iT_{i}};\boldsymbol {\theta }_{0})\) and \(\phantom {\dot {i}\!}p(w_{i(t1)}=k,w_{it}=l\text {\boldmath \(\Im \)}_{iT_{i}};\boldsymbol {\theta }_{0})\) for each observation,

Substitute the computed probability in the intermediate EM quantity \(\phantom {\dot {i}\!}\mathcal (Q)(\boldsymbol {\theta }_{1},\boldsymbol {\theta }_{0})\),

Solve$$\begin{aligned} {~}&\boldsymbol{\theta}_{1}=\underset{\boldsymbol{\theta}}{\text{argmax}}\ \mathcal{Q}(\boldsymbol{\theta},\boldsymbol{\theta}_{0})\\ &\text{subject to:}\\ &\sum_{j=1}^{2}\pi_{j}=1\\ &\sum_{l=1}^{2}P_{kl}=1;k=1,2\\ &0\leq\pi_{j}\leq 1, j=1,2. \end{aligned} $$

Repeat step 2 after replacing θ _{0} by θ _{1},

Keep going until convergence.
One main drawback with the HMM model with constant transition matrix is that the probability for a firm to move from one state to another does not depend on any observable, which is unrealistic for reasons considered in the case of the first model.
HMM Model with Time dependent Transition Matrix (\(\mathcal {M}_{6}\))
This is a time heterogeneous transition matrix. This matrix is different from the specifications in Asea and Blomberg (1998), Atman (2007) and Maruotti (2007). It is also possible to use a probit or logit model for each row of the transition matrix. In fact, when the Markov chain has more than two states a multinomial probit or logit model would be the most convenient choice. However, for a chain with two states, the current specification appears to be better since it involves a smaller number of parameters and offers a nice way to test for time dependence by testing the hypothesis λ=0.
Parameters Estimation
Hidden Markov Model with Time Varying Transition Matrix and Random Effects (\(\mathcal {M}_{7}\))
The integrals are computed using GaussHermite quadrature.
Hidden Markov Model with Time Varying Transition Matrix and endogeneity (\(\mathcal {M}_{8}\))
The last distributional assumptions make the states of the Markov chains and the response variable y _{ it } interdependent. The resulting model is an extension to the panel data setting of a modified version of the model by Kim et al. (2008). The transition matrix of the current model uses less parameters and the correlations between the stateindicator variable and the component distributions are allowed to be different.
Parameters estimation
Hidden Markov Model with Time Varying Transition Matrix, endogeneity and random effects (\(\mathcal {M}_{9}\))
Model Identification
The parameters of all the models previously presented are not automatically identified. In theory the loglikelihoods are all unbounded and a maximum likelihood estimator may not exist. Also, they all suffer from nonidentification due to label switching. The loglikelihood is invariant under the permutation of the components which will make it difficult to dissociate the unconstrained component from the constrained component.
As suggested in the literature (FruhwirthSchnatter 2006), this identification problem can be solved by the use of a set of constraints. These constraints may come from economic theory. In the case of firms’ physical investment one may be tempted to argue that a firm that has no trouble financing its investment activities should have a higher investment to capital ratio than when it has trouble obtaining funds, ceteris paribus. However, economic theory can only support the idea that a constrained firm is likely to choose a rate of investment below its optimal rate. Given the heterogeneity of the firms, it is possible that the majority of the constrained firms has a higher optimal rate of investment than the unconstrained ones. As a result, the previous constraint would be misleading. Thus, identification constraints should be chosen with care.
Another identification problem is associated with the use of a mixture model of too many components (overfitting). If the data set is generated by a single component, attempting to fit a mixture of two components may produce a component with a very small number of observations. In the case of a mixture with constant mixing proportions, the weight of each component will be very close to zero. As a consequence the loglikelihood will be approximately the same for any choice of parameters associated to that component.
Another issue that makes the identification of the parameters of these models difficult is the fact that the loglikelihoods are generally multimodal. Since the optimizers that will be used to maximize the loglikelihood can only find local maxima, the parameters estimates will be highly dependent on the starting values. To deal with this problem the loglikelihood maximization will be repeated several times with different starting values and the parameters estimates will be chosen to be the vector of estimates that corresponds to the highest loglikelihood assuming that it does not have the characteristics of a spurious maximizer. Each time the starting values are generated using either the Kmeans clustering algorithm (MacQueen 1967; Fink 2007) or a random classification scheme where each observation is randomly assigned to one group by flipping a fair coin. Note that the Kmeans algorithm does not produce the same classification at each run since the initial assignments are random. With these procedures, I try to increase the probability of finding a vector of starting values that falls in the bassin of attraction of the highest loglikelihood.
Inferences
Inferences will be based on the asymptotic properties of the maximum likelihood estimator. As discussed in the previous section, the likelihood of the models presented in this paper do not have an absolute maximum. However, for model \(\mathcal {M}_{1}\), Kiefer (1978) has showed that it is possible to find a closed set that contains the true value of the vector of parameters in which there exists a unique consistent estimator. One requirement for this set is that it does not contain π=0, π=1, σ _{2}=0, and σ _{1}=0. That estimator is asymptotically normal with a covariance matrix equal to the inverse of the information matrix. Choi and Zhou (2002) proved similar results for a class of models with covariatedependent mixing proportions.
Douc and Mathias (2001) prove the consistency and the asymptotic normality of the maximum likelihood estimator of a general hidden Markov model for both stationary and nonstationary Markov chains. The asymptotic covariance is, as usual, the inverse of the information matrix.
Robust Standard errors
If the sample is relatively small one can alternatively use parametric or nonparametric bootstrap. In the nonparametric case an appropriate resampling method is Moving Blocks Bootstrap as described in Cameron and Trivedi (2005). Nevertheless, for the hidden Markov models where the time series properties of the data are very important resampling among the units as proposed by Kapetanios (2008) may even be more appropriate.
I should note that for the models considered in this paper bootstrapping requires some care. The likelihoods being potentially multimodal the highest local maximum may not be reached at each repetition.
Statistical tests
The statistical tests that will be considered have four objectives: 1) to determine the number of components of the mixtures, 2) to choose the best mixture among the models with a given number of components, 3) to test for endogeneity and 4) to test for random effects.
As stated in McLachlan and Peel (2000) choosing the number of components for a mixture is difficult. The preceding authors provide a long discussion about this issue in their book. One important problem is that in some cases one may not be able to find evidence that favors a model of a given number of components over another model that contains more or fewer components. In such situations they advocate choosing the model with the smaller number of components.
For the applications targeted in this paper the possible number of components will be inferred from economic theory. The main issue will then be how to find the distribution of the chosen test statistic under the null hypothesis.
P(2) is the space of positive definite matrices of dimension 2.
In all cases the null hypothesis falls on the boundary of the parameter space as can be seen from Eq. (38) to Eq. (47). As a consequence the regularity conditions used to derive the asymptotic distribution of the likelihood ratio test break down. Note also that under H _{0} the parameters of the component distribution with zero mixing proportion are not identifiable. The asymptotic distribution of the likelihood ratio test is not the expected χ ^{2} distribution. For example, in the case of a onecomponent binomial distribution versus a twocomponent distribution Chernoff and Lander (1995) show that the distribution of twice the logarithm of the likelihood ratio is a mixture of three distributions, two of them are χ ^{2}. Goffinet and Loisel (1992) found similar non standard results. A review of these issues can be found in McLachlan and Peel (2000).
 1.
Compute the maximum likelihood estimator (β,σ) for the onecomponent model.
 2.
Generate a sample \(y_{it}^{*},t=1,..T_{i},i=1,\ldots,N\) from ϕ(x _{ it } β,σ).
 3.
Use \(y_{it}^{*}\) and the other covariates to obtain (β _{ m },σ _{ m }) for the onecomponent model and θ _{ m } for the alternative twocomponent model.
 4.
Use these parameters to compute the likelihood ratios t _{ m }.
 5.
Repeat this process 999 times to obtain a sequence \(\{t_{m}\}_{m=1}^{999}\).
The alternative hypothesis is that at least one of the coefficients of correlation is different from zero. As can be seen from Eq. (40) the boundary problem no longer exists and twice the likelihood ratio statistic has a chisquare distribution with two degrees of freedom. Alternatively, the test can also be conducted using a tstatistic.
The preceding matrix is positive semidefinite and H _{0} falls on the boundary of the parameter spaces Ω _{4} and Ω _{7}. As before the distribution of the likelihood ratio statistics is not the expected χ ^{2} distribution. Stram and Lee (1994) have studied this problem for onecomponent linear models and showed that the asymptotic distribution of the likelihood ratio statistic is a mixture of chisquare distributions.
In both cases the rows of the transition matrices are the same under the null hypothesis. The asymptotic null distribution of the likelihood ratio is valid in theses cases. In the case where λ=0 under the null hypothesis a ttest is also appropriate.
Application: Firms’ investment and financing constraints
The basic intertemporal investment model by Hayashi (1982) assumes that a firm chooses the level of its next period capital stock by maximizing the expected discounted value of dividends. In reality, it is not always possible for certain firms to finance the level of investment that maximizes profit. This situation may arise because of the existence of information asymmetry between the firm’s managers and the potential suppliers of funds. Without the ability to evaluate accurately the profitability of the firm’s projects, the suppliers of funds may be unwilling to finance the firm’s investment or they may be willing to supply only a fraction of the funds needed by the firm. As a result, investment may not be financed to the level that is optimal in the absence of constraints. One way of accounting for this issue is by adding a borrowing constraint to the Hayashi (1982) model (Adda and Cooper 2003). The Euler equation from the resulting model would imply two different relationships between investment and its determinants depending on whether the constraint is binding or not. If this model is a good approximation for a firm’s investment behavior, at each point in time the firm will fall in one of two groups: the group of firms that are financially constrained (borrowing constraint is binding) and the group of firms that are not financially constrained. Since the observed data do not generally include any variable that indicates group membership, this setting is well suited for the use of finite mixture models of the kinds presented in this paper.
Variables definitions
Variables  Definitions 

INVESTMENT  Investment ÷ (beginning of period capital stock) 
GROWTH OPPORTUNITIES  (Market value of the firm’s asset − common equity and deferred taxes) ÷ (Book value) 
CASHFLOWS  (Income before extraordinary items + Depreciation) ÷ (Beginning of period capital) 
LOGBOOKASSET  Log(Value of Assets adjusted for inflation) 
SHORTTERMDEBT  (Short Term Debt) ÷ (Firm’s Assets) 
LONGTERMDEBT  (Long Term Debt) ÷ (Firm’s Assets) 
FINANCIAL SLACK  Cash and short term investment ÷ previous year Assets 
DUMMYDIVPAYOUT  Equal to 1 if firm pays dividend, 0 otherwise 
DUMMYBONDRATING  Equal to one if firm has bond rating 0 otherwise 
COVERAGE RATIO  Interests ÷ Earnings Before Interest 
ASSET SALES  (Sales of property, plant and Equipment) ÷ beginning of period capital 
Bootstrap likelihood ratio tests of one component versus two show a twocomponent distribution is favored over a onecomponent one in all considered cases.
Likelihood ratio tests comparing the twocomponent models
\(\mathcal {M}_{1}\) vs \(\mathcal {M}_{2}\)  \(\mathcal {M}_{2}\) vs \(\mathcal {M}_{3}\)  \(\mathcal {M}_{1}\) vs \(\mathcal {M}_{5}\)  \(\mathcal {M}_{2}\) vs \(\mathcal {M}_{6}\)  \(\mathcal {M}_{5}\) vs \(\mathcal {M}_{6}\)  

Likelihood ratio  1833.337  19.504  1890.253  767.080  710.163 
Degrees of freedom  7.000  2.000  9.000  2.000  7.000 
pvalue  0.000  0.000  0.000  0.000  0.000 
Estimates of the parameters of the components distributions for Models \(\mathcal {M}_{1}\), \(\mathcal {M}_{3}\), and \(\mathcal {M}_{8}\)
CF _{ t } ^{a}  CF _{ t−1}  AS _{ t+1} ^{b}  AS _{ t }  AS _{ t−1}  sigma  π  ρ  

Model \(\mathcal {M}_{1}\)  
Component 1  
Estimates  0.274  0.004  0.180  0.238  0.125  0.281  0.333  ___ 
ste  0.012  0.003  0.055  0.061  0.055  0.003  0.005  ___ 
Component 2  
Estimates  0.113  0.016  0.023  0.042  0.027  0.073  0.667  ___ 
ste  0.005  0.002  0.013  0.015  0.015  0.001  0.005  ___ 
Model \(\mathcal {M}_{3}\)  
Component 1  
Estimates  0.136  0.040  0.024  0.049  0.032  0.071  0.031  
ste  0.006  0.004  0.013  0.020  0.018  0.001  0.050  
Component 2  
Estimates  0.231  0.006  0.143  0.156  0.145  0.281  0.146  
ste  0.010  0.003  0.055  0.052  0.051  0.003  0.050  
Model \(\mathcal {M}_{8}\)  
Component 1  
Estimates  0.225  0.004  0.137  0.133  0.124  0.291  0.491  0.447 
ste  0.012  0.003  0.257  0.577  0.571  0.006  0.387  0.126 
Component 2  
Estimates  0.139  0.037  0.024  0.049  0.036  0.072  0.509  0.293 
ste  0.006  0.005  0.022  0.033  0.033  0.001  0.387  0.093 
The most important result here is that the HMM model \(\mathcal {M}_{5}\) strongly outperforms the mixture model \(\mathcal {M}_{1}\) and the same is true for the HMM model \(\mathcal {M}_{6}\) versus the mixture model \(\mathcal {M}_{2}\). Moreover, the test statistic on lambda (See transition matrix of model \(\mathcal {M}_{6}\)) is quite large (31.25) which implies that this parameter is significantly different from 0 and reinforces the idea that the firms financial states are timedependent. Of the two hidden Markov models, the likelihood ratio test reveals that the best one is the one that allows for a covariatedependent transition matrix.
The results of the likelihood ratio tests are also confirmed by the information criteria AIC and BIC since the most flexible Hidden Markov Model shows the lowest values. Moreover, these criteria make possible the comparison between the nonnested models \(\mathcal {M}_{3}\) and \(\mathcal {M}_{6}\) and models \(\mathcal {M}_{3}\) and \(\mathcal {M}_{5}\). Even though the hidden Markov models do not account for endogeneity they fit the data much better than the endogeneneous mixture generally used in the literature. Neglecting timedependence is then more problematic than neglecting endogeneity.
Nevertheless, even though it is clear that a twocomponent distribution fits the data better, it is not obvious which component should be labeled as financially constrained. So, to interpret the results in Table 3 I first need to find some criteria to label the components. For this reason I choose the identification criteria from the literature. Financially unconstrained firms are expected to be big, old and not highly leveraged; they are expected to pay dividends regularly and to have a bond rating; and they may face lower growth opportunities and may be less interested in carrying large cash balances. The justification of these criteria is reviewed in Hovakimian and Titman (2006). Applying these criteria to models \(\mathcal {M}_{2}\) and \(\mathcal {M}_{3}\) one can identify the financially constrained component as the one that has the largest standard deviation. The same is true for model \(\mathcal {M}_{6}\).
The results then suggest that investment is more responsive to cash flow and asset sales in the financially constrained state as was signaled in Hovakimian and Titman (2006). Since the standard deviation of investment is much higher for the financially constrained group (0.28 versus 0.08), one can conclude that the change in fixed capital investment is much more volatile for firms that spend a long time in the financially constrained state.
The financially unconstrained state appears to be quite persistent. While a firm that is currently constrained has a higher probability to stay in that state next period, it also has a significant probability (43%) to become unconstrained.
Even though the Markov chain was not assumed to be stationary, the estimated transition matrices clearly admit a stationary distribution. The stationary probability vector associated with the second transition matrix is (p _{1},p _{2})=(0.29,0.71), which means that in the long run a higher proportion of the observations (71%) is expected to be classified as unconstrained. This is consistent with the estimated mixing proportions for model \(\mathcal {M}_{1}\). Similar results are obtained for the exogenous HMM model with covariatedependent transition matrix.
Conclusion
I have presented nine alternative mixture models that may be of interest for making inference from available economic panel data sets. I have also reviewed the maximum likelihood estimation of six of them via the well known ExpectationMaximization algorithm. A series of possible tests are also discussed. These tests can be use to identify among the proposed models the one that fits the data better.
Estimation of the hidden Markov models with random effects may be time consuming because, for each unit, the loglikelihood at each point in time depends on all the previous observations of that unit; moreover, this likelihood has to be computed repeatedly for each vector of abscissae or each vector of draws of the random effects. If, however, the loglikelihood is programmed in FORTRAN or C as opposed to MATLAB or R, the computation time may be reduced significantly, but performing bootstrap tests may still require a long time. Nevertheless, the models considered in this paper are very flexible and can be used to account for several potential sources of heterogeneity in panel data.
Finally, as an application I used the models without random effects to study the differences in the investment behavior of firms when they are financially constrained and when they are not, and also to learn about the process that governs the evolution of a firm’s financial status over time.
Declarations
Acknowledgement
I am grateful for all the help and support received from Professor Pravin Trivedi in completing this project. I am thankful for the suggestions obtained from two anonymous referees.
Funding
I did not receive any financial support in writing this paper.
Competing interests
I declare that I have no competing interests in publishing this paper.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Adda, J, Cooper, R: Dynamic Economics: quantitative methods and applications. The MIT Press, Cambridge (2003).Google Scholar
 Almeida, H, Campello, M: Financial constraints, asset tangibility and corporate investment. Rev. Financ. Stud. 20(5), 1429–1460 (2007).View ArticleGoogle Scholar
 Asea, PK, Blomberg, B: Lending cycles. J. Econ. 83, 89–128 (1998).View ArticleMATHGoogle Scholar
 Atman, RM: Mixed hidden markov models: An extension of the hidden markov model to the longitudinal data setting. J. Am. Stat. Assoc. 102(477), 201–210 (2007).MathSciNetView ArticleMATHGoogle Scholar
 Cameron, AC, Trivedi, P: Microeconometrics: Methods and Applications. Cambridge University Press, Cambridge (2005).View ArticleMATHGoogle Scholar
 Cappé, O, Moulines, E, Rydén, T: Inference in Hidden Markov Models. Springer, New York (2005).MATHGoogle Scholar
 Chernoff, H, Lander, E: Asymptotic distribution of the likelihood ratio test that a mixture of two binomials is a single binomial. J. Stat. Plan. Infer. 43, 19–40 (1995).MathSciNetView ArticleMATHGoogle Scholar
 Choi, KC, Zhou, X: Large sample properties of mixture models with covariates for competing risks. J. Multivar. Anal. 82, 331–366 (2002).MathSciNetView ArticleMATHGoogle Scholar
 Davidson, AC, Hinkley, DV: Bootstrap methods and their applications. Cambridge University Press, Cambridge (1997).View ArticleGoogle Scholar
 Deb, P, Trivedi, P: Finite mixture for panels with fixed effects. J. Econ. Methods. 2, 31–35 (2013).MathSciNetMATHGoogle Scholar
 Dempster, AP, Laird, NM, Rubin, DB: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. 39(1), 1–38 (1977).MathSciNetMATHGoogle Scholar
 Douc, R, Mathias, C: Asymptotics of the maximum likelihood estimator for general hidden markov models. Bernouilli. 3, 381–420 (2001).MathSciNetView ArticleMATHGoogle Scholar
 Fink, GA: Markov Models for Pattern Recognition: From Theory to Applications. 1st ed. SpringerVerlag, New York (2007).Google Scholar
 FruhwirthSchnatter, S: Finite Mixture and Markov Switching Models. SpringerVerlag, New York (2006).MATHGoogle Scholar
 Goffinet, B, Loisel, P: Testing in normal mixture models when the proportions are known. Biometrika. 79, 842–846 (1992).MathSciNetView ArticleMATHGoogle Scholar
 González, J, Tuerlinckx, F, Boeck, PD, Cools, R: Numerical integration in logisticnormal models. Comput. Stat. Data Anal. 51, 1525–1548 (2006).MathSciNetView ArticleMATHGoogle Scholar
 Hamilton, JD: Rationalexpectations econometric analysis of changes in regime: An investigation of the term structure of interest rates. J. Econ. Dyn. Control. 12, 385–423 (1988).MathSciNetView ArticleMATHGoogle Scholar
 Hayashi, F: Tobin’s marginal q and average q: A neoclassical interpretation. Econometrica. 50, 215–224 (1982).View ArticleMATHGoogle Scholar
 Hovakimian, G, Titman, S: Coporate investment with financial constraints: Sensitivity of investment to funds from voluntary asset sales. J. Money Credit Bank. 38(2), 357–374 (2006).View ArticleGoogle Scholar
 Jäckel, P: A note on multivariate gausshermite quadrature (2005). http://www.btinternet.com/pjaeckel/ANoteOnMultivariateGaussHermiteQuadrature.pdf. Accessed 14 Nov 2009.
 Kapetanios, G: A bootstrap procedure for panel data sets with many crosssectional units. Econ. J. 11, 377–395 (2008).MathSciNetMATHGoogle Scholar
 Kiefer, NM: Discrete parameter variation: Efficient estimation of a switching regression model. Econometrica. 46, 427–434 (1978).MathSciNetView ArticleMATHGoogle Scholar
 Kim, CJ, Piger, J, Startz, R: Estimation of markov regimeswitching regression models with endogenous switching. J. Econ. 143, 263–273 (2008).MathSciNetView ArticleMATHGoogle Scholar
 Lee, Y, Nelder, JA: Hierarchical generalized linear models. J. R. Stat. Soc. 58, 619–678 (1996).MathSciNetMATHGoogle Scholar
 Louis, TA: Finding the observed information matrix when using the em algorithm. J. R. Stat. Soc. 44, 226–233 (1982).MathSciNetMATHGoogle Scholar
 MacDonald, IL, Zucchini, W: Hidden Markov and Other Models for Discretevalued Times Series. Chapman & Hall, Boca Raton (1997).MATHGoogle Scholar
 MacQueen, JB: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press, Berkeley (1967).Google Scholar
 Maddala, GS: LimitedDependent and Qualitative Variables in Econometrics. Cambridge University Press, Cambridge (1999).MATHGoogle Scholar
 Maruotti, A: Hidden Markov Models for Longitudinal Data. PhD thesis. Università degli Studi di Roma (2007).Google Scholar
 McLachlan, G, Krishman, T: The EM Algorithm and Extensions. WileyInterscience, New York (1997).Google Scholar
 McLachlan, G, Peel, D: Finite Mixture Models. 1st ed. WileyInterscience, New York (2000).View ArticleMATHGoogle Scholar
 Mundlak, Y: On the pooling of time series and cross section data. Econometrica. 46, 69–85 (1978).MathSciNetView ArticleMATHGoogle Scholar
 Rabiner, LR: A tutorial on hidden markov models and selected applications in speech recognition. In: Proceedings of the IEEE, vol. 77, pp. 257–286 (1989).Google Scholar
 Stram, DO, Lee, JW: Variance components testing in the longitudinal mixed effects model. Biometrics. 50, 1171–1177 (1994).View ArticleMATHGoogle Scholar
 Trivedi, P, Hyppolite, J: Alternative approaches for econometric analysis of panel count data using dynamic latent class models (with application to doctor visits data). Health Economics. 21, 101–128 (2012).Google Scholar
 Wooldridge, JM: Econometric Analysis of Cross Section And Panel Data. The MIT Press, Cambridge (2002).MATHGoogle Scholar
 Xiaoqiang, H, Schiantarelli, F: Investment and capital markets imperfections: A switching regression approach using u.s. firm panel data. Rev. Econ. Stat. 80(3), 466–479 (1998).View ArticleGoogle Scholar
 Zhang, S, Jin, J: Computation of special functions. WileyInterscience, New York (1996).Google Scholar