 Research
 Open Access
 Published:
Multivariate distributions of correlated binary variables generated by paircopulas
Journal of Statistical Distributions and Applications volume 8, Article number: 4 (2021)
Abstract
Correlated binary data are prevalent in a wide range of scientific disciplines, including healthcare and medicine. The generalized estimating equations (GEEs) and the multivariate probit (MP) model are two of the popular methods for analyzing such data. However, both methods have some significant drawbacks. The GEEs may not have an underlying likelihood and the MP model may fail to generate a multivariate binary distribution with specified marginals and bivariate correlations. In this paper, we study multivariate binary distributions that are based on Dvine paircopula models as a superior alternative to these methods. We elucidate the construction of these binary distributions in two and three dimensions with numerical examples. For higher dimensions, we provide a method of constructing a multidimensional binary distribution with specified marginals and equicorrelated correlation matrix. We present a reallife data analysis to illustrate the application of our results.
Introduction
In clinical trials and research studies in health care and medicine, the endpoint of the observed data most often consists of correlated binary observations. The generalized estimating equation (GEE), introduced by Liang and Zeger (1986), has been the common statistical tool for analyzing such data. However, this method has several drawbacks. One of the drawbacks is that it uses an ambiguously defined working correlation to model the dependence in the binary observations, which could lead to misleading conclusions (Sabo and Chaganty, 2010). Another drawback it is a nonlikelihood approach, in the sense, it does not have an underlying joint distribution for the correlated binary observations. Other alternatives to GEEs for the analysis of correlated binary data are Markov chains (MCs) and multivariate probit (MP) models. A contrasting study of the first order MC model and the MP model was presented by Yang and Chaganty (2014). They showed that both models are asymptotically efficient, and discussed situations where one is preferable over the other.
In recent years, due to their success in other disciplines, copulas have been used to develop likelihoodbased methods as another alternative to GEEs. Some researchers have combined copulas with MC models. Escarela et al. (2009) have used Gaussian copula to construct conditional probabilities in Markov chain models in the context of longitudinal binary data. The copulabased bivariate probit models were generalized by Winkelmann (2012) replacing the Gaussian distribution by Frank and Clayton copulas. Radice et al. (2016) introduced nonlinear regression models, where nonGaussian copulas were used to deal with the dependence between binary responses. Smith et al. (2010) showed that longitudinal continuous data can be modeled by Dvine paircopula, and later extended their work to the discrete case using a Bayesian framework in Smith and Khaled (2012). A Gaussian copula model for integervalued ARMA structured time series data with or without covariates was developed by Lennon (2016). Panagiotelis et al. (2017) introduced two algorithms for optimizing vine structure and paircopula selection for discrete regular vine copulas. One of these algorithms uses a modified Akaike information criterion and the other uses predictive scores with crossvalidation.
In this paper we study multivariate binary distributions generated by Dvine paircopula models. These models are relatively easy to implement since they use only bivariate copulas, and flexible because they allow different types of bivariate copulas to model different types of dependence in the conditional distributions. We will see that the Dvine paircopula model has some advantages over the MP model.
The organization of this paper is as follows. We first present a lucid description of the construction of bivariate and trivariate binary distributions using bivariate Gaussian, Clayton, Frank, and Gumbel copulas in “Construction of vine paircopula binary distributions” section. We discuss comparisons between paircopula models and the multivariate probit (MP) model in “Comparison of paircopula and MP models” section, together with a numerical example where the vine paircopula model overcomes the difficulties associated with the MP model. In “Extensions to four and higher dimensions” section we discuss extensions to four and higher dimensions and present a method of constructing multivariate binary distribution with specified marginals and equicorrelated structure. In “Parameter estimation” section, we discuss parameter estimation by maximum likelihood for grouped data. “Data analysis” section contains an analysis of a reallife data. We end the paper with some discussion in “Discussion” section.
Construction of vine paircopula binary distributions
In this section, we illustrate methods of constructing multivariate binary distributions with specified correlations using vine paircopula methods. The major advantage of these methods is that the multivariate distribution can be constructed using only bivariate copulas. The method is computationally feasible, flexible, and can accommodate various types of dependence because the bivariate copula need not be the same for various bivariate marginal and conditional distributions. We start first with the simplest cases, bivariate and the trivariate distributions, and then show how they can be extended to higher dimensions focusing on the special correlation structures useful in longitudinal and clustered binary data analysis.
Bivariate binary distributions
Consider first the case of two binary variables. Let Y=(Y_{1}, Y_{2}), where the subscripts 1 and 2 possibly may indicate two sequential time points. Assume that E(Y_{i})=p_{i} for i=1,2. According to the theorem of Sklar (1959), the joint CDF of Y using a copula function C is given by F(y_{1},y_{2})=C(F_{1}(y_{1}),F_{2}(y_{2})), where F_{1} and F_{2} are CDFs of the univariate binary distributions of Y_{1} and Y_{2} respectively. Following Panagiotelis et al. (2012), we can recover the joint probability mass function (PMF) of Y from the CDF as
The C(u_{1},u_{2}; θ) could be any copula C_{i},1≤i≤5, given in Table 1. The copula parameter θ is the correlation coefficient γ for the Gaussian copula, and it is α for the Clayton, Frank, and Gumbel copulas.
Selecting y_{i}=0,1 and noting that F_{i}(0)=P(Y_{i}=0)=1−p_{i}=q_{i} for i=1,2, Eq. (1) simplifies to the probabilities given in Table 2.
Given ρ=Corr(Y_{1},Y_{2}), and if we use in Table 2 the Gaussian copula C_{1}, the parameter θ=γ can be obtained by solving equation
since 1−q_{1}−q_{2}+C_{1}(q_{1},q_{2}; γ)=C_{1}(p_{1}, p_{2}; γ).
Trivariate binary distributions
In this section we extend the paircopula method to construct three dimensional binary distributions. Let Y=(Y_{1}, Y_{2}, Y_{3}) be a vector of three correlated binary random variables. Note that
The above equation shows the three dimension distribution can be obtained by constructing the bivariate conditional distribution of (Y_{1}, Y_{3}) given Y_{2}=y_{2}. To this end we introduce some notation. We first construct bivariate distributions for (Y_{1}, Y_{2}) and (Y_{2}, Y_{3}) selecting bivariate copulas C_{12}(u_{1}, u_{2}) and C_{23}(u_{1}, u_{2}) from Table 2. For notational convenience we omit the copula parameter here and in some formulas later.
Let \(q_{10}=P(Y_{1}=0Y_{2}=0)=\frac {C_{12}(q_{1}, q_{2})}{q_{2}}\) and \(p_{10}=1q_{10}=1\frac {C_{12}(q_{1}, q_{2})}{q_{2}}\). Thus Y_{1}Y_{2}=0 is distributed as Bernoulli with mean p_{10}. Similarly, Y_{3}Y_{2}=0 is distributed as Bernoulli with mean p_{30}, where \(p_{30}=1\frac {C_{23}(q_{2}, q_{3})}{q_{2}}\). We also have Y_{1}Y_{2}=1 is Bernoulli with mean \(p_{11}=1\frac {q_{1}C_{12}(q_{1}, q_{2})}{p_{2}}\), and Y_{3}Y_{2}=1 is Bernoulli with mean \(p_{31}=1\frac {q_{3}C_{23}(q_{2}, q_{3})}{p_{2}}\).
Table 3 shows the conditional distributions of (Y_{1}, Y_{3})Y_{2}=0 and (Y_{1}, Y_{3})Y_{2}=1. Finally, from Eq. (3) and using the conditonal distributions we can get the joint trivariate PMF as given in Table 4.
We give six numerical examples to see that different copulas and parameter values give rise to different trivariate binary distributions. All these six distributions have the same marginal means p_{1}=0.8,p_{2}=0.7 and p_{3}=0.6. The choice of the copulas and parameter values are summarized in Table 5 for these six cases.
We give details of the calculations only for case 5, the others are similarly done. In this case we start with Table 2 using Gaussian copula with γ_{12}=0.752,γ_{23}=0.607, and p=(0.8, 0.7, 0.6). The resulting PMFs of bivariate binary variables are in Table 6.
Now, to construct the conditional PMFs of bivariate variables we need to get the marginal parameters of the conditional Bernoulli variables Y_{1}Y_{2}=0 and Y_{1}Y_{2}=1. These are
Then, p_{10}=1−0.5057=0.4943,p_{30}=1−0.6990=0.3010,p_{11}=1−0.0690=0.931 and p_{31}=1−0.2719=0.7281. Also, Frank copula is used for the conditional distributions with parameters \(\alpha _{13Y_{2}=0}=0.95, \alpha _{13Y_{2}=1}=0.85\). The PMFs of conditional bivariate binary variables are calculated according to Table 3, resulting in the values given in Table 7.
Since P(Y_{1}=y_{1},Y_{2}=y_{2},Y_{3}=y_{3})=P(Y_{2}=y_{2})∗P(Y_{1}=y_{1},Y_{3}=y_{3}Y_{2}=y_{2}), the last step is to multiply values in Table 7 by P(Y_{2}=y_{2}) to get the joint trivariate probability. For example, P(Y_{1}=0,Y_{2}=0,Y_{3}=0)=0.3782∗0.3=0.1135 or P(Y_{1}=1, Y_{2}=1, Y_{3}=0)=0.2475∗0.7=0.1732. The three dimensional joint binary distributions for the six cases are given in Table 8.
Comparison of paircopula and MP models
In this section, we will compare the probability mass functions generated by the paircopula methods and by the multivariate probit model. We will see that even if we use bivariate Gaussian copulas, these two methods yield different probability mass functions. Furthermore, the paircopula method is successful in cases where the MP model fails to generate a probability mass function with specified univariate marginals and correlations.
Multivariate probit (MP) model
Let Y=(Y_{1},Y_{2},…,Y_{m}) be a vector of binary random variables. The multivariate probit model assumes that associated with the vector Y there is a latent vector Z=(Z_{1},Z_{2},…,Z_{m}), which is distributed as multivariate normal (MVN), such that Y_{t}=1 if Z_{t}>0, and Y_{t}=0 if Z_{t}≤0. Assume Z_{t}=μ_{t}+ε_{t}, where ε=(ε_{1},…,ε_{m}) is MVN (0, R). Then, p_{t}=P(Y_{t}=1)=P(Z_{t}>0)=P(μ_{t}+ε_{t}>0)=Φ(μ_{t}), and q_{t}=(1−p_{t})=Φ(−μ_{t}). The joint PMF of Y=(Y_{1},Y_{2},…,Y_{m}) is given by
where D_{t}=(−∞,μ_{t}) if y_{t}=1, and D_{t}=(μ_{t},∞) if y_{t}=0. For example, for m=3 we have
where Φ_{3}(ε ; R) is the CDF of trivariate standard normal with correlation matrix R.
Distributions generated by paircopula and MP models
Since the MP model relies on the Gaussian distribution, for a fair comparison we will use the Gaussian copulas for the bivariate and conditional distributions in the paircopula construction. Consider the case of two dimensions. In this case, taking C_{12} as the bivariate Gaussian copula, the PMF as given in Table 2 is P(Y_{1}=0,Y_{2}=0)=C_{12}(q_{1},q_{2};γ)=Φ_{2}(Φ^{−1}(q_{1}),Φ^{−1}(q_{2});γ)=Φ_{2}(−μ_{1},−μ_{2};γ), which is identical to the probability under the MP model. Therefore, the probability distributions are the same for two dimensions. For three dimensions for the MP model, we have
From Table 4, we see that for the paircopula model
With bivariate Gaussian copulas we have q_{i}=Φ(−μ_{i}) for i=1,2,3 and
where ε=(ε_{1}, ε_{2}, ε_{3}) is distributed as a standard trivariate normal with correlation matrix R=(γ_{ij}). The quantities q_{10} and q_{30} are the same as the corresponding values for the probit model. Taking C_{130}(u_{1},u_{2}) as bivariate Gaussian copula with correlation γ_{130}, Eq. (6) is equivalent to
Clearly the quantity (8) is not equal to (5) since the parameter γ_{130} can be any value in (−1, 1) and need not be related to R. Thus the PMF of the paircoupla model is different from the MP model.
An advantage of the paircopula method over the MP model
In this section, we give an example to show that the paircopula method is useful to construct multivariate binary distributions with specified marginals and correlation structure in cases where the multivariate probit model breaks down. Let’s assume the marginal means are given by the vector p=(0.2, 0.3, 0.2). For the equicorrelated structure the feasible range of the correlation parameter ρ is (− 0.25,0.7638), see Theorem 1 in Chaganty and Joe (2006). For the value ρ=0.76, the latent correlation matrix obtained by solving Eq. (2), for all pairs is
which is not positive definite and thus the MP method does not give a PMF for the binary variables. However, the paircopula method generates a PMF for the binary variables with specified marginal means and equicorrelated structure. The input values needed to calculate the PMF are listed in Table 9.
Proceeding as in “Trivariate binary distributions” section, the resulting three dimensional distribution is given in Table 10. We can check that this distribution has the specified marginal means p_{1}=0.2,p_{2}=0.3, and p_{3}=0.2, and equicorrelated structure with ρ=0.76.
Extensions to four and higher dimensions
The paircopula method for three dimensions described in “Trivariate binary distributions” section can be extended to construct four or higherdimensional multivariate binary distributions. The foundations of these higher dimensional extensions have been laid out in Joe (1996; 1997). In a pioneering work, Bedford and Cooke (2002) showed how to use graphical models consisting of vines with trees and edges. The edges in a given tree become the nodes of the next tree. The vine structures not only help in enumerating and organizing numerous decompositions of a multivariate distribution but also facilitate models for different types of dependence for the marginal and conditional distributions. In recent years several articles have been published in the literature on vine paircopula models, see Kurowicka and Joe (2011). The papers by Min and Czado (2010), Gruber and Czado (2015), and Dalla Valle et al. (2018) discuss Bayesian inference for these vine paircopula models. The lecture notes by Czado (2019) and Brechmann and Schepsmeier (2013) discusses the practical implementation of vine copulas using the R software. The two most popular vines are the canonical Cvine and the drawable Dvine, see Czado (2019) and Joe (2014). An application of the Cvine for analyzing familial data is in Deng and Chaganty (2021). In this paper, we focus on the Dvine which is a natural candidate for analyzing longitudinal data which consists of an ordered sequence of variables.
Figure 1 shows the nested tree structure of the Dvine for m variables. There are (m−1) trees and for the ith tree there are m−i+1 nodes, represented by rectangular boxes. In the case of m=4, the Dvine consists of 3 trees. For the paircopula construction we will need bivariate copulas for the pairs (12), (23), and (34) in tree 1. Since we are dealing with binary variables, for tree 2 we will need two bivariate copulas for constructing the conditional distribution of (132) and another two for constructing (243). The final tree requires the construction of the conditional distribution of (1423), which in turn requires four bivariate copulas for the four possible values of the conditioned variables 2 and 3. The joint PMF in four dimensions is given by
The probability P(Y_{1}=y_{1},Y_{4}=y_{4}Y_{2}=y_{2},Y_{3}=y_{3}) requires p_{100}=P(Y_{1}=1Y_{2}=0,Y_{3}=0),p_{101}=P(Y_{1}=1Y_{2}=0,Y_{3}=1),…, p_{411}=P(Y_{4}=1Y_{2}=1,Y_{3}=1), which can be obtained from the bivariate and trivariate distributions constructed as in “Construction of vine paircopula binary distributions” section.
Binary distributions with structured correlation matrices
To allow parsimonious modeling, multivariate binary distributions with structured correlation matrices are normally employed in the analysis of longitudinal or clustered binary data. The two most popular structured correlation matrices are autoregressive of order one (AR(1)) and equicorrelated. Yang and Chaganty (2014) have outlined a method of constructing a multivariate binary distribution with AR(1) structure, and here we focus on the equicorrelated structure.
In “An advantage of the paircopula method over the MP model” section, we gave an example of a threedimensional binary distribution with specified marginals and equicorrelated structure. The paircopula method with bivariate Gaussian copulas can be used to generate higherdimensional multivariate binary distributions with specified marginals and equicorrelated structure. This requires specification of the partial correlations Corr(Y_{1},Y_{3}Y_{2}),Corr(Y_{2},Y_{4}Y_{3}),..., Corr(Y_{m−2},Y_{m}Y_{m−1}) for tree 2; Corr(Y_{1},Y_{4}Y_{2},Y_{3}),..., Corr(Y_{m−3},Y_{m}Y_{m−2},Y_{m−1}) for tree 3;....; Corr(Y_{1},Y_{m}Y_{2},...Y_{m−1}) for tree m−1. For binary variables, these partial correlations depend on the values of the conditional variables. To simplify matters we set Corr(Y_{i},Y_{i+k}Y_{i+1},...Y_{i+k−1})=ρ/(1+(k−1)ρ). The motivation for this assumption comes from the result that for equicorrelated structure, partial correlation ρ_{i,i+ki+1,…,i+k−1} equals ρ/(1+(k−1)ρ) as shown in the Appendix. The corresponding parameter γ_{i,i+ki+1,…,i+k−1} of the bivariate Gaussian copula can be obtained by solving Eq. (2) using the two conditional probabilities p_{ii+1,…,i+k−1} and p_{i+ki+1,…,i+k−1}. In the next section we give a numerical example to illustrate this method for dimension m=4.
Numerical example of equicorrelated binary distribution
Assuming the marginal means are p=(0.26, 0.36, 0.25, 0.24), the feasible range of the correlation parameter ρ is (−0.3244,0.7492) for the equicorrelated structure. Let ρ=0.4, the distribution is calculated and presented in Table 12 using input values from Table 11.
We can check the marginal means of the distribution in Table 12 are P(Y_{1}=1)=0.26,P(Y_{2}=1)=0.36,P(Y_{3}=1)=0.25,P(Y_{4}=1)=0.24, and further the distribution has an equicorrelated structure with ρ=0.4.
Parameter estimation
In this section, we discuss estimation of the parameters via maximum likelihood estimation (MLE) for the Dvine paircopula model with bivariate Gaussian distributions. Suppose that there are n independent subjects, and there are m repeated binary observations on each subject. Thus the data consists of binary vectors y_{i}=(y_{i1},y_{i2},⋯,y_{im}) of dimension m. Let p_{j} be the marginal probability of y_{ij} assumed to be the same for all i. There are 2^{m} possible combinations for y_{i}. For instance, when m=4, we have 16 combinations, that is, y_{i}=(0,0,0,0), or (0,0,0,1), or (0,0,1,0),⋯, or (1,1,1,1). The n observations can be grouped into 2^{m} counts. Assume the number of (0,⋯,0) vectors is n_{1}, the number of (0,⋯,1) is n_{2}, so on and so forth, the number of (1,⋯,1) is \(\phantom {\dot {i}\!}n_{2^{m}}\). Using these notations, the loglikelihood, ℓ(θ), for Dvine paircopula model for a sample of n independent observations is given by
where the parameter θ consists of marginal probabilities and copula parameters that are functions of correlations between the binary variables. Take the two dimensional example shown in Table 2 for instance, the loglikelihood is
where C_{1} is the bivariate Gaussian copula. The maximum likelihood estimates of the parameters are obtained by maximizing (10) using the optimization routine “LBFGSB” by Byrd et al. (1995) which allows box constraints. The standard errors of the parameters are obtained from the Hessian matrix at optimized values using “Richardson” method of the function “Hessian” in the R package “numDeriv” by Gilbert and Varadhan (2012).
Data analysis
Here we present a reallife data analysis to illustrate the application of the Dvine paircopula with bivariate Gaussian distributions. We also compare the results with the MP model and the model that ignores the correlation between the variables.
Drug response data
This data was first reported by Grizzle et al. (1969). Here 46 subjects were treated with three drugs 1, 2 and 3, and recorded their response as 0 for unfavorable or 1 for favorable. For example, (0, 0, 0) stands for unfavorable responses for all the three drugs. We assume the three binary responses are equicorrelated with correlation parameter ρ. The maximum likelihood estimates (MLE) of the marginal probabilities p_{1},p_{2} and p_{3} and ρ together with standard errors (SE) are presented in Table 13.
The estimate of ρ is close to zero both for the MP and Dvine paircopula models. The estimates and standard errors of Dvine independent copula model are listed at the last two columns of Table 13. The DVine independent copula model has the minimum AIC and seems to be a good choice for this data.
Discussion
In recent years vine paircopula models have become popular for analyzing dependent multivariate data. However, understanding and using these models for discrete in particular for binary data can pose as a challenge to the practitioner. In this paper, we have illustrated the paircopula construction of binary distributions in the case of two and three dimensions that make it easy for the practitioner. In three dimensions using bivariate Gaussian copula, we have shown that the probability mass function generated by the paircopula differs from the mass function of the multivariate probit (MP) model. We gave a numerical example where the MP model fails but one is able to use the paircopula method to generate mass function with specified marginals and correlations. For four and higher dimensions we provide a method of constructing a multivariate binary distribution with specified marginals and equicorrelated structure using the Dvine paircopula method. We discussed the maximum likelihood estimation of the parameters for grouped multivariate binary data and provided a reallife data analysis. Future work involves including covariates in these models.
Appendix
Consider the equicorrelated structure given by \(R=(1\rho)I_{m}+\rho \, e_{m}\,e_{m}^{T}\), with parameter ρ. Here I_{m} is the identity matrix of dimension m and e_{m} is a m×1 column vector of ones. From formula (2.19), page 40 in Joe (2014), we have the partial correlation is given by
where \(R_{11}=(1\rho)I_{m2}+\rho \, e_{m2}\,e_{m2}^{T}\). Using the formula in Example 4.1 of Chaganty (1997), we have
Since \(e_{m2}^{T}\,e_{m2}=(m2)\) we get
Substituting (13) in (12) and simplfying we get
The constant (m−2) in the denominator of (14) represents the number of conditional variables. More generally, for the equicorrelated structure the partial correlation ρ_{i,i+ki+1,…,i+k−1}=ρ/(1+(k−1)ρ) for any 1≤i≤(m−k),1≤k≤(m−1).
Availability of data and materials
Interested readers can contact the first author.
Abbreviations
 AIC:

Akaike information criterion
 AR(1):

Autoregressive of order one
 CDF:

Cumulative distribution function
 GEE:

Generalized estimating equations
 MC:

Markov chains
 MLE:

Maximum likelihood estimation
 MP:

Multivariate probit
 MVN:

Multivariate normal
 PMF:

Probability mass function
 SE:

Standard error
References
Bedford, T., Cooke, R. M.: Vines–a new graphical model for dependent random variables. Ann. Stat. 30(4), 1031–1068 (2002).
Brechmann, E., Schepsmeier, U.: Modeling dependence with c and dvine copulas: the r package cdvine. J. Stat. Softw., 52 (2013). https://doi.org/10.18637/jss.v052.i03.
Byrd, R. H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190–1208 (1995).
Chaganty, N. R.: An alternative approach to the analysis of longitudinal data via generalized estimating equations. J. Stat. Plan. Inf. 63(1), 39–54 (1997).
Chaganty, N. R., Joe, H.: Range of correlation matrices for dependent bernoulli random variables. Biometrika. 93(1), 197–206 (2006).
Czado, C.: Analyzing Dependent Data with Vine Copulas: A Practical Guide With R. Springer International Publishing, Lecture Notes in Statistics (2019).
Dalla Valle, L., Leisen, F., Rossini, L.: Bayesian nonparametric conditional copula estimation of twin data. J. Roy. Stat. Soc. C: Appl. Stat. 67, 523–548 (2018).
Deng, Y., Chaganty, N. R.: Paircopula models for analyzing family data. J. Stat. Theory Pract. 15(1), 13 (2021).
Escarela, G., PerezRuiz, L. C., Bowater, R. J.: A copulabased markov chain model for the analysis of binary longitudinal data. J. Appl. Stat. 36(6), 647–657 (2009).
Gilbert, P., Varadhan, R.: numDeriv: Accurate Numerical Derivatives. R Package (2012). http://CRAN.Rproject.org/package=numDeriv.
Grizzle, J. E., Starmer, C. F., Koch, G. G.: Analysis of categorical data by linear models. Biometrics, 489–504 (1969).
Gruber, L., Czado, C.: Sequential bayesian model selection of regular vine copulas. Bayesian Anal. 10, 937–963 (2015).
Joe, H.: Families of mvariate distributions with given margins and m(m−1)/2 bivariate dependence parameters. Lecture Notes–Monograph Series, vol. 28. Institute of Mathematical Statistics, Hayward (1996).
Joe, H.: Multivariate Models and Multivariate Dependence Concepts. Chapman & Hall/CRC, London (1997).
Joe, H.: Dependence modeling with copulas. Chapman and Hall/CRC, London (2014).
Kurowicka, D., Joe, H.: Dependence modeling: vine copula handbook. World scientific, Singapore (2011).
Lennon, H.: Gaussian copula modelling for integervalued time series. PhD thesis. The University of Manchester (United Kingdom) (2016).
Liang, K. Y., Zeger, S. L.: Longitudinal data analysis using generalized linear models. Biometrika. 73(1), 13–22 (1986).
Min, A., Czado, C.: Bayesian inference for multivariate copulas using paircopula constructions. J. Financ. Econ. 8, 511–546 (2010).
Panagiotelis, A., Czado, C., Joe, H.: Pair copula constructions for multivariate discrete data. J. Am. Stat. Assoc. 107(499), 1063–1072 (2012).
Panagiotelis, A., Czado, C., Joe, H., Stöber, J.: Model selection for discrete regular vine copulas. Comput. Stat. Data Anal. 106, 138–152 (2017).
Radice, R., Marra, G., Wojtyś, M.: Copula regression spline models for binary outcomes. Stat. Comput. 26(5), 981–995 (2016).
Sabo, R. T., Chaganty, N. R.: What can go wrong when ignoring correlation bounds in the use of generalized estimating equations. Stat. Med. 29(24), 2501–2507 (2010).
Sklar, M.: Fonctions de repartition an dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris. 8, 229–231 (1959).
Smith, M., Min, A., Almeida, C., Czado, C.: Modeling longitudinal data using a paircopula decomposition of serial dependence. J. Am. Stat. Assoc. 105(492), 1467–1479 (2010).
Smith, M. S., Khaled, M. A.: Estimation of copula models with discrete margins via bayesian data augmentation. J. Am. Stat. Assoc. 107(497), 290–303 (2012).
Winkelmann, R.: Copula bivariate probit models: with an application to medical expenditures. Health Econ. 21(12), 1444–1455 (2012).
Yang, W., Chaganty, N. R.: A contrasting study of likelihood methods for the analysis of longitudinal binary data. Commun. Stat. Theory Methods. 43(14), 3027–3046 (2014).
Acknowledgements
We thank the associate editor and two referees whose constructive comments on an earlier version resulted in an improved presentation.
Funding
There is no funding support for the research work.
Author information
Authors and Affiliations
Contributions
All authors have contributed equally to the work. The author(s) read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lin, H., Chaganty, N.R. Multivariate distributions of correlated binary variables generated by paircopulas. J Stat Distrib App 8, 4 (2021). https://doi.org/10.1186/s4048802100118z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s4048802100118z
Keywords
 Dvine
 Mutivariate binary distributions
 Multivariate probit model
 Paircopulas