- Research
- Open Access
- Published:

# Mean and variance of ratios of proportions from categories of a multinomial distribution

*Journal of Statistical Distributions and Applications*
**volume 5**, Article number: 2 (2018)

## Abstract

Ratio distribution is a probability distribution representing the ratio of two random variables, each usually having a known distribution. Currently, there are results when the random variables in the ratio follow (not necessarily the same) Gaussian, Cauchy, binomial or uniform distributions. In this paper we consider a case, where the random variables in the ratio are joint binomial components of a multinomial distribution. We derived formulae for mean and variance of this ratio distribution using a simple Taylor-series approach and also a more complex approach which uses a slight modification of the original ratio. We showed that the more complex approach yields better results with simulated data. The presented results can be directly applied in the computation of confidence intervals for ratios of multinomial proportions.

**AMS Subject Classification:** 62E20

## Introduction

Combinations of random variables (e.g., sums, products, ratios) regularly occur in many scientific areas. Particularly useful is the ratio of two random variables. For example, plant scientists use the ratio of leaf area to total plant weight (leaf area ratio) in the plant growth analysis (Poorter and Garnier 1996), and geneticists use the ratio of total genetic diversity distributed among populations to total genetic diversity in the pooled populations as a measure of population differentiation (Culley et al. 2002). The ratio of two fluorescent signals has several applications in fluorescence microscopy, e.g., estimating the DNA sequence copy number as a function of chromosomal location (Piper et al. 1995), and there are many (dimensionless) ratios employed in engineering (Mekic et al. 2012). In case of categorical data (i.e., from a binomial or multinomial distribution), there are numerous applications of ratios as well in consumer preference studies, election poll results, quality control, epidemiology, and so on.

Formally, a ratio distribution is a probability distribution constructed as the distribution of the ratio of two random variables, each having another (known) distribution. More particularly, given two random variables *Y*_{1} and *Y*_{2}, the distribution of the random variable *Z* that is formed as the ratio *Z*=*Y*_{1}/*Y*_{2} is a ratio distribution. When using ratio distributions for theoretical and practical purposes, it is helpful to know its mean and variance, preferably in a computationally efficient form. In the case that *Y*_{1} and *Y*_{2} follow normal distributions, and \(\mu _{Y_{2}}=0\), *Z* is known as Cauchy distribution (Geary 1930; Fieller 1932; Hinkley 1969; Korhonen and Narula 1989; Marsaglia 2006). Other authors have addressed ratios of binomial proportions (also known as relative risk) (Koopman 1984; Bonett and Price 2006; Price and Bonett 2008), ratios of uniform distributions (Sakamoto 1943), Student’s *t* distributions (Press 1969), Weibull and gamma distributions (Basu and Lochner 1971; Provost 1989; Nadarajah and Kotz 2006), beta distributions (Pham-Gia 2000), Laplace and Bessel distributions (Nadarajah 2005; Nadarajah and Kotz 2005) and others. General notes on the product and ratio of two (not necessarily normal) random variables can also be found in (Frishman 1971; Van Kempen and Van Vliet 2000).

In our paper, we consider a ratio involving two or more random variables that jointly have a multinomial distribution. This situation is similar to relative risk or risk ratio which is the ratio of the probability of an event occurring (for example, developing a disease or being injured) in an exposed group to the probability of the event occurring in a comparison, non-exposed group. However, while the probabilities in the risk ratio are independent (in the sense that they describe two independent events in two independent groups), in our case, the probabilities are tied together through the covariance between multinomial categories. These ratios serve as a common framework for opinion polls, statistical quality control, and consumer preference studies. Confidence intervals for the odds ratio, which can be easily calculated, if the standard deviation is known, are especially important for applications. Nelson (1972) presented estimates, confidence intervals, and hypothesis tests for the odds ratio in trinomial distributions. Piegorsch and Richwine (2001) examined some types of confidence intervals in the context of analysis of genetic mutant spectra. Quesenberry and Hurst (1964) and Goodman (1965) explored methods for obtaining a set of simultaneous confidence intervals for the probabilities of a multinomial distribution. A comparison of performance of various confidence intervals also appeared in Alghamdi (2015); Aho and Bowyer (2015). To the best of our knowledge, however, there has been no analytical treatment of the ratio of multinomial proportions including derivations for formulae for the mean and variance of such a ratio.

A ratio between two or more random variables that jointly have a multinomial distribution also arises in the trending field of the non-invasive prenatal testing of common fetal aneuploidies such as trisomy of the 13^{th}, 18^{th} or 21^{st} chromosome (Chiu et al. 2008; Sehnert et al. 2011; Lau et al. 2012; Minarik et al. 2015). We are currently working on implementation of this model into laboratory practice, and this paper represents a mathematical background of our work. In this paper, we discuss two solutions to the problem of mean and variance of the said ratio. More particularly, we derive asymptotic formulae for the mean and variance of the random variable *Z*=*Y*_{1}/*Y*_{2}, where \(Y_{1}=\sum _{k\in I} X_{k}\) and \(Y_{2}=\sum _{k\in J} X_{k}\), *I,J*⊂{1,...,*r*} and *I*∩*J*=*∅*, are sums of random variables *X*_{1},...,*X*_{
r
} which together have a joint multinomial distribution.

## Solution by Taylor series

There is a simple solution to the mean and variance of the ratio of multinomial proportions that can be derived by using the Taylor series. Formally, let a set of random variables *X*_{1},...,*X*_{
r
} have a probability function

where *x*_{
i
} are non-negative integers such that \(\sum x_{i} = n\) and *p*_{
i
} are constants with *p*_{
i
}>0 and \(\sum p_{i}=1\). The joint distribution of *X*_{1},...,*X*_{
r
} is known as multinomial distribution. Let *u,v*∈{0,1}^{r} be two binary vectors such that \(\sum u_{i}>0\), \(\sum v_{i}>0\) and *u*_{
i
}*v*_{
i
}=0 for all *i*. We define

where · represents a scalar product and *X*=(*X*_{1},...,*X*_{
r
}). Without loss of generality, we will restrict our explorations to *r*=3 and *Z*_{0}=*X*_{1}/*X*_{2}. This holds because the choice vectors *u,v* have no common *X*_{
i
}; thus, the *X*_{
i
}s can be grouped to three disjoint sets: 1) *X*_{
i
}s selected by *u*, 2) *X*_{
i
}s selected by *v*, and 3) all others.

Also, the reader will note that the ratio *Z*_{0}=*X*_{1}/*X*_{2} can be viewed as a ratio of absolute quantities as well as a ratio of fractions or probabilities because *Z*_{0}=(*X*_{1}/*n*)/(*X*_{2}/*n*).

Before we proceed any further, observe that because of the possible zero in the denominator of *Z*_{0}, there is no analytical solution to the mean and variance of the ratio *Z*_{0}. A workaround for this problem is to rewrite this ratio using a function that does not have a singularity. Let *Z*_{0}=*f*(*X*_{1},*X*_{2})=*X*_{1}/*X*_{2} be a function of two random variables. Then, with \(\mu =\left (\mu _{X_{1}}, \mu _{X_{2}}\right)\), we can use the Taylor series to approximate the function *f* as

from which we have

Since *X*_{1} and *X*_{2} are terms of a random vector *X*=(*X*_{1},*X*_{2},*X*_{3}) drawn from the multinomial distribution given by (*n,p*_{1},*p*_{2},*p*_{3}), we have \(\mu _{X_{i}} = np_{i}\) and \(\sigma _{X_{i}}^{2}=np_{i}(1-p_{i})\) for *i*=1,2, and \(\sigma _{X_{1},X_{2}} = -np_{1}p_{2}\). It follows easily that

For variance, we use a simpler approximation of *f*

from which we have

and finally

## Solution by a modified ratio

### 3.1 Definition

Let the symbols *X*, *u*, and *v* have the same meaning as in Section 2. We define a new random variable *Z*_{1} as

The + 1 in the above definition serves to avoid zero in the denominator, and thus solves the problem with the singularity of *Z*_{0}. For the same reasons as in Section 2, we will restrict our explorations to *k*=3 and *Z*_{1}=*X*_{1}/(*X*_{2}+1).

### 3.2 Sample space

The sample space \(S_{Z_{1}}\subseteq \mathbb {Q}\) of the random variable *Z*_{1} is limited by the sample space *S*_{
X
} of the multinomially distributed random vector *X*=(*X*_{1},*X*_{2},*X*_{3}). Therefore, if *X* assumes values from the multinomial distribution given by (*n,p*_{1},*p*_{2},*p*_{3}), then *Z*_{1} cannot assume all rational values *a*/(*b*+1) for some \(a, b\in \mathbb {N}\), but only those that satisfy *a*+*b*≤*n* and *a,b*≥0. Furthermore, values 2/2 and 4/4 are considered identical; therefore, different outcomes of random vector *X* may correspond with the same outcome of *Z*_{1}. In other words, each instance (*a,b,c*) of *X* corresponds with exactly one instance *a*/(*b*+1) of *Z*_{1}, while an instance of *Z*_{1} may correspond with multiple instances of *X*.

Naturally, the probability of a particular value of *Z*_{1} can be determined by summing the probabilities of all (multinomial) vectors that are associated with this value. From this, it follows that if the initial multinomial probability distribution function of random vector *X* is

then the probability distribution function of random variable *Z*_{1} is

which can be rewritten as

### 3.3 Mean and variance

Now we can state the mean and variance of *Z*_{1}. The proofs of the statements can be found in the Appendix.

###
**Theorem 1**

Let *X*=(*X*_{1},*X*_{2},*X*_{3}) be a random vector from the multinomial distribution given by (*n,p*_{1},*p*_{2},*p*_{3}). The expected value of the random variable *Z*_{1}, given by (5), is

###
**Theorem 2**

Let *X*=(*X*_{1},*X*_{2},*X*_{3}) be a random vector from the multinomial distribution given by (*n,p*_{1},*p*_{2},*p*_{3}), where

for some natural non-zero *N*. The variance of the random variable *Z*_{1}, given by (5), is

###
**Corollary 1**

For *N*=1 we have for the variance from Theorem 2

Observe that the formula for the variance is asymptotic in nature, and thus it may not work well for small *n* and certain configurations of *p*_{1}, *p*_{2} and *p*_{3}. See Section 5 for more details.

## Approximate error of solution by a modified ratio

Let

be a function of two random variables expressing the difference between *Z*_{0} and *Z*_{1}. Analogous to the Eqs. (1)–(4) from Section 2 and with *f*(*X*_{1},*X*_{2})=*X*_{1}/[*X*_{2}(*X*_{2}+1)], we have for the mean and variance of *Err*

It follows from the Eqs. (6) and (7) that *Z*_{1} is an asymptotically (*n*→*∞*) unbiased estimator of the ratio of multinomial proportions *Z*_{0}. Moreover, the Eqs. (6) and (7) can be used to correct the mean and variance of the modified ratio *Z*_{1} to better reflect the mean and variance of the original ratio *Z*_{0}. Let \(Z_{1}^{cor} = Z_{1} + Err\) be a new random variable. Since the expected value is linear, we have directly

For the variance, we have

where

To approximate the value of *E*(*Z*_{1}·*Err*), we use the Taylor series again, particularly Eq. (1). After some rearrangement, we get

Thus, we can now easily calculate the value of \(var\left (Z_{1}^{cor}\right)\) (equation omitted due to its length). In the next section, we shall discuss numerical simulations and performance of the presented formulae.

## Numerical simulations

Numerical simulations were performed in the following way. We selected several multinomial distributions given by (*n,p*_{1},*p*_{2},*p*_{3}) and for each such distribution, we sampled 10^{5} random vectors (*X*_{1},*X*_{2},*X*_{3}). Vectors with *X*_{2}=0 were counted (variable *zeros*) and omitted from further calculations; that is, they were not replaced by new random vectors. For the vectors with *X*_{2}≠0, we calculated the ratios *Z*_{0}=*X*_{1}/*X*_{2}, while the ratios *Z*_{1}=*X*_{1}/(*X*_{2}+1) were calculated from all 10^{5} sampled vectors. Thus, we obtained 10^{5}−*zeros* values of *Z*_{0} and 10^{5} values of *Z*_{1}. From both sets we calculated the mean and variance of the sampled data. We compared these values with the predictions as follows below.

For the mean, we compared the means of the two data sets with the Taylor-series solution given by Eq. (2), and with the modified ratio (MR) solution given by Theorem 1 with and without the correction given by the Eq. (6).

For the variance, we compared the variances of the two data sets with the Taylor-series solution given by Eq. (4), and with the modified ratio solution given by Theorem 2 with and without the correction (the final formula for corrected variance of the modified ratio was omitted due to its length, but see Section 4 for calculation details). Note that for variance given by Theorem 2, we considered the case *N*=5 so that its error *O*(1/*n*^{6}) would not interfere with the correction.

Figure 1 shows the simulation results for the multinomial distribution given by (*n*=10,…,50,*p*_{1}=0.25,*p*_{2}=0.5,*p*_{3}=0.25). The corrected modified ratio gives the best model of the mean and variance of *Z*_{0}. Observe also that the uncorrected modified ratio is a very precise model of *Z*_{1}.

In Fig. 2, when *p*_{2} and *n* are small, the discrepancy between the models and the data gets larger, although the corrected modified ratio still outperforms the Taylor-series approach. The uncorrected modified ratio is also a very good model of *Z*_{1}.

Figures 3 and 4 further explore the limits of the presented models. In Fig. 3, we compared the performance of the variance models in three multinomial distributions (with decreasing value of *p*_{2}) for various values of *N* from Theorem 2. Note that with growing *N*, there also grows the minimal value of *n* for which the Theorem 2 holds; therefore, the variance models start from a different *n*. It will be observed that all models have difficulty describing the initial part of the variance curve of the simulated data. However, one should keep in mind that the formula in Theorem 2 is only asymptotic.

In Fig. 4, we compared the models for mean on the same data as in Fig. 3. Again, for small values of *n*, the models fail to capture the real trend of the data. On a side note, the data for *Z*_{1} are very well described by the uncorrected modified ratio model from Theorem 1.

The supplemental material contains a script (Additional file 1) to generate similar plots for the user-specified multinomial distribution (*n,p*_{1},*p*_{2},*p*_{3}) and a range of *n*. Given the results from the simulation data, we encourage the reader to use this script and check whether the formulae presented in the paper will provide for a good approximation of *Z*_{0} for his/hers particular multinomial distribution.

## Appendix

### Proof of Theorem 1

###
**Lemma 1**

Let \(n\in \mathbb {N}\) and \(R\in \mathbb {R}\). Then it holds

###
*Proof*

From \(\left ({n \atop k}\right)=\frac {n}{k}\left ({n-1 \atop k-1}\right)\) it directly follows that

□

### Proof of Theorem 1

From the definition of the expected value we have

where \(S_{Z_{1}}\) is a sample space of *Z*_{1}. By using

from Section 3.2, we can write

Furthermore, because \(\sum _{b=0}^{n}\sum _{a=0}^{n-b}\) enumerates all possible values of a random vector (*X*_{1},*X*_{2},*X*_{3})=(*a,b,n*−*a*−*b*) for the given *n*, it also enumerates all values of *Z*_{1} including their multiplicities (see Section 3.2). Thus, we can simplify the expression of *E*(*Z*_{1}) into

We rewrite this expression to separate the sums, thus obtaining

Using Lemma 1, we have for (8)

By putting this back to *E*(*Z*_{1}) and after some rearrangement of the terms, we get

We continue by splitting the following fraction into two terms

By this, the sum in (9) splits into two parts

where

With \(\left ({n \atop b}\right)\frac {n+1}{b+1}=\left ({n+1 \atop b+1}\right)\) and some rearrangement of the terms, we obtain

and a straightforward calculation of *B* yields

Finally, after putting *A* and *B* together, we get

□

### Proof of Theorem 2

The proof of Theorem 2 relies on a series of lemmas and corollaries. For a better navigation through the proof, see Fig. 5 for the proof scheme.

###
**Lemma 2**

Let \(n\in \mathbb {N}\) and \(R\in \mathbb {R}\). Then it holds

###
*Proof*

From \(\left ({n \atop k}\right)=\frac {n}{k}\left ({n-1 \atop k-1}\right)\) and Lemma 1 it follows that

□

###
**Lemma 3**

Let \(n\in \mathbb {N}\) and \(R\in \mathbb {R}\backslash \{0\}\). Then, for any \(n\in \mathbb {N}\) it holds

where

###
*Proof*

By induction on *N*. Let *N*=0. Then, it follows

By using \(\frac {n+1}{k+1}\left ({n \atop k}\right)=\left ({n+1 \atop k+1}\right)\) and the binomial theorem, we can write

The base of the induction holds. Assume that the lemma holds up to some natural *N*. We prove that it holds for *N*+1 as well. Consider the term *A*_{2N+1}. We have

where

Furthermore, by the same trick with the binomial coefficient as above, we rewrite the terms *X*_{1} and *X*_{2} as

After some rearrangement, we finally get (again using the binomial theorem)

□

###
**Remark 1**

We will often use Lemma 3 with *n*+1 instead of *n*. Therefore, we restate the Lemma 3 with this change. Let \(n\in \mathbb {N}\) and \(R\in \mathbb {R}\backslash \{0\}\). Then, for any \(n\in \mathbb {N}\) it holds

where

###
**Lemma 4**

Let *p*_{1},*p*_{2}∈(0,1)be some real constants. Let *k,n* be some non-zero natural numbers. Let *A*_{2k+1} be the term from Remark 1. Furthermore, let *R*=*p*_{2}/(1−*p*_{2}), and let

Then, for *α*∈[1,*k*+2], it holds

###
*Proof*

First of all, for *α*∈[1,*k*+2] we have

This follows easily by applying the inequality

to the term *A*_{2k+1} from Remark 1, which holds for any natural *b,k* except for pairs *b*=*k*+1 (in our case *b*>*k*+1). We can see this by solving the inequality

for *x*. By this, we get an upper and lower bound on the term *A*_{2k+1}, which differ by a multiplicative constant *k*+2. Finally, the lemma follows by extending the summation through index *b* in the term *A*_{2k+1} to a full range from 0 to *n*+*k*+3, by applying the binomial theorem and some simple rearrangement of the terms. The *O* bound follows from the fact that \(\left ({n \atop k}\right)\geq \left (\frac {n}{k}\right)^{k}\). □

###
**Lemma 5**

Let *p*_{1},*p*_{2}∈(0,1)be some real constants. Let *k,n* be some non-zero natural numbers. Let *A*_{2k} be the term from Remark 1. Furthermore, let *R*=*p*_{2}/(1−*p*_{2}), and let

Then, it holds

###
*Proof*

The lemma follows easily by a straightforward multiplication of the terms *A*, *D* and *A*_{2k}, and some rearrangement of the terms. □

The following lemma is an extension of one borrowed from Graham et al. (1994).

###
**Lemma 6**

Let 0<*α*<*R*/(1+*R*)for some real *R*>0. Then, it holds

where *m*=⌊*α**n*⌋ and

###
*Proof*

First of all, we have

Let *m*=⌊*α**n*⌋=*α**n*−*ε*. It holds

because

which follows from *α*<*R*/(1+*R*). Thus,

By Stirling’s approximation, we have

and the lemma follows. □

###
**Lemma 7**

Let *p*_{1},*p*_{2}∈(0,1)be some real constants. Let *k,n* be some non-zero natural numbers such that

Let *B*_{2k} be the term from Remark 1. Furthermore, let *R*=*p*_{2}/(1−*p*_{2}), and let

Then, it holds

###
*Proof*

Let *α*=(*k*+1)/(*n*+*k*+2). One can easily verify that *α*<*R*/(1+*R*)=*p*_{2} because of the choice of *n*. Thus, we can apply Lemma 6 to the sum from the term *B*_{2k}. From this, it follows that

where

Moreover, for *H*(*α*) we have

which follows from

Plunging this into (10), we get

With this, we can write for the whole *B*_{2k} term from Remark 1

because \(\left (\frac {n}{k}\right)^{k}\leq \left ({n \atop k}\right)\). Similarly, with \(\left ({n \atop k}\right)<\left (\frac {ne}{k}\right)^{k}\), we have for *B*_{2k}

if we use

Thus, we have

and the lemma easily follows by multiplying *B*_{2k} with the term *AD*. □

###
**Corollary 2**

Let *p*_{1},*p*_{2}∈(0,1)be some real constants. Let *n,N* be some non-zero natural numbers such that

Let *A*_{2k},*B*_{2k}, *k*=0,...,*N*, and *A*_{2N+1} be terms from Remark 1. Furthermore, let *R*=*p*_{2}/(1−*p*_{2}), and let

Then, it holds

###
*Proof*

Follows from Lemmas 4, 5 and 7. □

###
**Lemma 8**

Let *p*_{1},*p*_{2}∈(0,1) be some real constants and *n* some non-zero natural number. Let

Then, it holds

###
*Proof*

Straightforward by binomial theorem. □

###
**Lemma 9**

Let *p*_{1},*p*_{2}∈(0,1) be some real constants and *n* some non-zero natural number. Let

Then, it holds

###
*Proof*

Straightforward by Lemma 1 and binomial theorem. □

### Proof of Theorem 2

The variance of the random variable *Z*_{1} can be calculated as

By Theorem 1, we have

So, we only need to determine the value of \(E\left (Z_{1}^{2}\right)\). From the definition of the expected value, we have

where

By application of Lemma 2 to *V*_{2}, we obtain

By using the equality

and adjustment of the summation borders, we get

Next, we split the term *W* according to powers of *b*, thus obtaining

where

If we set

then we can write

where

and by Corollary 2 (*S*_{1}) and Lemmas 8 (*S*_{2}) and 9 (*S*_{3}) we get

The rest of the proof follows from adding the term −*E*^{2}(*Z*_{1}) to the derived expression for \(E\left (Z_{1}^{2}\right)\), separating the term for *k*=0 from the rest of the sum, and simple rearrangement of the resulting terms. □

## References

Aho, K, Bowyer, RT: Confidence intervals for ratios of proportions: implications for selection ratios. Methods Ecol. Evol. 6(2), 121–132 (2015).

Alghamdi, N: Confidence intervals for ratios of multinomial proportions (2015). Master’s thesis, University if Nebraska at Omaha.

Basu, A, Lochner, RH: On the distribution of the ratio of two random variables having generalized life distributions. Technometrics. 13(2), 281–287 (1971).

Bonett, DG, Price, RM: Confidence intervals for a ratio of binomial proportions based on paired data. Stat. Med. 25(17), 3039–3047 (2006).

Chiu, RW, Chan, KA, Gao, Y, Lau, VY, Zheng, W, Leung, TY, Foo, CH, Xie, B, Tsui, NB, Lun, FM, et al: Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic sequencing of dna in maternal plasma. Proc. Natl. Acad. Sci. 105(51), 20458–20463 (2008).

Culley, TM, Wallace, LE, Gengler-Nowak, KM, Crawford, DJ: A comparison of two methods of calculating gst, a genetic measure of population differentiation. Am. J. Bot. 89(3), 460–465 (2002).

Fieller, E: The distribution of the index in a normal bivariate population. Biometrika. 24, 428–440 (1932).

Frishman, F: On the arithmetic means and variances of products and ratios of random variables (1971). Technical report, DTIC Document.

Geary, R: The frequency distribution of the quotient of two normal variates. J. R. Stat. Soc. 93(3), 442–446 (1930).

Goodman, LA: On simultaneous confidence intervals for multinomial proportions. Technometrics. 7(2), 247–254 (1965).

Graham, RL, Knuth, DE, Patashnik, O: Concrete Mathematics: A Foundation for Computer Science, 2nd edn, p. 492. Addison-Wesley Longman Publishing Co., Inc., Boston (1994). exercise 42.

Hinkley, DV: On the ratio of two correlated normal random variables. Biometrika. 56(3), 635–639 (1969).

Koopman, P: Confidence intervals for the ratio of two binomial proportions. Biometrics. 40, 513–517 (1984).

Korhonen, PJ, Narula, SC: The probability distribution of the ratio of the absolute values of two normal variables. J. Stat. Comput. Simul. 33(3), 173–182 (1989).

Lau, TK, Chen, F, Pan, X, Pooh, RK, Jiang, F, Li, Y, Jiang, H, Li, X, Chen, S, Zhang, X: Noninvasive prenatal diagnosis of common fetal chromosomal aneuploidies by maternal plasma dna sequencing. J. Matern. Fetal Neonatal Med. 25(8), 1370–1374 (2012).

Marsaglia, G: Ratios of normal variables. J. Stat. Softw. 16(4), 1–10 (2006).

Mekic, E, Sekulovic, N, Bandjur, M, Stefanovic, M, Spalevic, P: The distribution of ratio of random variable and product of two random variables and its application in performance analysis of multi-hop relaying communications over fading channels. Przegl. Elektrotechniczny. 88(7A), 133–137 (2012).

Minarik, G, Repiska, G, Hyblova, M, Nagyova, E, Soltys, K, Budis, J, Duris, F, Sysak, R, Bujalkova, MG, Vlkova-Izrael, B, et al: Utilization of benchtop next generation sequencing platforms ion torrent pgm and miseq in noninvasive prenatal testing for chromosome 21 trisomy and testing of impact of in silico and physical size selection on its analytical performance. PloS ONE. 10(12), 0144811 (2015).

Nadarajah, S: On the product and ratio of laplace and bessel random variables. J. Appl. Math. 2005(4), 393–402 (2005).

Nadarajah, S, Kotz, S: On the ratio of pearson type vii and bessel random variables. Adv. Decis. Sci. 2005(4), 191–199 (2005).

Nadarajah, S, Kotz, S: On the product and ratio of gamma and weibull random variables. Econ. Theory. 22(2), 338–344 (2006).

Nelson, W: Statistical methods for the ratio of two multinomial proportions. Am. Stat. 26(3), 22–27 (1972).

Pham-Gia, T: Distributions of the ratios of independent beta variables and applications. Commun. Stat. Theory Methods. 29(12), 2693–2715 (2000).

Piegorsch, WW, Richwine, KA: Large-sample pairwise comparisons among multinomial proportions with an application to analysis of mutant spectra. J. Agric. Biol. Environ. Stat. 6(3), 305–325 (2001).

Piper, J, Rutovitz, D, Sudar, D, Kallioniemi, A, Kallioniemi, O-P, Waldman, FM, Gray, JW, Pinkel, D: Computer image analysis of comparative genomic hybridization. Cytometry. 19(1), 10–26 (1995).

Poorter, H, Garnier, E: Plant growth analysis: an evaluation of experimental design and computational methods. J. Exp. Bot. 47(9), 1343–1351 (1996).

Press, SJ: The t-ratio distribution. J. Am. Stat. Assoc. 64(325), 242–252 (1969).

Price, RM, Bonett, DG: Confidence intervals for a ratio of two independent binomial proportions. Stat. Med. 27(26), 5497–5508 (2008).

Provost, S: On the distribution of the ratio of powers of sums of gamma random variables. Pak. J. Stat. 5, 157–174 (1989).

Quesenberry, CP, Hurst, D: Large sample simultaneous confidence intervals for multinomial proportions. Technometrics. 6(2), 191–195 (1964).

Sakamoto, H: On the distributions of the product and the quotient of the independent and uniformly distributed random variables. Tohoku Math. J. First Ser. 49, 243–260 (1943).

Sehnert, AJ, Rhees, B, Comstock, D, de Feo, E, Heilek, G, Burke, J, Rava, RP: Optimal detection of fetal chromosomal abnormalities by massively parallel dna sequencing of cell-free fetal dna from maternal blood. Clin. Chem. 57(7), 1042–1049 (2011).

Van Kempen, G, Van Vliet, L: Mean and variance of ratio estimators used in fluorescence ratio imaging. Cytometry. 39(4), 300–305 (2000).

## Acknowledgements

This contribution is the result of implementation of the project *REVOGENE* −*Research centre for molecular genetics* (ITMS 26240220067) supported by the Research & Developmental Operational Programme funded by the European Regional Development Fund.

## Author information

### Affiliations

### Contributions

All authors contributed equally to the research. FD wrote the manuscript. JG prepared the figures. All authors read and approved the final manuscript.

### Corresponding author

Correspondence to Frantisek Duris.

## Ethics declarations

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Additional file

### Additional file 1

A script written in language R to perform custom numerical simulations and produce graphical output. (R 10 kb)

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## About this article

### Cite this article

Duris, F., Gazdarica, J., Gazdaricova, I. *et al.* Mean and variance of ratios of proportions from categories of a multinomial distribution.
*J Stat Distrib App* **5, **2 (2018) doi:10.1186/s40488-018-0083-x

#### Received

#### Accepted

#### Published

#### DOI

### Keywords

- Multinomial distribution
- Ratio distribution
- Mean
- Variance