- Open Access
Chi-p distribution: characterization of the goodness of the fitting using L p norms
© Livadiotis; licensee Springer. 2014
Received: 6 June 2013
Accepted: 24 December 2013
Published: 11 June 2014
This paper derives (1) the Chi-p distribution, i.e., the analog of Chi-square distribution but for datasets that follow the General Gaussian distribution of shape p, and (2) develops the statistical test for characterizing the goodness of the fitting with L p norms. It is shown that the statistical test has double role when the fitting method is induced by the L p norms: For given the shape parameter p, the test is rated based on the estimated p-value. Then, a convenient characterization of the fitting rate is developed. In addition, for an unknown shape parameter and if the fitting is expected to be good, then those L p norms that correspond to unlikely p-values are rejected with a preference to the norms that maximized the p-value. The statistical test methodology is followed by an illuminating application.
The least square method based on the Euclidean norm, p = 2, and the least absolute deviations method based on the “Taxicab” norm, p = 1, are some cases of the general fitting methods based on the L p -norms (see Burden and Faires1993; for more applications of the fitting methods based on L p , see: Sengupta1984; Livadiotis and Moussas2007; Livadiotis2008;2012; for fitting methods based on other effect sizes e.g., correlation, see: Livadiotis and McComas2013a).
The goodness of the least square fitting is typically measured using the estimated Chi-square value, that is the least squared value,. Then, this is compared with the Chi-square distribution, to examine whether such a value is frequent or not (see next sections). However, this test can apply only to datasets that follow the normal distribution. There is no similar test for cases where the dataset follows the General Gaussian distribution of shape p, (see Section 2 and Appendix A). Livadiotis (2012) showed the connection between the fitting with L p norms, as in Eq. (3), and datasets that follow the General Gaussian distributions,.
The purpose of this paper is to (1) construct the formulation of the Chi-p distribution, the analog of Chi-square distribution but for datasets that follow the General Gaussian distribution of shape p, and (2) develop the statistical test for characterizing the goodness of the fitting with L p norms, which corresponds to datasets that follow the General Gaussian distribution of shape p. Therefore, in Section 2, we revisit the Chi-square derivation, and following similar steps, we construct the Chi-p distribution. In Section 3, we develop the statistical test for characterizing the goodness of the fitting with L p norms, using the Chi-p distribution and the p-value. In Section 4, we provide an application of the statistical test. Finally, in Section 5, we summarize the conclusions. Appendix A briefly describes the General Gaussian distribution, while Appendix B shows the mathematical derivation of the surface of the sphere of higher dimensions in L p space.
2. Chi-p distribution
The estimated value of the Chi-square for a fitting is given by the minimum at α = α* of the function χ2(α) = TSD(α)2, as shown in Eq. (1) (least squares). Considering that the Chi-square minimum, χ2(α*), is equivalently referred to all the M = N-1 degrees of freedom (for N number of data), then each of them contributes to this minimum by a factor of. This is the estimated value of the reduced Chi-square. For multi-parametrical fitting (Livadiotis2007) of n free parameters, the degrees of freedom are M = N-n. In general, the Chi-square distribution in Eq. (5) is referred to M degrees of freedom.
Proposition 1: The L p -normed mean of the distribution (6) is < x > p = μ, ∀ p ≥ 1.
Proposition 2: The L p -normed variance of the distribution (6) is , ∀ p ≥ 1.
The proofs of the two Propositions are shown in Appendix A.
Lemma 1: The surface of the N-dimensional sphere of unit radius in L p space is given by(8)
Proof of Theorem 1. The distribution of Chi-p can be derived as follows. The normalization of the joint distribution function of all the data is(11)
In general, for M degrees of freedom, the Chi-p distribution is given by Eq. (10).
3. Statistical test of a fitting
We begin with the established method of Chi-square, and then we will proceed to the generalized method of Chi-p.
(see some applications in Livadiotis and McComas2013b; Frisch et al.2013; Funsten et al.2013). Note that the maximum p-value is 0.5, and this corresponds to the estimated Chi-square. This is larger than the Chi-square that maximizes the distribution,. Hence,, i.e., the Chi-square that corresponds to p-value = 0.5, is located always at the right of the maximum.
The statistical test of the fitting for the evaluation of its goodness comes from the null hypothesis that the given data are described by the fitted statistical model. If the derived p-value is smaller than the significance level of ~0.05, then the hypothesis is typically rejected, and the hypothesis that the data are described by the examined statistical model is characterized as unlikely.
Testing rates and characterizations
p ~ 0
T ~ -1
0 < p <0.005
-1 < T < -0.5
0.005 ≤ p <0.05
-0.5 ≤ T <0
0.05 ≤ p <0.19
0 ≤ T <0.5
0.19 ≤ p <0.5
0.5 ≤ T <1
p ~ 0.5
T ~ 1
Note that the maximum p-value = 0.5 corresponds to the estimated Chi-square. This is larger than the Chi-square that maximizes the distribution,. Hence, again we find
The statistical test has double role in the case of L p norms. If the shape parameter p is known, then the test can be rated by deriving the p-value and according to Table 1. If the shape parameter is unknown and the fitting is expected to be good, then all the shape values p that correspond to unlikely p-values can be rejected. In fact, the largest p-value corresponds to the most-likely shape parameter p of the examined data. These are shown in the following applications.
Testing rates and characterizations
Ratio of umbral area to whole sunspot area
f i (%)
The p-value has a minimum value at p ~ 2.08 and increases for larger shape values p until it reaches p ~ 5.77 where becomes p-value ~ 0.5 (not shown in the figure). If the shape p of the dataset is known, e.g., p = 2, then the null hypothesis is rejected, i.e., the sunspot area ratio data are dependent on the heliolatitude. On the other hand, if the data are expected to be invariant with the heliolatitude, and thus the null hypothesis to be accepted, then all the norms between p1 ~ 1.7 and p2 ~ 2.5 are rejected, and the norm L p with p ~ 5.77 characterizes better these data points; the respective mean value is given by α p (5.77)~0.164. Therefore, if we know the shape/norm p that characterizes the data, we can proceed and rate the goodness of the fitting. However, if p is unknown, at least we could detect those values of p for which the null hypothesis is accepted or rejected.
This is shown in Figure 5(b), where the peak is at p ≅ 2.95 ± 0.08. Therefore, the p-value is maximized at the same value of p-norm as the shape of the General Gaussian distribution.
This paper (1) presented the derivation of the Chi-p distribution, the analog of Chi-square distribution but for datasets that follow the General Gaussian distribution of shape p, and (2) developed the statistical test for characterizing the goodness of the fitting with L p norms, which corresponds to datasets that follow the General Gaussian distribution of shape p.
It was shown that the statistical test has double role in the case of L p norms: (1) If the shape parameter p is fixed and known, then the test can be rated by deriving the p-value. A convenient characterization of the fitting rate was developed. (2) If the shape parameter is unknown and the fitting is expected to be good for some shape parameter value p, a method for estimating p was given by fitting a General Gaussian distribution of shape p to the data, and then use this estimated shape parameter p to the Chi-p distribution to characterize the goodness of fitting. In particular, all the shape values p that correspond to unlikely p-values can be rejected, while the largest p-value corresponds to the most-likely shape parameter p of the examined data. This was verified by an illuminating example where the method of the fitting based on L p norms was applied.
Appendix A: General Gaussian distribution
Proposition 1: Given the distribution (6), we have that the L p -normed mean is < x > p = μ, ∀ p ≥ 1.
Proof. We have(A3)
Proposition 2: Given the distribution (6), we have that the L p -normed variance is , ∀ p ≥ 1.
Proof. We have , i.e.,(A5a)
Appendix B: Surface of the N-dimensional sphere in L p space, Βp,N
Lemma 1: The surface of the N-dimensional sphere of unit radius in L p space, Β p,N, is given by Eq.(8). This is involved in the proof of Chi-p distribution (10), as shown below.
Proof of Lemma 1.
Since,, finally, we end up with Eq.(B4).
- Adèr HJ: Modelling (Chapter 12). In Advising on Research Methods: A consultant’s companion. Edited by: with contributions by D.J. Hand, Adèr HJ, Mellenbergh GJ. Huizen, The Netherlands: Johannes van Kessel Publishing; 2008:271–304.Google Scholar
- Burden RL, Faires JD: Numerical Analysis. Boston, MA: PWS Publishing Company; 1993:437–438.Google Scholar
- Edwards AWF: The proportion of umbra in large sunspots, 1878–1954. The Observatory 1957, 77: 69–70.Google Scholar
- Frisch PC, Bzowski M, Livadiotis G, McComas DJ, Mӧbius E, Mueller HR, Pryor WR, Schwadron NA, Sokól JM, Vallerga JV, Ajello JM: Decades-long changes of the interstellar wind through our solar system. Science 2013, 341: 1080. 10.1126/science.1239925View ArticleGoogle Scholar
- Funsten HO, Frisch PC, Heerikhuisen J, Higdon DM, Janzen P, Larsen BA, Livadiotis G, McComas DJ, Mӧbius E, Reese CS, Reisenfeld DB, Schwadron NA, Zirnstein E: The circularity of the IBEX Ribbon of enhanced energetic neutral atom flux. Astrophys. J. 2013, 776: 30. 10.1088/0004-637X/776/1/30View ArticleGoogle Scholar
- Livadiotis G: Approach to general methods for fitting and their sensitivity. Physica A 2007, 375: 518–536. 10.1016/j.physa.2006.09.027View ArticleGoogle Scholar
- Livadiotis G: Approach to the block entropy modeling and optimization. Physica A 2008, 387: 2471–2494. 10.1016/j.physa.2008.01.002View ArticleGoogle Scholar
- Livadiotis G: Expectation values and Variance based on L p norms. Entropy 2012, 14: 2375–2396. 10.3390/e14122375MathSciNetView ArticleGoogle Scholar
- Livadiotis G, McComas DJ: Fitting method based on correlation maximization: Applications in Astrophysics. J. Geophys. Res. 2013, 118: 2863–2875.View ArticleGoogle Scholar
- Livadiotis G, McComas DJ: Evidence of large scale phase space quantization in plasmas”. Entropy 2013, 15: 1116–1132.View ArticleGoogle Scholar
- Livadiotis G, Moussas X: The sunspot as an autonomous dynamical system: A model for the growth and decay phases of sunspots. Physica A 2007, 379: 436–458. 10.1016/j.physa.2007.02.003View ArticleGoogle Scholar
- McCullagh P: What is statistical model? Ann. Stat. 2002, 30: 1225–1310.MathSciNetView ArticleGoogle Scholar
- Melissinos AC: Experiments in Modern Physics. London, UK: Academic Press Inc; 1966:464–467.Google Scholar
- Sengupta A: A rational function approximation of the singular eigenfunction of the monoenergetic neutron transport equation. J. Phys. A 1984, 17: 2743–2758. 10.1088/0305-4470/17/14/018MathSciNetView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.