Copula (statistics)

All you want to know about Copula (statistics)

In statistics, a copula is used as a general way of formulating a multivariate distribution in such a way that various general types of dependence can be represented [1]. Other ways of formulating multivariate distributions include conceptually-based approaches in which the real-world meaning of the variables is used to imply what types of relationships might occur. In contrast, the approach via copulas might be considered as being more raw, but it does allow much more general types of dependencies to be included than would usually be invoked by a conceptual approach.

The approach to formulating a multivariate distribution using a copula is based on the idea that a simple transformation can be made of each marginal variable in such a way that each transformed marginal variable has a uniform distribution. When applied in a practical context, such transformations might be fitted as an initial step for each margin, or the parameters of the transformations might be fitted jointly with those of the copula.

There are many families of copulas which differ in the detail of the dependence they represent. A family will typically have several parameters which relate to the strength and form of the dependence. However, it is possible to specify a dependency structure and for a copula to emerge using a conditioning technique such as the D distribution.

Contents

The basic idea

Consider two random variables X and Y, and take them with strictly increasing cumulative distribution functions FX and FY for simplicity. The random variables can be transformed in such a way that their information is preserved. For example, one can define X' = 2X, Y' = Y/4, or X' = X + 3, Y' = exp(Y), and so on. The tranformation must be invertible to preserve the information. (X',Y') has then the same information as (X,Y), since we can go from one to the other by the transformation and its inverse. Now among all possible transformations, one can consider the functions FX and FY in particular, and define X' = FX(X) and Y' = FY(Y). It is a basic result of probability that by transforming a random variable through its cumulative distribution function we obtain a uniform random variable. It follows that X' and Y' are uniforms. Since the transforms are invertible, X' has the same information as X and Y' as Y. Hence specifying the dependence between X and Y is, in a way, the same as specifying dependence between X' and Y'. Being X' and Y' uniform random variables, the problem reduces to specifying a bivariate distribution between two uniforms, i.e. a copula. So the idea is taking out the specific marginal distribution by tranforming the marginals to uniforms, in a way that preserves information, and then specifying dependence as a multivariate distribution on the uniforms.

Definition

A copula is a multivariate joint distribution defined on the n-dimensional unit cube [0, 1]n such that every marginal distribution is uniform on the interval [0, 1].

Specifically, C:[0,1]^n\to [0,1] is an n-dimensional copula (briefly, n-copula) if:

 C\left(\mathbf u\right)=0 whenever \mathbf u\in [0,1]^n has at least one component equal to 0;
 C\left(\mathbf u\right)=u_i whenever \mathbf u\in [0,1]^n has all the components equal to 1 except the ith one, which is equal to ui;
 C\left(\mathbf u\right) is n-increasing, i.e., for each B=\times_{i=1}^{n}[x_i,y_i]\subseteq [0,1]^n;
 V_{C}\left( B\right):=\sum_{\mathbf z\in \times_{i=1}^{n}\{x_i,y_i\}} (-1)^{N(\mathbf z)} C(\mathbf z)\ge 0;

where the N(\mathbf z)=\operatorname{card}\{k\mid z_k=x_k\}.  V_{C}\left( B\right) is the so called C-volume of B.

Sklar's theorem

The theorem proposed by Sklar [2] underlies most applications of the copula. Sklar's theorem states that given a joint distribution function H for p variables, and respective marginal distribution functions, there exists a copula C such that the copula binds the margins to give the joint distribution.

For the bivariate case, Sklar's theorem can be stated as follows. For any bivariate distribution function H(xy), let F(x) = H(x, (−∞,∞)) and G(y) = H((−∞,∞), y) be the univariate marginal probability distribution functions. Then there exists a copula C such that

H(x,y)=C(F(x),G(y))\,

(where we have identified the distribution C with its cumulative distribution function). Moreover, if marginal distributions, say, F(x) and G(y), are continuous, the copula function C is unique. Otherwise, the copula C is unique on the range of values of the marginal distributions.

Fréchet–Hoeffding copula boundaries

Graphs of the Fréchet–Hoeffding copula limits and of the independence copula (in the middle).

Minimum copula: This is the lower bound for all copulas. In the bivariate case only, it represents perfect negative dependence between variates.

 W(u,v) = \max(0,u+v-1).\,

For n-variate copulas, the lower bound is given by

 W(u_1,\ldots,u_n) := \max\left\{1-n+\sum\limits_{i=1}^n {u_i} , 0 \right\} \leq C(u_1,\ldots,u_n).

Maximum copula: This is the upper bound for all copulas. It represents perfect positive dependence between variates:

 M(u,v) = \min(u,v).\,

For n-variate copulas, the upper bound is given by

C(u_1,\ldots,u_n)\le \min_{j \in \{1,\ldots,n\}} u_j =: M(u_1,\ldots,u_n).

Conclusion: For all copulas C(uv),

 W(u,v) \le C(u,v) \le M(u,v).

In the multivariate case, the corresponding inequality is

 W(u_1,\ldots,u_n) \le C(u_1,\ldots,u_n) \le M(u_1,\ldots,u_n).

Families of copula

Gaussian copula

Cumulative distribution and probability density functions of Gaussian copula with ρ = 0.4

One example of a copula often used for modelling in finance is the Gaussian copula, which is constructed from the bivariate normal distribution via Sklar's theorem. For X and Y distributed as standard bivariate normal with correlation ρ the Gaussian copula function is

 C_\rho(U,V) = \Phi_{\rho} \left(\Phi^{-1}(U), \Phi^{-1}(V) \right)

where U = U(X) and V = V(Y) are cumulative probability distributions whose result is in (0,1) and Φ denotes the cumulative normal density. These are "percentile to percentile transfomations", where X and Y are transformed into

A = (Φ − 1(U)) and B = (Φ − 1(V))

Differentiating C yields to:

 c_\rho(U,V) = \frac{\Phi_{U,V, \rho} (\Phi^{-1}(U), \Phi^{-1}(V) )}
{\Phi(\Phi^{-1}(U)) \Phi(\Phi^{-1}(V))}

where

 \Phi_{X,Y, \rho}(x,y) = \frac{1}{2 \pi\sqrt{1-\rho^2}} \exp \left (-
\frac{1}{2(1-\rho^2)}  \left [{x^2+y^2} -2\rho xy  \right ] \right )

is the density function for the standard bivariate gaussian with Pearson's product moment correlation coefficient ρ.

Archimedean copulas

Archimedean copulas are an important family of copulas, which have a simple form with properties such as associativity and have a variety of dependence structures. Unlike elliptical copulas (eg. Gaussian), most of the Archimedean copulas have closed-form solutions and are not derived from the multivariate distribution functions using Sklar’s Theorem.

One particularly simple form of a n-dimensional copula is

 H(x_1,x_2,\dots,x_n) = \Psi^{-1}\left(\sum_{i=1}^n\Psi(F_i(x_i)) \right)\,

where Ψ is known as a generator function. Such copulas are known as Archimedean. Any generator function which satisfies the properties below is the basis for a valid copula:

\Psi(1) = 0;\qquad \lim_{x \to 0}\Psi(x) = \infty;\qquad \Psi'(x) < 0;\qquad \Psi''(x) > 0.

Product copula: Also called the independent copula, this copula has no dependence between variates. Its density function is unity everywhere.

\Psi(x) = -\ln(x); \qquad  H(x,y) = F(x)G(y).

Where the generator function is indexed by a parameter, a whole family of copulas may be Archimedean. For example:

Clayton copula:

\Psi(x) = x^{\theta} -1;\qquad \theta \le 0; \qquad  H(x,y) = (F(x)^\theta+G(y)^\theta-1)^{1/\theta}.

For θ = 0 in the Clayton copula, the random variables are statistically independent. The generator function approach can be extended to create multivariate copulas, by simply including more additive terms.

Gumbel copula:

\Psi(x)= (-\ln(x))^\alpha \,.

Frank copula:

\Psi(x)= \ln\left( \frac{e^{\alpha x} -1}{e^{\alpha} -1}\right) .

Periodic copula

Aurélien Alfonsi and Damiano Brigo (2005)[3] introduced new families of copulas based on periodic functions. They noticed that if ƒ is a 1-periodic non-negative function that integrates to 1 over [0, 1] and F is a double primitive of ƒ, then both

F(u+v)-F(u)-F(v)\text{ and }-F(u-v)+F(u)+F(-v)\,

are copula functions, the second one not necessarily exchangeable. This may be a tool to introduce asymmetric dependence, which is absent in most known copula functions.

Empirical copulas

When analysing data with an unknown underlying distribution, one can transform the empirical data distribution into an "empirical copula" by warping such that the marginal distributions become uniform[1]. Mathematically the empirical copula frequency function is calculated by


C_n\left(\frac{i}{n}, \frac{j}{n}\right) = \frac{\text{Number of pairs } (x,y) \text{ such that } x \leq x_{(i)} \text{ and } y \leq y_{(j)} }{n} \, , 1 \leq i \leq n , 1 \leq j \leq n

where x(i) represents the ith order statistic of x.

Less formally, simply replace the data along each dimension with the data ranks divided by n.

Applications

Copulas are used in the pricing of collateralized debt obligations [4] (CDO's). Dependence modelling with copula functions is widely used in applications of financial risk assessment and actuarial analysis. Recently they have been successfully applied to the database formulation for the reliability analysis of Highway bridges and to various multivariate simulation studies in Civil, Mechanical and Offshore engineering.[citation needed]

In terms of their applications to CDO pricing and the role of CDOs in the current credit crisis, copulas have been considered to be insufficient and simplistic tools for capturing the risks in CDO portfolios, and much research has been devoted on going beyond (inherently static) copula functions and towards dynamic loss models, see for example the book Credit Correlation: Life after Copulas[5]. The Gaussian copula model has been used in the market as a tool to obtain the correlation in the names of a portfolio as implied by synthetic Single Tranche CDO prices. The way this correlation has been used assumes that a CDO tranche on a portfolio with 125 names is characterized by a single correlation parameter rather than the 125*124/2=7750 parameters one would normally expect for all possible dependences across pairs. Also, two different parts of a single CDO tranche (base tranches) would be inconsistently valued with two different Gaussian copulas. This can lead sometimes to absurd results[6]. These features, and the lack of tail dependence in the Gaussian copula model may have contributed to the losses involved in the credit crunch, although the mixing of methodological inadequacy and human behaviour makes a precise statement difficult.

See also

  • D distribution

References

Notes

  1. ^ a b Roger B. Nelsen (1999), An Introduction to Copulas. ISBN 0-387-98623-5.
  2. ^ Sklar (1959)
  3. ^ Alfonsi, A., and D. Brigo (2005). Comm. Statist. Theory Methods 34 (2005) 1437–1447
  4. ^ Meneguzzo, David; Walter Vecchiato (Nov 2003). "Copula sensitivity in collateralized debt obligations and basket default swaps". Journal of Futures Markets 24 (1): 37–70. doi:10.1002/fut.10110. 
  5. ^ Lipton, A., and Rennie, A., Editors: Credit Correlation — Life After Copulas, World Scientific, 2007
  6. ^ Implied Correlation in CDO Tranches: A Paradigm to be Handled with Care, SSRN research paper, available here

General

  • David G. Clayton (1978), "A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence", Biometrika 65, 141–151. JSTOR (subscription)
  • Frees, E.W., Valdez, E.A. (1998), "Understanding Relationships Using Copulas", North American Actuarial Journal 2, 1–25. Link to NAAJ copy
  • Roger B. Nelsen (1999), An Introduction to Copulas. ISBN 0-387-98623-5.
  • S. Rachev, C. Menn, F. Fabozzi (2005), Fat-Tailed and Skewed Asset Return Distributions. ISBN 0-471-71886-6.
  • A. Sklar (1959), "Fonctions de répartition à n dimensions et leurs marges", Publications de l'Institut de Statistique de L'Université de Paris 8, 229-231.
  • C. Schölzel, P. Friederichs (2008), "Multivariate non-normally distributed random variables in climate research – introduction to the copula approach". PDF
  • W.T. Shaw, K.T.A. Lee (2006), "Copula Methods vs Canonical Multivariate Distributions: The Multivariate Student T Distibution with General Degrees of Freedom". PDF
  • Srinivas Sriramula, Devdas Menon and A. Meher Prasad (2006), "Multivariate Simulation and Multimodal Dependence Modeling of Vehicle Axle Weights with Copulas", ASCE Journal of Transportation Engineering 132 (12), 945–955. (doi 10.1061/(ASCE)0733-947X(2006)132:12(945)) ASCE(subscription)
  • Genest, C.; MacKay, R.J. (1986), "The Joy of Copulas: Bivariate Distributions with Uniform Marginals", The American Statistician 40: 280–283, doi:10.2307/2684602 

External links


No comments have been added.



Your name:

City:

Country:

Your comments:

Security check *
(Please enter the number into adjoining box)

 
  • Ads

           
eXTReMe Tracker