CORRELATION COEFFICIENT

Critical values for Testing Significance

Suppose someone gives you two short time series which each contain 20 independent values and you find that they have a correlation of 0.71. What is the probability that this correlation could have happened by pure chance ?

A simple method to test the null hypothesis that the product moment correlation coefficient is zero can be obtained using Student's t-test on the t statistic = r sqrt(N-2)/sqrt(1-r^2) where N is the number of samples (Statistics, M. Spiegel, Schaum series). So the example above gives t=4.28 and looking in standard t-tables, tells us that only 1% of cases with N-2=18 degrees of freedom have t values exceeding 2.88. Hence, at 99% confidence, we could deduce that the two time series are "correlated" and that the non-zero correlation did not happen by chance.
In these modern times, there is no need for you to go and look up figures in statistical tables. All you have to do is to tap in your values in the table below !

Degrees of Freedom (N-2):

5% Significance Level:

1% Significance Level:

Java Script taken from http://www.ilir.uiuc.edu/courses/lir493/rdist.htm

BEWARE WITH AUTOCORRELATED TIME SERIES !!!

Suppose that X and Y are independent normal random variables. Then, in the absence of temporal autocorrelation, the correlation coefficient, r, between random samples of size n from X and Y has a probability density function f(r) = ((1 - r^2)^0.5(n-4)) / B(0.5,0.5(n-2)) The distribution has mean zero and a variance of (n-1)^-1. However, the distribution is affected by the autocorrelation in X and Y, which increases the variance of the distribution and so gives rise to spurious large correlations. This problem was recognised for time series as early as 1926 by Yule in his presidential address to the Royal Statistical Society. In the discussion of the address which followed, Edgeworth asked 'What about space ? Are there not nonsense correlations in space ?' (Yule, 1926). These questions are still being addressed by statisticians (and climate researchers !).
[ Extract from J.M. Potts (1991) Statistical methods for the comparison of spatial patterns in meteorological variables. Unpublished PhD thesis, University of Kent at Canterbury - kindly communicated by I.T. Jolliffe ]

AND BEWARE WITH NON-GAUSSIAN TIME SERIES ...

The above results apply ONLY if the values are Gaussian distributed. If the time series are very intermittent then the distribution can become very skewed and significantly non-Gaussian. In such cases, it is wise to non-linearly transform the values towards Gaussian (sometimes referred to as NORMALISING) in order to minimise the effect of the extreme values. The square root and logarithm transformations are widely used for this purpose. A good meteorological example of a non-Gaussian quantity is provided by precipitation totals, whose wet day values can be made more Gaussian by applying a simple square root transformation. This transformation is recommended before performing statistical analyses of covariances, correlations, power spectra, EOFs etc. on daily (and even monthly mean) rainfall data. Extreme care should be taken when interpreting the significance of rainfall analyses !

Copyright © 1997 D. B. Stephenson
Permission is granted to make copies for individual use, not for redistribution.
This page was developed by Dr. Rupa Kumar Kolli and Dr. David Stephenson. It was last updated on the 21st of October 1997 an since 21st of October 1997 has been visited times.