Suppose someone gives you two short time series which
each contain 20 independent values and you find that
they have a correlation of 0.71. What is the
probability that this correlation could have
happened by pure chance ?
A simple method to test the null hypothesis that the product
moment correlation coefficient is zero can be obtained using
Student's t-test on the t statistic = r sqrt(N-2)/sqrt(1-r^2) where
N is the number of samples (Statistics, M. Spiegel, Schaum series).
So the example above gives t=4.28 and looking in standard t-tables,
tells us that only 1% of cases with N-2=18 degrees of freedom have
t values exceeding 2.88.
Hence, at 99% confidence, we could deduce that the
two time series are "correlated" and that the non-zero
correlation did not happen by chance.
In these modern times, there is no need for you to go and look
up figures in statistical tables. All you have to do is to tap
in your values in the table below !
Suppose that X and Y are independent normal random variables.
Then, in the absence of temporal autocorrelation, the correlation
coefficient, r, between random samples of size n from X and Y has
a probability density function
f(r) = ((1 - r^2)^0.5(n-4)) / B(0.5,0.5(n-2))
The distribution has mean zero and a variance of (n-1)^-1.
However, the distribution is affected by the autocorrelation in X and Y,
which increases the variance of the distribution and so gives rise
to spurious large correlations.
This problem was recognised for time series as early as 1926 by
Yule in his presidential address to the Royal Statistical Society.
In the discussion of the address which followed, Edgeworth asked
'What about space ? Are there not nonsense correlations in space ?'
(Yule, 1926). These questions are still being addressed by statisticians
(and climate researchers !).
[ Extract from J.M. Potts (1991)
Statistical methods for the comparison of spatial patterns in
meteorological variables. Unpublished PhD thesis, University of Kent at
Canterbury - kindly communicated by
I.T. Jolliffe ]
AND BEWARE WITH NON-GAUSSIAN TIME SERIES ...
The above results apply ONLY if the values are Gaussian distributed.
If the time series are very intermittent then the distribution can
become very skewed and significantly non-Gaussian. In such cases, it
is wise to non-linearly transform the values towards
Gaussian (sometimes referred to as NORMALISING) in order to minimise
the effect of the extreme values. The square root and logarithm
transformations are widely used for this purpose. A good meteorological
example of a non-Gaussian quantity is provided by precipitation totals,
whose wet day values can be made more Gaussian by applying a simple
square root transformation. This transformation is recommended before
performing statistical analyses of covariances, correlations, power
spectra, EOFs etc. on daily (and even monthly mean) rainfall data.
Extreme care should be taken when interpreting the significance of
rainfall analyses !