EL 1100 2500 2100WQ 379 3115 5104

STAT 3115Q sec 02
ANALYSIS OF EXPERIMENTS, Fall 2012
UConn Storrs Campus, BOUS 160
MON WED 10:00-11:30
Eric Lundquist

[Guinness Brewery]

Office: BOUS 136
Office Hours: Mon 4:00-5:00, Tue 5:00-6:00, and by appointment
Phone: (860) 486-4084
E-mail: Eric.Lundquist@uconn.edu

TEACHING ASSISTANTS:

Sarah Sanborn
Office: BOUS 192
Office Hours: Mon 1:00-2:00, and by appointment
E-mail: sarah.sanborn@uconn.edu

Login George
Office: BOUS 192
Office Hours: Tue 11:30-12:30, and by appointment
E-mail: Login.George@uconn.edu


READING:
  1. Keppel, Geoffrey & Wickens, Thomas D. (2004). Design and Analysis: A Researcher's Handbook, 4/E. Prentice Hall. ISBN-10: 0135159415 (ISBN-13: 9780135159415)
  2. On-Line Readings and Reserve Readings (see below)

GRADING:
   
  • Homework:
  • 30%   assigned weekly
       
  • Midterm:
  • 35%   WEDNESDAY OCTOBER 17
       
  • Final:
  • 35%   MONDAY DECEMBER 10, 10:00 AM


    TOPICS AND READING ASSIGNMENTS: to be updated throughout the semester
    KW = Keppel and Wickens

    CLASS SYLLABUS in Microsoft Word format, should you lose your original. This has been considerably modified in the schedule below.

    TOPIC READING
    Experimental Design KW Ch. 1 [basic issues and terminology]
    PowerPoint slides on some introductory terminology and issues in experimental design.
    Summary of Techniques in the General Linear model in HTML format, Microsoft Word format, and PDF format.
    Categorical data and Chi-Square Howell Ch.6 [excellent presentation of Chi-Square and related topics]
    Excel spreadsheet to calculate a 2x2 chi square test of independence [including examples from the point of view of a dog and a medical researcher]
    Data Description KW Ch. 2 pp. 15-18, 24-25; Ch. 3 pp. 32-34; Ch. 7 pp. 144-145 [histogram, scatterplot; central tendency, dispersion, standardization; normality, skewness and kurtosis]
    The t-test and confidence intervals Howell Ch.7 [excellent treatment of the logic of the t-test, applied to the cases of a single sample mean, two related sample means, and two independent sample means; relation of t to z; confidence intervals described accurately on pp. 181-183]
    KW Ch. 3 pp. 34-36, Ch. 8 pp. 159-161
    and see my Notes on Confidence Intervals [references to Keith (2006) can be ignored, and the interpretation of confidence intervals for the regression coefficient "b" is the same as for the more familiar population mean "μ"]
    Null Hypothesis Significance Testing Howell Ch.4 [excellent and up-to-date treatment of the logic and controversies of hypothesis testing, possibly more accessible than Cohen's (1994) paper]
    KW Ch. 2 pp. 18-22; Ch. 3 pp. 46-48; Ch. 8 pp. 167-169
    Cohen (1994) [criticism of Null Hypothesis Significance Testing]
    Wilkinson and APA Task Force (1999) [recommendations for treatment of data in light of NHST controversy]

    For your curiosity and your future as a researcher, but not for your exam:
    Howell Ch. 5 Excerpt on Bayes's Theorem [provides a brief accurate description of Bayes's Theorem]
    Dienes (2011) [makes the case that Bayes's Theorem is what most people really believe is appropriate and want to use when analyzing data; link requires logging in with UConn NetID and password, then you should just download the pdf for convenience]
    Cohen (1990) [general advice about treatment of data]
    Cowles & Davis (1982) [historical roots of the "p<.05" significance level]
    Gigerenzer (1993) [examination of the NHST controversy by contrasting the incompatible original views of Fisher and Neyman & Pearson with the unsatisfying hybrid of their views that became the dominant method of data analysis]
    Between Subjects (Completely Randomized) Designs: One Factor KW Ch. 2 & 3, Ch. 8 pp. 161-162
    Logic Of ANOVA summary
    Effect Size and Power KW Ch. 8 pp. 163-167 (but not "Effect Sizes for Contrasts")
    Book Review of The Cult Of Statistical Significance from the journal Science from June 2008. This one-page article focuses on one consequence of the misplaced emphasis psychology places on null hypothesis significance testing, which is the neglect of effect size and of effect measurements.
    Assumptions of ANOVA (and t-tests): The Linear Model KW Ch. 7
    MIDTERM REVIEW interim summary
    Some sample midterms from previous years' courses
    [Note: 1) STAT 3115 was formerly called STAT 242; 2) some of these pages are out of order; 3) topics have been covered in a different order so many of these questions are not relevant to our exam, which should be apparent]
    Correlation KW Ch. 15 pp. 312-314
    r = covxy / (sx*sy), where covxy = SPxy / (N-1), and SPxy = Σ(X-Mx)(Y-My)
    Note this point from the list of links below:
    Correlation article in Wikipedia: whether or not the math explained here is of interest (correlations as cosines, etc.), the two images depicting sets of scatterplots are very important to understand. One of the diagrams on this page shows some scatterplots and the correlation coefficients calculated from them, just to give you an idea of what typical correlations might look like, but also of how unpredictable they might be if you don't look at your data in a scatterplot. This point is made even more obvious by this diagram further down the same page, which shows some very different sets of data that all give the exact same value for the correlation coefficient r (as well as for some other descriptive statistics).
    Analytical Comparisons Among Means (Single-df Contrasts) KW Ch. 4 sec. 4.1 - 4.5
    Analytic Contrasts summary
    Controlling Type I Errors in Multiple Comparisons (Planned and Post-hoc) KW Ch. 6
    Trend Analysis KW Ch. 4 sec. 4.6 - 4.7; Ch. 5
    Between-Subjects (Completely Randomized) Designs: Two Factors KW Ch. 10 & 11
    Two Factor Design: Interactions and Main Effects: this summary describes how to recognize when main effects and interactions are present in the two-way factorial design, both in terms of plots of means and in terms of tables of means.
    Keppel's ANOVA notation system (PDF): This is a handy summary of how to compute Degrees of Freedom for any Source of Variance. Keppel and Wickens (2004) use an ANOVA notation system that provides a simple way to compute Sums Of Squares: by converting Sources of Variance into Degrees of Freedom, and then into a combination of "bracketed" quantities, where the brackets indicate some further adding and dividing. But since no one in their right mind computes Sums Of Squares by hand, the only remaining useful part of this page is step 1 describing how to get Degrees of Freedom. That is quite useful though. Here's a Microsoft Word version in case it's convenient for any reason.
    Analyzing Interactions KW Ch. 12 & 13
    KW Ch. 14 pp. 303-307, 309-310: Nonorthogonality of the Effects, 14.3 Averaging of Groups and Individuals, and 14.5 Sensitivity to Assumptions (14.4 "Contrasts and Other Analytical Analyses" is optional, being a little heavy on notation for things you wouldn't really do by hand).
    Analysis of Covariance (ANCOVA) KW Ch. 15 pp. 311-312 [Aside from the analogy to post-hoc blocking (see pp. 231-232), this chapter will be largely skipped in favor of a regression-based treatment of ANCOVA in the spring semester (STAT 5105).]
    Three Factors and Higher Order Factorial Designs: Between-Subjects Designs KW Ch. 21 & 22
    Recognizing Higher Order Interactions From Graphs And Means Tables
    Repeated Measures (Within-Subjects) Designs: One Factor KW Ch. 16 & 17
    REPEATED MEASURES ANOVA notes: this summary is a companion to the "Logic of ANOVA Summary" above; it outlines the logic of the Sums Of Squares calculations that we will not concern outselves with in this class, though it may be useful to look at if you're confused about the concept behind such a calculation -- i.e., how the Within Groups Sums of Squares from the independent groups ANOVA is further partitioned into the part due to individual differences among the subjects and the part that is truly just experimental error. In Keppel and Wickens terms, that experimental error is identified as the interaction between factor A and the subject, thus the "AS" term, whereas here it's simply referred to as "error". The "S" term is referred to as the "Between Subjects" factor. The number of treatment conditions in the Between Groups factor A is called "k" here instead of our familiar "a".
    Expected Mean Squares (PDF): this topic isn't specific to any particular design, so it's being introduced at an arbitrary late point in the semester even though implicitly it was already introduced with the description of the F ratio for the single factor ANOVA; here's a Microsoft Word version in case it's convenient for any reason.
    Repeated Measures (Within-Subjects) Designs: Two Factors KW Ch. 18
    Mixed Designs: One Between, One Repeated Factor KW Ch. 19 & 20
    Finding Sources of Variance (PDF): once you're dealing with combinations of different numbers of between and within factors, it's good to have a general scheme for identifying what the sources of variance are in a given design; here's a Microsoft Word version in case it's convenient for any reason.
    Three Factors and Higher Order Factorial Designs: Repeated Measures and Mixed Designs KW Ch. 23
    Random and Nested Factors KW Ch. 24 & 25 but read mainly pp. 530-534
    Some sample final exams from previous years' courses
    [Note: 1) STAT 3115 was formerly called STAT 242; 2) some questions address topics we haven't covered, or have covered less thoroughly than these exams assume; you'll be able to determine which questions you can answer, and then use those for practice.]
    Some questions from previous years' STAT 3115 midterms are relevant to the final exam material listed above (e.g. contrasts, post-hoc testing, etc.); see in particular: 2004#3, 2003#1(a-d), 2002#3, 2001#2&3&4(b, if you consider factorial designs), 2000#2(b&c)&3

    NOTE ON TERMINOLOGY AND READING
  • For clarification, a completely between-subjects design is sometimes referred to as a "Completely Randomized" design when observations in each cell are all from different participants, randomly sampled from the population and randomly assigned to conditions. Of course, some designs are between-subjects, but do not use random assignment, e.g., in the case of quasi-experiments where gender is a factor. So "Between-Subjects" design is probably the preferable general term. At any rate, the opposite of "Between Subjects" (or of "Completely Randomized") is "Within Subjects" or "Repeated Measures" design. In BOTH Between and Within designs, we are usually dealing with FIXED effects -- not RANDOM effects. So don't misinterpret the phrase "Completely Randomized" as having any implications about whether you're using fixed or random EFFECTS.
  • Beginning around the halfway point in the text, Keppel and Wickens devote much space to detailed analyses of particular cases that are just as easily considered as parts of a general approach, and while the detail may serve you well when consulting the text as a handbook in the future, it's not that useful at the introductory level (note the last paragraph on p. 464). Case in point: there are two full chapters on three-way designs, but aside from the concept of the three-way interaction and how to read the three-way graphs, it's essentially a generalization of the two-way analyses already covered (see p. 507).
  • I recommend that in those later portions of the text you skim over the parts that describe computations: e.g., SS's using bracket terms, contrasts using Ψ's with complicated subscripts, standard errors of t's used to evaluate them. It's certainly preferable that you understand the computations and formalisms, it's just that we'll emphasize how you can combine various SPSS results to achieve the same result. But DO note the many conceptual points and useful recommendations that are offered throughout all the chapters. If you make this distinction successfully, you'll find there are many fewer pages you really need to attend to.
  • Note the error in the last full paragraph of p. 309 (on heterogeneity of variance with unequal sample sizes), where Keppel and Wickens write that "When the smaller groups are the ones with the larger variances, the tests are biased to give too many Type I errors, while when the larger groups have the smaller variances, the tests are biased to give too few Type I errors." First of all, this is a heads-I-win-tails-you-lose situation since clearly the two conditions described are the same: When the smaller groups are the ones with the larger variances, the larger groups MUST be the ones with the smaller variances. Ugh. And then you have to wonder if the silly phrase "too few errors" implies that we strive to make a certain number of errors. I'm pretty sure what they meant to say is that when larger groups have SMALLER variances, the weighted-averaged error variance MSS/AB is biased toward being smaller than it should be, and F will be significant more often than it would be with an accurate larger error term, and thus Type I errors occur more than 5% of the time. When the larger groups have LARGER variances, the bias in computing the error term is toward a larger error MSS/AB, which makes F less likely to be significant than it really should be -- which is not a case of making "too few Type I errors" (the rate is now less than 5% but really, the fewer the better), but of the complementary problem of making too many Type II errors (finding a non-significant F when the difference is really there). When they say "too few Type I errors" they really just mean α has effectively been lowered.


    HOMEWORK ASSIGNMENTS: to be updated throughout the semester
    1. HW1 due Wednesday 9/5/12; SPSS formatted data available here
      • Comments:
    2. HW2 due Wednesday 9/19/12; SPSS formatted data available here
      • Comments: Instead of clicking on OKAY when ready to run an analysis, remember you can click on PASTE to get the commands in a syntax window and run them from there. The advantage of that is you'll be able to simply copy and repaste the commands you just compiled to run the next analysis, and simply change the occurrences of, say, "1939" to "1970" as appropriate. Saves a lot of redundant clicking around if you're inclined to do that.
    3. HW3 due Wednesday 9/26/12; SPSS formatted data available here
      • Comments:
    4. HW4 due Wednesday 10/3/12
      • Comments: There is no SPSS component to this homework. Note that after the six homework questions, I've appended some extra questions listed as "THINGS THAT ARE NOT PART OF THIS ASSIGNMENT THAT YOU SHOULD THINK ABOUT ANYWAY". These are NOT part of the homework and should NOT be turned in, but may be helpful for you to work through in fully understanding the material.
    5. HW5 due Monday 10/15/12; SPSS formatted data available here
      • Comments:
        Here is some SPSS translation that you either understand already or don't need for this homework, but which may be helpful to know about in the long run:
      • Note that SPSS gives different names to your Sources of Variance in the output: A = "group" (your independent variable name), S/A = "error". As we'll soon see, the sum of those two gives a Total for both SS and df, and the Total is listed in the output not as just plain "Total", but as "Corrected Total"!
        • The way those labels work is something like this. The "corrected model" row refers to the total of all the factors present in your experiment. For now we have only one factor (A) so that IS the whole model, thus the rows for "corrected model" and "group" have the same information. Soon enough we will also have a second factor (B) and its interaction with the first (A*B), and then the "corrected total" will refer to the three of those effects summed together, and each will be listed separately in its own row in place of the sole factor we now have called "group".
        • In SPSS, the so-called "total" SS (which is NOT the Total we're interested in!) computes the SS around an origin of zero, rather than around the grand mean of all the scores, and its degrees of freedom is the total number of observations. The "corrected total" (the one we ARE interested in!) finds the SS around the grand mean, which is after all an estimate of the population mean, and you may remember that in estimating a parameter from the data we lose a degree of freedom. And indeed, the df for the "corrected total" is the number of observations minus 1. You may think only the "corrected total" makes any sense - who bothers finding the "sum of squared deviations from zero" instead of "... from the mean"? And I agree with you completely, but read on...
        • The "intercept" represents the grand mean of all the observations, i.e., your estimate of the population grand mean, and it will almost always be highly significant, and will always have df = 1: that's the 1 df that you lost above by estimating the population grand mean from your data. What significance test are you doing on it? You're testing whether it's different from 0! Who knows why. Read p. 37 of Keppel and Wickens, you'll see that it's not especially useful or interesting, it's just there for some reason. This seems nonsensical to me, but... It has SS = 135.809 because if your grand mean were, say, 2.1277 (which it is on this homework), its squared deviation from a hypothesized mean of 0 would be 2.1277 squared or 4.5270, and if you summed that number over all 30 of your observations, well it'd be the same for each of them - there's only one grand mean so the 2.1277 and the 0 are the same for everyone. And 4.5270 x 30 = 135.81. Voilà! - and no one cares. But there it is (which is pretty much what "voilà" means in the first place; those of you who think the word is "viola" are beyond help). Notice that if you add the "intercept" SS and df to the "corrected total" SS and df, you get what SPSS labels the "total" SS and df.
        • Bottom line: it's the "corrected total" you'll care about all semester, so ignore the "intercept" and the (uncorrected) "total".
      • Why do we call the within-groups variance (S/A) effect the "error"? That's because it's the denominator of the F ratio, representing the experimental error (individual differences, measurement error, etc) that is the variability present among subjects who have all received the same treatment but still differ from each other. In more complicated designs the "error" term will not always be S/A; in fact, we will use different error terms to test different effects within the same experiment. Fun stuff.
      • The vertical axis of your means plot is labeled "Estimated Marginal Means", which you should just read as saying "the means of the groups"!
      • The output column labeled "Type III Sums Of Squares" is indeed your SS for each effect; why it's called "Type III" is best saved for the spring semester course on regression, though I'll be happy to share before then if you like. Let it be said that Type I SS may be of interest, but you will rarely if ever encounter Type II and Type IV. Don't worry, they're still calculated the same, it's just which data they're calculated from that might differ. For now, don't even think about it at all, just recognize that Type III is what we do here.
      • The R-squared value is printed underneath the "tests of between subjects effects" output box, and there's also something called "adjusted R-squared". The latter is an estimate of the population value that you may safely ignore until you look at R-squared in multiple regression next semester, at which point all will become clear.
    6. HW6 due Wednesday 10/24/12; SPSS formatted data available here AND here (you need BOTH HW6Af12.sav and HW6Bf12.sav!); the power analysis program GPower 3 is available here.
      • Comments:
    7. HW7 due Wednesday 11/7/12; SPSS formatted data available here
    8. HW8 due Wednesday 11/14/12; SPSS formatted data available here
      • Comments:
    9. HW9 due FRIDAY 11/30/12; SPSS formatted data available here
      • Comments:
    10. HW10 due FRIDAY 12/7/12; see Recognizing Higher Order Interactions From Graphs And Means Tables
      • Comments: There is no data file because for once I'm letting you enter the data yourselves, because 1) there's very little of it and 2) welcome to real life. Except, in real life, you'd get an undergrad to do it. (And of course you'd make it a worthwhile overall educational experience for them.) Note three points about repeated measures designs on this homework: the additional new assumption for repeated measures designs called "sphericity"; Mauchly's test for sphericity which according to SPSS "tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix" (now you know!); and what to do in case of violations of sphericity (the Greenhouse-Geisser, Huynh-Feldt, and lower-bound corrections). Furthermore there are some post hoc tests to be done after the omnibus, but they're the easiest you've seen so far (equivalent to paired samples t-tests with a Bonferroni correction for alpha). OPTIONAL EXTRA CREDIT: The handout is worth looking at but is not required, because HW10's second question is only for extra credit, as described on the assignment. Don't stress over trying to figure out the three-way graphs if it's too annoying, but honestly, if you can assign some numbers for the cell means as the handout advises, you would find it a worthwhile exercise and possibly get 3 more points giving you a possible 13 points out of 10. Lucky!
    11. OPTIONAL 5 POINT EXTRA CREDIT HW11 due TUESDAY 12/11/12; SPSS formatted data available here
      • Comments: This is worth 5 points extra credit if you do it. You won't do it because you care about the points, though -- you'll do it because it'll take you through how to analyze a mixed design (containing both between and within subjects factors), which will be helpful in your data analysis future. Because it's optional, the due date is not until the day after the final exam so concentrate on studying first. If you don't have time after the exam, no worries, but if you can fit it in it won't be incredibly hard, just useful.


    NOTES AND RESOURCES

    Plato (and Greek Philosophy from origins to Aristotle): from Thomas Leahey's textbook on the history of psychology. Note Plato's emphasis on the abstract and universal as being part of an ideal realm that can only be comprehended by the mind (soul), not the senses. Then consider the quite abstract notion of "population" in statistics. It's also interesting to ponder our characterization of all observations as deviations from an ideal (represented by the "mean") that may not even ever actually be observed -- hence the assumption that individual differences represent "error." Statistics and psychology have some pronounced Platonist strains.

    The Secretary Problem, or how to choose a spouse. In case you're interested in the underlying math or something, apart from the illustration of how mathematical assumptions determine the applicability of models.

    Odds and Probabilities: a primer on definitions, interpretations, and calculations]

    Exponents and logarithms a primer on some basic mathematics that comes up in statistical contexts such as: logarithmic data transformations; loglinear models of categorical data with multiple IV's; the log(odds) transformation in logistic regression; the log likelihood (or "deviance" or "-2LL") in model comparison analyses like Structural Equation Modeling.

    Reliability is described adequately here in Wikipedia, as are several types of validity -- among them Internal, External, Construct, and Statistical Conclusion validity. See especially the respective threats to each, for aspects of research designs to pay special attention to.

    A diagram of a "quincunx", sometimes called a "Galton Board" after its inventor Francis Galton, which models the way multiple causation results in a normal distribution. It's a wooden board with pins inserted into it, and when a ball is dropped into the top it will bounce randomly either right or left at each pin it encounters. Most of the balls will bounce about an equal number of times in both directions, canceling out the left and right directions and landing in the middle. By chance, some of them will bounce to the left or the right more times, landing further from the middle. The end result is the accumulation of balls forming a normal distribution, which shows the decreasing likelihood of extreme patterns of bouncing (or of multiple causes all pushing the outcome in the same direction). Here's a video that shows a quincunx in action, where something more sand-like than ball-like is poured through the opening.

    The opening scene of Rosencrantz And Guildenstern Are Dead by Tom Stoppard, in which an unlikely extended run of coin flips gives rise to some existential angst. Note that even though each coin flip is perfectly in line with the "laws" of probability, we still don't quite believe this run of events should occur. (The play is a modern comedic take on two minor characters from Shakespeare's Hamlet who are unwittingly involved in a plot to kill Hamlet; this 1966 update focuses on their misadventures before their own eventual deaths.)

    An illustration of the three types of kurtosis which I've also incorporated into an informative web page about everyone's favorite monotreme

    Deriving the estimate of the standard error of the mean: something you don't need to be able to do at all but may be curious about, and if you are, it's explained clearly in section 10.17 of this text by Glass and Hopkins.

    Why the sample variance has a denominator of N-1 instead of N: a proof that dividing the sample sum of squares by N-1 instead of N gives an unbiased estimate (i.e. accurate in the long-run average) of the population variance. This is purely for the mathematically inclined -- others should steer clear. (Believe it or not, I've seen other proofs that are more complicated and thus probably more thorough.) The "expectation" operator notated as E(X) means roughly the long-run average of X or the mean of all X's in the population, but note that doesn't necessarily indicate a mean of some score -- X could be a variance for instance, and then E(X) would be the population value of that variance, as it is in this proof. If that helps clear anything up.
    Here is an alternative proof from a book on mathematical statistics. Other pages from the same book follow but are unrelated to this topic.

    Confidence Intervals in Howell ch. 7 pp. 181-183
    Notes on the meaning and interpretation of Confidence Intervals: Howell's discussion is very good, so the somewhat lengthy little essay that I've included here is more than I intended to write; still, it may be helpful to hear it expressed in more than one way.

    Bayes's Theorem article in Wikipedia: I'm pretty sure it's legitimate to phrase the theorem this way: The probability of A being true given that B is true is equal to the probability that B actually does occur due to A, divided by the probability that B actually does occur due to any possible reason it might occur -- that is, that B occurs at all under any circumstances. This denominator is sometimes expressed as the sum of two other probabilities: that B occurs due to A, and that B occurs due to every reason other than A, which do in fact account for all occurrences of B since "A and not-A" pretty much covers every possible reason for B. You can substitute the observations of interest into this formula: A = a hypothesis being true, and B = data bearing on that hypothesis. Examples listed on this link are pretty illuminating, if you follow them closely. The trick with Bayesian statistics is coming up with those probabilities that are the ingredients in the formula, e.g., of B occurring due to any possible reason -- it's educated guesswork at best (which can be pretty good after all).
    Bayes's Theorem excerpt from Howell ch. 5: a very good basic treatment.

    Logic Of ANOVA summary

    Understanding ANOVA Visually: a fun bit of Flash animation; related teaching tools are listed at http://www.psych.utah.edu/learn/statsampler.html

    Statistical Power Applet: a visual demonstration of the relations among the various quantities related to power.

    G*Power Home Page: free software for power calculations.

    Correlation article in Wikipedia: whether or not the math explained here is of interest (correlations as cosines, etc.), the two images depicting sets of scatterplots are very important to understand. One of the diagrams on this page shows some scatterplots and the correlation coefficients calculated from them, just to give you an idea of what typical correlations might look like, but also of how unpredictable they might be if you don't look at your data in a scatterplot. This point is made even more obvious by this diagram further down the same page, which shows some very different sets of data that all give the exact same value for the correlation coefficient r (as well as for some other descriptive statistics).

    Analytic Contrasts summary

    Keppel's ANOVA notation system (PDF)
    Keppel's ANOVA notation system (Microsoft Word)
    This is a handy summary of how to compute Degrees of Freedom for any Source of Variance. Keppel and Wickens (2004) use an ANOVA notation system that provides a simple way to compute Sums Of Squares: by converting Sources of Variance into Degrees of Freedom, and then into a combination of "bracketed" quantities, where the brackets indicate some further adding and dividing. But since no one in their right mind computes Sums Of Squares by hand, the only remaining useful part of this page is the part describing how to get Degrees of Freedom. That is quite useful though.

    Recognizing Higher Order Interactions From Graphs And Means Tables

    Finding Sources of Variance (PDF)
    Finding Sources of Variance (Microsoft Word)

    Expected Mean Squares (PDF)
    Expected Mean Squares (Microsoft Word)

    Excel spreadsheet for calculating values of the z, t, F, and chi-square distributions and their probabilities

    Table of Selected Values of the t Distribution:

  • In the absence of SPSS, Excel (TDIST and TINV functions), or other relevant software, use this table to find the value of t that cuts off a certain percentage of the area under the curve, which corresponds to the probability of obtaining a t of that size or larger. Since t is symmetric it doesn't matter whether it's positive or negative (i.e., whether it's in the upper or lower tail); all that counts is the absolute value which represents the obtained score's distance from the null hypothesis value in units of estimated standard errors -- analogous to a z-score which uses KNOWN standard errors or standard deviations as its units. The many curves representing the t distribution differ depending on the degrees of freedom or df, with few df giving a curve that is flatter with longer tails than the standard normal distribution (or z distribution); with more and more df, the t distribution looks more and more like the z distribution. (Note that with infinite df, which means an infinite sample size, the values for t are identical to those you'd find in the z distribution.)
  • Read the row corresponding to the correct df: for analyzing means the df are n-1 for a single sample, and for a 2 sample means comparison the df are the sum of each sample's df (or N-2, where N is the total number of observations from both groups). In correlation and regression the df are the number of observations minus the number of predictors, minus 1 (or N-k-1). The commonly used proportions listed in this version of the table are conveniently identified by two different column headings, based on whether you want the proportion of interest to be located entirely in one tail, or split between the upper and lower tails. See the diagram accompanying the table to clarify this. ALWAYS use the two-tailed version, and thus the headings under "proportion in two tails combined" -- so the 1 df value for p=.05 is 12.706, not 6.314. (One-tailed tests of so-called "directional hypotheses" map p-values onto smaller required values of t, making it easier to declare results significant, but this procedure has always been controversial and I rarely see a situation that legitimately calls for it. How often is it really the case that one group's mean MUST be higher than the other's, and it's inconceivable that their sizes could be reversed?) As an example, the t value for the p<.01 cutoff for the difference between the means of two samples of size n=10 would be 2.878. The df would be (10-1) + (10-1) = 18, and the appropriate column would be the one under 0.01 as you read the "proportion in two tails combined" headings. If your obtained t is larger than 2.878 then it clearly cuts off an even smaller proportion of the area than .01, and thus you can say the t you obtained has p<.01. (Any statistical software will tell you precisely what the p-value for your t actually is.)
  • Note that if the particular df you're looking for don't appear in the table, you should use the next LOWER df -- do NOT round df UP even if that higher df value is closer to yours. Another table with more values included appears here, and many more are available on the web. Many of these, for instance this one, will give the complementary proportion of the area for values SMALLER than t, and will do so only for one tail -- thus to find the example value of 2.878 you'd have to look for 18 df and then the 99.5% cutoff value, because p=.01 corresponds to a total of 1% of the area being more extreme and you have to split that 1% into 0.5% in the upper tail and 0.5% in the lower.

    Table of Selected Values of the F Distribution:

  • In the absence of SPSS, Excel, or other relevant software, use this table to find the value of F that cuts off a certain percentage of the area under the curve, which corresponds to the probability of obtaining an F of that size or larger. The F distribution has only one tail to consider, in the sense that the extreme values of interest are UPPER values only. The distribution's shape differs according to both the number of groups (or predictors) being analyzed, and the number of observations being made, and so picking out the relevant member of the family of F distributions requires two numbers specifying its df (one for the numerator df and one for the denominator df). Reproducing all the percentage cutoff points for the area under the curve (corresponding to the probabilities) for all possible combinations of these df would be very unwieldy. Thus only the most common cutoff values -- 5%, 10%, and 1% -- are included in this version of the table. They are organized such that the columns represent different numerator df up to 20 (appropriate for 21 group means in ANOVA, or 20 predictor variables in regression, which should be plenty), and the rows represent all values of the denominator df from 1 to 100.
  • Consulting the section of the table appropriate for the p-value you wish to examine, you find the row and column corresponding to your numerator and denominator df and the value at that entry is the upper "critical value": the value of F beyond which the given percentage of the area under the curve is cut off. For instance, the value for the p<.01 cutoff for the difference between the means of two samples of size n=10 would be 8.285. Familiarity with ANOVA df would make it apparent that the numerator df would be [number of groups] - 1 = 2-1 = 1, and the denominator df would be the sum of the df within each group, or (10-1) + (10-1) = 18. The entry in the p=.01 portion of the table under numerator df (called "ν1") and denominator df (called "ν2") is 8.285, meaning that for those df the area under the curve beyond the value of 8.285 on the horizontal axis is 1% of the total, and the probability of randomly sampling scores that lead to that high an F value when there is no difference between the populations means is 1%. If your obtained F is larger than 8.285 then it clearly cuts off an even smaller proportion of the area than .01, and thus you can say the F you obtained has p<.01. (Any statistical software will tell you precisely what the p-value for your F actually is.)
  • For 2 groups, either F or t can be used to yield exactly the same probability; in comparing just two groups the numerator df will always be 1 and the denominator df will be the same as the df for t. F then is the square of t -- that is, within rounding error, 8.285 is the square of 2.878.
  • Note that if the particular numerator and/or denominator df you're looking for don't appear in the table, you should use the next LOWER df -- do NOT round df UP even if that higher df value is closer to yours. A printable pdf version of the F distribution table for p=.05 and p=.01 values with numerator df up to 10 and all denominator df up to 100 is here. More versions of tables for F and other distributions appear here and at various other easily located web sites. Many web pages such as this one will calculate a p-value for any given F and df, and others will calculate F given df and a p-value, etc. But if you have access to the internet, chances are you also have access to Excel which will do the same with its FDIST, FINV, TDIST, and TINV functions, etc., or SPSS which displays all p-values for its analyses automatically.
    Supplemental readings in statistics and psychology:

  • Some useful papers:
  • Gravetter, F. J., & Wallnau, L. B. (2006) Statistics for the Behavioral Sciences (7th ed.). Belmont, CA: Wadsworth/Thomson: a very clear introductory level statistics text.
  • Howell, David C. (2007). Statistical Methods for Psychology (6th Ed.). Thomson-Wadsworth. (ISBN-10: 0495012874; ISBN-13: 9780495012870): an introductory text of exceptional clarity and accuracy, for the grad or advanced undergrad level:
  • Keith, Timothy Z. (2006). Multiple Regression and Beyond. Allyn & Bacon. ISBN: 0205326447 (ISBN-13: 9780205326440): used for STAT 379 Spring 2007/2008.
  • Grimm, Lawrence G. and Yarnold, Paul R., eds. (1994). Reading and Understanding Multivariate Statistics. APA. (ISBN: 1-55798-273-2; ISBN-13: 978-1-55798-273-5): used for STAT 379 Spring 2007/2008.
  • Grimm, Lawrence G. and Yarnold, Paul R., eds. (2000). Reading and Understanding MORE Multivariate Statistics. APA. (ISBN: 1-55798-698-3; ISBN 13: 978-1-55798-698-6): companion volume to the 1994 book.
  • Pedhazur, Elazar J. (1997). Multiple Regression in Behavioral Research (3rd Ed.) Thomson-Wadsworth. (ISBN-10: 0030728312; ISBN-13: 9780030728310): an advanced text and one of the best references on multiple regression and related procedures.
  • Keppel, Geoffrey & Wickens, Thomas D. (2004). Design and Analysis: A Researcher's Handbook, 4/E. Prentice Hall. ISBN-10: 0135159415 (ISBN-13: 9780135159415): used for STAT 242 Fall 2007.
  • Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analysing data: A model comparison perspective (2nd ed.). Mahwah, NJ: Erlbaum.(ISBN/ISSN: 0-8058-3718-3; ISBN13: 978-0-8058-3718-6): an advanced text on experimental design and ANOVA.


    Some important figures in the history of statistics:

  • Abraham De Moivre around 1730 derived the normal distribution as the limit of the binary distribution when the number of binary decisions (e.g., coin tosses) is infinite.
  • Johann Carl Friedrich Gauss often gets credit for discovering the normal distribution because in 1809 he proved that it described errors of measurement (in astronomy, etc.), which is why the normal distribution is sometimes called the Gaussian distribution.
  • Adolphe Quetelet in 1835 first applied the normal distribution to biological and behavioral traits rather than merely to measurement error, describing the concept of "the average man"; he also invented the Quetelet Index which today we usually refer to as the Body Mass Index (BMI).
  • Francis Galton invented the concepts of correlation and regression around 1886. He also read and wrote at age 2-1/2, went ballooning and did experiments with electricity for fun, mapped previously unexplored African territories, taught soldiers camping procedures and how to deal with wild animals and "savages," tried to objectively determine which part of Britain had the most attractive women, studied the efficacy of prayer empirically, observed the amount of fidgeting at scientific lectures to measure the degree of boredom, invented fingerprinting and weather maps along with the meteorological terms "highs," "lows," and "fronts," coined the phrase "nature and nurture," and pioneered mental testing, twin studies of heritability, the composite photograph, the study of mental imagery, the free-association technique for probing unconscious thought processes, the psychological survey questionnaire, and... umm... eugenics. Oops.
  • Karl Pearson founded modern statistics beginning in the 1890's, inventing the chi-square distribution and test and coining the term "standard deviation" among others; he formalized the calculation of the correlation coefficient (where Galton had arrived at it graphically) and so that calculation bears his name today.
  • George Udny Yule worked on the concepts and mathematics of partial correlation and regression in the 1890's, making multiple regression as we know it possible.
  • William Sealy Gosset in 1908 worked out the distribution of sample means ("standard error" in modern terminology) for cases where the population standard deviation is unknown -- hence he is the inventor of the t-test.
  • Ronald Fisher was a key figure in bridging the gap between the Darwinian theory of natural selection and its underlying mechanism of Mendelian genetics; from about 1915 onwards he also invented experimental design as we know it today, and developed Analysis Of Variance (ANOVA) as a generalization of Gosset's work to more than two groups (Snedecor in his influential early textbook named the 'F' statistic for Fisher).
  • Jerzy Neyman and Egon Pearson (son of Karl) invented and refined many of the concepts of null hypothesis significance testing in the 1930's (e.g. the alternative hypothesis, power, Type II error, confidence intervals), though Fisher had a constant ongoing argument with everything they did -- mainly because it wasn't the way HE did it.


    If you're wondering about classes being canceled due to weather, see http://alert.uconn.edu/ or call 486-3768.