only the greatest christmas song in the whole history of christmas

HOMEWORK ASSIGNMENTS (#10 DUE FRI 12/7)

EL 1101 5104 5105 2100

PSYC 5104 sec 001
Foundations of Research in the Psychological Sciences I, Fall 2018
UConn Storrs Campus, BOUS 160
MON WED 10:10-11:40
Eric Lundquist

[Guinness Brewery]

Office: BOUS 136
Office Hours: Tue Thu 12:30-1:30 and by appointment
Phone: (860) 486-4084
E-mail: Eric.Lundquist@uconn.edu

TEACHING ASSISTANTS:

Andy Tucker
Office: BOUS 120 (or 362)
Office Hours: Friday 1:00-3:00
E-mail: Andrew.Tucker@uconn.edu

Liz Simmons
Office: BOUS 123
Office Hours: Wednesday 8:00-10:00
E-mail: Elizabeth.A.Simmons@uconn.edu


READING:
  1. Keppel, Geoffrey & Wickens, Thomas D. (2004). Design and Analysis: A Researcher's Handbook, 4/E. Prentice Hall. ISBN-10: 0135159415 (ISBN-13: 9780135159415)
  2. On-Line Readings and Reserve Readings (see below)

GRADING:
   
  • Homework:
  • 30%   assigned weekly
       
  • Midterm:
  • 35%   WEDNESDAY OCTOBER 17
       
  • Final:
  • 35%   MONDAY DECEMBER 10, 10:00 AM


    TOPICS AND READING ASSIGNMENTS: to be updated throughout the semester
    KW = Keppel and Wickens

    CLASS SYLLABUS in Microsoft Word format, should you lose your original. This has been considerably modified in the schedule below.

    TOPIC READING
    Experimental Design KW Ch. 1 [basic issues and terminology]
    Explanation of R2 along with Sum of Squares, Variance, and Confidence Intervals .
    Summary of Techniques in the General Linear model in HTML format, Microsoft Word format, and PDF format.
    PowerPoint slides on some introductory terminology and issues in experimental design.

    The "Replication Crisis" and related issues A list of issues and difficulties confronting psychological researchers lately, that I've presented here informally and shrilly which is consistent with how I present it in class.
    A syllabus / reading list for a seminar titled Everything Is F***ed, only the asterisks are made more rude at this link. It's a good starting point for where to read about all the headaches science in general and psychology in particular are currently up against. This was a VERY popular and frequently shared blog post from the moment it appeared in August 2016.
    What has happened down here is the winds have changed: Andrew Gelman's incisive 2016 blog post recounting his impression of the developing problems. Point of information for the curious: the headings are lines from Randy Newman's song "Louisiana 1927" about the devastating Mississippi River flood that should have been foreseen and prepared for but somehow took everyone by surprise and became a huge disaster. (Especially poignant in 2017 as Houston lies underwater.)
    When the Revolution Came for Amy Cuddy: Very good report from the New York Times on the turmoil that has developed within psychology as a result of reconsidering the field's research methods and standards.

    Ioannidis Why Most Published Research Findings Are False.PDF
    Simmons Nelson Simonsohn False Positive Psychology.pdf
    Estimating the reproducibility of psychological science Nosek et al.pdf
    A manifesto for reproducible science.pdf
    ASA statement on p values.pdf
    BASP Editorial Null Hypothesis Banning.pdf
    Categorical data and Chi-Square Howell Ch.6 [excellent presentation of Chi-Square and related topics]
    Excel spreadsheet to calculate a 2x2 chi square test of independence [including examples from the point of view of a dog and a medical researcher]
    How to talk about odds and odds ratios in English
    Data Description KW Ch. 2 pp. 15-18, 24-25; Ch. 3 pp. 32-34; Ch. 7 pp. 144-145 [histogram, scatterplot; central tendency, dispersion, standardization; normality, skewness and kurtosis]
    The t-test and confidence intervals Howell Ch.7 [excellent treatment of the logic of the t-test, applied to the cases of a single sample mean, two related sample means, and two independent sample means; relation of t to z; confidence intervals described accurately on pp. 181-183]
    KW Ch. 3 pp. 34-36, Ch. 8 pp. 159-161
    and see my Notes on Confidence Intervals [references to Keith (2006) can be ignored, and the interpretation of confidence intervals for the regression coefficient "b" is the same as for the more familiar population mean "μ"]
    Some example of t-tests calculated in Excel. Double-clicking on formula cells will highlight the other cells used in the calculations. From the top, there are: a single sample t-test comparing sample mean gas mileage (mpg = miles-per-gallon) to manufacturer's mileage claims; a single sample t-test using some simple numbers that can be verified by hand; a paired samples t-test using those numbers again and creating a single set of difference scores; an independent samples t-test using those numbers again as scores from two different groups -- along with calculation of r2 directly from t and df, as compared to the square of the value of r given by Excel when correlating DV scores with Group membership. Each group's sd and standard error of the mean are calculated separately even though they aren't used in calculating t. For illustration purposes two different null hypothesis population means are used in each example, but note that for the paired samples and independent samples tests the only sensible value to hypothesize is 0, since we rarely have occasion to try to reject any other specific value.

    When identifying the value of t that cuts off the extreme 5% of scores, different conventions can be used. I typically write t.05(4) = 2.776 and it's implicit that t can be positive or negative. Howell says if I'm gonna do that, I should add the "+/-" before the "2.776", which is fair, but I usually don't and it's understood (I hope). What Howell typically does is to call that same value t.025 instead of t.05, since it cuts off the extreme 2.5% of scores on the positive side. I find that confusing so I don't do it. Here's his statement about notation from his 8th edition and here are examples of his usage from his 6th edition that I've posted from, and the 8th edition where his clarifying statement first appears.
    Null Hypothesis Significance Testing Howell Ch.4 [excellent and up-to-date treatment of the logic and controversies of hypothesis testing, possibly more accessible than Cohen's (1994) paper]
    KW Ch. 2 pp. 18-22; Ch. 3 pp. 46-48; Ch. 8 pp. 167-169
    Cohen (1994) [criticism of Null Hypothesis Significance Testing]
    Wilkinson and APA Task Force (1999) [recommendations for treatment of data in light of NHST controversy]

    For your curiosity and your future as a researcher, but not for your exam:
    Howell Ch. 5 Excerpt on Bayes's Theorem [provides a brief accurate description of Bayes's Theorem]
    Dienes (2011) [makes the case that Bayes's Theorem is what most people really believe is appropriate and want to use when analyzing data; link requires logging in with UConn NetID and password, then you should just download the pdf for convenience -- or, what the heck, read it right here]
    Cohen (1990) [general advice about treatment of data]
    Cowles & Davis (1982) [historical roots of the "p<.05" significance level]
    Gigerenzer (1993) [examination of the NHST controversy by contrasting the incompatible original views of Fisher and Neyman & Pearson with the unsatisfying hybrid of their views that became the dominant method of data analysis]
    Still Not Significant [blog post listing many many delusional euphemisms for insisting that non-significant results are still sorta significant]
    Between Subjects (Completely Randomized) Designs: One Factor KW Ch. 2 & 3, Ch. 8 pp. 161-162
    Logic Of ANOVA summary
    More ANOVA information and examples
    Effect Size and Power KW Ch. 8 pp. 163-167 (but not "Effect Sizes for Contrasts")
    Book Review of The Cult Of Statistical Significance from the journal Science from June 2008. This one-page article focuses on one consequence of the misplaced emphasis psychology places on null hypothesis significance testing, which is the neglect of effect size and of effect measurements.
    Assumptions of ANOVA (and t-tests): The Linear Model KW Ch. 7
    Q-Q Plots described on pp. 75-79 of Howell 8th ed.: note that the axes are mistakenly reversed at the bottom of the first page (p.75) and I've left a note correcting that, so that it matches all the plots he shows. SPSS, however, actually does put the expected quantiles on the Y axis and the observed on the X axis. It's arbitrary but obviously your text has to match your graphs!
    MIDTERM REVIEW interim summary, updated for Fall 2016 and after

    Some sample midterms from previous years' courses
    [Note: 1) PSYC 5104 was formerly STAT 3115 which was formerly STAT 242; 2) some of these pages are out of order; 3) topics have been covered in a different order so many of these questions are not relevant to our exam, which should be apparent - this is ESPECIALLY TRUE for Fall 2016 and later where a lot more material about methodology and current issues has been included in the earlier part of the course, possibly rendering most of these exams irrelevant to the midterm... but here they are anyway, so be able to answer the parts you recognize]

    RECORDING OF MONDAY 10/15/18 REVIEW SESSION
    I did not have the time or energy to cut out the superfluous small talk at the beginning and end of the review session, so this recording hits the 2 hour mark. The actual content of it was pretty good both in terms of what people asked about and how I explained it (I think!), and some of those explanations include some points I emailed you all earlier in the day. Yet, as I've said in past semesters, listening to it is not necessarily the optimal use of your studying time given that you already have your notes and the midterm summary link. Your call, maybe you just finished a podcast and are looking for the next big thing. Special message at the end for anyone who makes it all the way through.

    RECORDING OF MONDAY 10/16/17 REVIEW SESSION
    After cutting out some superfluous small talk at the beginning of the review session, this recording ends up being 68 minutes. Listening to it is not necessarily the optimal use of your studying time given that you already have your notes and the midterm summary link, but there are some useful moments I guess. The biggest drawback is the references to what's written on the board, which might be hard to visualize without having been there.

    Here's the midterm review session from Fall 2016 in case you're desperate to hear more of my voice, but even then I didn't recommend listening to it in lieu of actual studying (it's two hours long). For the first couple minutes I did look over my previous exam just reading off topics that were covered on that very similar exam, but that's just what's on the Study Guide that you already have.

    PRINTABLE TABLES AND FORMULAS
    PSYC 5104 midterm formulas.docx
    Howell 8E z table.pdf
    Howell 8E t table.pdf
    Howell 8E F table.pdf
    Howell 8E chi square table.pdf
    Correlation KW Ch. 15 pp. 312-314
    r = covxy / (sx*sy), where covxy = SPxy / (N-1), and SPxy = Σ(X-Mx)(Y-My)
    Note this point from the list of links below:
    Correlation article in Wikipedia: whether or not the math explained here is of interest (correlations as cosines, etc.), the two images depicting sets of scatterplots are very important to understand. One of the diagrams on this page shows some scatterplots and the correlation coefficients calculated from them, just to give you an idea of what typical correlations might look like, but also of how unpredictable they might be if you don't look at your data in a scatterplot. This point is made even more obvious by this diagram further down the same page known as Anscombe's quartet, which shows some very different sets of data that all give the exact same value for the correlation coefficient r (as well as for some other descriptive statistics). Along those lines, this game is also quite illuminating.

    As another way of testing your intuitions about correlations, play this Guess The Correlation game.

    correlation calculation.xlsx demonstrating how the formula works
    unequal n correlation.xls demonstrating that unbalanced factorial designs have correlated (i.e. non-orthogonal) independent variables, whose (Type III) SS therefore will not add up to the total SS.
    Analytical Comparisons Among Means (Single-df Contrasts) KW Ch. 4 sec. 4.1 - 4.5
    Analytic Contrasts summary
    Controlling Type I Errors in Multiple Comparisons (Planned and Post-hoc) KW Ch. 6
    Trend Analysis KW Ch. 4 sec. 4.6 - 4.7; Ch. 5
    Between-Subjects (Completely Randomized) Designs: Two Factors KW Ch. 10 & 11
    Two Factor Design: Interactions and Main Effects: this summary describes how to recognize when main effects and interactions are present in the two-way factorial design, both in terms of plots of means and in terms of tables of means.
    Keppel's ANOVA notation system (PDF): This is a handy summary of how to compute Degrees of Freedom for any Source of Variance. Keppel and Wickens (2004) use an ANOVA notation system that provides a simple way to compute Sums Of Squares: by converting Sources of Variance into Degrees of Freedom, and then into a combination of "bracketed" quantities, where the brackets indicate some further adding and dividing. But since no one in their right mind computes Sums Of Squares by hand, the only remaining useful part of this page is step 1 describing how to get Degrees of Freedom. That is quite useful though. Here's a Microsoft Word version in case it's convenient for any reason.
    Analyzing Interactions KW Ch. 12 & 13
    KW Ch. 14 pp. 303-307, 309-310: Nonorthogonality of the Effects, 14.3 Averaging of Groups and Individuals, and 14.5 Sensitivity to Assumptions (14.4 "Contrasts and Other Analytical Analyses" is optional, being a little heavy on notation for things you wouldn't really do by hand).
    Excel example of interaction effects, three way interaction, and removable vs. nonremovable interactions: this includes data from Dovidio and Gaertner (2000) as well as from Keppel and Wickens 4E pp. 208-209, though you could also play around with putting different numbers into the 2x2x2 or 3x2x2 interaction graphs.
    Analysis of Covariance (ANCOVA) KW Ch. 15 pp. 311-312 [Aside from the analogy to post-hoc blocking (see pp. 231-232), this chapter will be largely skipped in favor of a regression-based treatment of ANCOVA in the spring semester (PSYC 5105).]
    Three Factors and Higher Order Factorial Designs: Between-Subjects Designs KW Ch. 21 & 22
    Recognizing Higher Order Interactions From Graphs And Means Tables
    Repeated Measures (Within-Subjects) Designs: One Factor KW Ch. 16 & 17
    REPEATED MEASURES ANOVA notes: this summary is a companion to the "Logic of ANOVA Summary" above; it outlines the logic of the Sums Of Squares calculations that we will not concern outselves with in this class, though it may be useful to look at if you're confused about the concept behind such a calculation -- i.e., how the Within Groups Sums of Squares from the independent groups ANOVA is further partitioned into the part due to individual differences among the subjects and the part that is truly just experimental error. In Keppel and Wickens terms, that experimental error is identified as the interaction between factor A and the subject, thus the "AS" term, whereas here it's simply referred to as "error". The "S" term is referred to as the "Between Subjects" factor. The number of treatment conditions in the Between Groups factor A is called "k" here instead of our familiar "a".
    Expected Mean Squares (PDF): this topic isn't specific to any particular design, so it's being introduced at an arbitrary late point in the semester even though implicitly it was already introduced with the description of the F ratio for the single factor ANOVA; here's a Microsoft Word version in case it's convenient for any reason.
    Repeated Measures (Within-Subjects) Designs: Two Factors KW Ch. 18
    Mixed Designs: One Between, One Repeated Factor KW Ch. 19 & 20
    Finding Sources of Variance (PDF): once you're dealing with combinations of different numbers of between and within factors, it's good to have a general scheme for identifying what the sources of variance are in a given design; here's a Microsoft Word version in case it's convenient for any reason.
    Three Factors and Higher Order Factorial Designs: Repeated Measures and Mixed Designs KW Ch. 23
    Random and Nested Factors KW Ch. 24 & 25 but read mainly pp. 530-534
    FINAL REVIEW summary

    Some sample final exams from previous years' courses
    [Note: 1) PSYC 5104 was formerly STAT 3115 which was formerly STAT 242; 2) some questions address topics we haven't covered, or have covered less thoroughly than these exams assume; you'll be able to determine which questions you can answer, and then use those for practice.]
    Some questions from previous years' STAT 3115 midterms are relevant to the final exam material listed above (e.g. contrasts, post-hoc testing, etc.); see in particular: 2004#3, 2003#1(a-d), 2002#3, 2001#2&3&4(b, if you consider factorial designs), 2000#2(b&c)&3

    RECORDING OF MONDAY 12/10/18 REVIEW SESSION
    Monday's review session seemed pretty useful but at two and a half hours it's probably only worth listening to once you've done the studying using the study guide / review above, and is pretty dependent on seeing what was on the board, anyway. So maybe not the best use of your studying time, but it's here if you want to fall asleep to it the night before the exam.

    NOTE ON TERMINOLOGY AND READING
  • For clarification, a completely between-subjects design is sometimes referred to as a "Completely Randomized" design when observations in each cell are all from different participants, randomly sampled from the population and randomly assigned to conditions. Of course, some designs are between-subjects, but do not use random assignment, e.g., in the case of quasi-experiments where gender is a factor. So "Between-Subjects" design is probably the preferable general term. At any rate, the opposite of "Between Subjects" (or of "Completely Randomized") is "Within Subjects" or "Repeated Measures" design. In BOTH Between and Within designs, we are usually dealing with FIXED effects -- not RANDOM effects. So don't misinterpret the phrase "Completely Randomized" as having any implications about whether you're using fixed or random EFFECTS.
  • Beginning around the halfway point in the text, Keppel and Wickens devote much space to detailed analyses of particular cases that are just as easily considered as parts of a general approach, and while the detail may serve you well when consulting the text as a handbook in the future, it's not that useful at the introductory level (note the last paragraph on p. 464). Case in point: there are two full chapters on three-way designs, but aside from the concept of the three-way interaction and how to read the three-way graphs, it's essentially a generalization of the two-way analyses already covered (see p. 507).
  • I recommend that in those later portions of the text you skim over the parts that describe computations: e.g., SS's using bracket terms, contrasts using Ψ's with complicated subscripts, standard errors of t's used to evaluate them. It's certainly preferable that you understand the computations and formalisms, it's just that we'll emphasize how you can combine various SPSS results to achieve the same result. But DO note the many conceptual points and useful recommendations that are offered throughout all the chapters. If you make this distinction successfully, you'll find there are many fewer pages you really need to attend to.
  • Note the error in the last full paragraph of p. 309 (on heterogeneity of variance with unequal sample sizes), where Keppel and Wickens write that "When the smaller groups are the ones with the larger variances, the tests are biased to give too many Type I errors, while when the larger groups have the smaller variances, the tests are biased to give too few Type I errors." First of all, this is a heads-I-win-tails-you-lose situation since clearly the two conditions described are the same: When the smaller groups are the ones with the larger variances, the larger groups MUST be the ones with the smaller variances. Ugh. And then you have to wonder if the silly phrase "too few errors" implies that we strive to make a certain number of errors. I'm pretty sure what they meant to say is that when larger groups have SMALLER variances, the weighted-averaged error variance MSS/AB is biased toward being smaller than it should be, and F will be significant more often than it would be with an accurate larger error term, and thus Type I errors occur more than 5% of the time. When the larger groups have LARGER variances, the bias in computing the error term is toward a larger error MSS/AB, which makes F less likely to be significant than it really should be -- which is not a case of making "too few Type I errors" (the rate is now less than 5% but really, the fewer the better), but of the complementary problem of making too many Type II errors (finding a non-significant F when the difference is really there). When they say "too few Type I errors" they really just mean α has effectively been lowered.


    HOMEWORK ASSIGNMENTS: to be updated throughout the semester
    1. HW1 due Wednesday 9/5/18; SPSS formatted data available here
      • Comments: For those unfamiliar with some of the concepts described in this first homework, here are some explanatory comments that might help. They are completely optional; only look at them if any confusion or curiosity arises.
      • HW1 Google Form
    2. HW2 due Thursday 9/20/18; SPSS formatted data available here
      • Comments: Instead of clicking on OKAY when ready to run an analysis, remember you can click on PASTE to get the commands in a syntax window and run them from there. The advantage of that is you'll be able to simply copy and repaste the commands you just compiled to run the next analysis, and simply change the occurrences of, say, "1939" to "1970" as appropriate. Saves a lot of redundant clicking around if you're inclined to do that.
      • HW2 Google Form
    3. HW3 due Friday 9/28/18; SPSS formatted data available here
      • Comments: This homework is about t-tests -- both independent samples and paired samples -- and though we haven't covered the topic in lecture yet it's probably trivial to follow the instructions and complete it with just a little info coming this week in class. Friday 9/28 is a convenient due date since homeworks are being submitted electronically instead of in class; if that date changes due to lecture being too out-of-sync, I'll let you know.
      • HW3 Google Form
    4. HW4 due Friday 10/5/18
      • Comments: There is no SPSS component to this homework. Note that after the six homework questions, I've appended some extra questions listed as "THINGS THAT ARE NOT PART OF THIS ASSIGNMENT THAT YOU SHOULD THINK ABOUT ANYWAY". These are NOT part of the homework and should NOT be turned in, but may be helpful for you to work through in fully understanding the material.
      • HW4 Google Form
    5. HW5 due Friday 10/19/18 (after Wednesday's exam); SPSS formatted data available here
      • Comments:
        Here is some SPSS translation that you either understand already or don't need for this homework, but which may be helpful to know about in the long run:
      • Note that SPSS gives different names to your Sources of Variance in the output: A = "group" (your independent variable name), S/A = "error". As we'll soon see, the sum of those two gives a Total for both SS and df, and the Total is listed in the output not as just plain "Total", but as "Corrected Total"!
        • The way those labels work is something like this. The "corrected model" row refers to the total of all the factors present in your experiment. For now we have only one factor (A) so that IS the whole model, thus the rows for "corrected model" and "group" have the same information. Soon enough we will also have a second factor (B) and its interaction with the first (A*B), and then the "corrected total" will refer to the three of those effects summed together, and each will be listed separately in its own row in place of the sole factor we now have called "group".
        • In SPSS, the so-called "total" SS (which is NOT the Total we're interested in!) computes the SS around an origin of zero, rather than around the grand mean of all the scores, and its degrees of freedom is the total number of observations. The "corrected total" (the one we ARE interested in!) finds the SS around the grand mean, which is after all an estimate of the population mean, and you may remember that in estimating a parameter from the data we lose a degree of freedom. And indeed, the df for the "corrected total" is the number of observations minus 1. You may think only the "corrected total" makes any sense - who bothers finding the "sum of squared deviations from zero" instead of "... from the mean"? And I agree with you completely, but read on...
        • The "intercept" represents the grand mean of all the observations, i.e., your estimate of the population grand mean, and it will almost always be highly significant, and will always have df = 1: that's the 1 df that you lost above by estimating the population grand mean from your data. What significance test are you doing on it? You're testing whether it's different from 0! Who knows why. Read p. 37 of Keppel and Wickens, you'll see that it's not especially useful or interesting, it's just there for some reason. This seems nonsensical to me, but... It has SS = 135.809 because if your grand mean were, say, 2.1277 (which it is on this homework), its squared deviation from a hypothesized mean of 0 would be 2.1277 squared or 4.5270, and if you summed that number over all 30 of your observations, well it'd be the same for each of them - there's only one grand mean so the 2.1277 and the 0 are the same for everyone. And 4.5270 x 30 = 135.81. Voilà! - and no one cares. But there it is (which is pretty much what "voilà" means in the first place; those of you who think the word is "viola" are beyond help). Notice that if you add the "intercept" SS and df to the "corrected total" SS and df, you get what SPSS labels the "total" SS and df.
        • Bottom line: it's the "corrected total" you'll care about all semester, so ignore the "intercept" and the (uncorrected) "total".
      • Why do we call the within-groups variance (S/A) effect the "error"? That's because it's the denominator of the F ratio, representing the experimental error (individual differences, measurement error, etc) that is the variability present among subjects who have all received the same treatment but still differ from each other. In more complicated designs the "error" term will not always be S/A; in fact, we will use different error terms to test different effects within the same experiment. Fun stuff.
      • The vertical axis of your means plot is labeled "Estimated Marginal Means", which you should just read as saying "the means of the groups"!
      • The output column labeled "Type III Sums Of Squares" is indeed your SS for each effect; why it's called "Type III" is best saved for the spring semester course on regression, though I'll be happy to share before then if you like. Let it be said that Type I SS may be of interest, but you will rarely if ever encounter Type II and Type IV. Don't worry, they're still calculated the same, it's just which data they're calculated from that might differ. For now, don't even think about it at all, just recognize that Type III is what we do here.
      • The R-squared value is printed underneath the "tests of between subjects effects" output box, and there's also something called "adjusted R-squared". The latter is an estimate of the population value that you may safely ignore until you look at R-squared in multiple regression next semester, at which point all will become clear.
      • HW5 Google Form
    6. HW6 due Friday 10/26/18 SPSS formatted data available here AND here (you need BOTH HW6Af18.sav and HW6Bf18.sav!); the power analysis program GPower 3 is available here.
    7. HW7 due Friday 11/9/18; SPSS formatted data available here
    8. HW8 due Friday 11/16/18; SPSS formatted data available here
    9. HW9 due Friday 11/30/18; SPSS formatted data available here
    10. HW10 due Friday 12/7/18
      • Comments: see Recognizing Higher Order Interactions From Graphs And Means Tables.
        There is no data file because for once I'm letting you enter the data yourselves, because 1) there's very little of it and 2) welcome to real life. Except, in real life, you'd get an undergrad to do it. (And of course you'd make it a worthwhile overall educational experience for them.) Note three points about repeated measures designs on this homework: the additional new assumption for repeated measures designs called "sphericity"; Mauchly's test for sphericity which according to SPSS "tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix" (now you know!); and what to do in case of violations of sphericity (the Greenhouse-Geisser, Huynh-Feldt, and lower-bound corrections). Furthermore there are some post hoc tests to be done after the omnibus, but they're the easiest you've seen so far (equivalent to paired samples t-tests with a Bonferroni correction for alpha). OPTIONAL EXTRA CREDIT: The handout is worth looking at but is not required, because HW10's second question is only for extra credit, as described on the assignment. Don't stress over trying to figure out the three-way graphs if it's too annoying, but honestly, if you can assign some numbers for the cell means as the handout advises, you would find it a worthwhile exercise and possibly get 3 more points giving you a possible 13 points out of 10. Lucky!
      • HW10 Google Form
    11. OPTIONAL 5 POINT EXTRA CREDIT HW11 due FRIDAY 12/14/18; SPSS formatted data available here
      • Comments: see Finding Sources of Variance (PDF).
        This is worth 5 points extra credit if you do it. You won't do it because you care about the points, though -- you'll do it because it'll take you through how to analyze a mixed design (containing both between and within subjects factors), which will be helpful in your data analysis future. Because it's optional, the due date is not until the Friday after the final exam so concentrate on studying first. If you don't have time after the exam, no worries, but if you can fit it in it won't be incredibly hard, just useful.
      • Actual comment on this homework from a tenured professor of psychology (who, okay, is also a good friend since grad school), after she emailed me for advice on analyzing a mixed design and I sent her this homework link: "You perfect person, you! This is the best homework handout ever. I need to come take your stat classes for fun and refresher-ness." How do you not want to do it now?
      • HW11 Google Form


    NOTES AND RESOURCES

    Plato (and Greek Philosophy from origins to Aristotle): from Thomas Leahey's textbook on the history of psychology. Note Plato's emphasis on the abstract and universal as being part of an ideal realm that can only be comprehended by the mind (soul), not the senses. Then consider the quite abstract notion of "population" in statistics. It's also interesting to ponder our characterization of all observations as deviations from an ideal (represented by the "mean") that may not even ever actually be observed -- hence the assumption that individual differences represent "error." Statistics and psychology have some pronounced Platonist strains.

    Ten simple rules for structuring papers, which you might find to be sort of a useful guide to how to write a good research report. Not about APA style, but about what makes a good paper. Handy for your own reference and maybe for when you're trying to teach students how to do it when you're a TA, or later on, a graduate advisor.

    Some miscellaneous topics explained: I wrote this for my undergrad research methods class (so when it refers to exams, it means THEIRS, not OURS!) but it's consistent with things I say in the grad course and some of it is useful to see addressed explicitly.

    The Worst Example Of A Three Way Interaction Ever: This is a ludicrous example inspired by a ludicrous blog post and includes a ludicrous graph, and is still an accurate description of a three-way interaction in ANOVA.

    How do you pronounce "Likert scale"?: The definitive answer.

    Why the sample variance has a denominator of N-1 instead of N: a proof that dividing the sample sum of squares by N-1 instead of N gives an unbiased estimate (i.e. accurate in the long-run average) of the population variance. This is purely for the mathematically inclined -- others should steer clear. (Believe it or not, I've seen other proofs that are more complicated and thus probably more thorough.) The "expectation" operator notated as E(X) means roughly the long-run average of X or the mean of all X's in the population, but note that doesn't necessarily indicate a mean of some score -- X could be a variance for instance, and then E(X) would be the population value of that variance, as it is in this proof. If that helps clear anything up.
    Here is an alternative proof from a book on mathematical statistics. Other pages from the same book follow but are unrelated to this topic.

    Confidence Intervals in Howell ch. 7 pp. 181-183
    Notes on the meaning and interpretation of Confidence Intervals: Howell's discussion is very good, so the somewhat lengthy little essay that I've included here is more than I intended to write; still, it may be helpful to hear it expressed in more than one way.

    ANOVA vs Regression: this post describes some general situations in which you might prefer one to the other, even though they're equivalent expressions of the General Linear Model. Then there's a bit about why dichotomizing continuous variable scores (or grouping scores into more than two categories, as well) is not a great idea, and why it's better to treat continuous variables with regression-based techniques instead of trying to cram them into possibly more familiar ANOVA-type techniques.

    An illustration of the three types of kurtosis which I've also incorporated into an informative web page about everyone's favorite monotreme.
    Sadly it must be noted that the diagram is based on a traditional misunderstanding of what kurtosis looks like -- see the next link...

    What kurtosis actually means: To correct the diagram presented above, this paper gets the message across in its first lines: "The incorrect notion that kurtosis somehow measures "peakedness" (flatness, pointiness or modality) of a distribution is remarkably persistent, despite attempts by statisticians to set the record straight. This article puts the notion to rest once and for all. Kurtosis tells you virtually nothing about the shape of the peak - its only unambiguous interpretation is in terms of tail extremity; i.e., either existing outliers (for the sample kurtosis) or propensity to produce outliers (for the kurtosis of a probability distribution)." That is, leptokurtic really only means more observations in the tails relative to the normal distribution, and platykurtic means less in the tails. It concludes decisively, "[K]urtosis should never be defined in terms of peakedness. To do so is counterproductive to the aim of fostering statistical literacy. The relationship of peakedness with kurtosis is now officially over." It's a short article but between those sentences there's probably way more than you want to know: Westfall, P. H. (2014). Kurtosis as Peakedness, 1905-2014. R.I.P. The American Statistician, 68(3), 191-195.

    Mean, Median, and Skew: Correcting a Textbook Rule: interesting to note that the cliché picture of how skewness draws the mean in the direction of the skew, and the median less so, is not always true, especially for discrete distributions. The point is probably made adequately if you just read the brief "Abstract" and "What To Teach" sections.

    Reliability is described adequately here in Wikipedia, as are several types of validity -- among them Internal, External, Construct, and Statistical Conclusion validity. See especially the respective threats to each, for aspects of research designs to pay special attention to.

    Distributions and distribution free tests: a brief listing of the most common sampling distributions, used to evaluate the probabilities of observing various values of a statistic calculated from a sample. When the assumptions of a statistical test aren't met, those distributions don't describe the probabilities of getting those values. Instead you should use a test that is free of any distribution assumptions. Hence the term "distribution-free" tests, or "non-parametric" in that they don't specify the parameters of the distribution ahead of time -- though those terms are not strictly synonymous. Some of the more popular distribution free tests are listed at the end as suggestions to try out, without further explanation.

    Odds and Probabilities: a primer on definitions, interpretations, and calculations

    How to talk about odds and odds ratios in English, duplicating the post from the chi-square section in the syllabus above

    Exponents and logarithms a primer on some basic mathematics that comes up in statistical contexts such as: logarithmic data transformations; loglinear models of categorical data with multiple IV's; the log(odds) transformation in logistic regression; the log likelihood (or "deviance" or "-2LL") in model comparison analyses like Structural Equation Modeling.

    The Secretary Problem, or how to choose a spouse. In case you're interested in the underlying math or something, apart from the illustration of how mathematical assumptions determine the applicability of models.

    A diagram of a "quincunx", sometimes called a "Galton Board" after its inventor Francis Galton, which models the way multiple causation results in a normal distribution. It's a wooden board with pins inserted into it, and when a ball is dropped into the top it will bounce randomly either right or left at each pin it encounters. Most of the balls will bounce about an equal number of times in both directions, canceling out the left and right directions and landing in the middle. By chance, some of them will bounce to the left or the right more times, landing further from the middle. The end result is the accumulation of balls forming a normal distribution, which shows the decreasing likelihood of extreme patterns of bouncing (or of multiple causes all pushing the outcome in the same direction). Here's a video that shows a quincunx in action, where something more sand-like than ball-like is poured through the opening.

    The opening scene of Rosencrantz And Guildenstern Are Dead by Tom Stoppard, in which an unlikely extended run of coin flips gives rise to some existential angst. Note that even though each coin flip is perfectly in line with the "laws" of probability, we still don't quite believe this run of events should occur. (The play is a modern comedic take on two minor characters from Shakespeare's Hamlet who are unwittingly involved in a plot to kill Hamlet; this 1966 update focuses on their misadventures before their own eventual deaths.)

    Deriving the estimate of the standard error of the mean: something you don't need to be able to do at all but may be curious about, and if you are, it's explained clearly in section 10.17 of this text by Glass and Hopkins.

    Bayes's Theorem article in Wikipedia: I'm pretty sure it's legitimate to phrase the theorem this way: The probability of A being true given that B is true is equal to the probability that B actually does occur due to A, divided by the probability that B actually does occur due to any possible reason it might occur -- that is, that B occurs at all under any circumstances. This denominator is sometimes expressed as the sum of two other probabilities: that B occurs due to A, and that B occurs due to every reason other than A, which do in fact account for all occurrences of B since "A and not-A" pretty much covers every possible reason for B. You can substitute the observations of interest into this formula: A = a hypothesis being true, and B = data bearing on that hypothesis. Examples listed on this link are pretty illuminating, if you follow them closely. The trick with Bayesian statistics is coming up with those probabilities that are the ingredients in the formula, e.g., of B occurring due to any possible reason -- it's educated guesswork at best (which can be pretty good after all).
    Bayes's Theorem excerpt from Howell ch. 5: a very good basic treatment.

    Always Use Welch's t-test instead of the traditional Student's t-test: the rationale for this recommendation is explained in this blog post, and a published reference is cited too. "The idea that a two-step procedure (first performing Levene's test, then deciding which test statistic to report) should be replaced by unconditionally reporting Welch's t-test is generally accepted by statisticians, but the fool-hardy stick-to-what-you-know 'I'm not going to change if others are not changing' researcher community seems a little slow to catch on. But not you." Also included: the formula for the modified df in the Welch test, since you're interested.

    Logic Of ANOVA summary

    Proof that the expected value of F isn't actually 1 but rather is dferror/(dferror-2), with bonus proof that t2=F. This is good for a laugh, or else for some extended highly motivated effort.

    Understanding ANOVA Visually: a fun bit of Flash animation; related teaching tools are listed at http://old.psych.utah.edu/learn/statsampler.html

    G*Power Home Page: free software for power calculations.

    Some interesting demonstration apps can be found at this site, but many of them may no longer work due to Java being (appropriately) shunned by browsers these days. Among them is this Statistical Power Applet, a visual demonstration of the relations among the various quantities related to power (α, β, N, and effect size), which does require Java to be enabled in the browser.

    Correlation article in Wikipedia: whether or not the math explained here is of interest (correlations as cosines, etc.), the two images depicting sets of scatterplots are very important to understand. One of the diagrams on this page shows some scatterplots and the correlation coefficients calculated from them, just to give you an idea of what typical correlations might look like, but also of how unpredictable they might be if you don't look at your data in a scatterplot. This point is made even more obvious by this diagram further down the same page known as Anscombe's quartet, which shows some very different sets of data that all give the exact same value for the correlation coefficient r (as well as for some other descriptive statistics). Along those lines, this game is also quite illuminating.

    All the XKCD comics I always try to look up for a stats class (and then some)

    Analytic Contrasts summary

    Glossary of some statistical and experimental design terms: Many terms and concepts encountered in the latter part of the course are easily confusable, so I've tried to lay out their definitions and relationships here.

    Keppel's ANOVA notation system (PDF)
    Keppel's ANOVA notation system (Microsoft Word)
    This is a handy summary of how to compute Degrees of Freedom for any Source of Variance. Keppel and Wickens (2004) use an ANOVA notation system that provides a simple way to compute Sums Of Squares: by converting Sources of Variance into Degrees of Freedom, and then into a combination of "bracketed" quantities, where the brackets indicate some further adding and dividing. But since no one in their right mind computes Sums Of Squares by hand, the only remaining useful part of this page is the part describing how to get Degrees of Freedom. That is quite useful though.

    Recognizing Higher Order Interactions From Graphs And Means Tables

    Finding Sources of Variance (PDF)
    Finding Sources of Variance (Microsoft Word)

    Expected Mean Squares (PDF)
    Expected Mean Squares (Microsoft Word)

    Excel spreadsheet for calculating values of the z, t, F, and chi-square distributions and their probabilities

    Table of Selected Values of the t Distribution:

  • In the absence of SPSS, Excel (TDIST and TINV functions), or other relevant software, use this table to find the value of t that cuts off a certain percentage of the area under the curve, which corresponds to the probability of obtaining a t of that size or larger. Since t is symmetric it doesn't matter whether it's positive or negative (i.e., whether it's in the upper or lower tail); all that counts is the absolute value which represents the obtained score's distance from the null hypothesis value in units of estimated standard errors -- analogous to a z-score which uses KNOWN standard errors or standard deviations as its units. The many curves representing the t distribution differ depending on the degrees of freedom or df, with few df giving a curve that is flatter with longer tails than the standard normal distribution (or z distribution); with more and more df, the t distribution looks more and more like the z distribution. (Note that with infinite df, which means an infinite sample size, the values for t are identical to those you'd find in the z distribution.)
  • Read the row corresponding to the correct df: for analyzing means the df are n-1 for a single sample, and for a 2 sample means comparison the df are the sum of each sample's df (or N-2, where N is the total number of observations from both groups). In correlation and regression the df are the number of observations minus the number of predictors, minus 1 (or N-k-1). The commonly used proportions listed in this version of the table are conveniently identified by two different column headings, based on whether you want the proportion of interest to be located entirely in one tail, or split between the upper and lower tails. See the diagram accompanying the table to clarify this. ALWAYS use the two-tailed version, and thus the headings under "proportion in two tails combined" -- so the 1 df value for p=.05 is 12.706, not 6.314. (One-tailed tests of so-called "directional hypotheses" map p-values onto smaller required values of t, making it easier to declare results significant, but this procedure has always been controversial and I rarely see a situation that legitimately calls for it. How often is it really the case that one group's mean MUST be higher than the other's, and it's inconceivable that their sizes could be reversed?) As an example, the t value for the p<.01 cutoff for the difference between the means of two samples of size n=10 would be 2.878. The df would be (10-1) + (10-1) = 18, and the appropriate column would be the one under 0.01 as you read the "proportion in two tails combined" headings. If your obtained t is larger than 2.878 then it clearly cuts off an even smaller proportion of the area than .01, and thus you can say the t you obtained has p<.01. (Any statistical software will tell you precisely what the p-value for your t actually is.)
  • Note that if the particular df you're looking for don't appear in the table, you should use the next LOWER df -- do NOT round df UP even if that higher df value is closer to yours. Another table with more values included appears here, and many more are available on the web. Many of these, for instance this one, will give the complementary proportion of the area for values SMALLER than t, and will do so only for one tail -- thus to find the example value of 2.878 you'd have to look for 18 df and then the 99.5% cutoff value, because p=.01 corresponds to a total of 1% of the area being more extreme and you have to split that 1% into 0.5% in the upper tail and 0.5% in the lower.

    Table of Selected Values of the F Distribution:

  • In the absence of SPSS, Excel, or other relevant software, use this table to find the value of F that cuts off a certain percentage of the area under the curve, which corresponds to the probability of obtaining an F of that size or larger. The F distribution has only one tail to consider, in the sense that the extreme values of interest are UPPER values only. The distribution's shape differs according to both the number of groups (or predictors) being analyzed, and the number of observations being made, and so picking out the relevant member of the family of F distributions requires two numbers specifying its df (one for the numerator df and one for the denominator df). Reproducing all the percentage cutoff points for the area under the curve (corresponding to the probabilities) for all possible combinations of these df would be very unwieldy. Thus only the most common cutoff values -- 5%, 10%, and 1% -- are included in this version of the table. They are organized such that the columns represent different numerator df up to 20 (appropriate for 21 group means in ANOVA, or 20 predictor variables in regression, which should be plenty), and the rows represent all values of the denominator df from 1 to 100.
  • Consulting the section of the table appropriate for the p-value you wish to examine, you find the row and column corresponding to your numerator and denominator df and the value at that entry is the upper "critical value": the value of F beyond which the given percentage of the area under the curve is cut off. For instance, the value for the p<.01 cutoff for the difference between the means of two samples of size n=10 would be 8.285. Familiarity with ANOVA df would make it apparent that the numerator df would be [number of groups] - 1 = 2-1 = 1, and the denominator df would be the sum of the df within each group, or (10-1) + (10-1) = 18. The entry in the p=.01 portion of the table under numerator df (called "ν1") and denominator df (called "ν2") is 8.285, meaning that for those df the area under the curve beyond the value of 8.285 on the horizontal axis is 1% of the total, and the probability of randomly sampling scores that lead to that high an F value when there is no difference between the populations means is 1%. If your obtained F is larger than 8.285 then it clearly cuts off an even smaller proportion of the area than .01, and thus you can say the F you obtained has p<.01. (Any statistical software will tell you precisely what the p-value for your F actually is.)
  • For 2 groups, either F or t can be used to yield exactly the same probability; in comparing just two groups the numerator df will always be 1 and the denominator df will be the same as the df for t. F then is the square of t -- that is, within rounding error, 8.285 is the square of 2.878.
  • Note that if the particular numerator and/or denominator df you're looking for don't appear in the table, you should use the next LOWER df -- do NOT round df UP even if that higher df value is closer to yours. A printable pdf version of the F distribution table for p=.05 and p=.01 values with numerator df up to 10 and all denominator df up to 100 is here. More versions of tables for F and other distributions appear here and at various other easily located web sites. Many web pages such as this one will calculate a p-value for any given F and df, and others will calculate F given df and a p-value, etc. But if you have access to the internet, chances are you also have access to Excel which will do the same with its FDIST, FINV, TDIST, and TINV functions, etc., or SPSS which displays all p-values for its analyses automatically.
    Supplemental readings in statistics and psychology:

  • Some useful papers:
  • Gravetter, F. J., & Wallnau, L. B. (2006) Statistics for the Behavioral Sciences (7th ed.). Belmont, CA: Wadsworth/Thomson: a very clear introductory level statistics text.
  • Howell, David C. (2007). Statistical Methods for Psychology (6th Ed.). Thomson-Wadsworth. (ISBN-10: 0495012874; ISBN-13: 9780495012870): an introductory text of exceptional clarity and accuracy, for the grad or advanced undergrad level:
  • Keith, Timothy Z. (2006). Multiple Regression and Beyond. Allyn & Bacon. ISBN: 0205326447 (ISBN-13: 9780205326440): used for STAT 379 Spring 2007/2008.
  • Grimm, Lawrence G. and Yarnold, Paul R., eds. (1994). Reading and Understanding Multivariate Statistics. APA. (ISBN: 1-55798-273-2; ISBN-13: 978-1-55798-273-5): used for STAT 379 Spring 2007/2008.
  • Grimm, Lawrence G. and Yarnold, Paul R., eds. (2000). Reading and Understanding MORE Multivariate Statistics. APA. (ISBN: 1-55798-698-3; ISBN 13: 978-1-55798-698-6): companion volume to the 1994 book.
  • Pedhazur, Elazar J. (1997). Multiple Regression in Behavioral Research (3rd Ed.) Thomson-Wadsworth. (ISBN-10: 0030728312; ISBN-13: 9780030728310): an advanced text and one of the best references on multiple regression and related procedures.
  • Keppel, Geoffrey & Wickens, Thomas D. (2004). Design and Analysis: A Researcher's Handbook, 4/E. Prentice Hall. ISBN-10: 0135159415 (ISBN-13: 9780135159415): used for STAT 242 Fall 2007.
  • Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analysing data: A model comparison perspective (2nd ed.). Mahwah, NJ: Erlbaum.(ISBN/ISSN: 0-8058-3718-3; ISBN13: 978-0-8058-3718-6): an advanced text on experimental design and ANOVA.


    Some important figures in the history of statistics:

  • Abraham De Moivre around 1730 derived the normal distribution as the limit of the binary distribution when the number of binary decisions (e.g., coin tosses) is infinite.
  • Johann Carl Friedrich Gauss often gets credit for discovering the normal distribution because in 1809 he proved that it described errors of measurement (in astronomy, etc.), which is why the normal distribution is sometimes called the Gaussian distribution.
  • Adolphe Quetelet in 1835 first applied the normal distribution to biological and behavioral traits rather than merely to measurement error, describing the concept of "the average man"; he also invented the Quetelet Index which today we usually refer to as the Body Mass Index (BMI).
  • Francis Galton invented the concepts of correlation and regression around 1886. He also read and wrote at age 2-1/2, went ballooning and did experiments with electricity for fun, mapped previously unexplored African territories, taught soldiers camping procedures and how to deal with wild animals and "savages," tried to objectively determine which part of Britain had the most attractive women, studied the efficacy of prayer empirically, observed the amount of fidgeting at scientific lectures to measure the degree of boredom, invented fingerprinting and weather maps along with the meteorological terms "highs," "lows," and "fronts," coined the phrase "nature and nurture," and pioneered mental testing, twin studies of heritability, the composite photograph, the study of mental imagery, the free-association technique for probing unconscious thought processes, the psychological survey questionnaire, and... umm... eugenics. Oops.
  • Karl Pearson founded modern statistics beginning in the 1890's, inventing the chi-square distribution and test and coining the term "standard deviation" among others; he formalized the calculation of the correlation coefficient (where Galton had arrived at it graphically) and so that calculation bears his name today.
  • George Udny Yule worked on the concepts and mathematics of partial correlation and regression in the 1890's, making multiple regression as we know it possible.
  • William Sealy Gosset in 1908 worked out the distribution of sample means ("standard error" in modern terminology) for cases where the population standard deviation is unknown -- hence he is the inventor of the t-test.
  • Ronald Fisher was a key figure in bridging the gap between the Darwinian theory of natural selection and its underlying mechanism of Mendelian genetics; from about 1915 onwards he also invented experimental design as we know it today, and developed Analysis Of Variance (ANOVA) as a generalization of Gosset's work to more than two groups (Snedecor in his influential early textbook named the 'F' statistic for Fisher).
  • Jerzy Neyman and Egon Pearson (son of Karl) invented and refined many of the concepts of null hypothesis significance testing in the 1930's (e.g. the alternative hypothesis, power, Type II error, confidence intervals), though Fisher had a constant ongoing argument with everything they did -- mainly because it wasn't the way HE did it.


    Demos from my two bands - kind of amateur-ish, but enthusiastic. In both bands I'm on piano, keyboards, and organ, and my brother is the drummer.


    If you're wondering about classes being canceled due to weather, see http://alert.uconn.edu/ or call 486-3768.