only the greatest christmas song in the whole history of christmas
|
|
PSYC 5104 sec 001
Foundations of Research in the Psychological Sciences I, Fall 2018
UConn Storrs Campus, BOUS 160
MON WED 10:10-11:40
Eric Lundquist
Office:
BOUS 136
Office Hours: Tue Thu 12:30-1:30 and by appointment
Phone: (860) 486-4084
E-mail:
Eric.Lundquist@uconn.edu
TEACHING ASSISTANTS:
Andy Tucker
Office:
BOUS 120 (or 362)
Office Hours: Friday 1:00-3:00
E-mail: Andrew.Tucker@uconn.edu
Liz Simmons
Office:
BOUS 123
Office Hours: Wednesday 8:00-10:00
E-mail: Elizabeth.A.Simmons@uconn.edu
GRADING:
30% | assigned weekly | ||
35% | WEDNESDAY OCTOBER 17 | ||
35% | MONDAY DECEMBER 10, 10:00 AM |
TOPIC | READING |
Experimental Design |
KW Ch. 1
[basic issues and terminology]
Explanation of R2 along with Sum of Squares, Variance, and Confidence Intervals . Summary of Techniques in the General Linear model in HTML format, Microsoft Word format, and PDF format. PowerPoint slides on some introductory terminology and issues in experimental design. |
The "Replication Crisis" and related issues |
A list of issues and difficulties confronting psychological researchers lately, that I've presented here informally and shrilly which is consistent with how I present it in class.
A syllabus / reading list for a seminar titled Everything Is F***ed, only the asterisks are made more rude at this link. It's a good starting point for where to read about all the headaches science in general and psychology in particular are currently up against. This was a VERY popular and frequently shared blog post from the moment it appeared in August 2016. What has happened down here is the winds have changed: Andrew Gelman's incisive 2016 blog post recounting his impression of the developing problems. Point of information for the curious: the headings are lines from Randy Newman's song "Louisiana 1927" about the devastating Mississippi River flood that should have been foreseen and prepared for but somehow took everyone by surprise and became a huge disaster. (Especially poignant in 2017 as Houston lies underwater.) When the Revolution Came for Amy Cuddy: Very good report from the New York Times on the turmoil that has developed within psychology as a result of reconsidering the field's research methods and standards. Ioannidis Why Most Published Research Findings Are False.PDF Simmons Nelson Simonsohn False Positive Psychology.pdf Estimating the reproducibility of psychological science Nosek et al.pdf A manifesto for reproducible science.pdf ASA statement on p values.pdf BASP Editorial Null Hypothesis Banning.pdf |
Categorical data and Chi-Square |
Howell Ch.6 [excellent presentation of Chi-Square and related topics]
Excel spreadsheet to calculate a 2x2 chi square test of independence [including examples from the point of view of a dog and a medical researcher] How to talk about odds and odds ratios in English |
Data Description |
KW Ch. 2 pp. 15-18, 24-25; Ch. 3 pp. 32-34; Ch. 7 pp. 144-145
[histogram, scatterplot; central tendency, dispersion, standardization; normality, skewness and kurtosis]
|
The t-test and confidence intervals |
Howell Ch.7
[excellent treatment of the logic of the t-test, applied to the cases of a single sample mean, two related sample means, and two independent sample means; relation of t to z; confidence intervals described accurately on pp. 181-183]
KW Ch. 3 pp. 34-36, Ch. 8 pp. 159-161 and see my Notes on Confidence Intervals [references to Keith (2006) can be ignored, and the interpretation of confidence intervals for the regression coefficient "b" is the same as for the more familiar population mean "μ"] Some example of t-tests calculated in Excel. Double-clicking on formula cells will highlight the other cells used in the calculations. From the top, there are: a single sample t-test comparing sample mean gas mileage (mpg = miles-per-gallon) to manufacturer's mileage claims; a single sample t-test using some simple numbers that can be verified by hand; a paired samples t-test using those numbers again and creating a single set of difference scores; an independent samples t-test using those numbers again as scores from two different groups -- along with calculation of r2 directly from t and df, as compared to the square of the value of r given by Excel when correlating DV scores with Group membership. Each group's sd and standard error of the mean are calculated separately even though they aren't used in calculating t. For illustration purposes two different null hypothesis population means are used in each example, but note that for the paired samples and independent samples tests the only sensible value to hypothesize is 0, since we rarely have occasion to try to reject any other specific value. When identifying the value of t that cuts off the extreme 5% of scores, different conventions can be used. I typically write t.05(4) = 2.776 and it's implicit that t can be positive or negative. Howell says if I'm gonna do that, I should add the "+/-" before the "2.776", which is fair, but I usually don't and it's understood (I hope). What Howell typically does is to call that same value t.025 instead of t.05, since it cuts off the extreme 2.5% of scores on the positive side. I find that confusing so I don't do it. Here's his statement about notation from his 8th edition and here are examples of his usage from his 6th edition that I've posted from, and the 8th edition where his clarifying statement first appears. |
Null Hypothesis Significance Testing |
Howell Ch.4
[excellent and up-to-date treatment of the logic and controversies of hypothesis testing, possibly more accessible than Cohen's (1994) paper]
KW Ch. 2 pp. 18-22; Ch. 3 pp. 46-48; Ch. 8 pp. 167-169 Cohen (1994) [criticism of Null Hypothesis Significance Testing] Wilkinson and APA Task Force (1999) [recommendations for treatment of data in light of NHST controversy] For your curiosity and your future as a researcher, but not for your exam: Howell Ch. 5 Excerpt on Bayes's Theorem [provides a brief accurate description of Bayes's Theorem] Dienes (2011) [makes the case that Bayes's Theorem is what most people really believe is appropriate and want to use when analyzing data; link requires logging in with UConn NetID and password, then you should just download the pdf for convenience -- or, what the heck, read it right here] Cohen (1990) [general advice about treatment of data] Cowles & Davis (1982) [historical roots of the "p<.05" significance level] Gigerenzer (1993) [examination of the NHST controversy by contrasting the incompatible original views of Fisher and Neyman & Pearson with the unsatisfying hybrid of their views that became the dominant method of data analysis] Still Not Significant [blog post listing many many delusional euphemisms for insisting that non-significant results are still sorta significant] |
Between Subjects (Completely Randomized) Designs: One Factor |
KW Ch. 2 & 3, Ch. 8 pp. 161-162
Logic Of ANOVA summary More ANOVA information and examples |
Effect Size and Power |
KW Ch. 8 pp. 163-167 (but not "Effect Sizes for Contrasts")
Book Review of The Cult Of Statistical Significance from the journal Science from June 2008. This one-page article focuses on one consequence of the misplaced emphasis psychology places on null hypothesis significance testing, which is the neglect of effect size and of effect measurements. |
Assumptions of ANOVA (and t-tests): The Linear Model |
KW Ch. 7
Q-Q Plots described on pp. 75-79 of Howell 8th ed.: note that the axes are mistakenly reversed at the bottom of the first page (p.75) and I've left a note correcting that, so that it matches all the plots he shows. SPSS, however, actually does put the expected quantiles on the Y axis and the observed on the X axis. It's arbitrary but obviously your text has to match your graphs! |
MIDTERM REVIEW interim summary, updated for Fall 2016 and after
Some sample midterms from previous years' courses [Note: 1) PSYC 5104 was formerly STAT 3115 which was formerly STAT 242; 2) some of these pages are out of order; 3) topics have been covered in a different order so many of these questions are not relevant to our exam, which should be apparent - this is ESPECIALLY TRUE for Fall 2016 and later where a lot more material about methodology and current issues has been included in the earlier part of the course, possibly rendering most of these exams irrelevant to the midterm... but here they are anyway, so be able to answer the parts you recognize] RECORDING OF MONDAY 10/15/18 REVIEW SESSION I did not have the time or energy to cut out the superfluous small talk at the beginning and end of the review session, so this recording hits the 2 hour mark. The actual content of it was pretty good both in terms of what people asked about and how I explained it (I think!), and some of those explanations include some points I emailed you all earlier in the day. Yet, as I've said in past semesters, listening to it is not necessarily the optimal use of your studying time given that you already have your notes and the midterm summary link. Your call, maybe you just finished a podcast and are looking for the next big thing. Special message at the end for anyone who makes it all the way through. RECORDING OF MONDAY 10/16/17 REVIEW SESSION After cutting out some superfluous small talk at the beginning of the review session, this recording ends up being 68 minutes. Listening to it is not necessarily the optimal use of your studying time given that you already have your notes and the midterm summary link, but there are some useful moments I guess. The biggest drawback is the references to what's written on the board, which might be hard to visualize without having been there. Here's the midterm review session from Fall 2016 in case you're desperate to hear more of my voice, but even then I didn't recommend listening to it in lieu of actual studying (it's two hours long). For the first couple minutes I did look over my previous exam just reading off topics that were covered on that very similar exam, but that's just what's on the Study Guide that you already have. PRINTABLE TABLES AND FORMULAS PSYC 5104 midterm formulas.docx Howell 8E z table.pdf Howell 8E t table.pdf Howell 8E F table.pdf Howell 8E chi square table.pdf |
|
Correlation |
KW Ch. 15 pp. 312-314
r = covxy / (sx*sy), where covxy = SPxy / (N-1), and SPxy = Σ(X-Mx)(Y-My) Note this point from the list of links below: Correlation article in Wikipedia: whether or not the math explained here is of interest (correlations as cosines, etc.), the two images depicting sets of scatterplots are very important to understand. One of the diagrams on this page shows some scatterplots and the correlation coefficients calculated from them, just to give you an idea of what typical correlations might look like, but also of how unpredictable they might be if you don't look at your data in a scatterplot. This point is made even more obvious by this diagram further down the same page known as Anscombe's quartet, which shows some very different sets of data that all give the exact same value for the correlation coefficient r (as well as for some other descriptive statistics). Along those lines, this game is also quite illuminating. As another way of testing your intuitions about correlations, play this Guess The Correlation game. correlation calculation.xlsx demonstrating how the formula works unequal n correlation.xls demonstrating that unbalanced factorial designs have correlated (i.e. non-orthogonal) independent variables, whose (Type III) SS therefore will not add up to the total SS. |
Analytical Comparisons Among Means (Single-df Contrasts) |
KW Ch. 4 sec. 4.1 - 4.5
Analytic Contrasts summary |
Controlling Type I Errors in Multiple Comparisons (Planned and Post-hoc) |
KW Ch. 6
|
Trend Analysis |
KW Ch. 4 sec. 4.6 - 4.7; Ch. 5
|
Between-Subjects (Completely Randomized) Designs: Two Factors |
KW Ch. 10 & 11
Two Factor Design: Interactions and Main Effects: this summary describes how to recognize when main effects and interactions are present in the two-way factorial design, both in terms of plots of means and in terms of tables of means. Keppel's ANOVA notation system (PDF): This is a handy summary of how to compute Degrees of Freedom for any Source of Variance. Keppel and Wickens (2004) use an ANOVA notation system that provides a simple way to compute Sums Of Squares: by converting Sources of Variance into Degrees of Freedom, and then into a combination of "bracketed" quantities, where the brackets indicate some further adding and dividing. But since no one in their right mind computes Sums Of Squares by hand, the only remaining useful part of this page is step 1 describing how to get Degrees of Freedom. That is quite useful though. Here's a Microsoft Word version in case it's convenient for any reason. |
Analyzing Interactions |
KW Ch. 12 & 13
KW Ch. 14 pp. 303-307, 309-310: Nonorthogonality of the Effects, 14.3 Averaging of Groups and Individuals, and 14.5 Sensitivity to Assumptions (14.4 "Contrasts and Other Analytical Analyses" is optional, being a little heavy on notation for things you wouldn't really do by hand). Excel example of interaction effects, three way interaction, and removable vs. nonremovable interactions: this includes data from Dovidio and Gaertner (2000) as well as from Keppel and Wickens 4E pp. 208-209, though you could also play around with putting different numbers into the 2x2x2 or 3x2x2 interaction graphs. |
Analysis of Covariance (ANCOVA) |
KW Ch. 15 pp. 311-312 [Aside from the analogy to post-hoc blocking (see pp. 231-232), this chapter will be largely skipped in favor of a regression-based treatment of ANCOVA in the spring semester (PSYC 5105).]
|
Three Factors and Higher Order Factorial Designs: Between-Subjects Designs |
KW Ch. 21 & 22
Recognizing Higher Order Interactions From Graphs And Means Tables |
Repeated Measures (Within-Subjects) Designs: One Factor |
KW Ch. 16 & 17
REPEATED MEASURES ANOVA notes: this summary is a companion to the "Logic of ANOVA Summary" above; it outlines the logic of the Sums Of Squares calculations that we will not concern outselves with in this class, though it may be useful to look at if you're confused about the concept behind such a calculation -- i.e., how the Within Groups Sums of Squares from the independent groups ANOVA is further partitioned into the part due to individual differences among the subjects and the part that is truly just experimental error. In Keppel and Wickens terms, that experimental error is identified as the interaction between factor A and the subject, thus the "AS" term, whereas here it's simply referred to as "error". The "S" term is referred to as the "Between Subjects" factor. The number of treatment conditions in the Between Groups factor A is called "k" here instead of our familiar "a". Expected Mean Squares (PDF): this topic isn't specific to any particular design, so it's being introduced at an arbitrary late point in the semester even though implicitly it was already introduced with the description of the F ratio for the single factor ANOVA; here's a Microsoft Word version in case it's convenient for any reason. |
Repeated Measures (Within-Subjects) Designs: Two Factors |
KW Ch. 18
|
Mixed Designs: One Between, One Repeated Factor |
KW Ch. 19 & 20
Finding Sources of Variance (PDF): once you're dealing with combinations of different numbers of between and within factors, it's good to have a general scheme for identifying what the sources of variance are in a given design; here's a Microsoft Word version in case it's convenient for any reason. |
Three Factors and Higher Order Factorial Designs: Repeated Measures and Mixed Designs |
KW Ch. 23
|
Random and Nested Factors |
KW Ch. 24 & 25 but read mainly pp. 530-534
|
FINAL REVIEW summary
Some sample final exams from previous years' courses [Note: 1) PSYC 5104 was formerly STAT 3115 which was formerly STAT 242; 2) some questions address topics we haven't covered, or have covered less thoroughly than these exams assume; you'll be able to determine which questions you can answer, and then use those for practice.] Some questions from previous years' STAT 3115 midterms are relevant to the final exam material listed above (e.g. contrasts, post-hoc testing, etc.); see in particular: 2004#3, 2003#1(a-d), 2002#3, 2001#2&3&4(b, if you consider factorial designs), 2000#2(b&c)&3 RECORDING OF MONDAY 12/10/18 REVIEW SESSION Monday's review session seemed pretty useful but at two and a half hours it's probably only worth listening to once you've done the studying using the study guide / review above, and is pretty dependent on seeing what was on the board, anyway. So maybe not the best use of your studying time, but it's here if you want to fall asleep to it the night before the exam. |
Ten simple rules for structuring papers, which you might find to be sort of a useful guide to how to write a good research report. Not about APA style, but about what makes a good paper. Handy for your own reference and maybe for when you're trying to teach students how to do it when you're a TA, or later on, a graduate advisor.
Some miscellaneous topics explained: I wrote this for my undergrad research methods class (so when it refers to exams, it means THEIRS, not OURS!) but it's consistent with things I say in the grad course and some of it is useful to see addressed explicitly.
The Worst Example Of A Three Way Interaction Ever: This is a ludicrous example inspired by a ludicrous blog post and includes a ludicrous graph, and is still an accurate description of a three-way interaction in ANOVA.
How do you pronounce "Likert scale"?: The definitive answer.
Why the sample variance has a denominator of N-1 instead of N:
a proof that dividing the sample sum of squares by N-1 instead of N gives an unbiased estimate (i.e. accurate in the long-run average) of the population variance. This is purely for the mathematically inclined -- others should steer clear. (Believe it or not, I've seen other proofs that are more complicated and thus probably more thorough.)
The "expectation" operator notated as E(X) means roughly the long-run average of X or the mean of all X's in the population, but note that doesn't necessarily indicate a mean of some score -- X could be a variance for instance, and then E(X) would be the population value of that variance, as it is in this proof. If that helps clear anything up.
Here is an alternative proof from a book on mathematical statistics. Other pages from the same book follow but are unrelated to this topic.
Confidence Intervals in Howell ch. 7 pp. 181-183
Notes on the meaning and interpretation of Confidence Intervals:
Howell's discussion is very good, so the somewhat lengthy little essay that I've included here is more than I intended to write; still, it may be helpful to hear it expressed in more than one way.
ANOVA vs Regression: this post describes some general situations in which you might prefer one to the other, even though they're equivalent expressions of the General Linear Model. Then there's a bit about why dichotomizing continuous variable scores (or grouping scores into more than two categories, as well) is not a great idea, and why it's better to treat continuous variables with regression-based techniques instead of trying to cram them into possibly more familiar ANOVA-type techniques.
An illustration of the three types of kurtosis
which I've also incorporated into an informative
web page about everyone's favorite monotreme.
Sadly it must be noted that the diagram is based on a traditional misunderstanding of what kurtosis looks like -- see the next link...
What kurtosis actually means: To correct the diagram presented above, this paper gets the message across in its first lines: "The incorrect notion that kurtosis somehow measures "peakedness" (flatness, pointiness or modality) of a distribution is remarkably persistent, despite attempts by statisticians to set the record straight. This article puts the notion to rest once and for all. Kurtosis tells you virtually nothing about the shape of the peak - its only unambiguous interpretation is in terms of tail extremity; i.e., either existing outliers (for the sample kurtosis) or propensity to produce outliers (for the kurtosis of a probability distribution)." That is, leptokurtic really only means more observations in the tails relative to the normal distribution, and platykurtic means less in the tails. It concludes decisively, "[K]urtosis should never be defined in terms of peakedness. To do so is counterproductive to the aim of fostering statistical literacy. The relationship of peakedness with kurtosis is now officially over." It's a short article but between those sentences there's probably way more than you want to know: Westfall, P. H. (2014). Kurtosis as Peakedness, 1905-2014. R.I.P. The American Statistician, 68(3), 191-195.
Mean, Median, and Skew: Correcting a Textbook Rule: interesting to note that the cliché picture of how skewness draws the mean in the direction of the skew, and the median less so, is not always true, especially for discrete distributions. The point is probably made adequately if you just read the brief "Abstract" and "What To Teach" sections.
Reliability is described adequately here in Wikipedia, as are several types of validity -- among them Internal, External, Construct, and Statistical Conclusion validity. See especially the respective threats to each, for aspects of research designs to pay special attention to.
Distributions and distribution free tests: a brief listing of the most common sampling distributions, used to evaluate the probabilities of observing various values of a statistic calculated from a sample. When the assumptions of a statistical test aren't met, those distributions don't describe the probabilities of getting those values. Instead you should use a test that is free of any distribution assumptions. Hence the term "distribution-free" tests, or "non-parametric" in that they don't specify the parameters of the distribution ahead of time -- though those terms are not strictly synonymous. Some of the more popular distribution free tests are listed at the end as suggestions to try out, without further explanation.
Odds and Probabilities: a primer on definitions, interpretations, and calculations
How to talk about odds and odds ratios in English, duplicating the post from the chi-square section in the syllabus above
Exponents and logarithms a primer on some basic mathematics that comes up in statistical contexts such as: logarithmic data transformations; loglinear models of categorical data with multiple IV's; the log(odds) transformation in logistic regression; the log likelihood (or "deviance" or "-2LL") in model comparison analyses like Structural Equation Modeling.
The Secretary Problem, or how to choose a spouse. In case you're interested in the underlying math or something, apart from the illustration of how mathematical assumptions determine the applicability of models.
A diagram of a "quincunx", sometimes called a "Galton Board" after its inventor Francis Galton, which models the way multiple causation results in a normal distribution. It's a wooden board with pins inserted into it, and when a ball is dropped into the top it will bounce randomly either right or left at each pin it encounters. Most of the balls will bounce about an equal number of times in both directions, canceling out the left and right directions and landing in the middle. By chance, some of them will bounce to the left or the right more times, landing further from the middle. The end result is the accumulation of balls forming a normal distribution, which shows the decreasing likelihood of extreme patterns of bouncing (or of multiple causes all pushing the outcome in the same direction). Here's a video that shows a quincunx in action, where something more sand-like than ball-like is poured through the opening.
The opening scene of Rosencrantz And Guildenstern Are Dead by Tom Stoppard, in which an unlikely extended run of coin flips gives rise to some existential angst. Note that even though each coin flip is perfectly in line with the "laws" of probability, we still don't quite believe this run of events should occur. (The play is a modern comedic take on two minor characters from Shakespeare's Hamlet who are unwittingly involved in a plot to kill Hamlet; this 1966 update focuses on their misadventures before their own eventual deaths.)
Deriving the estimate of the standard error of the mean: something you don't need to be able to do at all but may be curious about, and if you are, it's explained clearly in section 10.17 of this text by Glass and Hopkins.
Bayes's Theorem article in Wikipedia: I'm pretty sure it's legitimate to phrase the theorem this way: The probability of A being true given that B is true is equal to the probability that B actually does occur due to A, divided by the probability that B actually does occur due to any possible reason it might occur -- that is, that B occurs at all under any circumstances. This denominator is sometimes expressed as the sum of two other probabilities: that B occurs due to A, and that B occurs due to every reason other than A, which do in fact account for all occurrences of B since "A and not-A" pretty much covers every possible reason for B. You can substitute the observations of interest into this formula: A = a hypothesis being true, and B = data bearing on that hypothesis. Examples listed on this link are pretty illuminating, if you follow them closely. The trick with Bayesian statistics is coming up with those probabilities that are the ingredients in the formula, e.g., of B occurring due to any possible reason -- it's educated guesswork at best (which can be pretty good after all).
Bayes's Theorem excerpt from Howell ch. 5: a very good basic treatment.
Always Use Welch's t-test instead of the traditional Student's t-test: the rationale for this recommendation is explained in this blog post, and a published reference is cited too. "The idea that a two-step procedure (first performing Levene's test, then deciding which test statistic to report) should be replaced by unconditionally reporting Welch's t-test is generally accepted by statisticians, but the fool-hardy stick-to-what-you-know 'I'm not going to change if others are not changing' researcher community seems a little slow to catch on. But not you." Also included: the formula for the modified df in the Welch test, since you're interested.
Proof that the expected value of F isn't actually 1 but rather is dferror/(dferror-2), with bonus proof that t2=F. This is good for a laugh, or else for some extended highly motivated effort.
Understanding ANOVA Visually: a fun bit of Flash animation; related teaching tools are listed at http://old.psych.utah.edu/learn/statsampler.html
G*Power Home Page: free software for power calculations.
Some interesting demonstration apps can be found at this site, but many of them may no longer work due to Java being (appropriately) shunned by browsers these days. Among them is this Statistical Power Applet, a visual demonstration of the relations among the various quantities related to power (α, β, N, and effect size), which does require Java to be enabled in the browser.
Correlation article in Wikipedia: whether or not the math explained here is of interest (correlations as cosines, etc.), the two images depicting sets of scatterplots are very important to understand. One of the diagrams on this page shows some scatterplots and the correlation coefficients calculated from them, just to give you an idea of what typical correlations might look like, but also of how unpredictable they might be if you don't look at your data in a scatterplot. This point is made even more obvious by this diagram further down the same page known as Anscombe's quartet, which shows some very different sets of data that all give the exact same value for the correlation coefficient r (as well as for some other descriptive statistics). Along those lines, this game is also quite illuminating.
All the XKCD comics I always try to look up for a stats class (and then some)
Glossary of some statistical and experimental design terms: Many terms and concepts encountered in the latter part of the course are easily confusable, so I've tried to lay out their definitions and relationships here.
Keppel's ANOVA notation system (PDF)
Keppel's ANOVA notation system (Microsoft Word)
This is a handy summary of how to compute Degrees of Freedom for any Source of Variance. Keppel and Wickens (2004) use an ANOVA notation system that provides a simple way to compute Sums Of Squares: by converting Sources of Variance into Degrees of Freedom, and then into a combination of "bracketed" quantities, where the brackets indicate some further adding and dividing. But since no one in their right mind computes Sums Of Squares by hand, the only remaining useful part of this page is the part describing how to get Degrees of Freedom. That is quite useful though.
Recognizing Higher Order Interactions From Graphs And Means Tables
Finding Sources of Variance (PDF)
Finding Sources of Variance (Microsoft Word)
Expected Mean Squares (PDF)
Expected Mean Squares (Microsoft Word)