Overview of Techniques in the General Linear Model

 

Typically IVs are referred to as X and DVs as Y, but the GLM doesn't require labeling variables as IVs and DVs so Y might mean simply "on the left" and X "on the right" of the equation. "Variate" means a composite variable which is a linear combination of the observed variables on one side of the equation (Xs or Ys); composites are made separately for each side. The number of composite variables that an analysis allows on each side is equal to the lesser number of variables on one side of the equation, whether Xs or Ys. When either side represents groups, consider the number of variables involved to be the number of dummy variables needed to represent the groups (i.e., g - 1). The term "multivariate" then should be reserved for analyses that involve both sides of the equation having more than one variable; otherwise only a single composite is constructed and the analysis is thus "univariate". Practically speaking, this means the term "multivariate" usually refers to analyses involving multiple dependent  (Y) variables. Sometimes the term "multivariate" is used casually to describe analyses that have only one Y and multiple Xs, as if it meant simply multivariable, and this is understood from context. (Along these lines, the term "bivariate" refers to a relationship between just two variables. The term "covariate" does not refer to "variates" at all, but to "covariation.")

 

Every technique in the General Linear Model is fundamentally about examining correlations between linear combinations of observed variables ("variates"). Techniques differ in the number of Xs and Ys allowed, and in whether they're continuous or discrete; note that a discrete variable with just two levels is "dichotomous" and can be represented by a single dummy variable. Each technique makes assumptions, some general to all techniques, some specific to its family of techniques, and some specific to the particular technique; such assumptions may vary in their restrictiveness and in the consequences of their violations.

 

"[T]he GLM view forces researchers to understand that all analyses are correlational. Some designs are experimental, but all analyses are correlational..."

Thompson, B. (2000). Canonical Correlation [p. 298]. In Grimm, Lawrence G. and Yarnold, Paul R., eds. (2000). Reading and Understanding MORE Multivariate Statistics. APA.

 

A.    Bivariate form: one X and one Y (where "bivariate" means two variables, not two variates)

         1.     Pearson product-moment correlation: X continuous, Y continuous

         2.     Point biserial correlation: X dichotomous, Y continuous (computed exactly as a Pearson correlation with X taking on only two values; "biserial" correlation assumes X is a continuous variable that has been dichotomized)

         3.     Independent samples t-test: X dichotomous, Y continuous (significance test is equal to that of the point-biserial correlation)

         4.     Phi coefficient: X dichotomous, Y dichotomous (computed exactly as a Pearson correlation with X and Y each taking on only two values; "tetrachoric correlation" assumes X and Y are each continuous variables that have been dichotomized)

         5.     Simple regression: X continuous or categorical, Y continuous

 

B.    Univariate form: more than one X (typically), one Y

         1.     Multiple regression: Xs continuous or categorical, Y continuous

         2.     ANOVA: all Xs discrete, Y continuous

         3.     ANCOVA: covariate controls are continuous Xs, groups are discrete Xs (dummy coded), no interaction between covariate and group, Y continuous

         4.     Two-group discriminant analysis: all Xs continuous, Y dichotomous (1 dummy variable); this is equivalent to a multiple regression analysis of Y on the Xs with Y taking on only two values

         5.     Multiway frequency analysis (or Log-linear modeling): all Xs discrete, Y is category frequency (2 levels, 1 dummy variable)

         6.     Two-group logistic regression analysis: Xs continuous and/or discrete, Y dichotomous

         7.     Multilevel modeling (or Hierarchical Linear Modeling): Xs at each level may be continuous or discrete, Ys at each level are continuous

         8.     Survival analysis: Xs continuous and/or dichotomous, Y continuous (time)

         9.     Time series analysis: Xs continuous (time) and dichotomous, Y continuous

         10.  A "multivariate analysis" such as MANOVA with only one X (a dummy variable representing 2 groups), and more than one Y, is technically univariate, since only one composite can be constructed; it is equivalent to a multiple regression analysis of X on Y with X taking on only two values; this is the reverse of the two-group discriminant analysis it implies, which is also included under univariate techniques

 

C.    Multivariate form: more than one X, more than one Y

         1.     Canonical correlation: all Xs continuous or categorical, all Ys continuous or categorical

         2.     MANOVA (Multivariate ANOVA): all Xs discrete, all Ys continuous (like ANOVA but with multiple DVs)

         3.     MANCOVA: covariate controls are continuous Xs, groups are discrete Xs (dummy coded), no interaction between covariate and group, all Ys continuous (like ANCOVA but with multiple DVs)

         4.     Profile analysis version of MANOVA: all Xs discrete, all Ys continuous and commensurate (i.e., measured on the same scale)

         5.     Discriminant analysis: all Xs continuous, all Ys discrete (the reverse of MANOVA with the labels X and Y implicitly switched, but mathematically identical)

         6.     Factor analysis (FA)/principal components analysis (PCA): all Ys continuous, all Xs continuous and latent (i.e., unobserved but estimated from the observed Y variables)

         7.     Structural Equation Modeling: Xs continuous and/or latent, Ys continuous and/or latent (called "path analysis" when all Xs and Ys are observed rather than latent)

         8.     Multiway frequency analysis (or Log-linear modeling): all Xs discrete, Y is category frequency (3 or more levels, 2 or more dummy variables)

         9.     Polychotomous (or polytomous) logistic regression analysis: Xs continuous and/or discrete, Y discrete (Ordinal logistic regression when Y is ordinal)

 

Table modified from Table 17.1, p. 915 of Tabachnick, B.G., and Fidell, L.S. (2007). Using Multivariate Statistics (5th ed.). Boston: Pearson