Chapter 8

Correlation:
Understanding Bivariate Relationships Between Continuous Variables

8.1 Introduction to the Pearson Correlation Coefficient: r

In Chapter 7 we demonstrated how to use the Crosstabs procedure to examine the relationship between pairs of categorical variables. As part of this procedure, we also discussed how we could use the statistical measure of association, Chi square, to determine whether or not a relationship between two categorical variables is statistically significant.

Recall from Chapter 5 that we created these categorical variables by transforming variables that were orginally continuous. For example, scores on the continuous variable, masctot, range from 5 to 35. These scores were transformed into a categorical variable, masc, with only two levels (1= low masculinity; 2 = high masculinity). While such transformations permitted the construction of contingency tables for crosstabulation, there are some disadvantages to converting continuous data to categorical data.

Such transformations result in a loss of detail and precision that might affect measures of association between variables. Statistically, the reduction in variability in the transformed categorical variables can decrease our ability to assess relationships. There are other ways to evaluate the relationship between continuous variables, and one such procedure involves the calculation of the Pearson correlation coefficient, known as r.

In general, the Pearson correlation coefficient is a statistic used to determine the degree and direction of relatedness between two continuous variables. The possible values of the correlation coefficient range from -1.00 to +1.00, and the closer the number is to an absolute value of 1.00, the greater the degree of relatedness. As with Chi square, the Pearson correlation coeffficient can be tested for statistical significance (using the conventional probability criterion of .05).

Two variables may be strongly related, as is the case for age and body weight before puberty (10 year olds invariably weigh more than 5 year olds). Or the two variables may be weakly related, as is the case for these same two variables, age and body weight, between ages 30 and 50 (45 year olds are not likely to weigh more than 35 year olds).

Sometimes changes in one variable are associated with no predictable change in the second variable. For example, in adults height and intelligence are unrelated: tall people can be either smart or dull, and the same holds for short people. In such cases the correlation between the variables centers around zero; that is, there is no correlation.

T he direction of the relationship, either positive or negative, is indicated by the sign (+ or -) of the correlation value (i.e., whether the coefficient is a positive or negative number). High school SAT scores and college GPA's are positively correlated: high values on one variable (SAT score) are associated with high values on the second (college GPA). When two variables are negatively correlated, an increase in the first is associated with a decrease in the second. For example, we typically find that the higher one's alcohol intake, the lower that person's motor coordination becomes.  

The Pearson correlation coefficient (r) is used specifically to describe relationships when the variables to be correlated are continuous (measured on at least an interval scale). This procedure also assumes that the correlated variables are normally distributed, and that the relationship between the two variables approximates a linear one. Curvilinear relationships are best described with other correlational procedures, most of which are beyond the scope of this book.