Scatterplot Matrix
of family Food Expenditure, Income and Size
Motivation: Oftentimes, it may not be realistic to
conclude that only one factor or IV influences the behavior of the DV. In such
situations, a researcher needs to carefully identify those possible factors and explicitly
include them in the Linear Regression Model (LRM). Both the existing theory and common
sense should constitute a basis for selecting the IVs; and where data on a theoretical
variable is not readily available a proxy should be chosen carefully. Graphical
assessment of both the type and the structure of correlation among the variables can be
accomplished by using the scatterplot matrix - a graphical device that consists of
scatterplots for each pair of variables in the model.
Problem Description and Data
The maintained hypothesis is essentially similar to the Simple Regression
& Correlation case where annual family Income was considered the only
determinant of annual family Food Expenditure. The influence of other factors, such as
family Size was assumed away as one of the ceteris paribus
factors. That assumption is relaxed here by hypothesizing that family Size () also has a positive
influence on the annual family Food Expenditure ()
in addition to annual family Income ().
The Multiple Regression & Correlation Analysis
attempts to measure and isolate the separate effects of X1 and X2 on
Y, as well as determine if any relationship exists between X1 and X2
that might blur the their separate effects on Y.
The additional data on family size is as follows (Source: Hamburg et al., 1994, p. 507):
X2 (number in family)
3 |
3 |
2 |
1 |
4 |
2 |
3 |
2 |
1 |
6 |
3 |
4 |
1 |
5 |
3 |
1 |
6 |
5 |
2 |
2 |

The Scatterplot matrix is an important graphical tool for
screening the data to visually identify the following possibilities:
1. Type of relationship between the variables (a pair at a time) - Direct or Indirect
2. Form of relationship between the DV and the IVs - Linear or Nonlinear
3. Degree of relationship between any two variables - from Perfectly Strong and Direct (r = +1) to Perfectly Strong and indirect (r = -1). No relationship at all if r = 0
4. Presence/Detection of Outliers in the data set.
The above matrix suggests the following conclusions:
1. The relationship between annual family Food Expenditure and Size is Direct, Linear, and relatively Strong with possibly one OUTLIER.
2. The relationship between annual family Food Expenditure an Income is Direct, Linear, and relatively Strong with no apparent OUTLIER.
3. The relationship between family Size and annual Income
is Direct, Linear, and Weak with one visible OUTLIER.
Thus we should expect collinearity problem in the regression.
Quantitative assessment of both the type and the structure of correlation among the
variables is the subject matter discussed under the multiple regression and correlation analysis.
Top or Back to Regression &
Correlation Analysis or Home Page or Send me your Comments via E-mail.
Copyright© 1996, Ebenge Usip, all rights reserved.
Last revised: Sunday, November 01, 1998.