DEVELOPING OPTIMAL PREDICTION EQUATIONS IN MULTIVARIATE REGRESSION ANALYSIS
For Ŷ = f(X1, X2, X3)
where (1) the number of terms = p
(2) each term can incorporate one
or more transformed or untransformed
variables
Ŷ = a + b1X1 + b2X2 + b3X3
or
Ŷ = a + b1X1
+ b2X2 + b3X3 + b4X12
+ b5X22 + b6X32
+ b7X1X2 + b8X1X3
+ b9X2X3
Ordinary Regression for Y vs X1, X2, …, Xp
(Y-Ŷ) vertical axis
vs
Ŷ horizontal axis
1. Forward Selection where the model initially contains no terms
Step 1: find the first term to add to the model using ‘total’ correlations
i. calculate RL for all possible one-term models
ii. add the term with the highest RL to the model
iii. do an F-test (Type III) on the added term – if significant at, say, 95% it stays
in the model; otherwise the regression analysis is finished
Step 2: find the second term to add to the model using ‘partial’ correlations and/or F tests
Approach
One
i. calculate linear partial correlations between all remaining (outside) terms and
Y (i.e., with the effects of the one term already in the model removed)
ii. add the term with the highest partial correlation to the model
iii. do an F-test (Type III) on the added term – if significant it stays in the model;
otherwise the regression analysis is finished
Flast = [SSRdiff / 1] / [Σ(Y-Ŷ)2 / (n-p-1)] where p = no. of terms currently in model
Approach Two
i. do an F-test (Type III) on all outside terms, treating each as if it were the last
term entered
ii. add the term with the largest significant Flast
Note 1: default value for “F-to-enter” (i.e., Fcrit) is usually set at 4.0
(= approx. value of F for n-p-1 > 30 at 95% significance level)
Note 2: SAS reports ‘p’ rather than ‘Fcrit’ where (1-p)•100 is the percent significance level
Step 3 and beyond: Step 2 is repeated recursively for all terms outside the model until no more terms can be added to the model
Note: some programs also report an “adjusted R2” (R2adj), which can be used instead of F tests
R2adj = R2 – ([p • (1-R2)] / [n-p-1])
whereas the ordinary R2 continues to increase with each added term, the R2adj will start to decrease when non-contributing terms are added
2. Backward Selection or Elimination where the model initially contains all the terms
Step 1: find the first term to remove from the model
i. calculate Flast statistics for all terms, treating each as if it were the last one
added to the model (Type III)
ii. remove the term with the smallest non-significant Flast
Step 2 and beyond: Step 1 is repeated recursively for all the remaining terms inside the model until no more terms can be removed from the model
Note 1: default value for “F-to-remove” (i.e., Fcrit) is usually set at 4.0
Note 2: SAS reports ‘p’ rather than ‘Fcrit’ where (1-p)•100 is the percent significance level
Note 3: Backward Selection is usually preferred over Forward Selection because it recognizes that two or more terms already in the model can together make a greater contribution than the sum of the individual contributions of the same terms taken one at a time as in Forward Selection
3. ‘Stepwise’ Procedure which is a combination of Forward and Backward Selections
The model initially contains no terms.
Step 1: the term with the largest significant Flast (Type III) is entered into the model
Step 2: the term with the next largest significant Flast is entered
Step 3: for the terms already in the model, Flast is calculated for each one and the one with the smallest insignificant Flast is removed (i.e., returned to the pool of other terms outside the model)
Step 4 and beyond: Steps 2 and 3 are repeated recursively until no more terms can be added or removed
Note 1: as the F-to-remove approaches zero, the procedure approaches pure Foreward Selection
Note 2: the F-to-remove must be smaller than the F-to-enter, otherwise the same term will be continuously added and removed
Note 3: the optimal values for F-to-remove (~3) and F-to-enter (~4) are very problematical and for this reason the Stepwise Procedure is unpopular with some data analysts
4. All Possible Subsets or R2 Procedure (graphical without significance testing)
Objective: identify the model with the fewest terms but highest ‘meaningful’ R2