DEVELOPING OPTIMAL PREDICTION EQUATIONS IN MULTIVARIATE REGRESSION ANALYSIS

 

Defining the ‘Full’ Model

 

 

For Ŷ = f(X1, X2, X3)

 

where (1) the number of terms = p

 

(2) each term can incorporate one or more transformed or untransformed
      variables

 

Ŷ = a + b1X1 + b2X2 + b3X3

 

            or

 

Ŷ = a + b1X1 + b2X2 + b3X3 + b4X12 + b5X22 + b6X32 + b7X1X2 + b8X1X3 + b9X2X3

 

 

 

 

Ordinary Regression for Y vs X1, X2, …, Xp

 

 

 

(Y-Ŷ) vertical axis

            vs

    Ŷ    horizontal axis

 

 

Stepwise Regression for Y vs X1, X2, …, Xp

 

1. Forward Selection where the model initially contains no terms

 

      Step 1: find the first term to add to the model using ‘total’ correlations

 

i.   calculate RL for all possible one-term models

                  ii.  add the term with the highest RL to the model

            iii. do an F-test (Type III) on the added term – if significant at, say, 95% it stays

     in the model; otherwise the regression analysis is finished

 

            Step 2: find the second term to add to the model using ‘partial’ correlations and/or F tests

 

                        Approach One

 

                        i.    calculate linear partial correlations between all remaining (outside) terms and

      Y (i.e., with the effects of the one term already in the model removed)

                        ii.   add the term with the highest partial correlation to the model

            iii.  do an F-test (Type III) on the added term – if significant it stays in the model;

      otherwise the regression analysis is finished

 

Flast = [SSRdiff / 1] / [Σ(Y-Ŷ)2 / (n-p-1)] where p = no. of terms currently in model

 

                  Approach Two

 

                  i.   do an F-test (Type III) on all outside terms, treating each as if it were the last

     term entered

                  ii.  add the term with the largest significant Flast

 

                  Note 1: default value for “F-to-enter” (i.e., Fcrit) is usually set at 4.0

(= approx. value of F for n-p-1 > 30 at 95% significance level)

 

Note 2: SAS reports ‘p’ rather than ‘Fcrit’ where (1-p)•100 is the percent significance level

 

Step 3 and beyond: Step 2 is repeated recursively for all terms outside the model until no more terms can be added to the model

 

Note: some programs also report an “adjusted R2” (R2adj), which can be used instead of F tests

 

            R2adj = R2([p • (1-R2)] / [n-p-1])

 

whereas the ordinary R2 continues to increase with each added term, the R2adj will start to decrease when non-contributing terms are added

 

2. Backward Selection or Elimination where the model initially contains all the terms

 

Step 1: find the first term to remove from the model

 

                        i.    calculate Flast statistics for all terms, treating each as if it were the last one

      added to the model (Type III)

                        ii.   remove the term with the smallest non-significant Flast

 

Step 2 and beyond: Step 1 is repeated recursively for all the remaining terms inside the model until no more terms can be removed from the model

 

                        Note 1: default value for “F-to-remove” (i.e., Fcrit) is usually set at 4.0

 

Note 2: SAS reports ‘p’ rather than ‘Fcrit’ where (1-p)•100 is the percent significance level

 

Note 3: Backward Selection is usually preferred over Forward Selection because it recognizes that two or more terms already in the model can together make a greater contribution than the sum of the individual contributions of the same terms taken one at a time as in Forward Selection

 

 3. ‘Stepwise’ Procedure which is a combination of Forward and Backward Selections

 

            The model initially contains no terms.

 

            Step 1: the term with the largest significant Flast (Type III) is entered into the model

 

            Step 2: the term with the next largest significant Flast is entered

 

Step 3: for the terms already in the model, Flast is calculated for each one and the one with the smallest insignificant Flast is removed (i.e., returned to the pool of other terms outside the model)

 

Step 4 and beyond: Steps 2 and 3 are repeated recursively until no more terms can be added or removed

 

Note 1: as the F-to-remove approaches zero, the procedure approaches pure Foreward Selection

 

Note 2: the F-to-remove must be smaller than the F-to-enter, otherwise the same term will be continuously added and removed

 

Note 3: the optimal values for F-to-remove (~3) and F-to-enter (~4) are very problematical and for this reason the Stepwise Procedure is unpopular with some data analysts

 

4. All Possible Subsets or R2 Procedure (graphical without significance testing)

 

            Objective: identify the model with the fewest terms but highest ‘meaningful’ R2

 

            Step 1: calculate R2 for all possible subsets of the ‘p’ terms; there will be a total of

2p-1 subsets

 

Step 2: plot a scree (i.e., slope) diagram of R2 vs the best model for a given number of terms; the first model where the curve begins to flatten out (i.e., at the slope break) is the optimal model

 

Option A: the terms in the best earlier models are always incorporated into all subsequent models

 

Option B: the model with the highest R2 is always selected regardless of which terms are included

 

Note: R2adj is usually preferred to R2 because the curve will stop rising and start to drop as non-contributing terms are added in the larger models