![]() I found another easier way to display the slope and intercept of a regression line in sgplot procedure. Hi Rick, I am the regular follower of your SAS blog, and I think your blog helps us a lot especially in how to make nice graphs. In the following example, the ODS OUTPUT statement saves the ParameterEstimates table to the PE data set: You can use the OUTEST= option or the ODS OUPUT statements to save the parameter estimates to a SAS data set. Use the INSET statement in PROC SGPLOT to add this information to the fitted scatter plot\. Use a DATA step to create macro variables that contain the parameter estimates. Use PROC REG to compute the parameter estimates (slope and intercept). You can use the following steps to create a plot that displays the parameter estimates: Instead, you need to use PROC REG to compute this information. In SAS 9.3, you cannot obtain this information directly from PROC SGPLOT. The question is: Does PROC SGPLOT support a way to display the slope of the regression line that is computed by the REG statement? Recall that the REG statement in PROC SGPLOT fits and displays a line through points in a scatter plot. %REMOVE_MULTICOLLINEARITY(DATASET=sashelp.A SAS user asked an interesting question on the SAS/GRAPH and ODS Graphics Support Forum. WHERE VARIABLE NOT IN("&REMOVE_VAR.","Intercept") ![]() %REMOVE_MULTICOLLINEARITY(DATASET=sashelp.cars,YVAR=MPG_City,VIF_CUTOFF=10) SELECT MAX(VarianceInflation) INTO: MAX_VIF *dropping the highest vif value variable*/ *creating the macro for highest vif value*/ *sorting the vif value by descending order*/ *dropping the independent variables with missing vif value*/ *taking the output of independent variables vif by removing the intercept*/ %DO %UNTIL (%SYSEVALF(&MAX_VIF.<=&VIF_CUTOFF.) ) *running the regression model till independent variables vif< specified vif_cutoff*/ *creating macro for independent variables*/ WHERE TYPE=1 AND NAME NOT IN(“&yvar.”) and FORMAT not in(‘DATE’) /* Remove target and date variables*/ ![]() *filter only numeric variables excluding dependent and date variables*/ %MACRO REMOVE_MULTICOLLINEARITY(DATASET=,YVAR=,VIF_CUTOFF=) Model flag_t_credit_card_cross_sale_=&XVARS_final. SELECT Variable INTO : XVARS_final SEPARATED BY ' ' * final data set with no multicollinearity variables*/ * after running the above sas macro codeįor multicollinearity then run the below code */ %MULTICOLLINEARITY(DATASET=IBMB,YVAR=flag_t_credit_card_cross_sale_,VIF_CUTOFF=10) SELECT MAX(VarianceInflation) INTO: PAREST2 SELECT Variable INTO: XVARS SEPARATED BY ' ' %DO %UNTIL (%SYSEVALF(&PAREST2.<=&VIF_CUTOFF.) ) SELECT NAME INTO : XVARS SEPARATED BY ‘ ‘ WHERE TYPE=1 AND NAME NOT IN(‘flag_t_credit_card_cross_sale_’) and FORMAT not in(‘DATE’) /* Remove target and date variables*/ %MACRO MULTICOLLINEARITY(DATASET=,YVAR=,VIF_CUTOFF=) *CREATING THE DATA FOR MULTICOLLINEARITY*/ Select distinct variable into: varlist separated by " " Set vif_top outlib.removed_variable_list > &VIF_limit.)) Ĭall symput( compress("vif_val"),compress(VarianceInflation)) If _n_ = 0 then output outlib.removed_variable_list * TO CREATE A BLANK TABLE FOR REMOVED VARIABLE */ %let target= Y_VAR /* DEPENDENT VARIABLE */ %let inset=MODEL_DATA /* MODEL DATASET NAME */ Libname outlib “/Desktop/Output" /* OUTPUT LOCATION */ libname dataloc "/Desktop/Model" /* MODEL DATASET LOCATION */ The iterations are used to remove one variable at a time. The SAS code uses proc reg as the only statistical procedure to calculate the VIF automatically. The following SAS code is an automated code to solve the problem multiple iterations, and the final datasets gives the list of retained variables as well as removed variables. The process is not a difficult one, but might turn to be cumbersome process if the number of independent variables is very high. The process of calculation and removal of variable should continue till the highest VIF comes lower than the threshold level, only variable being removed at a time. Instead, the analyst should remove the variable having the highest VIF value, and then re-calculate the VIF values. While running the linear regression analysis, one should not remove all the variables which have VIF more than the pre-decided threshold value (in this case, say 5). However, the final decision depends on the analyst’s discretion. As a rule of thumb, the VIF value should not be more than 2 for better modelling. In a linear regression analysis, it is important to run the VIF test to remove the multicollinearity among the independent variables. In statistics (or econometrics), the variance inflation factor (VIF) calculates incidence and severity of multicollinearity among the independent variables in an ordinary least squares (OLS) regression analysis. One can read more about problems of multicollinearity here and about VIF here.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |