SAS Code Note

SHAN LIN
3 min readJan 8, 2023

--

#credit_risk #modeling #SASprogram #coding

Sharing some SAS code for data processing

Data Input

Input data type: Excel. In this example, data starts from row 10.

proc import out = tb0_&file_date.
datafile = "/folder/input/report_&file_date..xlsx"
dbms = xlsx replace;
sheet = 'sheet1';
datarow=10; /*first data row*/
getnames=NO;
run;

/*Reference: find column names */
proc import out = col_header
datafile = "/folder/input/report_&file_date..xlsx"
dbms = xlsx replace;
range="'sheet1'$A6:BC6"; /*column row*/
getnames=NO;
run;
proc print data=col_header;run;

Convert data type

Character to numeric: INPUT(var, BEST.)

Numeric to character: PUT(var,$11.)

Combine or compile files

Use : for referencing file name.

/*Combine month1, month2, month3*/
Data month_1_to_3;
set month:;
run;

Check stats and distribution for numeric and categorical data

%let file_date = 201709;

/*numeric*/
proc sql noprint;
select name into : vars separated by " "
from dictionary.columns
where LIBNAME = upcase("work")
and MEMNAME = upcase("tb3_&file_date.")
and type='num'
;
quit;
/*get numeric variable names*/
%put variables = &vars.;

/*get min, max, mean*/
proc means data = tb3_&file_date. n nmiss mean median mode min max P1 P99;
var &&vars.;
run;

*********************************************************;
*********************************************************;

/*categorical*/
proc sql noprint;
select name into : vars_char separated by " "
from dictionary.columns
where LIBNAME = upcase("work")
and MEMNAME = upcase("tb3_&file_date.")
and type='char'
;
quit;
/*get categorical variable names*/
%put variables = &vars_char.;

/*get frequency*/
proc freq data=tb3_&file_date.;
table &&vars_char.;
run;

Remove format of a column

Sometimes there is unknown format masking a column. To retreive the original data, copy the column in a new column and it will automatically remove the format.

SAS table example

Strategy to combine data

  1. Clean up the mistake within one month file
  2. Month by month consistent
  3. Whole-year set
  4. Convert the whole-year set into desired data type
  5. Combines multiple years of data
Illustrated by Shan

Clean up manual mistakes and special character in numbers

Same columns came with different data type. Column with all missing value becomes character.

Some have special characters from manual mistake, such as “123..4%”

data tb3_&file_date.;
set tb2_&file_date.;
/*all missing so char --> numeric*/
Company_ID_NB2 = input(Company_ID_NB, BEST.);
/*Replace incorrect format such as 1.24.1 --> 1.24:1 */
If Debt_TBC = "1.24.1" then Debt_TBC2 = "1.24";
else if Debt_TBC = "1.43.1" then Debt_TBC2 = "1.43";
else Debt_TBC2 = Debt_TBC;

Rename and drop columns

data input.RAT_&file_date.;
set tb4_&file_date. ;
rename
Debt_TBC3 = Debt_TBC
Current_Ratio3 = Current_Ratio;
drop
Debt_TBC
Debt_TBC2
Current_Ratio
Current_Ratio2
;
run;

Missing value for numeric and character column

data test;
set tmp.code_change;
ID_Approved_by=''; /*character*/
Date_Approved=.; /*numeric*/
run;

Column Format: DATE

Define a data “201709”

Create a column called “scoringMDY” that captures this date (01SEP2017)

%let file_date = 201709;

data tb1_&file_date.;
set tb0_&file_date.;
format scoringMDY date9.; /*ex.01FEB2017*/
scoringMDY = mdy(%substr(&file_date.,5,2),1,%substr(&file_date.,1,4));

Group Name using RETAIN

Data Input example (tb1_&file_date)
data tb2_&file_date.;
set tb1_&file_date.;

/*TO-DO: group name*/
if index(compress(upcase(company_name)),"GROUP") > 0 then group_name = company_name;
retain group_name;
run;

--

--

SHAN LIN
SHAN LIN

Written by SHAN LIN

Data Analytics * Curiosity -I Write What You Don’t Find on Google || Article List: https://ppt.cc/fhjetx || LinkedIn: http://linkedin.com/in/shan-lin-0723

No responses yet