mobileNavlogo
headerlogo

SQL Aggregations & DataSet(Row & colummn operations) in SAS

21 September, 2018

Ratings

Related Blogs

 Aggregation functions:

  Using aggregation functions, we can do  arithmetic manipulation in SQL. Using aggregate function, we can do column and row wise analysis.

  • SQL produces each and every result in column wise.
  • Aggregate functions are called summarize functions.

Ex: Data medi; Input Gid $ week 1 drug1 $ sub1 Week2 drug2  $ sub2; Cards; G100  3  col5mg  56   6   col10mg  40 G200  3  col5mg  50   6   col10mg  45 G300  3  col5mg  60   6   col10mg  49; Proc sql; Select Gid, week1,drug1,sub1, Sum?(sub1) as total, Mean(sub1) as avgsub, Max(sub1)as maxsub from Medi; Quit; /* row wise analysis*/ Ex: Proc sql; Select * sum(sub1,sub2)as total From medi; Quit;  

Group by clause:

It can be used to do grouping analysis. Ex: Data clinical; Input center $ trail $ Sub adsub; Cards; Appolo   phase1  67  12 Nims      phase1    78  14 Care        phase1  34  10 Appolo   phase2  267  22 Nims      phase2    178  14 Care        phase2  2 34  40 ; Proc sql; Select  center, trail,sub,sum(sub) As total From clinical Group by center; Quit;  

Having clause:

Work like a where clause if your want to do grouping analysis based on condition we use having clause. Ex: Proc sql; Select  center, trail,sub,sum(sub) As total From clinical Group by center  having center; Not in (‘care’); Quit;

Interested in mastering SAS Developer? Learn more about SAS Training and Placement in this blog post.

Count functions:

It can be used to report frequency analysis. If we use * as the argument in count function , ots report number of observation generated by the current query statement. Proc sql; Select   count(*) as obs From clinical Quit;

  • If we use variable as argument, it reports number of non missing values generated by the required query statement.
/* no. of group participated in each and every trail*/

Ex: Proc sql; Select   trail,count(trail) as obs From clinical  group by trail; Quit;

/* each center conducted each trail no. of times*/

Proc sql; Select   center, trail, count(center) as obs From clinical group by Center, trail; Quit;

/* Total no. of patients participated in each trail and each center */

Ex: Proc sql; Select   center, trail, count(center), sum (sub )as obs From clinical group by Center, trail; Quit; Ex: Proc sql; Select   center, trail, count(center) as obs, sum(sub) as totsub From clinical group by Center, trail; Quit;  

Distinct function:

It can be used to report unique values from the required variables Syntax:   Distinct (argument) Ex: Proc sql; Select distinct (center) as Cenlist from clinical; Quit; Ex: Proc sql; Select distinct (trail) as traillist from clinical; Quit;  

Coalesce function:

It can be used to replace the missing values for reporting.

  • Numeric missing values we can replace using with numeric.
  • Character missing values we can replace using with Character

Syntax: Coalesce (variable name, replace char or  replace number) Ex: Proc sql; Select Coalesce (center,’Miss’) as center,trail, Coalesce (sub) as sub Adsub from clinical; Quit;  

Views:

Its work like sas data sets. We can create a views from the existed SAS data sets, views are working based on SAS data set.

  • Using query statement we can create views

Ex: Proc sql; Create view appolo as Select * from clinical Where center = ‘appolo’; Quit; Views are 3 types:

  1. Data set views
  2. Sql views
  3. Access views

 

Data set views:

It is created by the data set block.

View option:

       Using view option, we can create data set views. Ex: Data one/ view =one; Set sashelp=class; Where age <=14; Run;  

SQl views:

It  is created by the SQL statements. Ex: Proc sql; Create to view two as Select * from Sashelp.class; Quit;  

Access point:

it is created by the access procedure.  

Difference between data set and views:

   

Data set

Views

  1. It is a physical storage area. To occupy some place for storage.
  2. Data set can be created using required statements import ,insert ,create table etc.
  3. It takes less processing time.
  4. If we do any changes in data set, the changes are occurred in current data set or same data set.
 
  1. Data set store the data in physical format(variables and observations)
  1. It is a logical area, it does not occupy any dise place for storage.
  2. Views are created by the existed data set.
 
  1. It takes more processing time.
  2. If we do any changes in views, these changes are occurred in existed data sets. If we do any changes in existed data set, these changes are occurred in views.
  3. Views are stored in query statement instead of data.
     

 

Describe statement:

   It can be used to report structure of the table or view. Syntax:  Describe table /view <table name>/<view name>; Ex: Proc sql; Describe table Sashelp .class; Quit; Proc sql; Describe view two; Quit;

  • Describe statement can be used only for sql statement.

 

Creation process:

   Data set create from data set. View create from data set View create from view Data set create from view.   Capture  

/* data set to data set*/

Ex: Proc sql; Create table emp1 As select * from emp Where sale>=500; Quit;

/* data set to view */

Ex: Proc sql; Create view emp2 As select * from emp Quit;

/* view to view*/

Ex: Proc sql; Create view emp3 As select * from emp Where sale>=500; Quit;

/* view to data set */

Ex: Proc sql; Create table emp4 As select * from emp2 Where sale< 50; Quit;    

DROP THE TABLE OR VIEWS FROM THE SAS ENVIRONMENT:

DROP STATEMENT: Can be used to drop the table or views from the SAS environment Syntax: Drop table / view <table name>/<view name> Eg: proc sql; Drop table emp 1; Drop table emp 3; Quit;  

CONSTRAINTS:

Can be used to load the necessary data in tables we can assign constrains in two ways

  1. Column constrains
  2. Table constrains

 

Integrity constrains type:

  • UNIQUE: can be used to load the data without duplicate data values
  • Not null : can be used to load the data without missing values (numeric type and character type )
  • Check: can be used to load the data based on condition
  • Primary key: can be used to load the data without duplicate data values and duplicate observations and without missing values
  • Primary key, foreign key, reference key: can be used to link the tables

Unique: Using unique constrains we can load the data without duplicate observations

Condata paid Age Gender Race
Sheet 1 100 56 Female Asian
  101 45 Female African
  101 46 Female African
  102 34 Male Asian

/* Extraction*/ Eg:   Proc import Data file =‘d: \condata.xls’ Out =one dbms= XL replace; Sheet =’sheet 1 $’, Run; Proc sql; Describe table one; Quit; /*transformation*/ Proc sql; Create table Demo (pid num unique, age num, gender char, race char); Quit; Proc sql; Describe table demo; Quit; /*loading*/ Proc append base = demo Data = one force  

  • Only one constraint for one variable in table
  • Multiple constraint in multiple variables in table
  • Multiple constraint for only one variable in table

Not null:   Sheet 2 Eg:

Paid Age Gender race
100 56 Female Asian
101 45 Male African
  34 Female African
103 34 Male Asian
104 56 Female Asian

  Step 3                                 /*loading*/                                                 Proc append base = demo 4 Data = four force; Run;  

Multiple constraints for one variable in table:

Raw data:           

Paid Age Gender Race
100 56 Female Asian
101 48 Male African
  34 Female African
103 34 Male Asian
104 56 Female Asian
104 48 Female Asian

Step 1: Importing the data in to SAS dataset name 5 Step 2: To create a table with multiple constraints for one variable /*transformation*/                                                 Pro sql; Create table Demo (page num unique is not null, age num, gender char, race char); Quit; Proc sql; Describe table demo, Run; Step 3:                                                                                 /* loading*/ Note: using primary key as a column constraint we can load the data without duplicate data values and without missing values  

Multiple constraints for multiple variables in table:

Raw data:          

Paid Age Gender Race
100 56 Female Asian
101 48 Male African
102 34 Female African
103   Male Asian
104 56 Female Asian
104 45 Female Asian

  Step1:  /* extraction*/   Step 2: /* null data set creation with constraints */ transformation Proc sql; Create table  Demo 2(paid num primary key, age num not null, Check (age<=50), gender char, race char); Quit; Proc sql; Describe table = demo; Run;   Step3:  /*loading*/ Table constraint: Using table constraint, we can avoid duplicate observations  

Primary key:

                         Using primary key as a table constraint, we can avoid duplicate observations in loading time Constraint statement: Can be used to assign required constraint for required variable Syntax: constraint <integrity name>     <type>     (variable); -UN0001-          Unique      Pid num Eg: Proc sql; Create table demo (pis num unique, age num not null ); Quit; Proc sql; Describe table demo; Quit; Or  Proc sql;  Create table demo 1(pid num, age num, constraint UK unique (pid), constraint nt not null (age)); Quit; Proc sql; Describe table demo 1; Quit;   Eg: Raw data:

Paid

Test

Units

/*extraction*/

200

200

200

201

201

201

205

205

205

Hr

Sbp

Dbp

Hr

Hr

Sbp

Sbp

Dbp

dbp

76

156

90

89

89

145

156

89

89

 

Proc importData file=’d: \conast.xls’Out=one dbms=excel replace;Sheet = ‘sheet 4$’;Quit;Proc sql;Describe table one;Quit;

/*transformation*/

Proc sql;Create table lab( pid.num,test char, Units num,  constraint pk,Primary key(pid, test, units));Quit;

/*loading*/

Proc append base = labData= one force;Run;Proc print data= lab;Run;

 

Link the tables for data loading:

                         We can link the tables using three keys

  1. Primary key
  2. Foreign key
  3. Reference key

These three constraints can be used as column constraints to link up the tables In linking concept, we can link up dimension table (master table) with fact table (transformation table) for data loading we must use primary key in dimension table, foreign key and reference key in fact table Note: We should use only one primary key in dimension table but in fact table we can use multiple foreign keys Capture                               /*EXTRACTION*/  Proc import Data=’d:\conast-xls’ Out= d dbms = excel replace; Sheet = ‘sheet 2 $’ Quit; Proc import Out =l dbms = excel replace ; Sheet = ‘sheet 3$’ Quit; Proc sql; Describe table d; Describe table l; Quit; Proc sql; Create table demo(pid num name char, age num, race char, color char, constraint pk primary kry(pid)); Quit;   Capture                         Capture                                   Proc sql; Create table lab( pid num, test char, units num, constraint pk primary key (pid) reference demo); Quit; Proc sql; Create table lab 1(num, test char, units num, constraint fk foreign key (pid) reference demo); Quit; Proc sql; Describe table demo; Describe table lab; Describe table lab 1; Quit; Proc append base = demo; Data = d force; Run; Proc append base = lab Data = l force; Run; Proc append base = lab 1; Data = lab force; Run;   Eg:  Capture Capture                             /*extraction*/ Proc import Data file=‘d: \conast.xls’ Out = d dbms = =excel replace; Sheet = ‘sheet 1 $’; Quit; Proc import Data file = ‘d:\lata.xls ‘ Out = L dbms = =excel replace; Sheet = ‘sheet 2 $’; Quit; Proc sql; Describe table d; Describe table l; Quit;  /* Transformation*/ Proc sql; Create table demo ( pid num, age num, center char, constraint pk primary key (pid)); Quit; Proc sql; Create table lab (labno num, center char, constraint pk primary key (labno)); Quit; Proc sql; Create table lab ( pid num, test char, unit num, constraint PK primary key(pid)); Quit; Proc sql; Create table lab1 ( pid num, test char, unit num, constraint FK foreign key(pid) references demo; Quit; Proc sql; Describe table demo; Describe table lab; Describe table lab1; Quit; /* Loading */ Proc append base=demo Data = d force; Run; Proc append base=lab Data = l force; Run; Proc append base=lab1 Data = s force; Run; EX: Capture                               Capture     /*extraction*/ Proc import Data file=‘d: \data.xls’ Out = d dbms = =excel replace; Sheet = ‘sheet 1 $’; Quit; Proc import Data file = ‘d:\data.xls ‘ Out = L dbms = =excel replace; Sheet = ‘sheet 2 $’; Quit; Proc import Data file = ‘d:\data.xls ‘ Out = L dbms = =excel replace; Sheet = ‘sheet 3 $’; Quit; Proc sql; Create table demo ( pid num, age num, center char, constraint pk primary key (pid)); Quit; Proc sql; Create table lab (labno num, center char, constraint pk primary key (labno)); Quit; Proc sql; Create table sam( pid num, labno num, test char, units num, constraint FK foreign key (pid) references demo, constraint FK, foreign key(lab no) reference lab); Quit; Proc sql; Create table lab1 ( pid num, test char, unit num, constraint FK foreign key(pid) references demo; Quit; Proc sql; Describe table demo; Describe table lab; Describe table sam; Quit; Proc append base=demo Data = d force; Run; Proc append base=lab Data = l force; Run; Proc append base=sam Data = s force; Run;    

                TO DELETE THE DATA SETS:

  If we want to delete table or data set from the SAS environment, the process is to

  1. Delete constraints
  2. Delete observations
  3. Drop the table from SAS environment

à We can delete the constraints using after statement Delete constraints: Syntax: alter table<table name>drop<integrity constraint name> Eg: Proc import Data file = ‘d:\ condata.xls’ Out = one dbms = excel replace; Sheet = ‘sheet 2$; Quit; Proc sql; Create table demo ( pid num primary key, age num, gender char, race char constraint nt notnull(gender)); Quit; Proc sql; Describe table demo; Quit; /* Delete constraints */ Proc sql; After table demo Drop constraint-PK 0001-,nt; Quit;                 /* delete obs*/ Proc sql; Delete from demo; Quit; /* delete SAS data set*/ Proc sql; Drop table demo; Quit;  

Delete observations:

Syntax: proc sql; Delete from <table name>; Quit;  

Delete table from SAS environment:

Proc sql: Drop table<table name>; Quit;  

To assign the constraint for existed data set:

If we want to assign constraint for existed data set, then it existed data set all observations must be satisfied in required constraint in this case, SAS will be accepted required constraint, otherwise it is rejected. Eg: Data demo 1; Input pid age; Cards; 100         56 101         89 100         67 ; Data demo 2 ; Input pid age; Cards; 100         67 101         56 102         45 ; Proc sql; After table demo 1 add Constraint UK wnique(pid); Quit; Proc sql; Describe table demo 1; Quit; Proc sql; Describe table demo 2; Quit; For indepth knowledge on SAS, click on below

About Author
Authorlogo
Name

TekSlate

Author Bio

TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills .