Discovering All Available Data in a SAS Viya CASLIB
What is SAS Viya and What is a Caslib?
SAS Viya is a cloud-native platform which is able to host many of SAS’s other services such as SAS Visual Analytics or SAS Studio. One of the strengths of SAS Viya is the Cloud Analytics Services (CAS) in-memory engine. This is a way of storing and processing data in-memory across multiple nodes, making it run much faster than if it was on a non-distributed set-up.
A caslib is a connection of a database or folder path to CAS. It is effectively the equivalent of what a libref was in SAS 9, but for SAS Viya. It allows users to work with data in an external database, within SAS Viya.
Getting Started Exploring the Available Caslibs
Sometimes you will have a connection to a database through a caslib, but you may not know all of the tables available in the database, or need to know what columns each of those tables have. In a SAS studio session, if you follow the steps described in this blog, then you can produce a list of all the available tables and their columns from a set of caslibs.
The method for setting up caslibs is detailed elsewhere, this blog assumes you already have some caslib connections set up by your SAS adminstrator.
It is possible to run multiple programming languages in SAS Viya (including Python and R). In this example I have used a mix of SAS Macro and SAS Cloud Analytic Services language (CASL). Regardless of the language, the cas actions produced are able to do the same things in CAS.
I have not used CASL much before, and am aware that there are probably more efficient ways to do what is described in this blog. So please do get in touch if you have any feedback about how this could be improved.
In the first code block below, we need to initiate the CAS session and set some options to make the code work as desired. If you do not know what caslibs are available to you, then they can be printed to the log. Then create a comma separated list of all of the caslibs that you would like to interrogate and assign it to a macro variable.
/* Start cas session and assign some relevant options*/ options minoperator mindelimiter=','; cas; /* If necessary, print to log list of all available caslibs*/ %put caslib _all_ list; /* Create list of caslibs, could done automatically from another source, if there is a large number*/ %let caslib_list = database_a, database_b, database_c;
Check Access to Each Caslib
This step initiates the loop through each caslib, and then checks if your session has the correct access to each caslib. The number of caslibs is calculated by using a “countw” function on the caslib list assigned earlier.
There is a cas action which queries the caslib, and returns “True” if you have access. If it does not return “True”, then a warning will be printed to the log and the loop will stop processing for that caslib.
Note, this block initiates the SAS Macro, and the final code block in this post finishes the macro, and it is then executed in that code block.
%macro generate_table_and_column_list; /* Set macro variable scopes and calculate number of caslibs to loop through*/ %local n n_caslibs caslib_name n_tables table_name i; %let n_caslibs = %sysfunc(countw(%quote(&caslib_list.))); /* Loop through each caslib */ %do n = 1 %to &n_caslibs.; %let caslib_name = %sysfunc(scan(%quote(&caslib_list.),&n.,",")); /* check if caslib connection exists*/ %let caslib_check = 0; proc cas; table.queryCaslib result=t caslib="&caslib_name."; if t=True then do; call symputx("caslib_check",1,"L"); end; quit; /* If connection doesn't exist then skip loop with warning*/ %if &caslib_check. = 0 %then %do; %put WARNING: Caslib connection for &caslib_name. is not working.; %goto CASLIB_LOOP_EXIT; %end;
Create a List of all Available Tables for each Caslib
The first step in the codeblock below will produce a SAS dataset, with a list of all tables that the caslib contains.
The next step then puts the number of tables and each table name into a macro variable to allow for the next step which produces a list of all columns that each table contains.
/* Create list of all available tables for this caslib*/ proc cas; table.fileinfo result=t caslib="&caslib_name." ; saveresult t['fileinfo'] dataout=work.tables_&caslib_name.; quit; /* Put number of tables and each table name from the caslib into macro variables */ proc sql noprint; select count(name), name into :n_tables, :table_name1- from work.tables_&caslib_name.; quit;
Create a List of all Columns each Table Contains
This step starts a loop through each table in the caslib. It then produces a SAS dataset which contains all the columns in each table.
It is important to have the step which adds a field to the SAS dataset with the information about which caslib and which table the column comes from, or it will be impossible later once all the metadata is aggregated to know which columns come from which table!
/* Loop through all tables in the caslib, getting column info */ %do i = 1 %to &n_tables.; %let table_name = &&table_name&i.; %let out_dataset = columns_&caslib_name._&i.; /* Get column info for this table */ proc cas; table.columnInfo result=c table={caslib="&caslib_name.", name="&table_name."}; saveresult c['columnInfo'] dataout=work.&out_dataset.; quit; /* Add the caslib name and the table name, to the dataset with the column info */ data work.&out_dataset.; retain caslib_name table_name; set work.&out_dataset.; caslib_name = "&caslib_name."; table_name = "&table_name."; run; %end; /* End of table loop*/ %CASLIB_LOOP_EXIT: %end; /* End of caslib loop*/
Aggregate Everything Together
Once the caslib loop is complete, all the available tables, and all the columns for each table can be aggregated into two single datasets. This will then give 2 SAS datasets which hopefully contains all the information you need about the data available in your caslibs.
You can export the SAS datasets to a csv, if that is your desired way to explore the information about the caslibs.
/* Aggregate all available tables and all available columns into 1 dataset each*/ data all_tables_available; format schema $8. catalog $8. name $32. type $32. description $100.; set work.tables_:; run; data all_columns_available; format caslib_name $8. table_name $32. column $32. label $100. ID 8. type $32. rawlength 8. Format $32. nfl 8. nfd 8. set work.columns_:; run; /* Clean up the work library*/ proc datasets noprint; delete tables_: columns_: ; quit; %mend generate_table_and_column_list; %generate_table_and_column_list;
The full code is attached here. Please get in touch with me at martin@butterflydata.co.uk if you have any questions or feedback.