SAS Viya Tips: Finding and Utilising the Number of Rows in a CAS Table Through Macros
I was working on a project using SAS Viya 3.5. I found that many of the options normally used in SAS to count rows may not actually be CAS-enabled. For example I had a piece of code that utilised the automatic data step variable “_N_” to check if there were any rows in a dataset. This variable wasn’t CAS enabled, and it hence forced CAS to not utilise multiple workers, and caused our code to fail on very large datasets!
So we managed to combine ideas from a few different places, to figure out how to efficiently count the number of rows in a CAS table, and put this value into a macro variable to be able to utilise the value elsewhere in the code.
Example Code
proc cas; /* 1 */
table.recordCount result=obs / /* 2 */
table={caslib="your_CASLIB" name="your_table"}; /* 3 */
call symputx('nobs',obs.recordCount[1,1],"L"); /* 4 */
quit;
To run Proc Cas you will need to have an active CAS session.
This step tells Proc Cas to take the record count of the table, and place it into a variable called “obs”.
This step selects the table from the CASLIB and the table name. Note that the table needs to be loaded into memory for this to work.1
This step extracts the actual record count from the variable obs (“obs” is a dictionary - see here for more info), and then places it into the macro variable “nobs”.2
Following this code, the macro variable “nobs” will now have the number of rows in the dataset specified. Another benefit to this approach, as opposed to more traditional SAS methods, is that using CAS tends to be much more efficient than SAS, as demonstrated in this great post.
I hope this helped and if you found this interesting, then why not check out some of our other articles on CAS in the links above. If you have got any further questions, then feel free to get in touch with me at martin@butterflydata.co.uk.
Finally, just two things worth noting about this walkthrough:
I believe there’s no way to assess the number of rows in a dataset that is not loaded, as SAS has no way of viewing the dataset, however, it could be possible to interact with the the metadata of a database, depending on your CASLIB connection.
Note that when using CAS datasets, you have to specify the macro variable scope during call symputx, or it will return a null value, I believe.