Welcome to a 3 part blog series on bootstrap resampling. This series focuses on efficiency in both code and computation speed. By leveraging new action level programming in SAS Viya, we can distribute resampling methods across multiple computer processors, even on multiple machines. Triggering these computations is simplified by action level programming. Don’t worry about the details yet; we cover these in the next 3 blog posts!
Using CAS In SAS Viya for Bootstrap Resampling And Faster Inference For Clinical Trial Results
Initially, this was going to be a demonstration presented at SAS Global Forum 2020. I was looking forward to sharing this resampling action set project with this example demonstration. The next best thing is learning a new way to present slides and annotate them in blog form. Thank you for visiting and reviewing the following. I welcome your feedback on the content as well as the presentation format!
Outline For Blog Series
|Blog Series: Bootstrap Resampling At Scale|
|Post 1 (This One)|
Introduction: Slides 1 - 2
The motivation for this demonstration.
My copresenter, Jesse Behrens, giving feedback and encouragement is why this demonstration exists. My first inclination was to wait a year due to a shoulder surgery prohibiting my travel and participation. Instead of delaying, Jesse helped me out and prepared to present in my place. Thank You!
The bootstrap is a technique that allows us to learn from limited data and assess the uncertainty of our findings. The technique is attributed to Bradley Efron, and its importance was recently acknowledged by awarding him the International Prize in Statistics in 2019.
Background & Setup: Slides 3 - 5
Understanding the details and how to use setup SAS Viya for bootstrap resampling.
|Slide 3 (Click-to-Play)|
The process of bootstrap resampling is simple to describe. A resample is constructed by randomly drawing a case from the original sample, replace it, then repeat this up to the size of the resample. This process repeats to construct the desired number of bootstrap resamples.
This process can be easily parallelized by simultaneously taking drawings and doing each bootstrap resample at the same time on different computer processors.
SAS Viya’s CAS engine is a perfect environment to achieve this type of parallelization. Instructions can be written once and distributed to all available compute processors via threads. Let’s take a closer look at SAS Viya’s architecture to understand how all of this works.
In the animation above (click-to-play), we see the key components of SAS Viya displayed.
- The first part is a SAS 9.4 workspace. This component gives users a working SAS 9 session when they log into a SAS interface, such as SAS Studio.
- The second part is called CAS, which is a distributed computing environment made up of multiple servers working together.
- The CAS controller conducts the orchestration of work within CAS. Instructions are received and distributed to threads assigned to each processor in the CAS environment.
- The actual computation happens on CAS workers where each machine’s processors have an assignment to computing threads. An environment can contain any number of CAS workers.
For bootstrap resampling, we want to request B resamples and have them evenly distribute across T threads. For example, an environment may have 100 threads, that each goes to work processing 10 bootstrap resamples when we request 1000 total bootstrap resamples.
Let’s look at how to send instructions to CAS on the next slide.
To trigger execution in a CAS environment, we have several options. The most common way for SAS programmers to request computation is by using SAS code comprised of procs (procedures). There are many new, CAS enable procs that direct their computations to occur in a CAS environment.
Underneath procs, there is a new layer of programming available to SAS programmers with Viya. These building blocks are called CAS actions. There are many groups of actions that come with SAS Viya products, and these are called action sets.
For users that are comfortable with other languages, like R and Python, these same actions get triggered via an API. This functionality bypasses the need for a SAS interface, which has a SAS 9.4 workspace, by directly sending instructions to the CAS controller.
In our examples, we use CAS actions for bootstrap resampling from a SAS interface (SAS Studio) and use PROC CAS.
|Slide 5 (Click-to-Play)|
The bootstrap resampling action is part of a user-defined action set called resample. This action set does not come with SAS Viya but is easily added to an environment from its public GitHub repository.
The animation on this slide shows the whole process: download the single file from GitHub and run it from a SAS interface in the SAS Viya environment.
Bootstrap in One Line: Slides 6 - 7
Everything that is needed to run bootstrap resampling in SAS Viya’s CAS engine.
The process of requesting bootstrap sampling gets fully specified in a single line of code, as highlighted in this slides animation.
|Slide 7 (Click-to-Play)|
Using CAS from a SAS interface, like SAS Studio, has three components. The animated slide above (click-to-play) illustrates this process. Here is what is going on:
- Create a CAS session and make a libname to exchange data with it
- Load the sample data into CAS
- Here, PROC CASUTIL is used to load the sashelp.heart data (described in detail on the next slide)
- Run the bootstrap action against the sample data to create the resample dataset
- First, the resample action set is loaded for the session
- Then, resample.bootstrap is used to request 1000 bootstrap resamples of the same size as the sample data
Post 1 Wrap-up
That’s is for post 1! Hopefully, this is a clear explanation of the essentials elements of the functionality.
In the next post, we look at using this bootstrap resampling functionality in a typical workflow example. A complete example of using the bootstrap resamples for inference on model parameters and residuals.