Project Story : Loving SQL Pass-through!
As functional consultant via OCS, I am currently working for a large financial institution. There are many challenges. Not least being the millions of records per day that need to be processed. Luckily SAS does not need to worry about the constantly growing data storage requirements. We do however need to process this data, stored in a Teradata database, as efficiently as possible. One of the things that I love about SAS is how easy it is to connect to different data sources, I can literally define a library pointing to a Teradata instance and I have access to the data. However this is perhaps not always the most efficient method.
SAS developed a SAS/ACCESS Interface to Teradata software, so SAS users can have transparent access to the data via SAS Software.
We can differentiate three methods for accessing data:
- SAS LIBNAME engine: Users can reach the Teradata tables using SAS code which is ‘translated’ to the specific target DBMS.
- Explicit Pass-Through method:
- The user specifies Teradata specific SQL code with PROC SQL,
- The PROC SQL sends the code as it is to the DBMS,
- If everything is fine, the DBMS sends back the output, else the request fails with an incorrect syntax.
- Implicit Pass-Through method:
- The user specifies SAS SQL code with PROC SQL,
- Then the PROC SQL translates the code to a target specific SQL
- Finally, the DBMS runs the code if the functionality is supported, else it generates an error and passes back all the information to SAS.
The main difference of using these methods comes down to two questions: Who is querying the data and where do you want the data to be processed? SAS users will be comfortable using a libname in combination with PROC SQL or a datastep. However, if we are talking about fanatic Teradata users, they may already have tested queries which they want to execute with SAS. In this case, we can pass the query directly to the database to be executed. The second question has to do with the volume of data. When we execute a query, basically we want to transfer as little data as possible over the network, by executing the query on the database side, we restrict the number of records being returned to SAS. This is where SQL Pass-Through comes in. Without this, all records would be returned to SAS and the querying is done on the SAS side, not handy when we are talking about millions of records. The database is optimized for this data, let it do the hard work!
SAS is communicating with the target DBMS via the SAS/ACCESS software.
Really the most important difference is where the code is executed and where the compute resources are located.
For more information on the SQL Pass-Through feature I can recommend the following paper.