Achieve massive boosts in query performance with SARGable indexes

By Wouter Burggraaf - Senior SAS Consultant

At a client we experienced a performance problem when running queries joining tables containing millions of rows - the queries ran slow!. We drastically improved the process making the queries run five times faster. This article explains one of the techniques used.  

Power of indexes

You may already know that indexes can be essential to speeding up queries. If your tables have an index on customer_id then you’ll want to join them on this column instead of a similar column without an index like customer_name. Similarly, if you want to filter the dataset then the where filter should use customer_id, not customer_name. If you don't use the index then SAS (or if you use a pass-through: the target DBMS) will have to search through the entire table which will take much longer than querying the index: imagine flipping through the entire SAS manual to find the chapter on Indexes instead of consulting the book's index and going straight to the right page.

Why were our queries slow?

The SQL queries we wrote were passed through to a DBMS. We did use indexes however the index columns had been transformed and as a result the DBMS engine was unable to use the index. Instead, the DBMS would ignore the index and perform a full table scan resulting in terrible performance.

Example:

The query contained where year(START_DATE)=2019. The DBMS engine was unable to use the index for the where statement because the index column had been transformed.

Simply changing this to where START_DATE BETWEEN '20190101' AND '20191231' ensured that the index could be used and the query became 5 times faster.

A query written in such a way that the DBMS can take advantage of an index is called a SARGable (Search, ARGument, able) query.

What about SAS specifically?

The same applies to SAS: applying transformations like PUT, INPUT, YEAR, etc. to the index column will not allow SAS to use the index and a much longer full table scan will be done. I added an exhaustive list of the where expressions that are compatible with an index in SAS below.

What can you do after reading this post?

Have another look at a query you ran that takes particularly long. Check to see on which columns the indexes are in the source tables. Next, have a look at your joins and where statements and check if you are transforming any index columns. If you are, rewrite those statements. It can require a bit of creativity to make them SARGable. Run the query and you might see dramatic performance gains with little effort.

SAS compatible where statements:

Creating and Exploiting SAS Indexes

“Only these eleven forms of the WHERE expression will allow SAS to use an existing Simple index:

1. Normal comparison operators such as: where x > 500;

2. Normal comparison operators with NOT: where x ^> 500;

3. The CONTAINS operator: where state contains “East”;

4. A comparison operator with the colon modifier: where state =: “East”;

5. The TRIM function: where trim(state) = “North Dakota”;

6. Range conditions with upper and lower bounds or range conditions that use the BETWEEN-AND

operator:

where 42 < X < 112;

where X between 42 and 112;

7. The pattern matching operators LIKE and NOT LIKE: where state like “East %”;

8. The IS MISSING and IS NULL operators:

where name is missing;

where iq is null;

9. The SUBSTR function when it is of the form:

where substr(argument,position,<n>) = “value”’;

In this substring form, the following must be true for an index to be used:

a. The value of position must be 1.

b. The value of <n> must be less than or equal to the length of

argument.

c. The value of <n> must be equal to the length of value

Here is an example: where substr(state,1,4) = “East”;

10. The IN operator

where region in(“East”, “West”);

11. Any WHERE clause composed of two or more of the above ten forms connected via AND:

where (region in(“East”, “West”)) and (name is missing);”