Every once in a while I stumble across some SQL keyword that I didn’t really know about, but is extremely useful.
The other day I came across APPLY, or rather CROSS APPLY.
After reading through documentation on how it works and articles about it, I had a bit of trouble understanding it because I couldn’t really find a simple explanation.
I am going to try to explain it as simply as possible so you can start using it right away.
How CROSS APPLY works
The basic idea behind CROSS APPLY is to allow you to join two sets of data together.
If you understand how INNER JOIN works, you already understand CROSS APPLY.
The only difference is CROSS APPLY also allows you to join in a set of data in which that set of data is created or dependent on each row in the first set.
So basically what that means is that for a normal join you would, for example, join two tables that shared a common key.
I could join my customer table to my orders table to see all the orders for a particular customer like so:
SELECT * FROM orders o JOIN customers c ON o.customerid = c.customerid WHERE c.companyname = 'Around the Horn'
Notice how the join is operating on a key that exists independently in each table.
We could rewrite this to be exactly the same using the CROSS APPLY syntax instead like so:
SELECT * FROM orders o CROSS APPLY ( SELECT * FROM customers c WHERE o.customerid = c.customerid) AS c WHERE c.companyname = 'Around the Horn'
We can prove these results sets are exactly the same by using EXCEPT to make sure there are no rows in one set that aren’t in the other and then flipping it, like so:
SELECT * FROM orders o CROSS APPLY (SELECT * FROM customers c WHERE o.customerid = c.customerid) AS c WHERE c.companyname = 'Around the Horn' EXCEPT SELECT * FROM orders o JOIN customers c ON o.customerid = c.customerid WHERE c.companyname = 'Around the Horn'
Just run this exact query again swapping the SQL above the EXCEPT with the SQL below and make sure it has no results as well. If both of those queries have no results, then you know the results from each query are the same since EXCEPT will show any results that are in the top query but not in the bottom one.
So when is CROSS APPLY useful?
Remember how I said it can do more than a simple join? Joins are restricted to only joining two sets of data that could be queries independently of each other.
What if I said give me the three most recent orders for each customer?
Take a minute and think about how you would write that query. Go ahead and try to do it. I’ll wait.
There are a few ways to do it without using CROSS APPLY, but none of them are really very easy or perform very well.
Using CROSS APPLY it is simple though:
SELECT * FROM customers c CROSS APPLY ( SELECT TOP 3 o.orderdate , o.shipcity FROM orders o WHERE o.customerid = c.customerId order by o.orderdate desc ) as top3;
So CROSS APPLY is useful whenever you have some data that you would want to be able to join against, but are forced to do some kind of sub-query instead because the data you are trying to join is not going to map well against a single key.
The other instance in which CROSS APPLY will be useful is when you are doing a sub-select that has more than one value you would like to use in your final query.
For example if you were sub-selecting from an Order Details table to match up order ids that had a Quantity greater than 5, that sub-select would need to return exactly one column in order for you to use it in your where clause. If you wanted to use other columns from the sub-select, you would have to do another sub-select for each of these columns.
If you first try to rewrite the sub-select as a JOIN and find that you can’t, you may be able to write it as a CROSS APPLY.
How to know when to use CROSS APPLY
There isn’t a good solid rule you can use to identify when you should use a CROSS APPLY but having the knowledge of CROSS APPLY and how it works can help you when you are trying to tune queries and you are having a difficult time constructing one. It is another option you can try.
Here are some general guidelines of times when you might want to use CROSS APPLY:
- A query where the result set you want to JOIN against is in some way related to the data in the first set. (Example: one column in the first table tells you how many rows in the 2nd table to get)
- A query where you are doing a sub-query, but need more than one value from the sub-query
- Anywhere you are using a Common Table Expression (CTE) could possibly be rewritten as a CROSS APPLY
- A query that has a large set of data it is joining against and then filtering out. You can change it to a CROSS APPLY that does the filter in the CROSS APPLY statement.
- Any time you are trying to join against a table function (this is actually what CROSS APPLY was created for.)
As always, you can subscribe to this RSS feed to follow my posts on Making the Complex Simple. Feel free to check out ElegantCode.com where I post about the topic of writing elegant code about once a week. Also, you can follow me on twitter here.
Last time I wrote some code to allow paging of stored procedure results in SQL Server it sucked.
That was about 3-4 years ago.
I just wrote some code to do it again, and well… It still sucks.
Perhaps I’m doing it wrong, but if you are looking for, “how to implement paging in sql server”, or “how to page sql server stored procedure results”, or “sql server paging”, I’ll give you my best answer for it at this point.
How I roll with CTEs
CTEs or Common Table Expressions, are a pretty nice feature of SQL Server. They are a bit complex, but basically they let you create a query and treat it as a table to select from. They also allow recursive calls to themselves, which let you do cool things like query hierarchies of data.
This link offers a pretty good explanation of CTEs.
Anyway, CTEs come in handy combined with the ROW_NUMBER() function to add the ability to page data from an existing stored procedure.
The basic idea here is to modify the original query by wrapping it inside of a CTE that adds a RowNumber column.
Then we can select from that query the rows that are in the range we want.
Vegetables are good for you
Let’s look at an example:
SELECT name, goodnessfactor, color FROM vegetables WHERE color = 'purple' OR color = 'green'
Now let’s modify this query to be pageable given we have the first and last record we want. (first record = page number * rows per page, last record = page number + 1 * rows per page)
WITH pagedVegetables AS ( SELECT name, goodnessfactor, color, ROW_NUMBER() OVER (ORDER BY name) AS rownumber FROM vegetables WHERE color = 'purple' OR color='green' ) SELECT * FROM pagedVegetables WHERE rownumber > @firstRecord AND rownumber <= @lastRecord
Okay, so it is actually not that bad.
A couple of notes to help understand what happened here.
- ROW_NUMBER() requires an OVER clause which basically will determine how you calculate what the row number is. In this case, we are going to sort by name. So, the first alphabetical name will be row number 1, and so on.
- We’ve wrapped the query in a CTE and added an additional column so that we have a row number in our new virtual table. (Think of the CTE as creating a temporary view for us.)
- We are selecting everything out of the original query that is in the range of rows we want. (Depending on how you define your range you may use different equality operators. If you range is inclusive, you would do <= and >= , and if it is exclusive it would be < and >. In this case we are both.)
A word on total row count
If you need the total number of rows so that you can pass that back to the pager in your code, then you will probably have to select the results from the CTE into a temporary table and return two result sets from it.
I couldn’t really find a way to select the total rows and the data from a single CTE expression. If you know a way, please let me know.