SQL Server Indexing: Part 2 – Reading Query Plans

Welcome to part 2 of SQL Server Indexing, in part 1, we learned the basics of indexes and the difference between clustered and non-clustered indexes. In this part, we’re going to learn how to read and interpret query plans, which will help us decide where to put indexes (or how to restructure our queries).

Seeing Your Query Plans

There are 2 types of query plans, estimated and actual, and a few ways to access them

Estimated Execution Plan

Estimated Execution Plan will show you, without running the query, what SQL Server thinks it will do to retrieve the data your query is asking for. The ways to see it are:

  • In SSMS, highlight the query and press Ctrl+L
  • In SSMS, highlight the query and click the “Display Estimated Execution Plan” in the menu (it looks like 3 connected boxes and a triangle)
  • Run SET SHOWPLAN_TEXT ON or SET SHOWPLAN_XML ON and then run your query. It will show the plan and not run the query. Remember to set it back to OFF when you want to see actual query results
Display estimated execution plan button in SSMS

Actual Execution Plan

The actual execution plan will show you the plan that was used after the query has been run. The difference here is that it will be the real plan used by SQL Server as opposed to the one SQL Server thought it was going to use. Most of the time, however, the Estimated and Actual query plans will be the same.

To see the plan you can:

  • Click “Include Actual Execution Plan” in SSMS
  • Run SET STATISTICS PROFILE ON or SET STATISTICS XML ON and then run the query
Show actual execution plan in SSMS

Set Up FlowerShop

To demonstrate, I’m going to use the same FlowerShop database from the Memory Optimization tutorial a few articles back. Lets use the following query:

USE FlowerStore

SELECT
	s.Id AS SaleId,
	li.Quantity AS Quantity,
	f.FlowerName AS Flowers,
	li.Quantity * f.UnitPrice AS LineItemPrice,
	s.SaleDate AS SaleDate
FROM Sale s
JOIN LineItem li
ON li.SaleId = s.Id
JOIN Flower f
ON f.Id = li.FlowerId
WHERE s.SaleDate BETWEEN '2019-03-01' AND '2019-04-01'
ORDER BY s.SaleDate, s.Id;

Highlight the query and use one of the techniques above to look at the query plan. This is what mine looks like:

These are some impressive hieroglyphics, no? Believe it or not, they actually do mean something, and this is usually where you will start when doing performance tuning of often used queries. Let’s talk about what this plan is telling us and how to read it.

Side Note: If your estimated and actual execution plans are very different, you may need to update the statistics. You can run EXEC sp_updatestats

Read Right to Left for Data

Most people will tell you to read a query plan from right to left. This is true if you want a “data oriented” idea of the query plan. SQL Server prepares the data the way it is displayed from right to left, ie, the operations are happening in this sequence.

Read Left to Right for Logic

What you will often not hear about query plans, is that reading them from left to right is how you understand the logic of a query.

The Arrows

First, look at the arrows flowing from one icon to the next. This is representing the flow of data as SQL Server grabs it. Notice that the arrows vary in thickness. The thickness of these arrows represents the relative size of the data being passed from one operation to the other. If you hover your mouse over the the arrows, you can see information like Estimated Number of Rows, Estimated Row Size, and Estimated Data Size passed between operations:

Generally speaking, the further to the right you can push the thicker arrows, the faster your query will run. Logically, this means that the sooner we filter the data operated on, the better. When operations have fewer records to run against, the query will run faster. This is the most basic and generic concept of query tuning. Push data to the right as best you can.

The Icons

So, what do these icons mean? They represent the different kinds of operations taken against the data by SQL Server.

Many are self explanatory in what they “do” but the challenge is often figuring out why some operations are being used over others. When attempting to tune your queries or add indexes, it’s important to try to figure out which operations need to be tweaked and how you can do so. To do that, we first have to understand what these operations are.

Before we talk about the specific icons, let’s talk about the data you see when you hold your mouse over them. Mouseover the right-most icon (“Clustered Index Scan (Clustered)”) and take a look at the tool-tip. What do these things mean?

Most of this data is self explanatory or irrelevant. We’re going to talk about a few key metrics.

Anything With “Estimated Cost”: Ready to hear something that will sound like bad advice? Good: Ignore the estimated cost metrics

Why? Because first of all, when you run the query and look at the actual plan, you won’t see any “actual costs.” I know it seems counter intuitive, but try not to focus on these metrics, we’re going to focus on more relevant things when it comes to query optimization (and for now, just indexes). Everything else will work itself out.

Number of Rows Read: This one is obvious but it’s definitely something we’ll want to take note of. Sometimes we can shift the larger number over to more operations on the right side, which will make our queries faster because they will be filtered sooner.

Number of Executions: Again, this is kind of a gimme but it’s how many times the operation executes. Obviously, the lower the better.

Rebinds and Rewinds: These only apply to loops in the query plan resulting from joins, otherwise you will see that they are set to 0. You will also notice that rewinds + rebinds = number of rows.

The difference refers to whether the data retrieved in an iteration was collected via the same data as the previous iteration (a rewind) or new data (rebind).

Think of it as if each outer iteration of the loop is its own query. If I want all line items from order with Id of 1 and 2 then I am conceptually SELECTing all line items WHERE OrderId = 1. Say I got three rows back. The first row I grab from this query would represent a “rebind” because it is the first time I’m grabbing the data. But when I grab the second and third row, it’s a “rewind” because I’m still grabbing the line items with an order Id of 1.

Once I’ve gotten all that data (“I” meaning SQL Server trying to retrieve the data you asked for), I will now SELECT the line items with order Id of 2. In this case, the first record I get back will be via a rebind because even though it’s still ultimately part of the same result set, it’s a new result from the “outer” part of the loop. If this doesn’t make sense, please comment and I’ll try to explain better.

Now that covers the important metrics, lets talk about the different icons. From right to left in our query plan:

Clustered Index Scan: In short, this means that all rows of an index are being read. Remember the clustered index is usually the table’s primary key, so SQL Server is running through the whole Id column.

Compute Scalar: This icon represents the operation of making one value out of another. In the case of this query plan, this is the part responsible for the li.Quantity * f.UnitPrice part.

Hash Match: This operation is mostly used to join tables together. This operator is usually implemented when SQL Server must join two large, un-ordered data sets, whereas the nested loop will be used when small data sets are slung around.

Sort: Sorts are exactly what they say, the operation putting the data in the order you asked for. This is an expensive operation, and often may be the most expensive. In a high-use, performance-critical query, make sure you need the sorts you have.

Other Icons

Spools: These are common operators when JOINs are part of your query. Spools can get tricky because they are essentially temp tables that store data for later use in the query. Usually, SQL Server will implement spools if your indexes are lacking.

Lookups: Key lookups happen when your query has to get additional data from data it pulled from a non-clustered index. Remember last time when we talked about non-clustered indexes how you could INCLUDE certain columns? Key lookups happen when the query is asking for columns that could be found in that non-clustered index, but which weren’t added. We can make them disappear by adding them to the non clustered-index.

Table Scan: This is what it’s called when the query reads all the data pages of a table to find the rows belonging to the result set. Scans get a bad rap, and they can be a sign of poor design, but they are often not the boogeymen people make them out to be. Sometimes a table scan is necessary.

Too much scanning can squeeze an application’s ability to scale, however, and the query strategy needs to be revisited. If a database has grown exponentially since its original inception, but these scans are still necessary, you may want to think about partitioning your data, which I will discuss in another article.

Coming Up

This article is a bit longer than I wanted it to be, and we’ve certainly not covered everything there is to know about the query plan. This article series, however, is aiming to explain how to best employ indexes, and for this goal, I believe we’ve covered enough about execution plans to move on.

In the next article, we’ll finish our series with implementation and testing strategies. After that, you will (hopefully) be able to know whether or not an index is the answer to your performance problems. After that, we will either take a quick break from databases, or we’ll double down and talk about more optimization strategies. I haven’t decided yet hmmmmm.


Posted

in

by