Performance Tuning Series – Writing Efficient SQL

Welcome back to our Performance Tuning Series. In the previous post, we discussed Indexing Strategies which is a key component in having optimal query performance. We’ll continue in the series with a post on writing efficient SQL to expand on even better performance for your SQL Server queries.

Optimizing SQL queries is at the heart of database performance. Even with the best hardware and properly indexed tables, poorly written queries can slow down your SQL Server. Efficient SQL writing isn’t just about getting the right results; it’s about getting them quickly and with minimal resource consumption. Here’s how you can write SQL queries that are both efficient and effective.

Key Principles of Query Optimization

  1. Understand SQL Server Execution Plans:
    • An execution plan is a roadmap of how SQL Server will execute a query. It shows the steps taken to retrieve the data, including the order of operations, the use of indexes, and the estimated cost of each operation.
    • Use the “Include Actual Execution Plan” feature in SQL Server Management Studio (SSMS) to analyze how SQL Server processes your queries. Look for signs of inefficiency such as table scans (where the server reads every row in a table) and key lookups (which occur when a non-clustered index doesn’t cover all the columns needed by the query).
    • Pay attention to the “cost” associated with each step in the execution plan. High-cost operations are potential areas for optimization, such as converting table scans into index seeks.
  2. Try to avoid SELECT * :
    • While it’s tempting to use SELECT * to fetch all columns from a table, it can lead to unnecessary data retrieval, increased IO, and slower performance, especially in tables with many columns or large data types.
    • Instead, specify only the columns you need in the SELECT statement. This reduces the amount of data transferred and processed, leading to faster query execution and reduced network traffic.
    • Specifying columns also makes your queries easier to maintain. If the table schema changes (e.g., a column is added or removed), your query is less likely to break or produce unexpected results.
  3. Use Proper JOINs:
    • Understanding and using the correct type of JOINs in your queries is crucial for performance. The most commonly used JOINs are INNER JOIN, LEFT JOIN, and RIGHT JOIN, each serving different purposes.
    • INNER JOIN: Returns only the rows where there is a match in both joined tables. This is the most efficient JOIN type in most cases.
    • LEFT JOIN (or LEFT OUTER JOIN): Returns all rows from the left table and matched rows from the right table, with NULLs for non-matching rows. While useful, LEFT JOINs can be more resource-intensive, especially if the right table is large and has many unmatched rows.
    • RIGHT JOIN: Similar to LEFT JOIN, but returns all rows from the right table and matched rows from the left. It is less commonly used and can often be rewritten as a LEFT JOIN by reversing the order of the tables.
    • CROSS JOIN: Returns the Cartesian product of the two tables, which means every row in the first table is combined with every row in the second table. This JOIN type is usually not recommended unless you explicitly need all possible combinations, as it can result in an extremely large dataset and high resource usage.
    • Ensure that you have appropriate indexes on the columns used in the JOIN conditions to prevent full table scans.
  4. Leverage Query Hints Carefully:
    • SQL Server provides query hints that can override the default behavior of the query optimizer. While these can be powerful tools in specific scenarios, they should be used cautiously, as they can sometimes force suboptimal execution plans.
    • FORCESEEK: Forces SQL Server to use an index seek instead of a scan. This hint can be helpful when you know an index exists and should be used, but SQL Server is choosing a scan instead.
    • NOLOCK: Allows reading uncommitted data (dirty reads). This hint can improve performance in scenarios where slight inaccuracies are acceptable, but it should be used with caution as it can lead to inconsistent results.
    • MAXDOP: Controls the maximum degree of parallelism for a query, limiting how many CPU cores can be used. This can be useful for balancing query performance with overall server workload.
    • Use query hints as a last resort after ensuring that the query structure and indexes are optimized.
  5. Optimize Subqueries and CTEs:
    • Subqueries and Common Table Expressions (CTEs) are powerful tools in SQL but can be performance traps if not used wisely.
    • Subqueries: These can be used in SELECT, WHERE, or FROM clauses. While they are convenient, subqueries in the WHERE clause (known as correlated subqueries) can significantly slow down query performance, as they are often executed for every row in the outer query. Whenever possible, refactor correlated subqueries into JOINs or use EXISTS instead of IN to check for the existence of rows.
    • CTEs: CTEs can make complex queries easier to read and maintain, but they are not inherently optimized. Treat CTEs as you would any other subquery: be mindful of their performance impact, especially in recursive CTEs, which can be resource-intensive.
    • For both subqueries and CTEs, consider whether the same logic could be implemented using derived tables, JOINs, or even temporary tables, which can sometimes offer better performance.
  6. Parameterize Queries:
    • Parameterized queries allow you to reuse execution plans, improving performance and reducing the risk of SQL injection attacks.
    • Avoid using dynamic SQL, where queries are built as strings with hard-coded values. Dynamic SQL not only makes queries more prone to SQL injection but also prevents SQL Server from caching execution plans, leading to slower performance.
    • Use parameters to pass values into your queries. SQL Server can cache the execution plan for parameterized queries, leading to faster subsequent executions.
  7. Manage Temporary Objects Wisely:
    • Temporary tables, table variables, and Common Table Expressions (CTEs) are often used to store intermediate results within a query. Each has its performance implications, and choosing the right one is critical.
    • Temporary Tables: These are physically created in the TempDB database and can be indexed, making them suitable for storing large result sets or performing complex operations. However, they can be slower to create and access due to disk IO.
    • Table Variables: Stored in memory, table variables are faster for small datasets but lack statistics, which can lead to suboptimal execution plans. They are a good choice when dealing with small, temporary datasets.
    • CTEs: As mentioned earlier, CTEs are great for improving query readability but should be used carefully. They do not persist results, meaning the underlying query is re-executed every time the CTE is referenced, potentially leading to performance issues with complex CTEs.
    • Choose the appropriate temporary object based on the size of your dataset and the complexity of your operations. Monitor the performance impact of these objects using tools like SQL Profiler or execution plans.
  8. Optimize Sorting and Filtering:
    • Sorting and filtering data can be resource-intensive, especially on large datasets. To optimize these operations:
    • Indexes: Ensure that your indexes support the sort and filter operations. For example, if you frequently order by a particular column, consider creating an index on that column.
    • WHERE Clauses: Be mindful of the conditions in your WHERE clauses. Use indexes to avoid full table scans, and consider using EXISTS instead of IN when checking for the existence of rows, as it often provides better performance.
    • GROUP BY and HAVING: GROUP BY can be expensive, especially when used with large datasets. Ensure that the columns used in GROUP BY are indexed, and avoid using HAVING unless absolutely necessary, as it filters data after the aggregation, leading to additional processing.
  9. Limit the Use of Cursors:
    • Cursors allow you to iterate through rows one at a time, but they can be slow and resource-intensive, as they process each row individually rather than in sets.
    • Whenever possible, replace cursors with set-based operations, which are far more efficient in SQL Server. For example, use a single UPDATE statement to modify multiple rows instead of iterating through rows with a cursor.

Conclusion

Optimizing SQL queries is an essential skill for any SQL Server professional. By writing efficient queries, you can significantly reduce the load on your database, speed up data retrieval, and improve the overall performance of your applications. Following these best practices for query optimization will help ensure that your SQL Server databases operate at peak efficiency, providing fast and reliable access to the data your business depends on.

The Misconception of Slow Queries in SQL Server

 

As a SQL Server DBA, one of my regular tasks is to monitor and optimize the performance of SQL Server databases. A common tool in my arsenal is a script that retrieves the top 10 slowest queries. However, there’s a crucial misconception that needs to be addressed: just because a query appears in this list doesn’t necessarily mean it’s slow or inefficient.

Understanding the Top 10 Slowest Queries

When we ask SQL Server to provide the top 10 slowest queries, it’s essential to understand what we’re actually requesting. The server is simply returning the queries that have the longest duration or highest resource usage among those that were executed during the monitoring period. This does not automatically imply that these queries are poorly written or unoptimized.

Why Optimal Queries Can Appear in the Top 10

There are several reasons why perfectly optimized queries might show up in your top 10 slowest queries list:

  1. High Frequency of Execution: A query that is executed very frequently may have a cumulative duration that places it in the top 10, even if each individual execution is fast.
  2. Data Volume: Queries that operate on large datasets may naturally take longer to execute, even if they are optimized for efficiency.
  3. System Load: External factors such as system load, network latency, or resource contention can temporarily increase the execution time of queries.
  4. Nature of the Task: Some queries are inherently time-consuming due to the complexity of the task they perform, such as complex joins, aggregations, or calculations.

Interpreting the Results

When you identify a query in the top 10 slowest list, it’s important to analyze it in context. Consider the following:

  • Execution Plan: Review the execution plan to check for any inefficiencies or areas for improvement.
  • Frequency of Execution: Determine how often the query is executed and whether its cumulative impact is significant.
  • Data Volume: Assess whether the query is handling large volumes of data and if there are ways to reduce the dataset size.
  • Resource Usage: Look at the resources the query is consuming and explore ways to optimize resource utilization.
  • Comparison with Baselines: Compare the query’s performance with historical baselines to identify any anomalies or trends.

In summary, appearing in the top 10 slowest queries list doesn’t automatically condemn a query as slow or inefficient. It’s a starting point for further investigation and analysis. As a SQL Server DBA, my role is to dig deeper, understand the context, and make informed decisions about optimization. By doing so, we can ensure that our databases run smoothly and efficiently, supporting the needs of our organizations.

Correct SQL Server TempDB Spills in Query Plans Caused by Outdated Statistics

Statistics are an integral part of SQL Server and query performance. In short, the query optimizer uses statistics to create query plans that will improve the overall performance of the queries ran. Each statistic object is created on a list of one or more table columns and includes a histogram displaying the distribution of values in the first column. The histogram can have up to 200 steps, but no more regardless of the number of rows in the column or index.

In this post we’ll take a look at one specific performance issue that you might find in an execution plan of a query. If you’ve ever noticed the following warning, then this post is for you:

1 - TempDB spills
Within the AdventureWorks2014 DB, I’ll use the following query for my example:

SELECT BusinessEntityID, FirstName, LastName, EmailPromotion
FROM [AdventureWorks2014].[Person].[Person]
WHERE EmailPromotion > 0
ORDER BY LastName

Looking at this query I can already tell contention may be present so I’ll go ahead and add a covering index:

CREATE NONCLUSTERED INDEX [IX_Person_EmailPromotion_INCLUDES]
ON [Person].[Person] ([EmailPromotion])
INCLUDE ([BusinessEntityID],[FirstName],[LastName])

When adding the index above, statistics were automatically created and updated. Since the addition of this index I’ve added a few thousand rows to the Person table.

Let’s run the query and make sure the “Include Actual Execution Plan” button is selected.

2 - TempDB spills

After the query executes let’s take a look at the execution plan by clicking on the tab in the Results pane:

3 - TempDB spills
These warnings were added to SQL Server Management Studio 2012 so if you’re using an older version you may not see this. The spill data to TempDB warning means that the query was not granted enough memory to finish the operation and spilled over into the TempDB to complete the operation. We all know reading from memory is much faster than reading from disk and this is exactly what is happening here. The query read as much as it could from memory before moving over to the TempDB disk.

Click here to view the rest of this post.