Performance Tuning Series – Writing Efficient SQL

Welcome back to our Performance Tuning Series. In the previous post, we discussed Indexing Strategies which is a key component in having optimal query performance. We’ll continue in the series with a post on writing efficient SQL to expand on even better performance for your SQL Server queries.

Optimizing SQL queries is at the heart of database performance. Even with the best hardware and properly indexed tables, poorly written queries can slow down your SQL Server. Efficient SQL writing isn’t just about getting the right results; it’s about getting them quickly and with minimal resource consumption. Here’s how you can write SQL queries that are both efficient and effective.

Key Principles of Query Optimization

  1. Understand SQL Server Execution Plans:
    • An execution plan is a roadmap of how SQL Server will execute a query. It shows the steps taken to retrieve the data, including the order of operations, the use of indexes, and the estimated cost of each operation.
    • Use the “Include Actual Execution Plan” feature in SQL Server Management Studio (SSMS) to analyze how SQL Server processes your queries. Look for signs of inefficiency such as table scans (where the server reads every row in a table) and key lookups (which occur when a non-clustered index doesn’t cover all the columns needed by the query).
    • Pay attention to the “cost” associated with each step in the execution plan. High-cost operations are potential areas for optimization, such as converting table scans into index seeks.
  2. Try to avoid SELECT * :
    • While it’s tempting to use SELECT * to fetch all columns from a table, it can lead to unnecessary data retrieval, increased IO, and slower performance, especially in tables with many columns or large data types.
    • Instead, specify only the columns you need in the SELECT statement. This reduces the amount of data transferred and processed, leading to faster query execution and reduced network traffic.
    • Specifying columns also makes your queries easier to maintain. If the table schema changes (e.g., a column is added or removed), your query is less likely to break or produce unexpected results.
  3. Use Proper JOINs:
    • Understanding and using the correct type of JOINs in your queries is crucial for performance. The most commonly used JOINs are INNER JOIN, LEFT JOIN, and RIGHT JOIN, each serving different purposes.
    • INNER JOIN: Returns only the rows where there is a match in both joined tables. This is the most efficient JOIN type in most cases.
    • LEFT JOIN (or LEFT OUTER JOIN): Returns all rows from the left table and matched rows from the right table, with NULLs for non-matching rows. While useful, LEFT JOINs can be more resource-intensive, especially if the right table is large and has many unmatched rows.
    • RIGHT JOIN: Similar to LEFT JOIN, but returns all rows from the right table and matched rows from the left. It is less commonly used and can often be rewritten as a LEFT JOIN by reversing the order of the tables.
    • CROSS JOIN: Returns the Cartesian product of the two tables, which means every row in the first table is combined with every row in the second table. This JOIN type is usually not recommended unless you explicitly need all possible combinations, as it can result in an extremely large dataset and high resource usage.
    • Ensure that you have appropriate indexes on the columns used in the JOIN conditions to prevent full table scans.
  4. Leverage Query Hints Carefully:
    • SQL Server provides query hints that can override the default behavior of the query optimizer. While these can be powerful tools in specific scenarios, they should be used cautiously, as they can sometimes force suboptimal execution plans.
    • FORCESEEK: Forces SQL Server to use an index seek instead of a scan. This hint can be helpful when you know an index exists and should be used, but SQL Server is choosing a scan instead.
    • NOLOCK: Allows reading uncommitted data (dirty reads). This hint can improve performance in scenarios where slight inaccuracies are acceptable, but it should be used with caution as it can lead to inconsistent results.
    • MAXDOP: Controls the maximum degree of parallelism for a query, limiting how many CPU cores can be used. This can be useful for balancing query performance with overall server workload.
    • Use query hints as a last resort after ensuring that the query structure and indexes are optimized.
  5. Optimize Subqueries and CTEs:
    • Subqueries and Common Table Expressions (CTEs) are powerful tools in SQL but can be performance traps if not used wisely.
    • Subqueries: These can be used in SELECT, WHERE, or FROM clauses. While they are convenient, subqueries in the WHERE clause (known as correlated subqueries) can significantly slow down query performance, as they are often executed for every row in the outer query. Whenever possible, refactor correlated subqueries into JOINs or use EXISTS instead of IN to check for the existence of rows.
    • CTEs: CTEs can make complex queries easier to read and maintain, but they are not inherently optimized. Treat CTEs as you would any other subquery: be mindful of their performance impact, especially in recursive CTEs, which can be resource-intensive.
    • For both subqueries and CTEs, consider whether the same logic could be implemented using derived tables, JOINs, or even temporary tables, which can sometimes offer better performance.
  6. Parameterize Queries:
    • Parameterized queries allow you to reuse execution plans, improving performance and reducing the risk of SQL injection attacks.
    • Avoid using dynamic SQL, where queries are built as strings with hard-coded values. Dynamic SQL not only makes queries more prone to SQL injection but also prevents SQL Server from caching execution plans, leading to slower performance.
    • Use parameters to pass values into your queries. SQL Server can cache the execution plan for parameterized queries, leading to faster subsequent executions.
  7. Manage Temporary Objects Wisely:
    • Temporary tables, table variables, and Common Table Expressions (CTEs) are often used to store intermediate results within a query. Each has its performance implications, and choosing the right one is critical.
    • Temporary Tables: These are physically created in the TempDB database and can be indexed, making them suitable for storing large result sets or performing complex operations. However, they can be slower to create and access due to disk IO.
    • Table Variables: Stored in memory, table variables are faster for small datasets but lack statistics, which can lead to suboptimal execution plans. They are a good choice when dealing with small, temporary datasets.
    • CTEs: As mentioned earlier, CTEs are great for improving query readability but should be used carefully. They do not persist results, meaning the underlying query is re-executed every time the CTE is referenced, potentially leading to performance issues with complex CTEs.
    • Choose the appropriate temporary object based on the size of your dataset and the complexity of your operations. Monitor the performance impact of these objects using tools like SQL Profiler or execution plans.
  8. Optimize Sorting and Filtering:
    • Sorting and filtering data can be resource-intensive, especially on large datasets. To optimize these operations:
    • Indexes: Ensure that your indexes support the sort and filter operations. For example, if you frequently order by a particular column, consider creating an index on that column.
    • WHERE Clauses: Be mindful of the conditions in your WHERE clauses. Use indexes to avoid full table scans, and consider using EXISTS instead of IN when checking for the existence of rows, as it often provides better performance.
    • GROUP BY and HAVING: GROUP BY can be expensive, especially when used with large datasets. Ensure that the columns used in GROUP BY are indexed, and avoid using HAVING unless absolutely necessary, as it filters data after the aggregation, leading to additional processing.
  9. Limit the Use of Cursors:
    • Cursors allow you to iterate through rows one at a time, but they can be slow and resource-intensive, as they process each row individually rather than in sets.
    • Whenever possible, replace cursors with set-based operations, which are far more efficient in SQL Server. For example, use a single UPDATE statement to modify multiple rows instead of iterating through rows with a cursor.

Conclusion

Optimizing SQL queries is an essential skill for any SQL Server professional. By writing efficient queries, you can significantly reduce the load on your database, speed up data retrieval, and improve the overall performance of your applications. Following these best practices for query optimization will help ensure that your SQL Server databases operate at peak efficiency, providing fast and reliable access to the data your business depends on.

Using ChatGPT with SQL Server

If you’re in the tech field you’ve probably heard the buzz around ChatGPT and Artificial Intelligence.

ChatGPT is a pre-trained language model developed by OpenAI. It uses deep learning techniques, specifically a transformer neural network architecture, to generate human-like text. It can be fine-tuned on specific tasks such as language translation, question answering, and text summarization. ChatGPT can be used to generate a wide range of text, including conversation, news articles, stories, and more. Because it is pre-trained on a large corpus of text, it can generate text that is often indistinguishable from text written by a human.

To preview ChatGPT, shoot over to https://openai.com, create a login and you can try it for free (at the time this post was written).

For this post, since I’m a DBA, I want to test how it performs when I give it questions about SQL statements.

So let’s start! I asked the following question:

“How can ChatGPT be used to learn TSQL?”

And here’s a great response:

ChatGPT can be used to learn TSQL in several ways:

  1. You can use ChatGPT to generate TSQL code snippets and queries based on natural language prompts. For example, you can ask ChatGPT to "generate a TSQL query to select all columns from a table named 'customers'" and it will respond with the appropriate TSQL code.
  2. You can also use ChatGPT to explain TSQL concepts and provide examples. For example, you can ask ChatGPT to "explain how to use the JOIN clause in TSQL" and it will provide a detailed explanation and examples of how to use the JOIN clause in TSQL.
  3. You can also train ChatGPT model on TSQL text data and fine-tune the model for generating TSQL queries, this way you can generate more accurate and complex TSQL queries.
  4. You can use ChatGPT as a code completion tool to assist you in writing TSQL code.

 

Not a bad start. Let’s see if it can create a database.

“Can you write SQL code to create a SQL database named AI?”

Look correct. Let’s test in SQL Management Studio.

OK, OK, that was too easy ChatGPT. Let’s see if it can create a table in our new database and insert some dummy data.

“Can you write SQL code to create a table called Orders in my AI database and populate with some test data?”

Let’s test the code.

Looks good. How about a basic SELECT statement?

“Can you write SQL code to give me all data from my Orders table?”

Again, looks good. It even goes as far as explaining it and giving an example by only selecting specific rows. This is great for learning.

Let’s see if it can create an aggregate query.

“Can you write SQL code to give me the Customer that has the highest TotalAmount from the Orders table?”

From my question, I was hoping for something a little different, but I can’t get mad at the results. The query uses LIMIT 1 instead of SELECT TOP 1 which would work in MySQL, but doesn’t exist in SQL Server. It was really close though and it even gives me an example and explanation of how this would be used in a JOIN.

Let’s be more specific. If I change the question to specify SQL Server:

“Can you write SQL code to give me the Customer that has the highest TotalAmount from the Orders table in SQL Server?”

Again, not really what I was looking for, but it works. ChatGPT ended up writing more code than it needed and even wrote this statement in a common table expression (CTE). I was looking for something more along the lines of this:

SELECT TOP 1 CustomerID, MAX(TotalAmount) as MaxAmount
FROM Orders
GROUP BY CustomerID
ORDER BY MaxAmount DESC

Either way, both statements work and produce the same results.

How about something a little more difficult such as creating a partition. Partitions are heavily used in other database platforms so I’m going to specify SQL Server again in this question.

“Can you write T-SQL code to partition my Orders table, OrderDate column by year on SQL Server?”

This was a little more challenging and it wrote out the Partition Function and Partition Scheme statements correctly, but it added a OrderDateRange column as an integer and then it tries to create a clustered index where OrderDate is datetime and OrderDateRange is int so the end result is a failure.

All in all, I think this is a great tool for learning basic (and even some advanced) SQL, but it still has some bugs to work out before it tries to replace me. 😉

 

Using Bookmarks in Power BI

Using bookmarks in Power BI help you capture the currently configured view of a report page, including filtering and the state of visuals, and later let you go back to that state by simply selecting the saved bookmark.

You can also create a collection of bookmarks, arrange them in the order you want, and subsequently step through each bookmark in a presentation to highlight a series of insights, or the story you want to tell with your visuals and reports.

In this post we’ll quickly go over how to create a few bookmarks and view them as a slideshow if you will.

I’m going to use my March Madness Report I created in an earlier post. Once my report is opened in Power BI Desktop, I’m going to click on the View tab in the ribbon and select “Bookmarks Pane”

Bookmarks Pane

This should bring up a new Bookmarks pane inside PBI Desktop:

Bookmarks

Remember, bookmarks are used to capture the current view of the report so I’m going to use the default view where I’m showing all data and I’m going to name the bookmark “Home”. Make sure all filters are selected to show all data and click Add under the bookmark pane. This will create a new Bookmark, named Bookmark 1. Click the ellipsis and select rename to rename the bookmark appropriately.

Next, I like North Carolina, so I’m going to go to my Team Filter and choose North Carolina which will show me data for only this team.

Team Filtered Power BI Report

In my bookmark pane, I’m going to click Add again and rename to North Carolina.

Next, I want to view data on North Carolina from 2000 to present so I’ll change the Year Filter.

Team and Year Filtered Power BI Report

In my bookmark pane, I’m going to click Add again and rename to North Carolina 2000-present.

Now, if I click on any of bookmarks, it will take me to the data that was saved for each. This is a great way to present data in a meeting/conference so you don’t have to manually change the filters during the engagement.

We can also click the View button in the Bookmark pane to view a slideshow using the arrows at the bottom to navigate:

Creating a Dynamic Date Range Title in Power BI

Creating a dynamic title in Power BI helps present the data and let’s the viewers know what the data is filtered on. In this post I’ll go over how to do this…

I have a sales report that I’d like to add a title that is based on the Order Date Slicer. Currently, the title is static text “Sales Report”

Sales Report

To create my dynamic title, I’ll first need to create a measure table that has my Order Date data. In this case, that table is FactInternetSales and the column is OrderDate.

To create a measure, click New Measure in the Power BI Desktop ribbon

Power BI Desktop Ribbon

Next, you’ll see a window where you can type code. In this example, I’ll use the following DAX

Next, you’ll see a window where you can type code. In this example, I’ll use the following DAX

Order Date Title = “Sales For ” &
MIN ( FactInternetSales[OrderDate] ) & ” to “
& MAX ( FactInternetSales[OrderDate] )

Let’s walk through this real quick.

The first line (Order Date Title = “Sales For “ &) is basically naming the measure and adding the beginning text for the title.
The second line (MIN ( FactInternetSales[OrderDate] ) & “ to “) is finding the minimum order date from FactInternetSales.OrderDate and then adding the “to” text.
The last line (MAX ( FactInternetSales[OrderDate] ) is finding the maximum order date from FactInternetSales.OrderDate.

This one was pretty easy. Once I’ve typed my DAX, hit the checkmark to make sure there are no errors and the click off screen.

DAX

Our measure has been created! Let’s go back and find it under the FactInternetSales fields pane.

Power BI Fields

Next, let’s click on the Card Visualization and move and size it appropriately to fit in our title space.

Card Visualization

While the card is highlighted, click on the new measure from the Fields pane and it will populate the card with the measure we created.

The only thing left to do is format the title and we’re all set! If we change the Order Date Slicer, you’ll notice the title changes with the date. See live example at the beginning of this post.

March Madness Power BI Dashboard

At the beginning of the year I set a goal to learn something new. I’ve always loved business intelligence and bringing data to life in the form of dashboards and charts so for the 1st half of the year I wanted to focus on Microsoft’s Power BI. I’m not going to explain what Power BI is, but if you want to read up on it go here: https://powerbi.microsoft.com/en-us/

This post is just going to show off my dashboard. ? See live example above.

I’m a huge sports fan and the best time of the year happens to fall in March. Besides my birthday being in March, it’s also March Madness. Hours and hours of basketball. I could of used AdventureWorks for my dataset, but I wanted to use something I’m interested in. I found some data containing every NCAA tournament game result since 1985 (when the tournament was expanded to the 64 team bracket). The dataset contains the year, round (1-6), seed of the teams (1-16), region (1-4) and the scores. Perfect. Let’s use this to create a dashboard.

There’s not a ton of data, but I used what I could and tried to answer some questions around wins and upsets. Here’s a screenshot of the final product:

March Madness Power BI Dashboard

You can see Wins By Team (Duke with 93, North Carolina with 78, etc), Wins by Seed, National Championships, and Upsets vs Wins by Year. You can also see that a total of 2142 games have been played with 199 different teams in the tournament.

This was really fun and answers a lot of the questions I was thinking in my head while designing. The top left corner also has slicers which help filter the data. For example, if I wanted to see only the data for 2015 I could change the Year slicer to 2015 and it would update all my visualizations:

March Madness Power BI Dashboard filtered by year 2015

You can see that Duke won the National Championship from the National Championships visualization. If you hover over the Wins and Upsets visualization, you’ll see there were 30 upsets out of 63 games.

Let’s say I want to view data for a certain Team. Let’s choose Alabama Crimson Tide. If I change the Team slicer to Alabama I can see some data based around this team.

March Madness Power BI Dashboard filtered by team Alabama Crimson Tide

Alabama has won 19 NCAA tournament games, 0 national championships, has been a 5 or 7 seed 21% of the time and they’ve had a few upsets along the way. Not bad for a football school.

What about data for the National Championship game? I can change the Round slicer to 6, which is the National Championship round and view the data this way.

March Madness Power BI Dashboard filtered by Championship game

I can see out of 34 games, there has only been 16 different teams make the National Championship. Duke leads the way with 6, followed by North Carolina and Connecticut with 4. The 1 seed has played in this game 59% of the time, and there were upsets in 1988, 1990, 1997, 2003, 2006, and 2016.

We can also click on the visualizations themselves to view data. For example, if we reset our slicers to show all data and click on the #1 seed in the Wins By Seed Donut Chart we see the following:

March Madness Power BI Dashboard filtered by #1 seed

We can see that the #1 seed has played in 419 games with a total of 41 different teams. Duke has won 51 games as the #1 seed while North Carolina has won 46. Duke has also won the National Championship 4 times as the #1 seed and in 1999 the #1 seed won 17 games which is the highest.

Really cool stuff. I loved working on this project and working with this data.

SQL Server Management Studio 2015

Microsoft SQL Server Management Studio is a product I use throughout the day, every day. I’ve tried using other environments to access and manage SQL Server, but SSMS is what I learned using and what I always go back to. The problem with SSMS was that you couldn’t download it individually. You always had to have a licensed copy of SQL Server or install SQL Server Express with Tools to get this…..until now!

Microsoft has finally released a standalone download of SSMS. This release supports SQL Server 2016 through SQL Server 2005. It also provides the greatest level of support when working with Azure.

Some enhancements include:

  • New SSMS Installer – SSMS can now be installed with a light weight stand-alone web installer.
  • SSMS Updates – Receive notification within SSMS when new updates are available, and choose to install them at your convenience.
  • Enhanced SSMS support for Azure SQL Database – Several fixes and enhancements, including expanded SQL Server Management Objects (SMO) coverage, and an updated Import/Export wizard.

Let’s step through the install:

Download the SSMS-Web-Setup.exe from here.

Start the install, agree to the license terms, and click Install.SSMS 2015 SQL Freelancer

 

It should take a few minutes, but once this completes restart your computer and that’s it! Easy install.

SSMS 2015 SQL Freelancer 2

 

 

 

 

 

Passing Multiple Values into a Variable

Passing multiple values into a variable is a little more difficult than it should be. In other languages you can use functions such as Lists or Arrays, but SQL makes it a bit more complicated. To show you what I mean, let’s look at an example.

First, let’s create a Demo table:

CREATE TABLE [dbo].[Demo](
[ID] [int] NULL,
[Name] [varchar](50) NULL,
[Address] [varchar](50) NULL,
[State] [varchar](50) NULL,
[Zip] [int] NULL
)

Next, populate it with some data:

INSERT INTO [dbo].[Demo]
VALUES (1, 'Brady', '123 Main Street', 'TN', 12345)

 

INSERT INTO [dbo].[Demo]
VALUES (2, 'Tommy', '124 Main Street', 'TN', 12345)

 

INSERT INTO [dbo].[Demo]
VALUES (3, 'Jonny', '125 Main Street', 'TN', 12345)

Now that we have some data, let’s try a query using variables. I want to define a variable on the column ID.

DECLARE @MultipleValue varchar(200)
SET @MultipleValue = '1,2'

SELECT * FROM Demo WHERE ID IN (@MultipleValue)

After running this query, I get 0 results and an error:

Msg 245, Level 16, State 1, Line 24
Conversion failed when converting the varchar value ‘1,2’ to data type int.

Why? I know the ID’s 1 and 2 are in the table, but SQL is looking at this variable as one string. So unless I have 1,2 in the same ID column, it will show 0 results.

One way to get around this is to use a UDF, user defined function. In this function, we’re going to convert the comma separated values (1,2) into a table, then query from that.

CREATE FUNCTION [dbo].[MultipleValues] (@InStr VARCHAR(MAX))
RETURNS @TempTable TABLE
(id int not null)
AS
BEGIN

SET @InStr = REPLACE(@InStr + ',', ',,', ',')
DECLARE @SP INT
DECLARE @VALUE VARCHAR(1000)
WHILE PATINDEX('%,%', @INSTR ) <> 0

BEGIN

SELECT @SP = PATINDEX('%,%',@INSTR)
SELECT @VALUE = LEFT(@INSTR , @SP - 1)
SELECT @INSTR = STUFF(@INSTR, 1, @SP, '')
INSERT INTO @TempTable(id) VALUES (@VALUE)
END
RETURN
END
GO

Now that we have a UDF, let’s use this in the query:

DECLARE @MultipleValue varchar(200)
SET @MultipleValue = '1,2'

SELECT * FROM Demo WHERE ID IN (SELECT * FROM dbo.MultipleValues(@MultipleValue))

Ta da!

We now have two results. ID 1 and 2:

Passing multiple values into a variable