Performance Tuning Series – Monitoring and Alerts: Staying Ahead of Issues

A proactive approach to monitoring and alerting is key to maintaining SQL Server performance. Monitoring provides real-time insight into the database’s health, resource consumption, and potential bottlenecks, while alerts enable prompt responses to issues before they impact users. SQL Server’s built-in tools, along with third-party solutions, offer effective ways to track key performance metrics, identify anomalies, and set up automated alerts to help stay ahead of performance problems.

Why Monitoring and Alerts Matter for Performance Optimization

Constantly changing workloads, hardware constraints, and system settings make SQL Server performance unpredictable. Monitoring helps identify trends and sudden deviations in performance, allowing for preventive measures or optimizations before they become critical. Alerts provide early warnings for issues like high CPU usage, long-running queries, or approaching storage limits, enabling quick intervention.

Key Monitoring and Alerting Best Practices for SQL Server

1. Define Key Performance Indicators (KPIs)

Before setting up a monitoring system, it’s essential to establish which metrics or KPIs best represent your SQL Server’s health and performance. These indicators should cover areas such as CPU, memory, disk usage, and specific SQL Server metrics.

  • Best Practice:
    • Track essential KPIs, including CPU utilization, memory usage, disk I/O, query wait times, and buffer cache hit ratio.
    • For databases with heavy I/O demands, monitor page life expectancy (PLE), which shows how long a data page stays in the buffer cache.
    • Keep an eye on log growth and transaction log space usage for signs of excessive logging, which may indicate poorly optimized queries or large transactions.

2. Use Dynamic Management Views (DMVs) for Real-Time Monitoring

SQL Server’s Dynamic Management Views (DMVs) provide valuable insights into system performance by capturing real-time data on query execution, memory usage, and index efficiency. DMVs are essential for identifying specific issues that require immediate attention.

  • Best Practice:
    • Use DMVs like sys.dm_exec_query_stats to identify long-running or resource-intensive queries and sys.dm_os_wait_stats to understand wait types and locate bottlenecks.
    • Regularly analyze index usage using sys.dm_db_index_usage_stats to determine which indexes are frequently used and which are candidates for removal.
    • Automate DMV queries to gather data at regular intervals and retain historical performance data for trend analysis.

3. Set Up SQL Server Performance Alerts

SQL Server Agent allows you to configure alerts for specific events, such as high CPU or memory usage, job failures, or database connection issues. Alerts can be sent via email or logged to a table for review.

  • Best Practice:
    • Configure alerts for CPU usage exceeding 80%, memory pressure warnings, blocked processes, disk space thresholds, and failed SQL Agent jobs.
    • Set up alerts for critical wait types, like PAGEIOLATCH for disk-related bottlenecks or LCK_M_* for blocking and locking issues.
    • Ensure that alerts provide actionable information and avoid alert fatigue by fine-tuning alert thresholds and focusing on the most critical metrics.

4. Utilize SQL Server Extended Events for Detailed Diagnostics

SQL Server Extended Events offers a lightweight, customizable framework for tracking detailed performance data. Extended Events can capture detailed information on query execution, deadlocks, wait times, and system-level events without a significant performance impact.

  • Best Practice:
    • Create custom Extended Events sessions to capture specific performance issues, like long-running queries or high wait times.
    • Use Extended Events to monitor deadlocks and capture the SQL text, session ID, and resources involved for troubleshooting.
    • Archive Extended Events data for historical analysis, as this information is valuable for identifying recurring issues or performance trends.

5. Implement Database Monitoring Tools

In addition to built-in tools, various third-party solutions (e.g., SolarWinds, Redgate SQL Monitor, and Idera SQL Diagnostic Manager) provide advanced monitoring and alerting capabilities for SQL Server. These tools often offer features like dashboards, reporting, automated analysis, and recommendations.

  • Best Practice:
    • Choose a tool that provides real-time monitoring, historical trend analysis, and robust alerting based on your organization’s needs and budget.
    • Look for monitoring solutions that include visualizations for easy diagnosis of issues like query bottlenecks or resource contention.
    • Use third-party tools that support customizable alerts and integrate with incident management systems to streamline response workflows.

6. Monitor Query Performance

Query performance is one of the most significant factors affecting SQL Server’s efficiency. Using monitoring to detect slow or inefficient queries helps with timely tuning and optimization.

  • Best Practice:
    • Regularly monitor query execution times, CPU and I/O usage for top-running queries, and identify queries that are consistently resource-intensive.
    • Use Query Store in SQL Server to track query plans and execution statistics, making it easier to identify performance regressions after schema changes or updates.
    • Set alerts for queries running above a specific threshold or those with high execution counts that may indicate inefficiencies.

7. Monitor Tempdb Usage

Since tempdb is frequently used for temporary storage, sorting, and intermediate query results, high tempdb usage or contention can degrade performance. Monitoring tempdb ensures that it has adequate space and can handle temporary object creation demands.

  • Best Practice:
    • Monitor tempdb space usage using DMVs like sys.dm_db_task_space_usage and sys.dm_db_session_space_usage.
    • Configure alerts for tempdb growth or high utilization, which could indicate inefficient query processing or excessive use of temporary tables.
    • Use monitoring to detect contention on tempdb, especially on system pages like PFS, GAM, and SGAM, and address it by adding more data files.

8. Create Custom Alerts for Resource-Specific Metrics

Different applications and workloads may have unique performance demands, so SQL Server supports custom alerts tailored to your specific environment. Custom alerts allow more granular monitoring of resource-specific metrics, such as long lock times or unusually high I/O operations.

  • Best Practice:
    • Set custom alerts for lock escalation, deadlocks, and blocking sessions that exceed specified thresholds.
    • Create alerts for excessive logins, failed login attempts, or unusual access patterns for security monitoring.
    • Use custom alerts for sudden changes in query execution plans, which may indicate suboptimal plan choices or regressions due to outdated statistics.

9. Enable Automated Responses for Critical Alerts

Responding to alerts manually can be time-consuming, so SQL Server Agent allows you to define responses to specific alerts. Automated responses can include restarting services, running scripts, or adjusting resources temporarily to prevent further degradation.

  • Best Practice:
    • For critical alerts, configure automated responses such as restarting SQL Server services, clearing cache, or scaling up cloud resources if available.
    • Set automated scripts to collect additional diagnostics when an alert is triggered, helping capture valuable data for post-incident analysis.
    • Use escalation protocols to ensure that critical alerts that require manual intervention are immediately directed to the right team members.

10. Log Monitoring for Long-Term Insights

Long-term log monitoring is useful for understanding performance trends, identifying recurring issues, and tracking the impact of changes over time. SQL Server’s Error Log, Windows Event Log, and system_health Extended Events session are valuable sources of diagnostic information.

  • Best Practice:
    • Regularly review SQL Server Error Logs for messages related to I/O warnings, login failures, and deadlocks, and establish alerts for critical log entries.
    • Monitor Windows Event Logs for system-level alerts related to hardware failures, memory issues, or networking problems that could affect SQL Server performance.
    • Use log aggregation tools like Elasticsearch, Splunk, or Azure Monitor to centralize and analyze logs, allowing for faster identification of trends and potential issues.

Conclusion

Proactive monitoring and alerting ensure that SQL Server remains resilient, responsive, and optimized for performance over time. By defining KPIs, setting up automated alerts, monitoring query and resource usage, and utilizing both built-in and third-party tools, you can identify performance issues early and take corrective actions before they impact end-users. Implementing a structured monitoring and alerting strategy is essential for long-term SQL Server performance optimization and helps your team stay one step ahead of potential bottlenecks.

Performance Tuning Series – Regular Maintenance

SQL Server performance isn’t just about the initial setup or database design—it requires continuous maintenance to ensure it runs smoothly over time. Neglecting regular maintenance can lead to fragmentation, slow queries, data integrity issues, and ultimately, downtime. By implementing a comprehensive maintenance strategy, you can ensure that your SQL Server databases remain optimized and healthy, allowing them to perform efficiently even as workloads and data volumes grow.

Why Regular Maintenance is Critical

Over time, SQL Server databases accumulate various inefficiencies that can degrade performance. These include fragmented indexes, outdated statistics, growing transaction logs, and unused or bloated data. Regular maintenance tasks help to mitigate these issues, ensuring that SQL Server can continue to execute queries quickly, handle transactions efficiently, and maintain data integrity.

Key Regular Maintenance Tasks for SQL Server

1. Index Rebuilding and Reorganization

Indexes can become fragmented over time as data is inserted, updated, or deleted. Fragmentation occurs when the logical order of pages in an index no longer matches the physical order on disk. This leads to slower reads, as SQL Server must perform additional I/O to retrieve scattered data.

  • Rebuild Indexes: Index rebuilds recreate the index from scratch, removing fragmentation and improving query performance. This process locks the table, so it should be scheduled during periods of low activity.
  • Reorganize Indexes: Reorganizing indexes is a less intrusive process that defragments them without locking the table. This can be done during regular operations but is less effective than a full rebuild.

Best Practice:

  • Schedule regular index maintenance based on the level of fragmentation. Use SQL Server’s sys.dm_db_index_physical_stats DMV to check fragmentation levels:
    • 0-10% fragmentation: No action needed.
    • 10-30% fragmentation: Use index reorganization.
    • Above 30% fragmentation: Perform a full index rebuild.
  • Automate index maintenance using SQL Server Agent jobs or a dedicated maintenance tool to ensure this task is performed regularly without manual intervention.

2. Update Statistics

SQL Server uses statistics to estimate the distribution of data values in a table, which helps the query optimizer choose the most efficient execution plan. As data is modified, these statistics can become outdated, leading to suboptimal query plans and slower performance.

  • Best Practice:
    • Regularly update statistics on your tables and indexes to ensure that the query optimizer has the most accurate information. Use the UPDATE STATISTICS command or enable SQL Server’s auto-update statistics feature.
    • For large tables, use sampled statistics to balance performance with accuracy. Full scans of very large tables can be resource-intensive.
    • If queries slow down unexpectedly, manually update statistics to resolve potential performance issues caused by outdated statistics.

3. Backup and Recovery Management

Regular backups are essential for data protection and business continuity. However, improper backup strategies can lead to bloated transaction logs, excessive disk usage, and even performance degradation during peak times.

  • Best Practice:
    • Implement a full backup strategy based on your business’s recovery point objectives (RPO) and recovery time objectives (RTO). Schedule full backups regularly (e.g., daily) and transaction log backups more frequently (e.g., every 15-30 minutes) for critical databases.
    • Use differential backups between full backups to reduce the load on storage and improve recovery times.
    • Regularly test your backups by restoring them to a separate environment to ensure that they can be successfully recovered when needed.
    • Ensure that backup schedules avoid peak activity times to prevent any impact on performance.

4. Transaction Log Management

SQL Server’s transaction log records every modification made to the database. If not properly managed, the transaction log can grow excessively large, consuming valuable disk space and degrading performance.

  • Best Practice:
    • Use the Full Recovery Model for critical databases to ensure point-in-time recovery, but regularly back up the transaction logs to prevent them from growing too large.
    • For less critical databases or databases that don’t require point-in-time recovery, consider using the Simple Recovery Model, which automatically truncates the transaction log after each checkpoint.
    • Monitor transaction log size and schedule log backups frequently to avoid excessive growth. Use the sys.dm_db_log_space_usage DMV to monitor log space consumption.

5. Integrity Checks (DBCC CHECKDB)

Database corruption can occur for various reasons, such as hardware failures or improper shutdowns. SQL Server provides the DBCC CHECKDB command to detect and repair corruption in your databases.

  • Best Practice:
    • Run DBCC CHECKDB regularly to ensure data integrity. This process checks for physical and logical corruption in database files.
    • Schedule DBCC CHECKDB during off-peak hours to avoid performance impacts, as this operation can be resource-intensive.
    • If DBCC CHECKDB identifies corruption, address the issue immediately. Use repair options like REPAIR_ALLOW_DATA_LOSS as a last resort, and restore from backups when possible.

6. Tempdb Maintenance

Tempdb is a shared system database that is heavily used by SQL Server for temporary objects, intermediate query results, and sorting. Over time, tempdb can become a performance bottleneck if it is not properly managed.

  • Best Practice:
    • Ensure that tempdb has multiple data files, especially in high-concurrency environments. Best practice is to configure one data file per logical CPU core (up to 8 cores), which helps reduce contention on system pages (like PFS, GAM, and SGAM).
    • Place tempdb on fast storage (preferably SSD or NVMe) to handle its high I/O workload.
    • Regularly monitor tempdb space usage to avoid running out of space, which can lead to system crashes. Use sys.dm_db_task_space_usage and sys.dm_db_session_space_usage to track space consumption.

7. Cleanup of Unused or Outdated Data

Over time, databases may accumulate unused data, which can increase table sizes and slow down queries. Regularly cleaning up obsolete data ensures your database remains efficient.

  • Best Practice:
    • Implement a data retention policy that defines how long data should be kept before being archived or deleted. This policy should reflect business requirements while keeping database sizes manageable.
    • Periodically archive old data that is not frequently accessed into separate databases or storage systems.
    • Use automated scripts to clean up old or unused records, freeing up space and reducing index bloat.

8. Monitor and Tune Performance

SQL Server provides several tools for monitoring performance, including Dynamic Management Views (DMVs) and Extended Events. Regular monitoring can help identify performance bottlenecks before they affect end-users.

  • Best Practice:
    • Regularly monitor key performance metrics like CPU usage, memory usage, disk I/O, and query execution times to ensure the system is operating within optimal thresholds.
    • Use SQL Server Profiler or Extended Events to capture detailed information about query performance and diagnose slow-running queries.
    • Leverage DMV’s to analyze query patterns and suggest optimizations like new indexes.
    • Continuously review and tune your queries, indexes, and database schema based on real-time performance data.

9. Automating Maintenance Tasks

Manually managing routine maintenance tasks can be time-consuming and prone to error. SQL Server provides built-in automation tools, such as SQL Server Agent, to schedule and manage maintenance operations.

  • Best Practice:
    • Set up automated maintenance jobs for tasks like index rebuilding, statistics updates, transaction log backups, and integrity checks. This ensures that these critical operations are performed consistently and without manual intervention.
    • Regularly review and adjust job schedules to avoid conflicts during peak business hours.
    • Use maintenance plans in SQL Server Management Studio (SSMS) or third-party tools for more advanced scheduling and management of maintenance tasks.

Conclusion

Regular maintenance is essential for keeping your SQL Server database healthy and optimized for performance. Tasks like index rebuilding, updating statistics, managing transaction logs, performing integrity checks, and cleaning up obsolete data all contribute to the overall efficiency and reliability of your SQL Server environment. By automating maintenance operations, monitoring key performance metrics, and regularly tuning the system, you can ensure that your SQL Server databases continue to deliver optimal performance as your workload and data volumes grow.

Capturing Deadlocks with Extended Events

I’ve been noticing a lot of deadlocks on my server. What’s the best way to track down the queries so I can fix the problem?

There are a few different ways that you can capture deadlock information. You can setup a trace flag (1222) to write the deadlock information to the error log, setup a Profiler trace to capture the deadlock graph, or setup an Extended Event to capture all sorts of information.

I’m going to focus on setting up an Extended Event in this post since MS continues to say Profiler will not be released in future versions. Extended Events are the future so why not start using them now?

In SSMS, drill down to Management, Extended Events. Right click on Sessions and click New Session Wizard:

Deadlocks with Extended Events 1

Click next on the Introduction screen and give the Session a name. I’m going to name this session Deadlocks:

Deadlocks with Extended Events 2

Click next. On the Choose Template screen you can choose a predefined template (like Profiler) or you can create your own events by choosing “Do not use a template”. For this post, let’s create our own:

Deadlocks with Extended Events 3

Click next and you’ll see hundreds of events (like Profiler). We only want to capture deadlock data so let’s scroll down to the very bottom and choose xml_deadlock_report. Click on the event and click the right arrow to move it into the Selected Events box:

Deadlocks with Extended Events 4

You can choose other events if needed, but for the simplicity of this post I’m just going to use this one. Click next. The Capture Global Fields page allows us to select what fields we want to capture. These are unique to each event selected. For this example, I’ll choose the following fields:

  • Callstack
  • Client_app_name
  • Client_hostname
  • Database_id
  • Database_name
  • Plan_handle
  • Process_id
  • Sql_text
  • Transaction_id
  • Transaction_sequence

Deadlocks with Extended Events 5

Click next. On this page you can apply filters if needed. I’ll setup a filter so that I only capture data from the RollTide database. There are hundreds of different filters that can be configured so that you don’t pull back data that is not needed:

Deadlocks with Extended Events 6

Click next to the Session Data Storage page. This page allows you to save data to a file or work with only the most recent data. I don’t want to keep thousands upon thousands of events so I’ll choose “Work with only the most recent data”

Deadlocks with Extended Events 7

The next page summarizes all the options we have selected. You can also script this session if you need to create it on other servers or save it for later. Click Finish to create the new session.

The last page allows you to start the session immediately and watch live data. For this post, I’ll choose both:

Deadlocks with Extended Events 8

You should see the new session under Extended Events and the Live Data tab should appear:

Deadlocks with Extended Events 9

Deadlocks with Extended Events 10

Once a deadlock occurs it should show the deadlock in the Live Data window:

Deadlocks with Extended Events 11

This view shows all of the fields we selected including the XML report. If you click on the Deadlock tab, you’ll see the graph:

Deadlocks with Extended Events 12

You can also use this query to see detailed information including the Deadlock graph and Event XML

SELECT
DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), CURRENT_TIMESTAMP), DeadlockEventXML.value('(event/@timestamp)[1]', 'datetime2')) AS [EventTime],
DeadlockEventXML.value('(//process[@id[//victim-list/victimProcess[1]/@id]]/@hostname)[1]', 'nvarchar(max)') AS HostName,
DeadlockEventXML.value('(//process[@id[//victim-list/victimProcess[1]/@id]]/@clientapp)[1]', 'nvarchar(max)') AS ClientApp,
DB_NAME(DeadlockEventXML.value('(//process[@id[//victim-list/victimProcess[1]/@id]]/@currentdb)[1]', 'nvarchar(max)')) AS [DatabaseName],
DeadlockEventXML.value('(//process[@id[//victim-list/victimProcess[1]/@id]]/@transactionname)[1]', 'nvarchar(max)') AS VictimTransactionName,
DeadlockEventXML.value('(//process[@id[//victim-list/victimProcess[1]/@id]]/@isolationlevel)[1]', 'nvarchar(max)') AS IsolationLevel,
DeadlockEventXML.query('(event/data[@name="xml_report"]/value/deadlock)[1]') AS DeadLockGraph,
DeadlockEventXML
FROM
(
SELECT
XEvent.query('.') AS DeadlockEventXML,
Data.TargetData
FROM
(
SELECT
CAST(target_data AS XML) AS TargetData
FROM sys.dm_xe_session_targets st
JOIN sys.dm_xe_sessions s ON s.address = st.event_session_address
WHERE s.name = 'Deadlocks' AND
st.target_name = 'ring_buffer'
) AS Data
CROSS APPLY TargetData.nodes('RingBufferTarget/event[@name="xml_deadlock_report"]') AS XEventData(XEvent)
) AS DeadlockInfo 

Managing SQL Server Extended Events in Management Studio

SQL Server 2012 introduces a GUI in SQL Server Management Studio to create and manage extended events. Prior to the SQL Server 2012 integration, Extended Events could only be created using T-SQL. In this tip, I’ll show you step by step process on how to create a simple Extended Event in SQL Server 2012 using the new GUI in SQL Server Management Studio.

Creating an Extended Event has never been easier with SQL Server 2012. Open SSMS, and drilldown to Management, Extended Events, Sessions as shown in the image below.  By default, you should see an AlwaysOn_health and a system_health session already created. You will notice the AlwaysOn_health session is disabled and the system_health session is running. The system_health session collects system data that you can use to help troubleshoot performance issues. For the most part, SQL Server Extended Events use very little resources.

SQL Freelancer SQL Server Extended Events Management Studio SSMS
There are two ways to create a session. Right click on the Sessions folder and you can choose New Session or New Session Wizard. In this tip, we’ll step through using the wizard.

SQL Freelancer SQL Server Extended Events Management Studio SSMS

After clicking New Session Wizard, an Introduction window will appear that will give you a brief introduction. Click the “Next” button to continue.

The next window, Set Session Properties, is where you can specify the session name and whether or not you want the session to start on server start-up. In this tip, I’ll name the session DB Monitor and choose to start the event session at server start-up. Click the “Next” button to continue.

SQL Freelancer SQL Server Extended Events Management Studio SSMS
The next screen will allow us to choose a preconfigured template or create our own. If you’ve ever used SQL Server Profiler’s built in template, these function the same way. In this tip, we’ll create our own. Choose the “Do not use a template” option and click the “Next” button to continue.

SQL Freelancer SQL Server Extended Events Management Studio SSMS
The “Select Events To Capture” window is an important one. This is where we select the events we want to capture. For this example, I want to monitor when my DB goes offline and when it becomes available, so I’ll choose the events that relate to this: database_attached, database_created, database_detached, database_started, and database_stopped. Once you select the events from the “Event library” (on the left), click the right arrow to move them to the “Selected events” (on the right). Click the “Next” button to continue.

SQL Freelancer SQL Server Extended Events Management Studio SSMS

Click here to view the rest of this post.