It’s the perennial question: Why is my SQL server so slow? What was once a finely tuned system has become sluggish and unresponsive. Something has changed and now the server is struggling. But what is the cause? How can we find out where the hold-up is to even start working on a solution? Thankfully, SQL servers come with some pretty neat stats to help us out in the form of WAITS and QUEUES.
What are WAITS?
As your server runs processes, the operating system schedules these processes through an execution queue. Those that are running get a status of RUNNING. However, if the process lacks a resource then it becomes SUSPENDED until the resource is available, at which point it becomes RUNNABLE and then just waits for CPU time.
It is the SUSPENDED state that is most interesting. These resources are the waiting on a specific resource such as IO, memory, network or CPU. Importantly, SQL servers record this information in WAITS. These can then be accessed from the SQL Dynamic Management Views (DMVs) to give us information on what’s happening with the server.
Finding the WAITS
There are three DMVs that, when queried, will provide the necessary information. The primary aggregator for WAIT stats is Sys.dm_os_wait_stats which stores instance level wait information for all session since the last restart or since the wait stats were last cleared. By querying both Sys.dm_os_waiting_tasks and sys.dm_exec_requests together, you can also obtain detailed information at session id level of the wait types, wait times, queries that are being executed and objects that are participating.
Interpreting the WAITS
Armed with this new found information, how do we extract meaning? First off, remember that most waits will occur naturally due to things like background task. Seeing a wait is not a sign of a problem. Instead, it’s volume of waits that show us the way to the most likely blockages. Let’s quickly run through the most common WAIT types.
This WAIT occurs when a process voluntarily yields for another. More of these indicate high CPU usage that can be caused by CPU intensive or long running queries.
Latches are temporary links to in-memory pages. TempDB and Indexes that are in high demand are common causes of this WAIT.
Caused by queries being run over multiple CPUs. For a properly optimised OLTP application, the only reason for this to occur would be a missing index or WHERE clause.
If a query requests a page or data that isn’t buffered, then it must wait for it to be transferred from disk to buffer. If this keeps cropping up, then it might be due to insufficient memory or a failing disk.
ASYNC_IO_COMPLETION & IO_COMPLETION
Similar to the above, this wait indicates IO or disk subsystem problems, with a process waiting on read/write.
Seen mostly on heavy transactional databases, this WAIT comes from data in the log cache being written to disk. Another IO related WAIT.
This WAIT corresponds to a task trying to lock a resource, commonly caused by other tasks already having the resource locked due to large transactions and improper isolation/partitioning.
Armed with this sample of WAITS, you can begin to get a picture of system bottlenecks. Of course, there are many more that we haven’t covered, but these are some of the most commonly used to identify problems. If you want to know more or having a server that’s started to slow down, then you can get in touch with us at DBA Services.