Is Your SQL Server Actually Healthy?

Most SQL Server environments have at least one serious problem, and the people responsible for them don't know it yet. That's not a criticism. It's a pattern we see consistently across Australian businesses, from mid-sized companies running a single SQL instance to enterprises managing dozens. In our experience reviewing hundreds of SQL Server environments, roughly 80% have configuration issues significant enough to affect performance, reliability, or recoverability.

The good news is that most of these problems are fixable. The first step is knowing what to look for.

Why SQL Server Health Gets Neglected

SQL Server is one of Microsoft's most complex software products. Even Microsoft acknowledges this. And yet the majority of SQL Server instances in production today are managed by generalist system administrators who are also juggling Active Directory, virtualisation, networking, security patching, and a dozen other responsibilities.

That's not a knock on sys admins. They're capable people dealing with an enormous scope of work. But SQL Server rewards specialisation. The difference between a well-tuned instance and a poorly configured one isn't always visible until something breaks, and by then the cost, in downtime, data loss, or both, can be significant.

For businesses running high-transaction databases that support critical operations, a poorly performing SQL Server isn't just an IT inconvenience. It's a direct drag on productivity and revenue.

What Are the Warning Signs of an Unhealthy SQL Server?

Some symptoms are obvious. Users complaining about slow application load times, reports that take minutes instead of seconds, timeouts during peak periods. These are the visible signs that something is wrong underneath.

Other warning signs are less obvious but equally serious:

  • No defined Recovery Time Objective (RTO) or Recovery Point Objective (RPO). If your organisation has never formally answered "how long can we operate without this database?" and "how much data can we afford to lose?", your disaster recovery posture is incomplete. Microsoft recommends documenting these objectives as a baseline for any production SQL environment.
  • Backups that have never been tested. A backup that hasn't been restored is a backup you can't trust. We regularly encounter environments where backups have been running for months or years but the restore process has never been validated.
  • No SQL Server Agent alerts configured. When critical events occur inside SQL Server, such as corruption, failed jobs, or severe errors, the instance will log them internally. Without alerts configured to notify an operator, those events sit silently in the error log until the problem escalates into an outage.
  • Autogrowth events happening regularly. If your data or log files are growing in small increments on demand, you're generating unnecessary I/O overhead and fragmentation. This is a common sign that database file sizing and growth settings haven't been reviewed.
  • High wait statistics with no investigation. SQL Server tracks exactly what it's waiting on at any given moment. Consistently elevated waits on PAGEIOLATCH, CXPACKET, or LCK resources point to specific, diagnosable problems. If no one's looking at wait stats, no one's diagnosing the real bottlenecks.

Are Your Maintenance Plans Actually Doing the Job?

Maintenance plans are one of the most misunderstood areas of SQL Server management. Most environments have something in place, but "something" isn't the same as "correct".

A well-configured maintenance strategy covers several distinct areas. Regular full and transaction log backups with tested restore procedures. Index maintenance, meaning rebuilds or reorganisations based on actual fragmentation levels rather than a fixed schedule. Statistics updates to ensure the query optimiser has accurate data to work with. Consistency checks using DBCC CHECKDB to catch corruption before it becomes catastrophic.

The default maintenance plan wizard in SQL Server Management Studio will get you started, but it has real limitations. It doesn't differentiate between index rebuild and reorganise thresholds. It doesn't handle Ola Hallengren-style adaptive maintenance, which is the industry standard for production environments. And it won't alert you intelligently when something goes wrong.

A poorly designed maintenance plan can actually make things worse. Running full index rebuilds on every index regardless of fragmentation level, for example, generates unnecessary transaction log growth and locks resources during business hours.

What Hardware Bottlenecks Should You Be Watching?

Hardware bottlenecks in SQL Server environments are real and measurable. The most common ones we diagnose fall into three categories.

Storage I/O is the most frequent culprit. SQL Server is extremely sensitive to disk latency. Read latency above 20ms on data files or write latency above 2ms on log files are generally considered problematic thresholds. If your SQL Server is sitting on a shared SAN or a slow storage tier, those numbers will tell the story.

Memory pressure is the second common issue. SQL Server will use as much memory as you allow it to, and by default the maximum server memory setting is uncapped, which means SQL Server can consume all available RAM on the host and starve the operating system. Conversely, if max server memory is set too low, you'll see excessive disk reads as SQL Server constantly pulls data from storage instead of serving it from the buffer pool.

CPU contention is less common but worth monitoring, particularly in virtualised environments where vCPUs may be overcommitted at the hypervisor level. Parallelism misconfiguration, specifically the Max Degree of Parallelism (MAXDOP) and Cost Threshold for Parallelism settings, can also cause unnecessary CPU pressure on high-concurrency systems.

Are Your SQL Server Alerts Configured?

This one still surprises us. A significant number of production SQL Server instances we review have no SQL Agent alerts configured at all.

SQL Server can generate alerts for a wide range of conditions: severity 19 through 25 errors (which represent serious system-level problems), specific error numbers like 823, 824, and 825 (which indicate I/O errors and potential corruption), and custom performance condition alerts for things like buffer cache hit ratio dropping below acceptable thresholds.

Without these alerts, your first notification of a serious problem might be a phone call from a user telling you the application is down. With a properly configured alert system, you get notified the moment SQL Server detects something worth your attention, often hours or days before it becomes an outage.

Setting up a basic but effective alert structure takes a few hours of work. It's one of the highest-value, lowest-cost improvements you can make to any SQL Server environment.

Key Takeaways

  • Around 80% of SQL Server environments we review have configuration issues affecting performance, reliability, or recoverability. Most are caused by gaps in specialised knowledge, not negligence.
  • The absence of defined RTO and RPO objectives, untested backups, and missing SQL Agent alerts are among the most common and most dangerous gaps in SQL Server health.
  • Maintenance plans need to cover backups, index maintenance, statistics updates, and consistency checks. Default wizard-generated plans often fall short of what production environments require.
  • Hardware bottlenecks in storage I/O, memory configuration, and CPU parallelism settings are measurable and fixable once you know where to look.
  • A proactive SQL Server health check will surface these issues before they cause downtime, not after.

Getting a Clear Picture of Your SQL Server Health

If you're not certain whether your SQL Server environment is properly configured, that uncertainty is itself a risk worth addressing. A structured SQL Server health check will give you a clear, documented view of where your configuration stands, what's working, what needs attention, and what's putting your business at risk.

DBA Services provides SQL Server health checks and ongoing managed database administration for Australian businesses. Our assessments cover configuration, maintenance, backup and recovery, security, and performance, and they're delivered by DBAs who work with SQL Server environments every day. If your SQL Server health is overdue for a proper review, get in touch with our team.