Skip to content

Log stats details for long running SQL queries#5376

Open
apurvabhaleMS wants to merge 17 commits intomainfrom
personal/abhale/log-info-long-running-queries
Open

Log stats details for long running SQL queries#5376
apurvabhaleMS wants to merge 17 commits intomainfrom
personal/abhale/log-info-long-running-queries

Conversation

@apurvabhaleMS
Copy link
Contributor

@apurvabhaleMS apurvabhaleMS commented Feb 6, 2026

Description

  1. Feature Flag for Long-Running Query Details
    • Added _longRunningQueryDetails (ProcessingFlag) to control whether Query Store logging is enabled at runtime via the Search.LongRunningQueryDetails.IsEnabled parameter in dbo.Parameters.
    • The flag defaults to enabled (true), so logging is active unless explicitly disabled.

  2. Configurable Threshold
    • Added LongRunningQueryDetailsThresholdId ("Search.LongRunningQueryDetails.Threshold") to allow the threshold to be configured via dbo.Parameters at runtime.
    • Added LongRunningThresholdMillisecondsDefault = 60000ms (1 minute) as the fallback when no database parameter exists.

  3. Execution Time Measurement
    • Added Stopwatch-based timing (executionStopwatch) around ExecuteReaderAsync in both SearchImpl and SearchIncludeImpl.
    • The elapsed time is compared against the threshold; if exceeded and the feature flag is enabled, LogQueryStoreByTextAsync is called.

  4. Query Store Stats Logging
    • StripQueryPreambleLines —Strips diagnostic and parameter-declaration preamble lines from a SQL command's text so the remaining query body can be used as a search key for Query Store lookups via LIKE.
    • LogQueryStoreByTextAsync — Queries sys.query_store_query_text joined to runtime stats DMVs.

  5. Error Handling
    • If the Query Store lookup itself fails with a SqlException, it's caught and logged as a warning (not rethrown), so it never impacts the original search operation.

Related issues

Addresses AB180196

Testing

Describe how this change was tested.

FHIR Team Checklist

  • Update the title of the PR to be succinct and less than 65 characters
  • Add a milestone to the PR for the sprint that it is merged (i.e. add S47)
  • Tag the PR with the type of update: Bug, Build, Dependencies, Enhancement, New-Feature or Documentation
  • Tag the PR with Open source, Azure API for FHIR (CosmosDB or common code) or Azure Healthcare APIs (SQL or common code) to specify where this change is intended to be released.
  • Tag the PR with Schema Version backward compatible or Schema Version backward incompatible or Schema Version unchanged if this adds or updates Sql script which is/is not backward compatible with the code.
  • When changing or adding behavior, if your code modifies the system design or changes design assumptions, please create and include an ADR.
  • CI is green before merge Build Status
  • Review squash-merge requirements

Semver Change (docs)

Patch|Skip|Feature|Breaking (reason)

@apurvabhaleMS apurvabhaleMS added the Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs label Feb 6, 2026
@apurvabhaleMS apurvabhaleMS added this to the FY26\Q3\2Wk\2Wk16 milestone Feb 6, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds best-effort logging of Query Store runtime stats for long-running SQL search queries in the SQL Server search layer, controlled via dbo.Parameters feature flag + threshold.

Changes:

  • Adds a runtime feature flag (Search.LongRunningQueryDetails.IsEnabled) and numeric threshold (Search.LongRunningQueryDetails.Threshold, default 60s).
  • Measures query execution time around ExecuteReaderAsync in both search and include-search paths.
  • Introduces query-text normalization and a Query Store lookup query to log recent runtime stats for matching statements.

apurvabhaleMS and others added 3 commits February 6, 2026 10:40
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@apurvabhaleMS apurvabhaleMS added Enhancement Enhancement on existing functionality. No-PaaS-breaking-change labels Feb 9, 2026
@apurvabhaleMS apurvabhaleMS requested a review from Copilot February 9, 2026 18:44
@apurvabhaleMS apurvabhaleMS added the No-ADR ADR not needed label Feb 9, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.

Comments suppressed due to low confidence (1)

src/Microsoft.Health.Fhir.SqlServer/Features/Search/SqlServerSearchService.cs:570

  • The stopwatch starts before ExecuteReaderAsync but stops after the entire reader processing completes, so the measured "execution" time includes result materialization/decompression in addition to SQL execution. If the intent is to detect long-running SQL, consider measuring only the DB call (e.g., around ExecuteReaderAsync / DB read loop) or clearly renaming this to reflect end-to-end processing time.
                        var st = DateTime.UtcNow;
                        var executionStopwatch = Stopwatch.StartNew();

                        try
                        {
                            using (var reader = await sqlCommand.ExecuteReaderAsync(CommandBehavior.SequentialAccess, cancellationToken))
                            {

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

@apurvabhaleMS apurvabhaleMS marked this pull request as ready for review February 11, 2026 19:14
@apurvabhaleMS apurvabhaleMS requested a review from a team as a code owner February 11, 2026 19:14
Comment on lines +849 to +856
catch (Exception ex)
{
_logger.LogWarning(
"Long-running SQL ({ElapsedMilliseconds}ms). Query: {QueryText}. Query Store lookup failed for long-running query.",
executionTimeSnapshot,
queryTextSnapshot);
_logger.LogDebug(ex, "Query Store lookup failed for long-running query.");
}

Check notice

Code scanning / CodeQL

Generic catch clause Note

Generic catch clause.

Copilot Autofix

AI 2 days ago

In general, fix this by replacing catch (Exception) with one or more specific exception types that represent expected/benign failures in this context, and by optionally handling OperationCanceledException separately when using cancellation tokens. Leave genuinely fatal exceptions (e.g., OutOfMemoryException, ThreadAbortException) unhandled, so they are not silently swallowed.

For this code block, the exception can only be thrown from the Task.Run delegate, specifically from LogQueryStoreByTextAsync or from the CancellationTokenSource. Since this is an auxiliary SQL logging path, the most appropriate exceptions to handle are SqlException, IOException, and OperationCanceledException:

  • OperationCanceledException: expected if the 2-second logging timeout is hit; we should treat this as benign and either ignore or log at very low severity.
  • SqlException: expected failures while querying Query Store.
  • IOException: possible failures when writing logs, or if LogQueryStoreByTextAsync interacts with I/O.

We then remove the outer catch (Exception ex) around the whole block, or at least narrow it similarly, because the only code inside is the conditional and the Task.Run scheduling, both of which should not throw in normal operation; any error there should be visible rather than silently debug-logged.

Concretely:

  • Inside the Task.Run delegate:
    • Replace catch (Exception ex) with:
      • catch (OperationCanceledException) that does nothing (or logs at Debug).
      • catch (SqlException ex) and catch (IOException ex) that log at the same Warning/Debug levels as before.
  • Outside (below line 859), either remove the outer try/catch or narrow it similarly. Since all heavy work is inside the delegate and already protected, and scheduling Task.Run should not fail in typical ways we want to hide, we can safely remove the outer try/catch. If you prefer to keep a safety net, narrow it to catch (OperationCanceledException) to cover any unexpected cancellation source usage.

No new methods are required. We may need using System.IO; and using Microsoft.Data.SqlClient;, but both are already present at the top of the file, so no import changes are necessary.

Suggested changeset 1
src/Microsoft.Health.Fhir.SqlServer/Features/Search/SqlServerSearchService.cs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/Microsoft.Health.Fhir.SqlServer/Features/Search/SqlServerSearchService.cs b/src/Microsoft.Health.Fhir.SqlServer/Features/Search/SqlServerSearchService.cs
--- a/src/Microsoft.Health.Fhir.SqlServer/Features/Search/SqlServerSearchService.cs
+++ b/src/Microsoft.Health.Fhir.SqlServer/Features/Search/SqlServerSearchService.cs
@@ -824,43 +824,48 @@
                         {
                             executionStopwatch.Stop();
 
-                            try
+                            if (executionStopwatch.ElapsedMilliseconds > _longRunningThreshold.GetValue(_sqlRetryService) && _longRunningQueryDetails.IsEnabled(_sqlRetryService))
                             {
-                                if (executionStopwatch.ElapsedMilliseconds > _longRunningThreshold.GetValue(_sqlRetryService) && _longRunningQueryDetails.IsEnabled(_sqlRetryService))
-                                {
-                                    // Capture the query text BEFORE the connection closes
-                                    string queryTextSnapshot = sqlCommand.CommandText;
-                                    long executionTimeSnapshot = executionStopwatch.ElapsedMilliseconds;
-                                    int timeoutSnapshot = (int)_sqlServerDataStoreConfiguration.CommandTimeout.TotalSeconds;
+                                // Capture the query text BEFORE the connection closes
+                                string queryTextSnapshot = sqlCommand.CommandText;
+                                long executionTimeSnapshot = executionStopwatch.ElapsedMilliseconds;
+                                int timeoutSnapshot = (int)_sqlServerDataStoreConfiguration.CommandTimeout.TotalSeconds;
 
-                                    // Fire-and-forget: Log query details without blocking the response
-                                    _ = Task.Run(async () =>
+                                // Fire-and-forget: Log query details without blocking the response
+                                _ = Task.Run(async () =>
+                                {
+                                    using var loggingCts = new CancellationTokenSource(TimeSpan.FromSeconds(2));
+                                    try
                                     {
-                                        using var loggingCts = new CancellationTokenSource(TimeSpan.FromSeconds(2));
-                                        try
-                                        {
-                                            await LogQueryStoreByTextAsync(
-                                                queryTextSnapshot,
-                                                _logger,
-                                                timeoutSnapshot,
-                                                executionTimeSnapshot,
-                                                loggingCts.Token);
-                                        }
-                                        catch (Exception ex)
-                                        {
-                                            _logger.LogWarning(
-                                                "Long-running SQL ({ElapsedMilliseconds}ms). Query: {QueryText}. Query Store lookup failed for long-running query.",
-                                                executionTimeSnapshot,
-                                                queryTextSnapshot);
-                                            _logger.LogDebug(ex, "Query Store lookup failed for long-running query.");
-                                        }
-                                    });
-                                }
+                                        await LogQueryStoreByTextAsync(
+                                            queryTextSnapshot,
+                                            _logger,
+                                            timeoutSnapshot,
+                                            executionTimeSnapshot,
+                                            loggingCts.Token);
+                                    }
+                                    catch (OperationCanceledException)
+                                    {
+                                        // Logging timed out; ignore as this is best-effort.
+                                    }
+                                    catch (SqlException ex)
+                                    {
+                                        _logger.LogWarning(
+                                            "Long-running SQL ({ElapsedMilliseconds}ms). Query: {QueryText}. Query Store lookup failed for long-running query.",
+                                            executionTimeSnapshot,
+                                            queryTextSnapshot);
+                                        _logger.LogDebug(ex, "Query Store lookup failed for long-running query.");
+                                    }
+                                    catch (IOException ex)
+                                    {
+                                        _logger.LogWarning(
+                                            "Long-running SQL ({ElapsedMilliseconds}ms). Query: {QueryText}. Query Store lookup failed for long-running query due to I/O error.",
+                                            executionTimeSnapshot,
+                                            queryTextSnapshot);
+                                        _logger.LogDebug(ex, "Query Store lookup failed for long-running query due to I/O error.");
+                                    }
+                                });
                             }
-                            catch (Exception ex)
-                            {
-                                _logger.LogDebug(ex, "Failed to initiate long-running SQL query logging.");
-                            }
                         }
                     }
                 },
EOF
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +2009 to +2016
catch (Exception ex)
{
_logger.LogWarning(
"Long-running SQL ({ElapsedMilliseconds}ms). Query: {QueryText}. Query Store lookup failed for long-running query.",
executionTimeSnapshot,
queryTextSnapshot);
_logger.LogDebug(ex, "Query Store lookup failed for long-running query.");
}

Check notice

Code scanning / CodeQL

Generic catch clause Note

Generic catch clause.

Copilot Autofix

AI 2 days ago

In general, the fix is to replace broad catch (Exception) clauses with more specific exception types that represent expected, recoverable failures. Only those exceptions should be handled and suppressed; others should be allowed to propagate so they can be diagnosed and fixed.

Here, the outer try covers logic that checks elapsed time and, if necessary, schedules a background task to log query details. The main foreseeable exceptions are:

  • SqlException from interacting with SQL-related objects.
  • OperationCanceledException in case of cancellation (though the cancellation token here is internal, this is still benign to ignore in this logging context).

We should change the outer catch (Exception ex) at lines 2020–2022 to two specific catch blocks: one for SqlException and one for OperationCanceledException. Both should keep the existing debug logging behavior (same message and exception variable), so functionality is preserved: non-critical failures in logging still don't affect the main operation, but unexpected exceptions (like NullReferenceException) will now propagate instead of being silently swallowed.

Concretely, within src/Microsoft.Health.Fhir.SqlServer/Features/Search/SqlServerSearchService.cs, in the area around lines 1987–2023, we will:

  • Replace the single catch (Exception ex) with:
    • catch (SqlException ex)
    • catch (OperationCanceledException ex)
  • Keep the logging calls the same inside each catch.

No new imports are required; SqlException is already imported from Microsoft.Data.SqlClient, and OperationCanceledException is available via System.

Suggested changeset 1
src/Microsoft.Health.Fhir.SqlServer/Features/Search/SqlServerSearchService.cs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/Microsoft.Health.Fhir.SqlServer/Features/Search/SqlServerSearchService.cs b/src/Microsoft.Health.Fhir.SqlServer/Features/Search/SqlServerSearchService.cs
--- a/src/Microsoft.Health.Fhir.SqlServer/Features/Search/SqlServerSearchService.cs
+++ b/src/Microsoft.Health.Fhir.SqlServer/Features/Search/SqlServerSearchService.cs
@@ -2017,10 +2017,14 @@
                                     });
                                 }
                             }
-                            catch (Exception ex)
+                            catch (SqlException ex)
                             {
                                 _logger.LogDebug(ex, "Failed to initiate long-running SQL query logging.");
                             }
+                            catch (OperationCanceledException ex)
+                            {
+                                _logger.LogDebug(ex, "Failed to initiate long-running SQL query logging.");
+                            }
                         }
                     }
                 },
EOF
@@ -2017,10 +2017,14 @@
});
}
}
catch (Exception ex)
catch (SqlException ex)
{
_logger.LogDebug(ex, "Failed to initiate long-running SQL query logging.");
}
catch (OperationCanceledException ex)
{
_logger.LogDebug(ex, "Failed to initiate long-running SQL query logging.");
}
}
}
},
Copilot is powered by AI and may make mistakes. Always verify output.
@apurvabhaleMS
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs Enhancement Enhancement on existing functionality. No-ADR ADR not needed No-PaaS-breaking-change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants