Skip to content

Search Param UrlLookup and TypeLookup mismatch fix#5386

Open
jestradaMS wants to merge 10 commits intomainfrom
users/jestrada/searchparamlookupobjectfix
Open

Search Param UrlLookup and TypeLookup mismatch fix#5386
jestradaMS wants to merge 10 commits intomainfrom
users/jestrada/searchparamlookupobjectfix

Conversation

@jestradaMS
Copy link
Contributor

@jestradaMS jestradaMS commented Feb 12, 2026

Description

This pull request introduces several important improvements and bug fixes, primarily focused on search parameter consistency and configuration updates for database throughput and autoscaling. The most significant changes ensure that the same SearchParameterInfo instance is used throughout the codebase, preventing subtle bugs due to race conditions. Additionally, the pull request updates database autoscaling settings and increases throughput configurations for Cosmos DB and SQL, as well as adjusts test settings for better reliability.

Search parameter consistency and bug fixes:

  • Ensured atomic creation and retrieval of SearchParameterInfo objects by switching to ConcurrentDictionary and using GetOrAdd, preventing race conditions and guaranteeing that UrlLookup and TypeLookup always reference the same instance. [1] [2] [3]
  • Updated the BuildSearchParameterDefinition method and its recursive calls to use the shared uriDictionary, ensuring object instance consistency across search parameter lookups. [1] [2] [3] [4] [5]

Database configuration and autoscaling improvements:

  • Increased the initial Cosmos DB collection throughput from 1500 to 5000 in integration tests, and clarified that autoscale is now set in the ARM template (max 10,000 RU). [1] [2]
  • Added explicit autoscale settings for Cosmos DB in the ARM deployment template, setting maxThroughput to 10,000.
  • Increased the default SQL SKU capacity for non-Hyperscale tiers from 200 to 800 in the ARM deployment template.

Test reliability improvements:

  • Re-enabled a previously skipped bulk delete test that was disabled due to search parameter cache synchronization issues, reflecting improved cache consistency.
  • Increased polling delay in reindex job completion tests from 1 second to 5 seconds to reduce flakiness and improve reliability.

Related issues

Addresses AB#183574

Testing

Describe how this change was tested.

FHIR Team Checklist

  • Update the title of the PR to be succinct and less than 65 characters
  • Add a milestone to the PR for the sprint that it is merged (i.e. add S47)
  • Tag the PR with the type of update: Bug, Build, Dependencies, Enhancement, New-Feature or Documentation
  • Tag the PR with Open source, Azure API for FHIR (CosmosDB or common code) or Azure Healthcare APIs (SQL or common code) to specify where this change is intended to be released.
  • Tag the PR with Schema Version backward compatible or Schema Version backward incompatible or Schema Version unchanged if this adds or updates Sql script which is/is not backward compatible with the code.
  • When changing or adding behavior, if your code modifies the system design or changes design assumptions, please create and include an ADR.
  • CI is green before merge Build Status
  • Review squash-merge requirements

Semver Change (docs)

Patch|Skip|Feature|Breaking (reason)

@jestradaMS jestradaMS requested a review from a team as a code owner February 12, 2026 23:20
@jestradaMS jestradaMS changed the title [DO NOT REVIEW] UrlLookup and TypeLookup mismatch fix [DO NOT REVIEW] Search Param UrlLookup and TypeLookup mismatch fix Feb 12, 2026
@jestradaMS jestradaMS changed the title [DO NOT REVIEW] Search Param UrlLookup and TypeLookup mismatch fix Search Param UrlLookup and TypeLookup mismatch fix Feb 12, 2026
@jestradaMS jestradaMS added this to the FY26\Q3\2Wk\2Wk17 milestone Feb 13, 2026
@jestradaMS jestradaMS added Bug-Reliability Reliability related bugs. Azure API for FHIR Label denotes that the issue or PR is relevant to the Azure API for FHIR Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs No-PaaS-breaking-change No-ADR ADR not needed labels Feb 13, 2026
@jestradaMS
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -357,30 +353,45 @@ private static HashSet<SearchParameterInfo> BuildSearchParameterDefinition(
var searchParameterDictionary = new ConcurrentDictionary<string, ConcurrentQueue<SearchParameterInfo>>();
foreach (SearchParameterInfo searchParam in results)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing searchParam name here instead of changing it in many places below

// URL http://hl7.org/fhir/SearchParameter/Resource-type with type Special,
// while ResourceTypeSearchParameter uses the same URL with type Token.
// Choosing the wrong type causes parser failures for _type queries.
SearchParameterInfo canonicalParam = searchParam;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using name that closer identifies what search param origin is. Like searchParamFromUriLookup

// while ResourceTypeSearchParameter uses the same URL with type Token.
// Choosing the wrong type causes parser failures for _type queries.
SearchParameterInfo canonicalParam = searchParam;
if (searchParam.Url != null &&
Copy link
Contributor

@SergeyGaluzo SergeyGaluzo Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is searchParam valid at this point? If so, why do we check that Uri is not null?

@SergeyGaluzo
Copy link
Contributor

SergeyGaluzo commented Feb 14, 2026

How about changing our lookup data structures such that it is impossible to get different instances of SearchParameterInfo (SPI) by design?
Imagine that TypeLookup does not have SPI at the bottom but only Uri string. In all the cases when we need to get SPI from TypeLookup we would just need to do extra lookup in UriLookup. There is no way get incorrect data in SPI's from TypeLookup and UriLookup because SPI is "stored" in a single place by definition.

@SergeyGaluzo
Copy link
Contributor

This PR solves atomicity of cache writes for UriLookup. We still don't maintain other 2 data structures (TypeLookup and hash lookup) atomically with UriLookup. It would be interesting to know the reasons for cache components to be out of sync. Is it racing writes or incorrect write workflow?. Please keep in mind that we most likely should remove racing writes and will keep all updates single directional and single threaded (from the database to cache) to have identical logic across all VMs.

}

await Task.Delay(1000);
await Task.Delay(5000);
Copy link
Contributor

@SergeyGaluzo SergeyGaluzo Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it bad to check every second?

$additionalProperties["SqlServer__DeleteAllDataOnStartup"] = "false"
$additionalProperties["SqlServer__AllowDatabaseCreation"] = "true"
$additionalProperties["CosmosDb__InitialDatabaseThroughput"] = 1500
# Cosmos DB autoscale is configured in the ARM template (10,000 RU max)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we effectively increased power 6x or 1500 setting did not take effect previously?

"azureContainerRegistryName": "[concat(substring(replace(variables('serviceName'), '-', ''), 0, min(11, length(replace(variables('serviceName'), '-', '')))), uniquestring(resourceGroup().id, variables('serviceName')))]",
"isSqlHyperscaleTier": "[equals(parameters('sqlDatabaseComputeTier'),'Hyperscale')]",
"sqlSkuCapacity": "[if(variables('isSqlHyperscaleTier'), 2, 200)]",
"sqlSkuCapacity": "[if(variables('isSqlHyperscaleTier'), 2, 800)]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like 4x increase in compute power. How was the lack of power manifested?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Azure API for FHIR Label denotes that the issue or PR is relevant to the Azure API for FHIR Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs Bug-Reliability Reliability related bugs. No-ADR ADR not needed No-PaaS-breaking-change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants