Skip to content

feat(anr): Profile main thread when ANR and report ANR profiles to Sentry#4899

Open
markushi wants to merge 35 commits intomainfrom
markushi/feat/anr-profiling
Open

feat(anr): Profile main thread when ANR and report ANR profiles to Sentry#4899
markushi wants to merge 35 commits intomainfrom
markushi/feat/anr-profiling

Conversation

@markushi
Copy link
Member

@markushi markushi commented Nov 12, 2025

📜 Description

Adds ANR (Application Not Responding) profiling integration that profiles the main thread when an ANR is detected and reports the captured profiles to Sentry.

Key Changes:

  • New AnrProfilingIntegration to capture profiles during ANR events
  • AnrV2Integration now takes care of matching and capturing the profile on the next start
  • If the captured ANR event only contains system frames, a static fingerprint will get set, effectively changing the grouping behavior to group all noisy ANRs into a single issue

💡 Motivation and Context

This feature enables better ANR diagnostics by capturing profiling data at the time of ANR detection, allowing developers to identify performance bottlenecks and problematic code paths causing application hangs.

Example event: https://sentry-sdks.sentry.io/issues/7229210096/events/4598ff6fcc0f402d8ecca615005e7f64/

💚 How did you test it?

  • Added tests

📝 Checklist

  • I added GH Issue ID & Linear ID
  • I added tests to verify the changes.
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • I updated the docs if needed.
  • I updated the wizard if needed.
  • Review from the native team if needed.
  • No breaking change or entry added to the changelog.
  • No breaking change for hybrid SDKs or communicated to hybrid SDKs.

🔮 Next steps

@github-actions
Copy link
Contributor

github-actions bot commented Nov 12, 2025

Messages
📖 Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against 2ff62db

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Nov 12, 2025

Performance metrics 🚀

  Plain With Sentry Diff
Startup time 362.67 ms 431.71 ms 69.04 ms
Size 1.58 MiB 2.29 MiB 725.26 KiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
9ea89e8 308.06 ms 358.16 ms 50.10 ms
d15471f 310.26 ms 377.04 ms 66.78 ms
d364ace 382.77 ms 443.21 ms 60.44 ms
806307f 357.85 ms 424.64 ms 66.79 ms
17a0955 372.53 ms 446.70 ms 74.17 ms
6edfca2 316.43 ms 398.90 ms 82.46 ms
d15471f 343.13 ms 361.47 ms 18.34 ms
319f256 317.53 ms 370.83 ms 53.29 ms
9fbb112 359.71 ms 421.85 ms 62.14 ms
539ca63 313.51 ms 355.43 ms 41.92 ms

App size

Revision Plain With Sentry Diff
9ea89e8 1.58 MiB 2.28 MiB 716.23 KiB
d15471f 1.58 MiB 2.13 MiB 559.54 KiB
d364ace 1.58 MiB 2.11 MiB 539.75 KiB
806307f 1.58 MiB 2.10 MiB 533.42 KiB
17a0955 1.58 MiB 2.10 MiB 533.20 KiB
6edfca2 1.58 MiB 2.13 MiB 559.07 KiB
d15471f 1.58 MiB 2.13 MiB 559.54 KiB
319f256 1.58 MiB 2.19 MiB 619.79 KiB
9fbb112 1.58 MiB 2.11 MiB 539.18 KiB
539ca63 1.58 MiB 2.12 MiB 551.41 KiB

Previous results on branch: markushi/feat/anr-profiling

Startup times

Revision Plain With Sentry Diff
00299fd 359.87 ms 424.85 ms 64.98 ms
fa76e86 274.32 ms 349.63 ms 75.31 ms
eb7143a 347.66 ms 408.54 ms 60.88 ms
c10e603 367.92 ms 393.50 ms 25.58 ms
4c0ffee 314.94 ms 377.79 ms 62.86 ms
83a9ec4 333.84 ms 390.30 ms 56.47 ms
31581b9 350.00 ms 420.63 ms 70.63 ms
ddbbe91 289.51 ms 359.74 ms 70.23 ms
fca8df8 326.79 ms 379.69 ms 52.90 ms
2cee1ab 318.29 ms 361.00 ms 42.71 ms

App size

Revision Plain With Sentry Diff
00299fd 1.58 MiB 2.29 MiB 723.50 KiB
fa76e86 1.58 MiB 2.29 MiB 724.06 KiB
eb7143a 1.58 MiB 2.29 MiB 724.12 KiB
c10e603 1.58 MiB 2.29 MiB 723.72 KiB
4c0ffee 1.58 MiB 2.29 MiB 723.67 KiB
83a9ec4 1.58 MiB 2.29 MiB 723.99 KiB
31581b9 1.58 MiB 2.19 MiB 624.94 KiB
ddbbe91 1.58 MiB 2.29 MiB 724.15 KiB
fca8df8 1.58 MiB 2.29 MiB 723.68 KiB
2cee1ab 1.58 MiB 2.29 MiB 723.68 KiB

@markushi markushi marked this pull request as draft December 3, 2025 07:19
@markushi
Copy link
Member Author

@sentry review

@markushi markushi marked this pull request as ready for review December 17, 2025 09:51
return;
}

if (options.isEnableAnrProfiling() && hasOnlySystemFrames(event)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering whether we should guard this behind isEnableAnrProfiling or actually just do it for all ANRs going forward? I think AEI also gives quite a lot of noise with just system frames, right?

Copy link
Member Author

@markushi markushi Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly - we should do this everywhere in the long run. I still would guard this right now - otherwise we'll have a breaking change in default behavior. I can create a follow up ticket for the next major.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've done this quite a lot in the past (breaking behaviour change that affects grouping) so I'm not opposed to doing this now and not wait for the next major 😅 If it improves things I think I'd rather do it sooner. But your call here (we could also wait to get some adoption and see how it performs before doing this for everyone)

options);
chunk.setSentryProfile(profile);

final SentryId profilerId = Sentry.getCurrentScopes().captureProfileChunk(chunk);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, but is there a way to send both the ANR event and the profile chunk in the same envelope?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claude says yes, this should be valid for the backend😅

Yes, this works. The client can send both in the same envelope.
During split_envelope() at relay-server/src/services/processor.rs:357-365, Relay splits them apart before any event-type routing:

  1. ProfileChunk items are extracted into their own ProcessingGroup::ProfileChunk envelope
  2. The remaining error event goes into ProcessingGroup::Error
  3. Both are processed independently through their respective pipelines

We'd need to attach the profile directly to the event (or via hint) and then do something similar as we do for e.g. attachments (

if (attachments != null) {
for (final Attachment attachment : attachments) {
final SentryEnvelopeItem attachmentItem =
SentryEnvelopeItem.fromAttachment(
options.getSerializer(),
options.getLogger(),
attachment,
options.getMaxAttachmentSize());
envelopeItems.add(attachmentItem);
}
}
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, or replay_recording, too.

I guess this will be better in terms of not sending one without the other, but I don't have a strong preference right now, we can probably create a followup issue if you don't feel like doing it now.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

try (final @NotNull ISentryLifecycleToken ignored = profileManagerLock.acquire()) {
final @Nullable AnrProfileManager p = profileManager;
if (p != null) {
p.close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I think this one does I/O under the hood right (raf.close())? We should probably move it out of the lock here (since we capture it in the p variable) and also offload to a bg thread perhaps?

oldThread.interrupt();
}

final @NotNull Thread profilingThread = new Thread(this, "AnrProfilingIntegration");
Copy link
Member

@romtsn romtsn Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we actually reuse the thread as opposed to creating a new one on every foreground? like checking the inForeground flag inside the run method in addition to isInterrupted?

also not sure if it makes any difference on android as to whether to make it isDaemon or not

// get main thread Handler so we can post messages
final Looper mainLooper = Looper.getMainLooper();
final Thread mainThread = mainLooper.getThread();
final Handler mainHandler = new Handler(mainLooper);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could memoize this and capture it in register if anr profiling is enabled? Would avoid allocating a new handler on every run

static final int MAX_NUM_STACKS = (int) (10_000 / POLLING_INTERVAL_MS);

private final AtomicBoolean enabled = new AtomicBoolean(true);
private final Runnable updater = () -> lastMainThreadExecutionTime = SystemClock.uptimeMillis();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized we use AndroidCourrentDateProvider instead of SystemClock.uptimeMillis directly, but I'm fine with using systemclock directly since you can override it in tests easily

Comment on lines +179 to +180
final long duration = SystemClock.uptimeMillis() - start;
if (logger.isEnabled(SentryLevel.DEBUG)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final long duration = SystemClock.uptimeMillis() - start;
if (logger.isEnabled(SentryLevel.DEBUG)) {
if (logger.isEnabled(SentryLevel.DEBUG)) {
final long duration = SystemClock.uptimeMillis() - start;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, does mainThread.getStackTrace() affects the main thread? or is this log just for debugging purposes?

if (cacheDirPath == null) {
throw new IllegalStateException("cacheDirPath is required for ANR profiling");
}
final @NotNull File currentFile =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, I think I found a way this will cause ANRs (similar to replay) - if close() is called from the main thread, we'll be holding the lock, awaiting on the operations here to finish, which can take a while due to I/O in getFileForRecording and AnrProfileManager.<init>.

I guess to make it non-blocking we'd still have to make QueueFile instantiation lazy and move getFileForRecording inside the lazy block, wdyt?

Copy link
Member

@romtsn romtsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work already! I believe there are some things to address but I can check once more after that


final Mechanism mechanism = new Mechanism();
mechanism.setType("ANR");
final ExceptionMechanismException error =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one turns to be unused, shall we pass it to getSentryExceptions?

Comment on lines +942 to 952
if (module != null && !AnrCulpritIdentifier.isSystemFrame(module)) {
return false;
}
}
}
}
}
return true;
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: ANR profile frames use a hardcoded list to detect system frames, ignoring user-configured inAppIncludes. This can lead to incorrect ANR grouping.
Severity: MEDIUM

Suggested Fix

Ensure profile-derived stack frames have the inApp flag set correctly by using the same logic as normal stack frames, which respects the user's inAppIncludes and inAppExcludes options. Alternatively, update hasOnlySystemFrames() to explicitly check profile-derived frames against the user's configuration instead of relying on the hardcoded list in AnrCulpritIdentifier.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location:
sentry-android-core/src/main/java/io/sentry/android/core/ApplicationExitInfoEventProcessor.java#L926-L952

Potential issue: The method `hasOnlySystemFrames()` uses inconsistent logic to identify
system frames for ANR profiles. For normal stack traces, it respects user-configured
`options.getInAppIncludes()` and `options.getInAppExcludes()`. However, for
profile-derived frames, the `inApp` flag is not set, causing the logic to fall back to
`AnrCulpritIdentifier.isSystemFrame()`, which uses a hardcoded list of system packages.
This discrepancy means if an application uses custom packages defined in
`inAppIncludes`, ANRs originating from that code will be incorrectly classified as
system-only ANRs, leading to improper event grouping in Sentry and potentially obscuring
important application-specific ANRs.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

value.serialize(dos);
dos.flush();
}
sink.flush();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toStream closes caller-owned stream then flushes it

Low Severity

The toStream converter wraps sink in a DataOutputStream try-with-resources, which closes both dos and the underlying sink when the block exits. The sink.flush() call afterward operates on an already-closed stream. This works only because the actual sink from FileObjectQueue is a ByteArrayOutputStream where close() is a no-op, but the pattern violates stream ownership conventions and the post-close flush is dead code.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants