Optimize multicast stubs by MichalPetryka · Pull Request #130207 · dotnet/runtime

MichalPetryka · 2026-07-04T22:36:04Z

Rewrites multicast stubs to byref loops to remove bounds checks and covarianc helpers from every iteration.

This assumes the wrapper struct is the same size as a ref, is such assumption fine for the VM? @jkotas @MichalStrehovsky

MichalPetryka · 2026-07-04T22:40:34Z

@EgorBot -arm -amd --envvars DOTNET_JitDisasm:IL_STUB_MulticastDelegate_Invoke

using BenchmarkDotNet.Attributes;

public class MyBenchmarks
{
    public Action a;

    [GlobalSetup]
    public void Setup()
    {
        for (int i = 0; i < 10000; i++)
            a += new Action(() => {});
    }

    [Benchmark]
    public void Bench() => a();
}

MichalPetryka · 2026-07-04T23:33:33Z

Hmm the asm seems better but the arm perf is much worse, I assume it's cause the JIT is ordering blocks wrong which causes branch mispredictions.

EDIT: the branch order is also different than what I get locally, is the bot using any weird settings? @EgorBo

jkotas · 2026-07-05T01:20:51Z

This assumes the wrapper struct is the same size as a ref, is such assumption fine for the VM

These sort of Unsafe.As UB casts can confuse optimizations that depend on accurate type information. I think we should be rather fixing the few places where we still do them in CoreLib instead of introducing new ones.

the arm perf is much worse,

We have seen number of cases where unsafe code "optimizations" result in worse performance. It sounds like this is another one of those.

covariant helpers from every iteration.

What are the covariant helpers that this is eliminating?

for (int i = 0; i < 10000; i++)
a += new Action(() => {})

The typical multicast delegate has very few targets, and the targets are typically different. This is not very representative microbenchmark.

MichalPetryka · 2026-07-05T01:49:40Z

@EgorBot -arm -amd --envvars DOTNET_JitDisasm:IL_STUB_MulticastDelegate_Invoke

using BenchmarkDotNet.Attributes;

public class MyBenchmarks
{
    public Action a;

    [GlobalSetup]
    public void Setup()
    {
        for (int i = 0; i < 10000; i++)
            a += new Action(() => {});
    }

    [Benchmark]
    public void Bench() => a();
}

MichalPetryka · 2026-07-05T01:56:39Z

These sort of Unsafe.As UB casts can confuse optimizations that depend on accurate type information. I think we should be rather fixing the few places where we still do them in CoreLib instead of introducing new ones.

It's not UB here since the As is on the ref, not on the object. It's just layout reliant. If desired, I can swithc the code to fetch the sizeof of thee struct and use ldfld to avoid assuming the field is at offset 0.

We have seen number of cases where unsafe code "optimizations" result in worse performance. It sounds like this is another one of those.

I assume the bad perf is from the horrible layout from the JIT caused by the debugging helper check which is predicted to be hot. Does the code need to recheck this on every iteration?

What are the covariant helpers that this is eliminating?

I misread the loop as doing ldelema, it's just a bounds check, not a covariance helper.

The typical multicast delegate has very few targets, and the targets are typically different. This is not very representative microbenchmark.

The targets being different shouldnt matter much outside of CPU branch prediction here and it makes results more stable. This should also not scale badly with size, I assume this'd be faster even with just 2 delegates if the JIT used a proper layout.

The actual diff of the loop, excluding the debugging call is this:

huoyaoyuan · 2026-07-05T06:40:09Z

-#endif // DEBUGGING_SUPPORTED
+
+        ILCodeLabel *realLoopStart = pCode->NewCodeLabel();
+        pCode->EmitBEQ(realLoopStart);


This doesn't look optimal for branch predicting (forward conditional jump as hot path). The original logic was constructed for optimizing this.

Optimize multicast stubs

364e7eb

MichalPetryka requested a review from MichalStrehovsky as a code owner July 4, 2026 22:36

github-actions Bot added the area-VM-coreclr label Jul 4, 2026

EgorBot mentioned this pull request Jul 4, 2026

Benchmarks for dotnet/runtime#130207 (for @MichalPetryka) EgorBot/Benchmarks#284

Open

Avoid weird IL

9290fe1

EgorBot mentioned this pull request Jul 5, 2026

Benchmarks for dotnet/runtime#130207 (for @MichalPetryka) EgorBot/Benchmarks#285

Open

huoyaoyuan reviewed Jul 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize multicast stubs#130207

Optimize multicast stubs#130207
MichalPetryka wants to merge 2 commits into
dotnet:mainfrom
MichalPetryka:multicast-invoke

MichalPetryka commented Jul 4, 2026

Uh oh!

MichalPetryka commented Jul 4, 2026

Uh oh!

MichalPetryka commented Jul 4, 2026 •

edited

Loading

Uh oh!

jkotas commented Jul 5, 2026 •

edited

Loading

Uh oh!

MichalPetryka commented Jul 5, 2026

Uh oh!

MichalPetryka commented Jul 5, 2026

Uh oh!

huoyaoyuan Jul 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

MichalPetryka commented Jul 4, 2026

Uh oh!

MichalPetryka commented Jul 4, 2026

Uh oh!

MichalPetryka commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichalPetryka commented Jul 5, 2026

Uh oh!

MichalPetryka commented Jul 5, 2026

Uh oh!

huoyaoyuan Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MichalPetryka commented Jul 4, 2026 •

edited

Loading

jkotas commented Jul 5, 2026 •

edited

Loading

huoyaoyuan Jul 5, 2026 •

edited

Loading