Skip to content

Bug: run stays stuck in running when all jobs of a fail-fast: false matrix fail #38333

Description

@bircni

Description

When every job of a fail-fast: false matrix ends in failure, the overall workflow run stays stuck in the running state indefinitely. It never transitions to failure.

Additionally:

  • Cancelling the workflow has no effect — it stays running.
  • Restarting Gitea or the runner does not clear it — the state is persisted.

Gitea Version: 1.27.0-rc0
Runner Version: 2.0

How to reproduce

Run a workflow with a fail-fast: false matrix and make every matrix leg fail (e.g. all the checks below failing):

name: pr

on:
  pull_request:
    branches:
      - main

jobs:
  checks:
    runs-on: container
    strategy:
      fail-fast: false
      matrix:
        check:
          - eslint
          - prettier
          - tsgo
          - vitest
          - build
    steps:
      - uses: actions/checkout@v7

      - uses: actions/setup-bun@v2
        with:
          bun-version: "1.3.14"

      - uses: actions/cache@v6
        with:
          path: |
            node_modules
            ~/.bun/install/cache
          key: bun-${{ runner.os }}-${{ hashFiles('bun.lock') }}

      - run: bun install

      - name: Lint
        if: matrix.check == 'eslint'
        run: bun lint

      - name: Format check
        if: matrix.check == 'prettier'
        run: bun fmt --check

      - name: Type check
        if: matrix.check == 'tsgo'
        run: bun tsgo -b

      - name: Test
        if: matrix.check == 'vitest'
        run: bun vitest --run

      - name: Build
        id: build
        if: matrix.check == 'build'
        run: bun bundle

Expected behavior

Once all matrix jobs have reported a terminal result, the run status should aggregate to failure.

Actual behavior

The run remains running forever and cannot be cancelled.

Analysis / likely cause

This appears to be server-side status aggregation, not the runner. act_runner only handles a single task per matrix leg and reports each leg's terminal result (failure) back via UpdateTask; the aggregation of individual job results into the overall run status lives in Gitea (AggregateJobStatus in models/actions). The evidence that the runner isn't at fault:

  • All legs reported failure, yet the run never leaves running.
  • Cancellation (a server-side operation) doesn't clear it.
  • Restarting the runner has no effect — the stuck state is persisted in the DB.

Likely an edge case in the aggregation where the "all jobs finished" / terminal transition isn't reached when none of the jobs succeeded (all failure). Originally reported at https://gitea.com/gitea/runner/issues/1072.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions