RFC(ckb): Improve CI Workflows

,

Motivation

  • Avoid using two sets of CI, one for nervosnetwork/ckb and another for forks.
  • Allow only rerunning the failed jobs.
  • Use Linux self runners in the critical path. Bors can merge the PR once all jobs ran by Linux self runners have succeeded.

Conventions

Conventions help to make the workflow files consistent and clear.

  • Use snake_case to format workflow name, job ID, and variable name.
  • Name the workflow file as {workflow_name}.yaml. The extension name must be yaml. Make name: in the workflow file consistent with the file name.
  • Prefix the ID of the child workflow with its parent workflow ID.
  • Specify the OS version explicitly, e.g., ubuntu-18.04. Do not use *-latest to match GitHub runners.
  • Use self-hosted-{root_workflow_id}-{os_name}-{os_version} to match the self runners, such as self-hosted-ci-ubuntu-18.04.

Specification

Workflow ci.yaml

This workflow is a placeholder triggered by workflow_dispatch. Bors must wait for this workflow to merge PRs.

Once triggered, this workflow prints github.event.inputs.reason and github.event.sender.login, then it exists according to the value of the input variable conclusion.

The input reason lets repository owners give a reason to bypass the CI manually. The input conclusion can be success, and any other value is considered as failure.

on:
  workflow_dispatch:
    inputs:
      reason:
        description: 'Reason to pass CI'
        required: true
      conclusion:
        description: 'CI conclusion, allowed values: success or failure'
        required: true

Workflow ci_dispatch.yaml

This workflow is triggered by pushes and pull requests. It triggers all child workflows for CI using its own github.ref and the input runs_on.

The workflow should check the API check-suites to skip the child workflows which are running or have succeeded on the ref.

The API check-suites lists all check suites, find github action test suite in it and list all the checks in the suite.

Pay attention that the check name includes matrix flags, like Linters (macOS).

If any check failed, including checks on all platform, trigger ci.yaml with conclusion “failure” and list the failed checks as the reason.

If all the required child workflow has at least succeeded in one platform, trigger ci.yaml with the reason “via ci_dispatch.yaml” and the conclusion “success”.

The runs_on value is selected according to the following rule:

  • If the ref commit message has a line starting with runs-on: , use the remaining line content as runs_on.
  • Otherwise if there’s a secret CI_RUNS_ON, use its value as runs_on.
  • Otherwise if the workflow runs on a fork repository, or for a pull request to nervosnetwork/ckb but the author has no push permission, use ubuntu-18.04,macos-10.15,windows-2019.
  • Otherwise use self-hosted-ci-ubuntu-18.04,macos-10.15,windows-2019.

Skip CI

If the commit message contains “skip ci”, and the author has the push permission in the repository. The workflow ci_dispatch.yaml should trigger ci.yaml as success with the reason: “skip ci by @author”.

CI Child Workflows

All the child workflows are triggered via workflow_dispatch. They must accept the input variable runs_on:

on:
  workflow_dispatch:
    inputs:
      reason:
        description: 'OS Matrix to Run the Workflow, separated by comma.'
        required: true
        default: ubuntu-18.04

If the workflow only needs to be run in one OS, use the first one.

Add a job notify to trigger ci_dispatch.yaml using if: always() and input variable check_only: true.

Workflow ci_dispatch.yaml triggered by workflow_dispatch

If ci_dispatch is triggered by workflow_dispatch, and the input variable check_only is true, it only checks whether all triggers child workflows for CI via check-suites API.

If all child workflows on Ubuntu passed, trigger ci.yaml with the reason “Via ci_dispatch.yaml” and conclusion “success”. If any child workflow failed, trigger ci.yaml with conclusion “failure” and list the failed checks names in reason.

If check_only is not true, it behaves the same as triggered by push and pull request.

on:
  workflow_dispatch:
    inputs:
      check_only:
        description: 'Set to true to just check CI status, false to also trigger child workflows'
        required: true
        default: true

Workflow ci_alert.yaml

Triggered via event workflow_run with type completed.

If the source workflow failed and it is triggered by pushing into the branches listed below, the workflow sends the alert to our notification system, OpsGenie.

  • master
  • develop
  • staging
  • rc/*

Self Runners

We’ll run one runner in one host. Find a configuration which can build ckb without cache in 30 minutes.

We’ll start from 3 ubuntu runners.

Self runners must use a local directory, say /home/runner/ckb-target, as the target cache. The workflow must use /home/runner/ckb-target/${{ github.ref }}-${{ YYYY-mm-dd of this Monday }} as the cargo target directory. Self runners must allocate enough disk space to store the generated files in a week and auto clean up target directories in the last week and before via cronjob.

这一部分已经改成 All the child workflows are triggered via push/pull_request

We met many issues because of GitHub Actions limitations. We’ll update about how we improve CI after we successfully deployed the new workflows.