Empowering CI/CD with Agentic AI: Practical Examples Using Buildkite

DevOps

Explore how Buildkite's new agentic workflow components, powered by AI, enable smarter CI/CD. Discover practical examples for GitHub code reviews, PR build fixes, and Linear issue handling.

Following our previous discussion on how AI is reshaping CI, you're likely familiar with the new foundational elements we've integrated into the Buildkite platform. These are our agentic workflow components – composable primitives designed to equip platform teams with the necessary tools to embed AI-assisted processes directly into their CI/CD workflows.

This article demonstrates the practical application of these components through three straightforward examples where AI agents tackle common real-world challenges, such as:

  • Reviewing GitHub pull requests
  • Automatically fixing broken PR builds
  • Generating first-draft pull requests based on Linear issues

Each example is supported by a complete GitHub repository, including a Buildkite pipeline template ready for you to fork, configure, run, and easily customize for your team's specific requirements.

Recapping the Components

Before delving into the examples, let's briefly review the workflow components introduced previously, as they form the basis of the following demonstrations:

  • The Buildkite MCP server: Provides agents with granular access to the Buildkite REST API via specialized server tools.
  • Buildkite model providers: Facilitate direct connections to leading frontier models such as Anthropic's Claude Code, either using your own API credentials or a Buildkite-managed key.
  • Buildkite pipeline triggers: Inbound webhooks that allow pipelines to be invoked by any external HTTP event, with robust support for services like GitHub and Linear.
  • The Buildkite SDK: Enables dynamic composition and generation of pipeline definitions at runtime using popular programming languages like JavaScript, TypeScript, Python, Go, and Ruby.
  • A collection of plugins: Powered by LLMs like Claude, Codex, and Amazon Bedrock, these plugins simplify the annotation of CI jobs with comprehensive build summaries.

Combined with the extensive Buildkite platform, these building blocks empower you to create flexible, adaptive workflows that seamlessly integrate AI agents into your CI/CD process. Let's explore how.

Example 1: GitHub Code-Review Bot

Code review is critical for the delivery process, yet it's often time-consuming. With AI agents now generating more code, the human review workload is increasing significantly. A capable AI agent can be a valuable assistant for initial code reviews, saving substantial time and even detecting subtle bugs that human reviewers might overlook.

Get the Code

The code for this example is available on GitHub: buildkite-agentic-examples/github-code-review-bot

How it Works

This example configures a Buildkite pipeline to listen for GitHub PR events (specifically, labeled events) using a Buildkite pipeline trigger. Triggers are essentially unique webhook URLs linked to a specific pipeline, allowing you to add them to any GitHub repository to execute the pipeline in response to certain events. This particular pipeline runs an AI agent (Claude Code by default) that evaluates a pull request and submits a code review as a GitHub comment.

All three examples in this post follow a consistent pattern:

  1. A label is applied, triggering a Buildkite pipeline.
  2. The pipeline executes a script that parses and validates the webhook payload.
  3. The script appends a step to the running Buildkite pipeline to spawn an AI agent.
  4. The agent completes its assigned task.

The core logic resides in the handler script, scripts/handler.ts, a Node.js program written in TypeScript that leverages the Buildkite and GitHub SDKs to run Claude Code with a task-specific prompt.

import { execSync } from "child_process";
import { Pipeline } from "@buildkite/buildkite-sdk";
import { Octokit } from "octokit";

// ...
// Generate the pipeline with the Buildkite SDK.
function generateCodeReviewPipeline(webhookPullRequestUrl: string, agentBuildUrl: string): string {
    const pipeline = new Pipeline();
    const tokenArgs = [`PullRequestURL=${webhookPullRequestUrl}`, `AgentBuildURL=${agentBuildUrl}`];

    pipeline.addStep({
        label: "🏗️ Reviewing the code",
        commands: [...runAgent(tokenArgs)],
        plugins: {
            docker: {
                "image": "buildkite-agentic-example-tools:latest",
                "mount-checkout": false,
                "mount-buildkite-agent": true,
                "environment": [
                    //...
                    "TRIGGER_ON_LABEL",
                    "MODEL_PROVIDER",
                ],
            },
        },
        //...
    });

    return pipeline.toYAML();
}

async function main() {

    // Fetch the incoming payload from Buildkite.
    const event = buildkiteAgent("meta-data", "get", "buildkite:webhook").trim();
    const payload = JSON.parse(event);
    
    // ...
    // Exit unless the payload has a label matching the one we're listening for.
    const labelName = payload.label.name;
    if (labelName !== process.env.TRIGGER_ON_LABEL) {
        console.log(`Label is not '${process.env.TRIGGER_ON_LABEL}', exiting`);
        process.exit(0);
    }
    
    // ...
    // Generate and upload a new pipeline step to run the AI agent.
    const pipelineYaml = generateCodeReviewPipeline(pullRequestUrl);
    const uploadProcess = execSync("buildkite-agent pipeline upload", {
        input: pipelineYaml,
        encoding: "utf-8",
    });
}

main().catch(error => {
    console.error("Error:", error.message);
    process.exit(1);
});

This step runs Claude within a Docker container (using the Buildkite docker plugin), equipped only with the tools necessary for this task: Node.js, the GitHub CLI, the local version of the Buildkite MCP server, the Claude Code CLI, and required scripts, prompts, and environment variables. While running the agent in a container isn't strictly essential, it enhances isolation and safety compared to direct filesystem execution. (The container itself is built and tagged locally at runtime using an agent lifecycle hook. See .buildkite/hooks/post-checkout for details.)

Inside the container, Claude clones the PR's GitHub repository, checks out the PR branch, analyzes the changes, and posts a review comment back to the PR, annotating the Buildkite build using the MCP server's annotation tools as it progresses.

All settings, including the model provider and label, are configurable in .buildkite/pipeline.yml:

secrets:
  GITHUB_TOKEN: GITHUB_TOKEN
  BUILDKITE_API_TOKEN: API_TOKEN_BUILDKITE

env:
  GITHUB_CLI_VERSION: "2.83.0"
  BUILDKITE_MCP_SERVER_VERSION: "0.7.3"
  TRIGGER_ON_LABEL: "buildkite-review"
  MODEL_PROVIDER: "anthropic"

steps:
  - label: "⚙️ Generate the pipeline"
    command: |

      # Generate and upload the pipeline to handle the webhook.
      echo "--- 🎣 Run the webhook handler"
      npm install && npm run build
      node dist/handler

An important detail (often overlooked) is that the claude CLI isn't communicating directly with the Anthropic API in the usual manner. Instead, it utilizes a Buildkite-managed model-provider endpoint that proxies the Anthropic API. Examining scripts/claude.sh reveals that the two environment variables Claude uses to configure its backend are set via Buildkite environment variables:

#!/bin/bash

# ...
# Set up Buildkite Hosted Models
export ANTHROPIC_BASE_URL="$BUILDKITE_AGENT_ENDPOINT/ai/anthropic"
export ANTHROPIC_API_KEY="$BUILDKITE_AGENT_ACCESS_TOKEN"

# ...
echo "--- 🤖 Starting Claude Code"
echo "$prompt" | claude -p --mcp-config mcp.json

These values are automatically populated by Buildkite at runtime as pipeline environment variables. This provides a seamless way to use Claude—or any supported model provider—in your pipelines without needing to manually provide, expose, and manage your own Anthropic credentials. Refer to the model providers documentation for more details.

The following diagram illustrates the operative components and their interplay:

For complete setup instructions and configuration details, refer to the example's README.

Once operational (which typically takes only a few minutes, guided by the README), you'll have a fully functional code-review bot. You can invoke it by adding a buildkite-review label to any pull request, providing a reusable Buildkite pipeline and workflow adaptable to your team's needs.

Now, let's explore the next example.

Example 2: GitHub PR Build Fixer

It's a common frustration: a PR build fails, and the reason is elusive. Often, linters are the culprits, but their logs can be notoriously cryptic, offering little clarity. In such baffling scenarios (and more complex ones), an AI agent can be invaluable for diagnosing and rectifying the issue.

☝️ Me, when this happens.

Get the Code

The code for this example is available on GitHub: buildkite-agentic-examples/github-pr-build-fixer

How it Works

This example mirrors the high-level pattern of the previous one:

  1. You create a new Buildkite pipeline and trigger, then add the trigger as a GitHub webhook to the repositories you want Claude to fix.
  2. When a PR build fails, you apply the buildkite-fix label to it (this label is configurable in ./buildkite/pipeline.yml).
  3. GitHub invokes the webhook, triggering the pipeline, which evaluates the payload and adds a step to run Claude in a Docker container.
  4. Claude uses the Buildkite MCP server to query logs, identify the root cause, clone the repository, implement a fix, and push a new branch with the fix to GitHub.
  5. Claude creates a new PR (targeting the original, still-broken PR), waits for the Buildkite build to pass (iterating if necessary), and posts a summary comment on the original PR explaining the fix.
  6. If the PR looks good, you merge the fix and continue your work.

Here's an architectural overview:

Full setup and configuration details are provided in the README.

We also encourage you to review the prompt for this example, located at prompts/user.md. You may wish to adjust it to align with your team's specific standards or guidelines.

Now, let's move on to our final example.

Example 3: Linear Issue Handler

Linear has emerged as a highly popular tool for software project management, including its use within Buildkite. For those unfamiliar, Linear functions as an issue tracker, similar to Jira, Trello, or GitHub Issues. Issue trackers are central to how many teams organize their work, and developers often face a multitude of assigned issues. Some issues are straightforward, like dependency updates, while others are complex, demanding thorough analysis for accurate estimation. Fortunately, a capable LLM can often assist in handling both simple and complex issues, provided with a clear issue description and an appropriate prompt as a starting point.

Get the Code

The code for this example is available on GitHub: buildkite-agentic-examples/linear-issue-handler

How it Works

By now, the pattern should be clear: you set up a pipeline, assign it a trigger, configure the trigger to respond to an external event, and your chosen agent handles the rest. In this example, the objective is for Linear to initiate a Buildkite pipeline in response to an issue-label event.

Once the trigger is configured in your Linear project (refer to the README for instructions), applying the buildkite-analyze label to an issue will run scripts/handler.ts. This script extracts issue details, identifies the relevant GitHub organization and repository from the issue description, and generates a pipeline step to run Claude, similar to the other examples.

In this specific case, Claude also analyzes the codebase and makes a judgment call (much like a human) on the complexity level. For simpler issues, Claude proceeds to implement a fix, opens a PR, and comments back on the Linear issue with a summary and link. For more complex issues, it presents its findings, suggests potential approaches, and posts an analysis back to the Linear issue instead.

As with the other examples, all LLM-performed work occurs within a Docker container, which here includes the Linear CLI to provide Claude with controlled access to Linear's functionalities.

What Makes These Work

These practical examples underscore the core principles and foundational tools essential for teams integrating agentic processes into their CI workflows:

  • Composable primitives: Enabling flexible, adaptive CI workflows.
  • Built-in access to frontier models: With minimal configuration.
  • Pipeline triggers: Extending Buildkite pipelines to external services with ease.
  • Highly configurable, performance-focused MCP server: Granting agents fine-grained access to Buildkite resources.
  • Multi-language SDK: For intelligent and dynamic pipeline creation at runtime.

Individually, each component offers significant utility. Collectively, however, they unlock agentic workflows that can branch, adapt, and scale dynamically based on real-time discoveries by the agent – capabilities largely unattainable with static YAML-based pipelines.

Next Steps

The most effective way to understand these components is to experience them firsthand. You'll need a Buildkite account, a GitHub personal access token, a Buildkite API access token, and for the Linear example, a Linear API key. Each example can be set up in just a few minutes.

Follow the instructions in their respective READMEs to get started:

For more in-depth information on each component, consult their documentation:

Enjoy building! We're excited to see your creations. 👋