AI Agents

Mar 2026

GitHub Agentic Workflows Explained

Marina Rivosecchimarketing

Markdown instead of YAML: what actually changes
The security model: why it's more than a disclaimer
Layer 1: Substrate isolation
Layer 2: Compilation-time constraints
Layer 3: Safe outputs and the vetting pipeline
Getting started: install, write, compile, run
Agent flexibility: Copilot vs. Claude vs. Codex
What you can realistically build
How it compares to existing automation patterns
Current limitations and what to watch
Should you adopt it now?

On February 13, 2026, GitHub shipped something that doesn't fit neatly into any existing box. GitHub Agentic Workflows, entering technical preview that day, let you automate repository tasks by describing what you want in plain Markdown, and then handing execution to an AI agent that figures out how to do it. No YAML pipeline steps. No deterministic shell scripts. Just intent, and an agent.

If you've followed GitHub's AI trajectory, the concept isn't a surprise. What's notable is that this is GitHub's first production feature that puts a coding agent directly inside the Actions runtime, with an explicit security model designed from day one for continuous, unattended execution. It's a collaboration between GitHub Next, Microsoft Research, and Azure Core Upstream, and the implementation is open source on GitHub under the MIT license.

This article walks through what GitHub Agentic Workflows actually are, how the security model works, how to get started, what you can build with them, and where the current preview is still rough. If you're on a platform team evaluating whether this is ready for production, this is the honest walkthrough.

Markdown instead of YAML: what actually changes

A standard GitHub Actions workflow is a YAML file. Every step is explicit: install dependencies, run tests, deploy. The workflow can branch on conditions, but each branch still consists of predetermined steps. The outcome is deterministic, given the same inputs, you get the same execution.

An agentic workflow is different. You place a .md file in .github/workflows/ alongside your normal YAML workflows. It has two parts: a YAML frontmatter block that declares the trigger, permissions, tools, and allowed write operations; and a Markdown body that describes the task in natural language. A coding agent reads those instructions at runtime and decides how to fulfill them.

Here's what a daily status report workflow looks like:

---
on:
  schedule: daily
permissions:
  contents: read
  issues: read
  pull-requests: read
safe-outputs:
  create-issue:
    title-prefix: "[repo status] "
    labels: [report]
tools:
  github:
---

# Daily Repo Status Report

Create a daily status report for maintainers. Include:

- Recent repository activity (issues, PRs, discussions, releases, code changes)
- Progress tracking, goal reminders and highlights
- Project status and recommendations
- Actionable next steps for maintainers

Keep it concise and link to the relevant issues/PRs.

Notice the structure: the frontmatter is precise and declarative, the trigger fires daily, the agent can read issues and PRs, it can only create issues (not push code, not open PRs, not comment on existing issues), and it uses the GitHub MCP server as its tool. The Markdown body is intentionally vague. You're describing a goal, not a procedure.

That's the fundamental shift. In a standard Actions workflow, you're programming. In an agentic workflow, you're delegating. The tradeoff is obvious: you get flexibility and expressiveness you can't easily encode in YAML, but you lose predictability. What the agent produces on Tuesday may differ from Monday's run even with the same repo state.

GitHub calls the use cases this enables Continuous AI: ongoing AI-driven repository work that runs alongside your CI/CD. Issue triage, documentation updates, CI failure investigation, test coverage, code simplification, compliance monitoring. These tasks share a common trait — they require reading and understanding context, not just running commands. Standard YAML workflows can't express them well. Agents can.

The security model: why it's more than a disclaimer

Putting an AI agent in your CI pipeline with access to your repository is a real security concern. The GitHub team didn't treat this lightly. The architecture has three explicit layers, each designed to contain failures from layers above it.

Layer 1: Substrate isolation

The agent runs in a Docker container with restricted egress. An Agent Workflow Firewall (AWF) uses iptables to route all HTTP/HTTPS traffic through a Squid proxy container, enforcing a configurable domain allowlist. The agent doesn't have direct internet access — only to domains you've explicitly permitted. By default, GitHub domains and standard package ecosystems (npm, PyPI) are allowed.

The agent's LLM API key never lives in the agent container. An API proxy holds it, and the agent's LLM traffic is routed through that proxy. So a prompt-injected agent that tries to extract credentials by inspecting environment variables or reading config files finds nothing useful. MCP servers run in separate containers too, accessible only through an MCP gateway.

Layer 2: Compilation-time constraints

Before an agentic workflow runs, it must be compiled. The gh aw compile command produces a .lock.yml file alongside your Markdown file. This lock file is what GitHub Actions actually executes. During compilation, the compiler pins all action dependencies to specific SHAs (not tags), validates the frontmatter schema, and runs three security scanners: actionlint for workflow linting, zizmor for privilege escalation vulnerabilities, and poutine for supply chain risks. The lock file you commit is what gets reviewed in PRs, not just the readable Markdown.

Layer 3: Safe outputs and the vetting pipeline

This is the most interesting part of the architecture. The agent has read-only access to GitHub state via the GitHub MCP server. It cannot write anything directly — not even add a comment. Instead, when the agent wants to write something, it buffers the action through the Safe Outputs MCP server. The agent job exits. Then a separate set of jobs processes those buffered actions.

Between the agent job and the write jobs sits a threat detection job. A second AI agent — running with a security-focused system prompt, no write permissions, and no access to the original agent's runtime state — analyzes the buffered outputs. It's looking for secret leaks, malicious code patterns, and policy violations. If it flags anything, the entire workflow stops. Nothing gets written.

Safe outputs also enforce structural constraints you define in the frontmatter. You can restrict the agent to creating at most three pull requests per run, or require that issue titles start with a specific prefix, or block URLs from untrusted domains in comments. The agent's good intentions don't matter — these constraints are deterministic and cannot be overridden by the agent itself.

One more layer worth knowing: content sanitization. Before any issue or PR content reaches the agent, it goes through a pipeline that neutralizes @mentions, converts XML/HTML tags (removing script injection vectors), filters non-HTTPS URLs, normalizes Unicode homoglyphs, and enforces a 0.5MB / 65k-line content limit. This reduces the surface area for prompt injection attacks from adversarial issue content.

Getting started: install, write, compile, run

The entry point is the gh-aw CLI extension. Install it with:

gh extension install github/gh-aw

You can write your first workflow manually, or use a coding agent to generate it. The GitHub team provides a prompt template you can paste into GitHub Copilot, Claude, or any other coding agent to scaffold a workflow from a description. Once you have a .md file in .github/workflows/, compile it:

# Compile the Markdown workflow to a lock file
gh aw compile

# Or with added security scanners
gh aw compile --actionlint --zizmor --poutine

Compilation produces a workflow-name.lock.yml alongside your .md. Both files go into your repository. The lock file is what GitHub Actions executes. You commit the Markdown for readability and the lock file for execution — treat the pair like a package manifest and its lockfile.

For the default engine (GitHub Copilot CLI), your Copilot subscription handles authentication — no extra secrets needed. For Claude Code or OpenAI Codex, you'll add an API key to your repository secrets and reference it in the frontmatter. The AI Engines reference covers the specifics for each.

Once committed, the workflow runs on its declared trigger — a schedule, an issue event, a pull request event, a manual dispatch, or a /command typed into an issue or PR comment. You can monitor runs with the standard GitHub Actions UI, and use the CLI for deeper inspection:

gh aw logs           # Download and analyze recent run logs
gh aw audit <run-id> # Investigate a specific run's prompts and outputs
gh aw status         # Check workflow health

Agent flexibility: Copilot vs. Claude vs. Codex

The default engine is GitHub Copilot CLI, and for most people evaluating this today it's the path of least resistance. If you have a Copilot subscription, it authenticates automatically — no new secrets, no API keys. The tradeoff is cost: each agentic workflow run consumes approximately two Copilot premium requests (one for the agent's work, one for the safe outputs threat detection check). Those requests draw from your existing Copilot allocation.

For teams already using Claude Code or OpenAI Codex, switching is a single frontmatter line. The workflow format is identical across engines — the Markdown and frontmatter structure doesn't change. You can also mix engines across different workflows in the same repository, which is useful if you want a lighter model for simple triage tasks and a more capable one for code changes.

There are also edge cases in agent behavior across engines. Some coding agents are more aggressive about trying to run shell commands; others stay closer to the declared tools. The security architecture contains all of them, but you may get different outputs from the same workflow instructions depending on which engine you use. Worth testing before committing to one for production workflows.

What you can realistically build

The GitHub team has published over 50 example workflows across six categories that give a realistic picture of what agentic workflows do well:

Continuous triage: Automatically read new issues, determine the right label and assignee, and add a structured summary comment. Agents can interpret natural language bug reports and apply labels a rule-based system would miss.
CI failure analysis: Trigger on a failed workflow run, read the logs, reason about the root cause, and open an issue or PR comment with an explanation and a proposed fix. This alone can save significant debugging time for recurring flaky tests.
Documentation maintenance: On each PR merge, check whether the README or API docs need updating based on the diff, and open a PR with the changes. This is the kind of task that always falls through the cracks in manual workflows.
Code simplification: On a weekly schedule, identify duplicated logic or overly complex functions in specific parts of the codebase and open PRs with proposed refactors. Not feature work — maintenance work.
Status reports: Daily or weekly digests of repository activity, surfaced as issues for maintainers. The agent reads activity across issues, PRs, and commits, and synthesizes a summary with links.

A useful filter: if you'd describe a task in words, an agent can probably do it. If you'd describe it in steps, write a YAML workflow. The two approaches aren't competing — they cover different problem spaces.

How it compares to existing automation patterns

Teams sometimes ask whether agentic workflows replace reusable workflows, composite actions, or GitHub Copilot for CI. They don't — the problem spaces barely overlap.

Reusable workflows and composite actions are abstractions over deterministic CI steps. They're still YAML, still predictable, still the right choice for build, test, and deploy pipelines. If your logic can be expressed as "do X then Y then Z," a reusable workflow is simpler, faster, and cheaper.

Copilot for CI (the existing feature where Copilot suggests fixes for failing CI runs) is reactive and UI-driven. You see a failure, you click "let Copilot fix this." Agentic workflows are proactive and automated — the agent runs on a schedule or trigger, without anyone clicking anything.

Running a coding agent CLI directly in a YAML workflow is the obvious DIY alternative. You can just call claude --prompt "..." in a run step. But this approach typically runs the agent in the same trust domain as the rest of the workflow — it can access secrets, write anywhere, and make arbitrary network calls. Agentic workflows' security model is specifically designed to avoid this. The compilation, isolation, safe outputs pipeline, and threat detection add real overhead in setup and cost, but they're what makes continuous unattended execution reasonable.

Current limitations and what to watch

Technical preview means real limitations. A few worth knowing before you decide to adopt:

Cost isn't free. Each run with the default Copilot engine consumes roughly two premium requests. For workflows that fire on every issue event on a busy repo, this can add up quickly. Start with scheduled or manually triggered workflows while you calibrate usage.
The lock file workflow adds friction. Every time you change the Markdown, you need to recompile. Forgetting to commit the updated lock file means the workflow runs the old compiled version. This isn't hard, but it's easy to forget in fast-moving repos.
Non-determinism is real. Agents hallucinate, misinterpret instructions, and occasionally do unexpected things. The security model contains the blast radius, but it doesn't prevent unhelpful or wrong outputs. Treat early runs as supervised — review what the agent produces before relying on it.
PRs are never auto-merged. This is intentional and documented, but worth restating. Agentic workflows can open PRs, but a human must review and merge them. The feature is designed for augmentation, not full automation of code changes.
Copilot billing is per-user. Today, automated Copilot usage in agentic workflows is attributed to a user account rather than an organization-level service account. This matters for auditing and billing in enterprise contexts. The team has indicated this model may evolve.

Should you adopt it now?

For individual developers or small teams: yes, try it now. Pick one low-stakes workflow — a daily status report, or a triage bot for a side project — and run it for a week. The setup takes under an hour, the cost is minimal, and the experience will tell you far more than any article.

For platform teams in larger organizations: proceed deliberately. The security model is solid for a technical preview — the architecture has real depth, and the threat model documentation is unusually transparent. But the per-user billing attribution, the cost at scale, and the lock file maintenance workflow all deserve evaluation before you roll this out to 50 repositories. Start with internal tools and low-traffic repos.

For anyone evaluating the direction: this is the first time GitHub has shipped a feature that makes AI agency a first-class primitive in the developer workflow. Not an AI suggestion inside an editor, not a UI button that triggers Copilot — a scheduled, automated agent that takes actions in your repository based on goals you specify. The rough edges are real, but the architecture is well-considered. Watch the community feedback thread — it's a good signal for where the GA version will land.

The documentation and quick start guide are good starting points. For examples, the Agent Factory covers over 50 workflows across issue triage, documentation, code quality, reporting, and compliance that you can adapt directly.

GitHub Agentic Workflows Explained

Table of Contents

Markdown instead of YAML: what actually changes

The security model: why it's more than a disclaimer

Layer 1: Substrate isolation

Layer 2: Compilation-time constraints

Layer 3: Safe outputs and the vetting pipeline

Getting started: install, write, compile, run

Agent flexibility: Copilot vs. Claude vs. Codex

What you can realistically build

How it compares to existing automation patterns

Current limitations and what to watch

Should you adopt it now?

Smarter reviews. Faster builds.
Start for Free in less than 2 min.

GitHub Agentic Workflows Explained

Table of Contents

Markdown instead of YAML: what actually changes

The security model: why it's more than a disclaimer

Layer 1: Substrate isolation

Layer 2: Compilation-time constraints

Layer 3: Safe outputs and the vetting pipeline

Getting started: install, write, compile, run

Agent flexibility: Copilot vs. Claude vs. Codex

What you can realistically build

How it compares to existing automation patterns

Current limitations and what to watch

Should you adopt it now?

Smarter reviews. Faster builds. Start for Free in less than 2 min.

Smarter reviews. Faster builds.
Start for Free in less than 2 min.