The (Mostly) Agentic SDLC

Monday, 12:00. Grace, the CEO of ACME Corp, just finished her Q2 leadership meeting. The team decided it is time to build an integration with a major platform. An AI agent picks up the meeting summary and creates the first tickets. Grace's VP of R&D approves them over Slack and kicks off development.

Monday14:00

Alan, the PM, gets notified about the ticket he has been assigned. The planning agent has already picked up the details and is offering a few courses of action. Alan picks one, nudges the PRD agent in the right direction, and an hour later a PRD is ready and new tickets are created for the team.

Monday16:00

Ada, the team's architect, is having her discussions over the right system design. It is going to require extended thinking and several sub-agents. It is good those agents work overnight.

Tuesday10:00

Ada has had her coffee and it is decision time. Designs are ready, and it is time for the engineers to pick them up.

Tuesday11:00

John is a backend engineer and he is picking up the new integration tickets. The coding agent codes, the testing agent writes the tests, and John points out a few issues. Around 14:00 the pull request is ready to be reviewed. A review agent prepares the review and John's team points out some missed edge cases. By the end of the day the PR is ready to be merged.

Wednesday10:00

The new integration is ready to be tested in the staging environment. At 14:00, the testing agents find some issues and the team fixes those. By Thursday morning it is time to release.

Thursday10:00

The team is working with canary releases, so the release agent verifies that key KPIs are solid before full production releases. Other tickets will go through the experimentation agents, but not this one.

Thursday17:00

It has been a full day, and the team gets notified the change is stable. Time for a full production release.

Fridaymorning

The team fully releases the feature. For the next couple of weeks the monitoring agents will pay attention to business and technical metrics. If something comes up, new tickets will be opened, and the full cycle will repeat.

There is nothing in this scenario that cannot be done today. This is the new baseline for a healthy SDLC. So how do you get there?

∗ ∗ ∗

01 / The New Age of SoftwareFrom human-as-worker to human-as-decider.

By now it is clear the nature of software engineering is undergoing a fundamental change. For decades, the industry focused on making humans more efficient at doing their job. We are now entering an era where the primary "workers" will be autonomous agents, and the primary role of the human is to make decisions.

This is not speculative. Top-tier engineering organizations are already retooling for this. Stripe is deploying Minions for one-shot end-to-end coding tasks. Ramp built background agents to handle the repetitive toil that usually slows teams down. We see companies like Cursor moving toward cloud agents and a broader industry push toward background agents where models are upstream, constantly proposing changes, and our pipelines are downstream, acting as the high-fidelity filters for quality.

The data supports this. A METR study from early 2025 shows AI agents already performing at the level of experienced open-source developers on complex tasks. Industry leaders like Andrej Karpathy and Michael Truell point to a future where programming is less about syntax and more about managing these agents. As Boris Cherny put it, the shift is already here.

The software development lifecycle has always been a series of handoffs. A ticket becomes a plan, a plan becomes a design, a design becomes code, code becomes a release. At each step, context is transferred from one person or team to the next, and some of it is inevitably lost. AI agents are about to change the mechanics of these handoffs. Not by replacing the people involved, but by carrying context forward more reliably, doing the repetitive parts faster, and freeing humans to focus on the decisions that actually require judgment.

We are still in the very early days and engineers are ahead of the rest of the business, but this is about to change. This article proposes a framework for thinking about where agents fit, what triggers them, what context they need, and critically, where humans stay in the loop. It is opinionated where it should be (humans in the loop, handoffs, feedback loops) and flexible where your organization needs it to be (choose your own tools).

Interactive / The Six Phases

One pipeline. Six phases. Human gates where judgment lives.

Click any phase to explore its agentic workflow. Vertical flow is forward progression. The horizontal pills mark feedback loops where work can return upstream.

01 Plan ticket created ▸

Planning starts when a ticket is created. The agent turns a sparse ticket into a structured plan or PRD. For bugs: a root cause hypothesis and fix approach. For features: a full requirements document. The agent can ask clarifying questions back to the ticket creator before finalizing.

Trigger Event

Ticket created in tracking system (Jira, Linear, GitHub Issues)

Context Inputs

ticketcodebase docspast incidentsteam context

Tools

ticket system APIdoc storecodebase search

MCPs

JiraLinearConfluenceSlackGoogle DocsNotion

Output

PRDplan of actionupdated ticket

Owner

Product manager or tech lead

Sub-agents

Clarification agent (asks questions), context gatherer (pulls related docs and tickets)

gateHuman gate. The plan or PRD must be reviewed and approved by the product owner before the ticket transitions to Design. The agent drafts; the human decides.

feedbackReceives from: Design (plan needs revision), Monitor (new ticket from anomaly or business change). Plan is the entry point for all long feedback loops.

— human approval required —

02 Design plan approved ▸

Design begins once the plan is approved. What "design" means depends on the work: UI/UX sketches for a frontend feature, system architecture for a backend change, database schema proposals for a data model change. The agent proposes design artifacts appropriate to the task type.

Trigger Event

Ticket moves to "Ready for Design" (plan approved)

Context Inputs

ticketPRD / planexisting architecturedesign system

Tools

codebase searchdiagram toolsdesign system refs

MCPs

FigmaConfluenceGoogle DocsNotionGitHubJira

Output

system designDB schemawireframesAPI contracts

Owner

Designer, architect, or senior engineer (depends on task type)

Sub-agents

Architecture analyzer, compatibility checker, schema validator

collabHuman involvement. Design review is collaborative. The agent proposes, the designer or architect iterates. The human owns the final design decision.

feedbackSends to: Plan (if requirements are incomplete or contradictory). Receives from: Build (if implementation reveals the design is infeasible).

03 Build design ready ▸

Build is where most organizations start with agents, and it is the most mature phase. The agent receives all prior context and produces code changes. For multi-service changes, sub-agents can work in parallel. The output is one or more pull requests, not merged code.

Trigger Event

Ticket moves to "Ready for Implementation" (design approved)

Context Inputs

ticketPRDdesign artifactscodebasetest suite

Tools

SCM (Git)IDE / code agentCI pipelinepackage registry

MCPs

GitHubGitLabJiraSlackSentryPostgreSQL

Output

pull request(s)code changesunit testschangelog

Owner

Engineer, EM, or SRE depending on scope

Sub-agents

Code writer (per service), test generator, PR description writer, lint and format agent

reviewHuman involvement. Code review remains human-owned. AI can assist with review, but a human must approve the pull request. Agents produce MRs; humans merge them.

feedbackSends to: Design (if the design is infeasible). Receives from: Validate (test failures, security issues return here with failure context).

04 Validate PR ready ▸

Validation ensures the change is safe for release. This is where agents shine in parallel: unit tests, integration tests, security scans, compliance checks, and change management validation all run concurrently. If validation fails, the feedback loop sends work back to Build with specific failure context.

Trigger Event

Pull request ready to be merged

Context Inputs

ticketPRDdesigncode diffPR metadatatest history

Tools

CI/CDtest runnersSAST / DASTstaging env

MCPs

GitHubGitLabSnykSonarQubeJiraSlack

Output

test resultssecurity reportcompliance sign-offrelease candidate

Owner

Engineer, QA, or change management (depends on org)

Sub-agents

Test agent, security scanner, compliance validator, performance profiler (parallel)

gateHuman gate. Change management and compliance approvals typically require human sign-off (SOC 2, SOX, HIPAA). The agent prepares the evidence package; a human approves the release candidate.

feedbackSends to: Build (on any test, security, or compliance failure). This is the most frequent feedback loop and can often be automated for simple failures like lint or unit test regressions.

— human approval required —

05 Release validation passed ▸

Release gets the change to production. The approach varies by organization: canary deployments, blue-green, feature flags, app store submissions, experiment rollouts. This phase has the highest blast radius and the least agent maturity. Agents can orchestrate the mechanics, but the go / no-go decision stays with humans.

Trigger Event

Validation passed + human approval (release candidate approved)

Context Inputs

all prior artifactsrelease candidaterollback plandeploy config

Tools

deploy pipelinefeature flagscanary system

MCPs

GitHubGitLabCloudflareAWSSlackPagerDuty

Output

deploy receiptcanary metricsrelease notesrollback artifact

Owner

Release engineer, SRE, or on-call engineer

Sub-agents

Deployment executor, canary monitor, rollback agent (triggered on failure)

gateHuman gate. Production deployment authority stays with humans. The agent prepares everything and monitors the canary, but the go / no-go and rollback decisions involve a human.

feedbackSends to: Validate (canary failure triggers rollback and re-validation). In severe cases, may loop all the way back to Build or even Plan.

06 Monitor deployed ▸

Monitoring is the healthy idle state. The system watches for technical regressions (errors, latency, resource usage) and business impact (conversion changes, revenue shifts). In an ideal world, this extends to market dynamics and competitive changes. The most important output is sometimes nothing at all, and sometimes a new ticket, closing the loop.

Trigger Event

Deployment complete (continuous, event-driven on anomalies)

Context Inputs

full product statemetrics / logsbusiness KPIsrecent changes

Tools

observability platformalertinganalytics

MCPs

DatadogPagerDutySentryJiraSlackGrafana

Output

new ticketincident reporthealth dashboardor: nothing (healthy)

Owner

Product or business owner, SRE for technical monitoring

Sub-agents

Anomaly detector, root cause analyzer, ticket creator, business impact assessor

loopHuman involvement. The monitoring agent surfaces issues; humans decide priority and response. When the agent creates a new ticket, it closes the SDLC loop and the cycle starts again at Plan.

feedbackSends to: Plan (new ticket from anomaly, business shift, or competitive change). This is the long loop that makes the SDLC a cycle, not a line.

02 / Feedback LoopsWork does not always move forward.

The pipeline above reads top to bottom, but real work does not always move forward. Feedback loops are how the system self-corrects. Some loops are short and frequent, happening many times within a single ticket. Others are long, spanning the entire lifecycle when monitoring discovers something that needs new work.

Short loops, within a ticket

These happen constantly and are often fully automated. They are the inner engine of quality.

Validate

Tests fail, security scan finds issues, or lint errors detected

→

Build

Agent receives failure context and attempts a fix, or human is notified

high frequency, often automated

Design

Architecture review reveals the plan is incomplete or contradictory

→

Plan

Plan is revised with new constraints discovered during design

moderate frequency, human-driven

Build

Implementation reveals the design is infeasible or under-specified

→

Design

Design is updated to accommodate implementation reality

moderate frequency, human-driven

Release

Canary deployment shows degraded metrics or errors

→

Validate

Rollback triggered, additional validation needed before next attempt

low frequency, high urgency

Long loops, across the lifecycle

These close the full cycle. They are the reason the SDLC is a loop and not a line.

Monitor

Anomaly detected: error rate spike, conversion drop, or performance regression after deploy

→

Plan

New ticket created automatically with incident context, starting the cycle again

the defining loop of the SDLC

Monitor

Business KPIs shift: competitor launches feature, market conditions change, user behavior evolves

→

Plan

Strategic ticket created by product, informed by monitoring data

low frequency, high impact

The key design decision is which feedback loops should agents handle autonomously, and which require human judgment. A good default: short loops within Build–Validate (test failures, lint fixes) can be automated. Everything else should notify a human and wait for a decision.

∗ ∗ ∗

Interactive / Context Accumulation

Each phase inherits everything the last one made.

Move the slider through the phases. New artifacts produced at each phase are highlighted. By the time you reach Release, the agent needs access to everything upstream.

Context accumulation across phases

Phase

03 / Context AccumulatesThe hardest unglamorous problem.

One of the hardest problems in the agentic SDLC is context management. Each phase produces artifacts that downstream phases need. By the time you reach Release, an agent needs access to the ticket, the plan, the design artifacts, the code changes, the test results, and the compliance evidence.

The practical approaches to managing this context range from simple to complex. Most teams should start at the simple end.

Context management spectrum

Simple. Agents access tools directly (Jira API, Google Docs, Git) and pull context on demand. A summary of prior phases is passed as part of the agent's initial prompt. This works for most teams and keeps infrastructure minimal.

Advanced. A shared state store (vector DB, knowledge graph) where each phase deposits structured artifacts. Agents query semantically for relevant context rather than receiving everything. This becomes necessary when the volume of artifacts exceeds what fits in an agent's context window.

Start with tool access. Let agents pull what they need from the systems that already hold the information. You can always add a shared state layer later when context volume forces the issue. You almost never need to add it upfront.

04 / OrchestrationKeep it boring.

There is a strong temptation to over-engineer orchestration. Multi-agent frameworks, complex message buses, dynamic agent routing. Like with everything in software development, the teams that succeed start simple.

The simplest orchestration pattern that works for the SDLC is a sequential pipeline with human checkpoints, backed by your existing ticket system as the state machine. The ticket's status field is the orchestration. When a ticket moves to "Ready for Design," the design agent activates. When a PR is opened, the validation agents run. When validation passes and a human approves, the release pipeline fires.

The opinionated default

Use your ticket system as the orchestrator. Ticket status transitions are your events. CI/CD pipelines are your agent runners. Git branches provide isolation for parallel agent work. Message queues (SQS, Kafka, and so on) are for when you outgrow this, not for when you start.

When you do need more sophisticated orchestration, a simple message queue gives you everything: decoupled phases, retry logic, dead letter queues for failed agent runs, and the ability to scale agents independently. But most organizations are not there yet, and pretending you are will cost you more in complexity than it saves in efficiency.

05 / Org DesignYour structure already works. Don't break it.

There is a recurring pattern in how companies approach AI adoption. A leadership directive arrives: we need to be AI-first. An AI enablement team is formed, or an existing platform team is given the mandate. Within weeks, that team is drowning. Every department wants something. Product wants AI-generated specs. Engineering wants coding agents. QA wants automated test generation. Ops wants intelligent alerting. The enablement team becomes a bottleneck, trying to build bespoke AI solutions for every team in the company while everyone waits.

This is the wrong model. It fails for the same reason it fails when a platform team tries to build every internal tool themselves instead of providing the platform that lets other teams build what they need.

The agentic SDLC framework does not require a new organizational structure. It requires the existing one to work the way it already should. Product people still own planning. Designers and architects still own design. Engineers still own building and validating. SREs still own releases and monitoring. What changes is not who does the work, but what tools they have.

The AI enablement team's real job

The AI enablement team (or platform team, or developer experience team) should not be building agents for every phase of the SDLC. Their job is to provide the infrastructure: the LLM access layer, the cost controls, the observability tooling, the secure execution environments, and the guardrails. Then each domain team configures and steers the agents for their own phase, because they are the ones with the domain knowledge to do it well.

A product manager knows what makes a good PRD for their domain. A senior engineer knows what patterns belong in their codebase and what does not. An SRE knows what healthy looks like for their services. No central AI team can replicate that knowledge. What the central team can do is make it easy for those experts to plug their knowledge into agent workflows: prompt templates, system instructions, tool access, and evaluation criteria.

This is the same model that made platform engineering successful. The platform team does not write your CI pipelines. They give you the CI system, the runners, the security scanning integration, and the deployment targets. Your team writes the pipeline that fits your service. The agentic SDLC works the same way. The enablement team provides the agent infrastructure. Your team provides the expertise that makes the agents useful.

The organizational benefit is that domain knowledge stays where it belongs: distributed across the people who actually do the work. The community of practice between product managers, the design review culture, the engineering guild that maintains coding standards, the on-call rotation that knows production inside out. None of that needs to be replaced. All of it becomes the guidance layer that makes agents effective rather than dangerous.

06 / The Manual PartsThe humans are the guidance system, not the bottleneck.

The most counterintuitive insight about the agentic SDLC is this: the manual parts are not the weakness. They are the strength. Human approval gates exist because judgment, accountability, and compliance cannot be automated away.

Agents work well with good guidance. They work poorly without it. The humans in the loop are not bottlenecks to be optimized away. They are the guidance system. A plan that no product person has read will produce the wrong feature. Code that no engineer has reviewed will accumulate debt. A deployment that no SRE has approved will eventually take down production.

The approval-fatigue trap

As agents produce more output faster, the risk shifts from "not enough automation" to "too many approvals rubber-stamped." If your humans are approving 50 agent-generated MRs a day without reading them, you do not have an agentic SDLC. You have an automated one with a decorative human in the loop. Design your approval workflows so that the number of decisions stays manageable and each one gets real attention.

07 / ObservabilityYou cannot manage what you cannot see.

When agents do the work, humans need to see what happened. This is not optional. Observability in the agentic SDLC covers three layers.

Three layers of agent observability

Orchestration. Which agents ran, in what order, with what inputs, and what they produced. This is your audit trail.

Operation. How each agent behaved: token usage, latency, tool calls, error rates, retries. This is your cost and reliability signal.

Output. The quality of what agents produced: were the tests meaningful, was the code reviewable, did the PRD capture the intent? This requires human evaluation, at least for now.

The operational layer is especially important for cost control. Multi-agent setups can consume 4 to 15 times more compute than single-turn workflows. Without per-agent cost tracking, teams discover they have a problem only when the invoice arrives.

08 / SecurityAgents need sandboxes too.

Every agent that writes code, runs tests, or deploys changes needs a secure execution environment. The security model should follow the principle of least privilege: agents start with read-only access and earn write permissions through demonstrated reliability and appropriate human oversight.

A planning agent that reads tickets and writes documents is low risk. A build agent that pushes to Git is medium risk. A release agent that deploys to production is high risk. Each level demands different isolation, different approval flows, and different monitoring. This is a deep topic that deserves its own treatment — check out my colleague Emir's writeup on the topic — but the key principle is: treat agent permissions like you treat employee permissions. Scope them tightly and audit them continuously.

09 / Where To StartOne phase, measured, then the next.

You do not need to build all six phases at once. Start where the technology is most mature and the organizational risk is lowest. Most teams that succeed follow a sequence like this.

Build and Validate first

Code generation and automated testing are the most mature agent capabilities. Start here. Measure cycle time and defect rates. This is also where the validation harness lives, and where you will get the largest single quality jump.

Then Plan

Structured spec generation from tickets. This improves the quality of everything downstream. A good PRD agent quietly raises the floor on every later phase.

Then Monitor

Anomaly detection and incident triage. These agents work best with historical data, so give them time to learn your system before relying on them.

Then Design and Release

These require the most organizational trust and the deepest system context. Earn that trust with the earlier phases first.

10 / The Work AheadFaster stages. Tighter feedback. Same people.

The agentic SDLC is not a product you buy or a switch you flip. It is a way of thinking about where AI fits into the work your organization already does. The phases are the same ones you have always had. The people are the same ones who have always done the work. What changes is the tooling between them: agents that carry context forward, automate the repetitive, and surface what needs human attention.

The organizations that will get this right are the ones that resist two temptations. The first is to automate everything and remove humans from the loop. The second is to centralize everything and build a single AI team that becomes a bottleneck for the entire company. The better path is to distribute the capability, keep the expertise where it lives, and invest in the infrastructure that makes agents safe, observable, and accountable.

Start with one phase. Measure what changes. Expand when you have evidence, not enthusiasm. The SDLC has survived every technology shift for decades because its structure reflects how software actually gets built: in stages, with feedback, by people who care about the outcome. Agents do not change that. They just make the stages faster and the feedback tighter.

The people still matter most. Build accordingly.