I Built an Outsourcing Company with 7 AI Agents

A dev shop with no developers. AI plans, codes, and tests.


The Problem

The structural problems of outsourced development have been around forever.

A client says “build me this app,” and the dev shop interprets and builds it. Communication costs eat up 30-40% of the entire project. Requirements get buried in meeting notes, mid-point reviews are perfunctory, and the deliverable looks nothing like what was originally discussed.

I wondered: what if AI agents handled the outsourcing instead of people?

AI agents in a meeting


Why Existing Approaches Fall Short

There’s no shortage of AI coding tools. GitHub Copilot, Cursor, Claude Code — all excellent. But these are developer tools. You need a developer to use them.

I wanted something different.

  • Client says “build me a todo app”
  • AI analyzes the requirements
  • Writes the spec
  • Codes it
  • Tests it
  • Deploys it

Without a developer. Or with minimal oversight at most.

The problem is that a single AI agent can’t do all this. Cram planning, development, and testing into one prompt and the context explodes. It looks plausible at first, but collapses as complexity grows.


My Decision: Split the Roles

Just like human organizations, AI needs role separation.

I designed an AI outsourcing team of 7 specialized agents:

AgentRoleAnalogy
PMRequirements analysis, PRD writing, issue breakdownProject Manager
ArchitectTech stack selection, architecture designTech Lead
FrontendUI/UX implementationFrontend Developer
BackendAPI, DB, business logicBackend Developer
QATesting, quality verificationQA Engineer
DebugBuild error analysis, auto-recoverySenior Debugger
DevOpsBuild, deploy, preview environmentsInfra Engineer

The key: each agent only does its job. PM doesn’t know code, Frontend doesn’t know the DB. Instead, they reference each other’s outputs.

7-agent architecture


The Pipeline: Approval Gates Are Everything

You can’t just let agents loose. AI is confidently wrong.

So I built 3-stage approval gates:

Client Request

[PM Agent] Write PRD

🚪 Gate 1: PRD Approval ← Human reviews

[PM Agent] Break into Phase-based Issues

🚪 Gate 2: Issue List Approval

[Dev Agents] Frontend + Backend in Parallel

[QA Agent] Testing

🚪 Gate 3: QA Plan Approval

Deploy!

A human reviews and approves at each gate before proceeding. This catches the AI if it’s heading in the wrong direction.

This mirrors real outsourcing contracts: spec review → mid-check → final acceptance. The difference is it all happens with a single click on a web dashboard.

Approval gate pipeline


Auto Mode vs Manual Mode

Here’s where it gets interesting — two modes:

Manual Mode: Approval required at every gate

  • Client reviews each stage, like a traditional contract
  • For projects where trust hasn’t been established yet

Auto Mode: Gates auto-approve

  • Internal prototypes, rapid MVP validation
  • “Just build it and we’ll see”

I tested it. In auto mode, I told it to build a “weather dashboard” app. PM wrote the PRD → auto-approved → 8 issues generated → auto-approved → Dev Agent implemented Phase 1 through 7 sequentially → done. Zero human intervention.

The result wasn’t perfect. But a working app came out.


Cost: A Dev Team for $13/month

The real disruption is cost.

RoleModelCost
PMClaude Opus 4.6~$10/mo
Dev (FE+BE)GLM-5~$2/mo
QA + DebugGLM-5~$1/mo

Only the PM uses an expensive model. Because PRD quality determines the entire project. Bad planning means good coding is worthless. But with clear instructions, even cheap models code well enough.

One good PRD means Dev wastes less time, and total costs drop. Same principle in human organizations.

Cost comparison


Shared Memory: The Agents’ Wiki

Agents need to know about each other’s work. Frontend can’t ignore Backend’s API spec.

So I built a per-project wiki system:

  • PM writes PRD → saved to wiki
  • Architect picks tech stack → saved to wiki
  • Backend builds APIs → saved to wiki
  • Frontend reads wiki → implements accordingly

Every decision is recorded, every agent references it. Like Notion or Confluence for human teams.


Lessons Learned

1. AI Needs to Be Managed

“Just hand it to AI and it’ll figure it out” is a fantasy. AI is a great executor but a poor judge. Setting direction is still a human’s job. That’s why approval gates matter.

2. Role Separation Creates Quality

Ask one AI to do everything and you get “not bad” results. Split the roles and you get “good in each area” results. Same principle as human organizations.

3. Use Expensive Models Only Where It Matters

Not every agent needs the best model. Expensive models where judgment is needed (PM), cost-effective models where execution is needed (Dev).

4. Process Beats Prompts

No matter how good your prompt is, without structure it falls apart. Conversely, with solid structure, even average prompts produce results.


What’s Next

Right now, only the super admin (me) can use it. Soon I’ll open a client portal to take real outsourcing projects.

“An app built by AI” might not inspire confidence yet. But when the deliverable works, costs 1/10th as much, and ships 10x faster — the market will respond.

An AI agent outsourcing company. 7 employees, $13/month payroll.

Future AI outsourcing office


A Question to Consider

In your organization, where is the line between “judgments only humans should make” and “execution AI can handle”?


The system described in this post is being developed as Codemon Make.