Human Edge

The Real Problem Is Not the Technology

A friend dropped this question in our WhatsApp group last week:

“Are there any startups building agents that can learn business processes by observing humans doing them? We have 2,000 to 3,000 people handling back-office processes in Finance and HR. We’ve had good success with about 10 products on the Microsoft stack, but the processes are non-standard. It’s tough to scale without going through the massive change management effort of standardisation.”

I started typing a reply. Then I realised the answer was too long for WhatsApp. So here it is as a proper blog. Because this question is not really about AI agents. It’s about the oldest problem in operations: the gap between how people say they do the work and how they actually do the work.

Let’s be honest about what’s actually going on in most back-office teams. There is one standard process. The official one. The one in the training manual. The one that got signed off by compliance three years ago. Then there are 10 to 15 ways the work actually gets done.

Some are shortcuts. Some are workarounds. Some are jugaad fixes that someone figured out on a Tuesday afternoon when the system was down and a deadline was approaching. Some are genuinely better than the standard process but nobody ever wrote them down.

If you ask 30 people to list the 10 to 15 variations, they can’t. Not because they’re hiding anything. Because they don’t even know they’re doing it. It’s muscle memory.

So the question becomes: how do you capture what people actually do, including the stuff they can’t articulate?

Six Loops. Start Simple. Go Deeper Only When You Need To.

I think about this as six loops. Each one captures more detail than the last. You start at Loop 1. You only move to the next loop when the previous one isn’t giving you enough.

Most problems get solved by Loop 2 or 3. You rarely need Loop 6. But knowing the full stack means you always know where to go next.

Just Ask Them

CaptureStart here

Conversations with the people who do the work, transcribed and pattern-mined.

AI Interviews Them

Capture

A conversational agent probes for variations the human interviewer might miss.

Screen + Voice Recording

Capture

See what they do AND hear why. The frame and the narration, synchronised.

Pattern Extraction & Weighting

Build

A probability map of every variation. Power-law distribution.

The Agent That Knows Its Limits

Build

Handles known paths. Stops and flags the unknown ones. No silent failure.

Continuous Learning

Build

Every human resolution becomes training data. The 26th variation gets added.

Loop 1 — Just Ask Them

The simplest version. You sit down with 20 to 30 people who do the work. You record the conversations. Not a formal interview with a clipboard. A conversation. “Walk me through what you did this morning. What happened when the system didn’t have the field you needed? What do you do when the approval takes too long?”

Then you take those transcripts and feed them to an AI. Not to summarise. To extract patterns: “Here are the 14 different ways your team processes expense reimbursements. Seven of them are variations of the standard. Four are workarounds for system limitations. Two are completely unofficial but faster. One is technically non-compliant.”

That’s it. Loop 1. Voice in. Patterns out.

You’d be amazed how much you can capture just by listening. Finance processes, HR onboarding, procurement approvals. These are not quantum mechanics. The complexity is not in the logic. It’s in the variations. And people will tell you the variations if you ask the right way.

Loop 2 — The AI Interviews Them

Loop 1 has a limitation. You’re doing the interviewing. You might miss follow-up questions. You might not know enough about the process to probe the edge cases.

So in Loop 2, the AI does the interviewing.

You build a simple conversational agent that asks: “Walk me through your last expense claim. What happened next? Was that the normal way or was there something different this time? What do you do when that field is missing?” The AI keeps asking until it has mapped the full path. Then it compares that path against every other path it has collected. If it finds a new variation, it flags it. If it matches an existing one, it adds weight to that pattern.

Over 30 conversations, the system builds a probability-weighted map of every way the work gets done. Not the manual. The reality.

Loop 3 — Screen Recording While Talking

Now we’re going deeper. Some processes are hard to explain verbally. “I click on that thing, then I go to the other tab, then I copy the number from the email.” That’s not helpful in a transcript.

So in Loop 3, you ask people to do the work on a screen recording while narrating what they’re doing. Voice and screen together. “OK so I’m opening the invoice portal now, I’m looking for the PO number, it’s not in the standard field so I have to check the email from the supplier, here it is, now I’m pasting it into the notes section because the PO field doesn’t accept this format…”

Now you have two streams: what they’re doing (screen) and why they’re doing it (voice). When you combine these, you capture the nuances that pure voice misses. And the technology to do this already exists.

Open SourceMIT licensed

claude-video by Brad Flaugher

Takes any video, extracts frames at timed intervals, pulls a timestamped transcript, and hands both to Claude. The AI sees every screen and hears every word, synchronised to the second. Originally built for watching YouTube. The architecture is identical to what you need for process capture.

That’s the difference between “I copied the number” (useless) and “at 2:47, the user opened tab 3 of the invoice portal, scrolled to the notes field, and pasted value PO-2847 from the supplier email visible in the background” (actionable).

Screen capture while talking. That’s the gold standard for process capture. Everything below this loop is just getting there faster.

Loop 4 — Pattern Extraction and Weighting

By Loop 4, you have enough data to build the actual map. And the map looks like this:

FrequencyVariantWhat Happens

90%

Standard

Everything goes through the normal path.

Variant 2

PO number in non-standard format. Person reformats it manually.

Variant 3

Approval chain broken because someone is on leave. Person emails the backup approver directly.

Variant 4

System is down. Person processes it offline and reconciles later.

0.5%

Variant 5

Foreign currency invoice. Exchange rate lookup needs a different tool.

0.1%

Variant 6

Something genuinely weird. Duplicate supplier, historical credit note, regulatory edge case.

This is a power law distribution. And here’s what matters: as far as the AI agent is concerned, each variation is just another branch. Variant 6 is not harder than Variant 1. It’s just rarer. The agent doesn’t care about frequency. It cares about completeness. Can it handle this path? Yes or no.

Loop 5 — The Agent That Knows Its Limits

This is where most automation projects go wrong. They build for the 90% and pray the other 10% doesn’t show up. When it does, the system breaks silently. Or worse, it processes it wrong and nobody notices for three months.

A properly built agent does something different. It tries all 25 known variations. If the case fits one of them, it processes it. If it doesn’t fit any of them, it stops and says:

“I’ve checked all 25 variations. This case doesn’t match any of them. You need a 26th path. Go figure it out with your team and come back and program me.”

That’s not a failure. That’s the agent doing its job. It knows what it knows. It knows what it doesn’t know. And it tells you.

The output is processing the invoice. The outcome is knowing that 99.5% of invoices get processed automatically and the remaining 0.5% get flagged for human review with a specific reason why. No silent failures. No prayers.

Loop 6 — Continuous Learning

Loop 6 is where the system starts improving itself. Every time a human resolves a case that the agent couldn’t handle, that resolution becomes training data. The 26th variation gets added. The weights get updated. The next time this edge case shows up, the agent handles it.

Over six months, the system goes from handling 90% of cases to 95% to 98% to 99.2%. Not because someone redesigned the process. Because the system learned from what actually happened.

The Flywheel

Capture→Extract patterns→Build agent→Run agent→Flag unknowns→Humans resolve→Agent learns→Fewer unknowns↻

That’s the whole thing. Six loops. Each one building on the last. Start with voice. Graduate to screen capture. Build the probability map. Let the agent run. Let it tell you when it’s stuck. Let it learn from the fix.

Why Standardisation Is the Wrong Starting Point

Now let me come back to the original question. “It’s tough to scale without going through the massive change management effort of standardisation.”

Here’s the thing. Standardisation is the most expensive, slowest, most politically painful way to solve this problem. You’re asking 2,000 people to stop doing what works and start doing what the manual says. They’ll resist. Not because they’re difficult. Because their “non-standard” process often is better for their specific situation.

The six-loop approach flips this entirely. Instead of forcing everyone into one process, you map every process that exists. You build an agent that handles all of them. Then, over time, the data tells you which variations are genuinely better and which are just habits. The standardisation happens bottom-up, driven by evidence, not top-down, driven by a consulting firm’s PowerPoint.

The outcome is not “we standardised the process.” It’s “we captured every way the work gets done, automated 98% of it, and the remaining 2% gets smarter every month.”

The Technology Is Already Here

Let me be specific about what you need to build this. Because this is not theoretical.

Loops 1 & 2Voice capture

Any transcription service. Whisper (open source, free). Groq’s hosted Whisper (basically free). Record a conversation on your phone. Transcribe it. Feed it to Claude or GPT. Ask it to extract the process variations. This works today. Right now. On your laptop.

Loop 3Screen + voice capture

Any screen recording tool. Loom. OBS (free). QuickTime (Mac, free). Then use claude-video to feed the recording to an AI that can see the frames and read the transcript together. A 45-minute process walkthrough costs roughly a dollar to analyse. That’s nothing.

Loops 4, 5 & 6Agent building

This is where your engineering team comes in. But the inputs are already there from Loops 1 to 3. You have the process map. You have the variations. You have the weights. Building the agent is the straightforward part. Capturing the reality is the hard part. The six loops solve the hard part.

The Bottom Line

Your friend’s question was: “Are there agents that can learn business processes by observing humans?”

The answer is: the agent is the last step, not the first.

The first step is capturing how the work actually gets done. Not the manual version. The real version. All 25 variations of it. Including the ones nobody can articulate until they’re doing it and talking through it at the same time.

Start with voice. Graduate to screen capture. Build the map. Weight the variations. Build the agent. Let it flag what it can’t handle. Let it learn from the fix.

Six loops. Start at one. Go deeper only when you need to. The only thing standing between your 2,000-person back office and a system that handles 98% of it automatically is someone willing to start recording conversations.

Tools referenced

claude-video by Brad Flaugher →

Give Claude the ability to watch any video. Open source. MIT license.

Whisper →

Open source speech recognition. Free.

Groq Whisper API →

Hosted Whisper transcription. Free tier covers most usage.

OBS Studio →

Free screen recording. Open source.

Part of the From Outputs to Outcomes series. The equation: Domain Engineering + Context Engineering + Prompt Engineering + Human Feedback = Outcomes. This post is about the Human Feedback loop. The part where the system learns from what humans actually do, not what the manual says they should do.

Keep Reading

Explore more essays on strategy and AI

Read: The Six Loops

Teach Your AI the Jugaad.

The Real Problem Is Not the Technology

Six Loops. Start Simple. Go Deeper Only When You Need To.

Loop 1 — Just Ask Them

Loop 2 — The AI Interviews Them

Loop 3 — Screen Recording While Talking

claude-video by Brad Flaugher

Loop 4 — Pattern Extraction and Weighting

Loop 5 — The Agent That Knows Its Limits

Loop 6 — Continuous Learning

Why Standardisation Is the Wrong Starting Point

The Technology Is Already Here

The Bottom Line