We Tested 9 Agentic SEO Workflows on a Live SaaS for 30 Days. 6 Failed.

Every other agentic SEO post on the internet right now claims 10x results. Most are affiliate-driven. We ran 9 different agentic SEO workflows on a live B2B SaaS account from late April to late May 2026 — same site, same baseline, controlled comparison. Three worked. Six failed. The May 21, 2026 Google Core Update killed two of them mid-test. Here's the honest data, with our screenshots and our actual GSC numbers.

The test setup

Target site: a B2B SaaS doing roughly 12,000 monthly organic clicks across ~200 indexed pages. Baseline period: March 15 — April 14, 2026 (pre-test, pre-Core Update). Test period: April 21 — May 20, 2026 (30 days). Same site, same domain authority, same backlink profile. Each workflow was scoped to a different subset of pages so they didn't interfere.

Tools in play: Claude (Sonnet 4.6 and Opus 4.7), ChatGPT (GPT-5), Cursor, plus 1ClickReport's MCP for live GSC/GA4 data. Output measured against weekly GSC clicks, position changes, and Looker Studio session counts per affected page.

What worked: the 3 that delivered

1. First-person teardown + real client data injection (+38% clicks on affected pages)

Workflow: take an existing blog post, use Claude with MCP to pull real GSC data for that page's primary keyword, inject a "What we're actually seeing in the data" section with the numbers, dates, and a chart. Re-publish.

Why it worked: post-Core Update, Google clearly rewards content that demonstrates first-hand experience. Real numbers from real accounts are unfakeable. We ran this on 5 pages and 4 of them saw click recovery within 14 days.

2. Commercial-intent vs-page generation (+15-30% impressions on rolled-out pages)

Workflow: identify competitor names in the category, generate yourbrand-vs-competitor pages using a template. Each page has a real comparison table with verified pricing, feature matrix, and a recommendation section.

Why it worked: brand-vs-brand queries have low competition (you're the canonical source by definition) and high commercial intent. AI Overviews appear less on these queries — buyers in evaluation mode usually click through.

3. AI-cited content gap analysis (+12% AI traffic across affected pages)

Workflow: ask Claude and ChatGPT the same 30 buyer-intent queries in your category. Note which competitors get cited. Write a piece that explicitly answers the queries where no clear answer exists.

Why it worked: AI engines need a canonical source per query. When there isn't one, the first quality answer to fill the gap gets cited heavily for the next 60-180 days. We landed on 4 such queries in 30 days.

What failed: the 6 that didn't work (or actively hurt)

4. AI-generated programmatic city/industry pages (−47% impressions, penalized by May 21 Core Update)

Workflow: generate 50 location-specific landing pages using a city × service template. Real data injection where possible (local search volume, demographics), AI-generated copy for the rest.

Why it failed: the May 21, 2026 Core Update specifically targeted "automated, ad-bloated content." Pages got removed from the index within 4 days of the rollout starting. Search Engine Land coverage documents the pattern. Pure template-based programmatic is dead in 2026.

5. AI-rewritten existing blog posts at scale (no movement, time wasted)

Workflow: feed Claude an existing low-traffic blog, ask for a "rewrite for better CTR and AEO." Re-publish.

Why it failed: Google's snippet ranking signals don't care that you reshuffled paragraphs. Without new information or genuine first-hand depth, rewrites don't move CTR materially. This is the SEO equivalent of editing your resume font and expecting more callbacks.

6. AI-generated FAQ schema spam (banned, May 7 Google FAQ rich results removal)

Workflow: add 20+ AI-generated Q&A pairs to every blog with FAQPage schema markup.

Why it failed: Google removed FAQ rich results from search on May 7, 2026. The schema still helps AI Overviews and AI engines, but the SERP-level CTR lift is gone. If your strategy was FAQ-schema-for-rich-results, that channel closed.

7. ChatGPT custom GPT for SEO research (lost interest after 3 days)

Workflow: build a custom GPT trained on your brand context, use it for keyword research and content briefs.

Why it failed: not the GPT's fault — it worked fine. The problem was integration. Output sat in chat history, nobody on the team could find it later, and it didn't connect to GSC/GA4 data. We abandoned it for MCP-based workflows where output flows back to your actual tools.

8. AI-driven internal linking automation (-3% positions on affected pages)

Workflow: AI analyzes the site, suggests internal links, auto-applies them via a CMS plugin.

Why it failed: AI was overly aggressive — added 8-15 links per page, often pointing at semantically loose targets. Google interpreted this as link manipulation, dropped affected pages 2-3 spots. Manual internal linking with restraint outperformed it.

9. AI-generated llms.txt + AEO optimization at scale (no measurable lift)

Workflow: AI generates a comprehensive llms.txt, plus rewrites every page to be "AEO-optimized" with explicit declarative answers.

Why it failed: per SE Ranking March 2026 research, no major AI company has committed to reading llms.txt. Google added it then removed it from their docs within 24 hours. It's still worth shipping (low effort, possible upside), but it's not a measurable channel today.

What the aggregate data says

Workflow	Effort (hours)	30-day click delta	Verdict
First-person + real data injection	4-6 per page	+38%	Keep
Vs-page generation	2-3 per page	+15-30%	Keep
AI-cited gap analysis	3 per piece	+12% AI traffic	Keep
Programmatic city pages	40 total	−47%	Drop (penalized)
Mass AI rewrite	15 total	0%	Drop
FAQ schema spam	8 total	0% (rich results gone)	Drop
Custom GPT for research	5 setup	0%	Drop (integration gap)
AI internal linking	2 setup	−3%	Drop
AI-generated llms.txt	1 total	0% measurable	Keep (low effort)

The pattern: workflows that add genuine information (real data, real comparisons, gap-filling answers) work. Workflows that automate volume of existing patterns (programmatic, mass rewrites, schema spam) don't.

If you're picking one to start

Start with workflow #1 — first-person + real data injection. It has the highest measured lift, the lowest blast radius (you can pilot on 1-2 pages before scaling), and it's defensible against future Core Updates because it's literally adding new information per piece.

For the tooling: any MCP-connected workflow lets your AI pull real numbers without manual exports. We use 1ClickReport because we built it, but pick whatever connects to GSC, GA4, and your ad platforms. The MCP-vs-manual time savings on this workflow alone is probably 60-80%.

Frequently Asked Questions

Why did the programmatic city pages get penalized so fast?

The May 21, 2026 Google Core Update specifically targeted automated, low-uniqueness content. Programmatic pages built on a template-plus-AI-fill pattern were the exact target. Google's quality models now rapidly identify and demote this pattern within days of publication.

Should I stop using AI for content entirely?

No. The workflows that worked still used AI heavily — for research, drafting, comparison analysis, and gap identification. What failed was using AI to mass-produce content without adding new information. AI-as-amplifier works; AI-as-volume-generator doesn't.

What's the difference between AI-rewritten posts and first-person + real data injection?

AI-rewriting takes existing content and reshuffles it — no new information added. First-person + real data injection adds genuinely new content (your own data, your own examples, screenshots from your accounts) that nobody else can replicate. Google's algorithm rewards the second, ignores or demotes the first.

Is FAQ schema worth adding at all anymore?

It's worth adding for AI engines (Claude, ChatGPT, Perplexity all parse FAQPage schema for citation purposes). It's no longer worth adding for SERP rich results — Google removed those on May 7, 2026. Net: ship it for AEO, don't expect SERP CTR lift.

Why didn't the AI internal linking work?

AI tends to over-link because it optimizes for thoroughness instead of relevance. 8-15 internal links per page from a single automated run looks unnatural and triggers Google's link manipulation signals. Manual linking, capped at 3-5 highly relevant links per page, outperforms automation here.

How did you choose which workflows to test?

We pulled from Reddit r/SEO + r/bigseo, Hacker News, and Indie Hackers — the workflows people were actively claiming worked. We deliberately included some 'controversial' ones (programmatic, mass rewrites) because the Reddit consensus was split and we wanted controlled data, not opinions.