Feeding agents clear data.

The human web is
broken for AI agents._

AgentClearfeed is a parallel content layer built for inference. Clean, structured, verified content served in .acf format - not a scraper on top of the human web, but a native format designed for how AI agents actually consume information.

The current options

None of them work.

~/

Scraping

Fragile, breaks constantly. Dependent on HTML structure that changes without warning. 93-98% of every scraped page is noise - ads, navigation, cookie banners, tracking scripts - before the agent reaches actual content.

{}

MCPs

Powerful but hand-built. Every publisher needs to implement their own. Doesn't scale to the open web. Great for controlled internal tools — not viable for general-purpose agent content retrieval across the web.

~>

Existing APIs

Designed for developers building human-facing products. Pagination chrome, JSON wrappers, authentication flows, rate limits built around human usage patterns. Not designed for agent consumption at inference time.

Benchmarks - Phases 1-4

The numbers.

0%
Token Reduction

Phase 1 — single document retrieval. HTML averaged 16,388 tokens. ACF averaged 394.

0%
Token Reduction

Phase 2 — 10-document multi-topic retrieval. 84,022 tokens of HTML vs 5,429 of ACF. Consistent across all 4 models.

0%
Token Reduction

Phase 3 — live dynamic data. 13,287 tokens of bloated crypto tracker vs 173 tokens of ACF action format.

0.42 → 0.42
Accuracy

Multi-doc accuracy averaged across 4 models: Qwen 14B, Claude Haiku, Kimi K2.5, Kimi K2.6. Same content - different format.

0x
Fresher Data

Phase 3 staleness metric. Local model answers with HTML data 113.5 seconds old. ACF: under 10s. Token bloat costs time.

8-11%
Agent Comms Saving

ACF beats JSON in agent-to-agent communication across all models and datasets. Phase 4 complete.

Architecture

How it works.

Human Web
bloated HTML
Agent
~16,000 tokens 119s latency 0.42 accuracy expensive
ACF Layer
clean .acf
Agent
~394 tokens 23s latency 0.93 accuracy cheap
Every field is explicit

Structured labels, not prose. An agent always knows exactly what it's reading — no inference, no guessing, no tokens wasted on layout chrome or promotional copy.

Ungameable by design

Structured fields leave no room for SEO manipulation or AEO injection. No narrative spin. No promotional language. The format physically cannot be gamed the way HTML can.

Model-agnostic

Every model tested — 14B local, frontier API, Chinese cloud providers — performs better with ACF. The format wins at every layer of the stack regardless of model capability.

Full Results — Phases 1–3

The data.

Metric HTML ACF Improvement
Tokens per queryPhase 1 16,388 394 97.6% reduction
Tokens per 10-doc retrievalPhase 2 84,022 5,429 93.5% reduction
Tokens per live data pagePhase 3 13,287 173 98.7% reduction
Inference latencyPhase 1 119s 23s 5.2× faster
Multi-doc accuracy — avg 4 modelsPhase 2 0.42 0.93 +0.51 absolute
Data staleness — local modelPhase 3 113.5s 9.8s 11.5× fresher
Phase 4 - Agent Communication

Agent communication results.

Qwen 2.5 14B - AI Fairness dataset (deterministic, seed=42)

Format Total Tokens Accuracy Data Loss Cost/query
ACF 828 0.89 0.11 $0.000249
TOON 860 0.89 0.11 $0.000258
JSON 932 0.89 0.11 $0.000280

Claude Haiku 4.5 - three-way finale

Format Total Tokens Accuracy Data Loss Cost/query
ACF 828 1.00 0.00 $0.000663
TOON 860 0.89 0.11 $0.000688
JSON 932 0.89 0.11 $0.000746

ACF is the only format achieving perfect accuracy. 8-11% fewer tokens than JSON at every layer of the stack.

Research Progress

The phases.

01
Complete
Single Document Retrieval

HTML vs ACF on AI fairness content. 97.6% token reduction. 5.2× faster. Zero accuracy loss.

02
Complete
Multi-Document + Cross-Model

10 Wikipedia topics across 4 models. ACF wins on every model. Agent Swarm can't rescue HTML.

03
Complete
Live Dynamic Data

Real-time data pipeline with staleness metric. 98.7% reduction. 11.5× fresher answers on local model.

04
Complete
Agent-to-Agent Communication

ACF vs JSON vs TOON in agent-to-agent comms. 8-11% fewer tokens than JSON. Only format achieving perfect accuracy with Haiku.

05
Planned
Index and Publisher Tools

Discovery layer and tooling for publishers to serve ACF natively at scale.