Feeding agents clear data.

The human web is
broken for AI agents._

AgentClearfeed is a parallel content layer built for inference. Clean, structured, verified content served in .acf format - not a scraper on top of the human web, but a native format designed for how AI agents actually consume information.

View on GitHub Read the Spec

The current options

None of them work.

Scraping

Fragile, breaks constantly. Dependent on HTML structure that changes without warning. 93-98% of every scraped page is noise - ads, navigation, cookie banners, tracking scripts - before the agent reaches actual content.

{}

MCPs

Powerful but hand-built. Every publisher needs to implement their own. Doesn't scale to the open web. Great for controlled internal tools - not viable for general-purpose agent content retrieval across the web.

Existing APIs

Designed for developers building human-facing products. Pagination chrome, JSON wrappers, authentication flows, rate limits built around human usage patterns. Not designed for agent consumption at inference time.

Benchmarks - Phases 1-6 + GA

The numbers.

Token Reduction

Phase 1 - single document retrieval. HTML averaged 16,388 tokens. ACF averaged 394.

Token Reduction

Phase 2 - 10-document multi-topic retrieval. 84,022 tokens of HTML vs 5,429 of ACF. Consistent across all 4 models.

Token Reduction

Phase 3 - live dynamic data. 13,287 tokens of bloated crypto tracker vs 173 tokens of ACF action format.

0.42 → 0.42

Accuracy

Multi-doc accuracy averaged across 4 models: Qwen 14B, Claude Haiku, Kimi K2.5, Kimi K2.6. Same content - different format.

Fresher Data

Phase 3 staleness metric. Local model answers with HTML data 113.5 seconds old. ACF: under 10s. Token bloat costs time.

8-11%

Agent Comms Saving

ACF beats JSON in agent-to-agent communication across all models and datasets. Phase 4 complete.

Format Fitness Gain

Phase 5 - genetic algorithm evolved ACF encoding. 682 tokens vs 759 seed. Accuracy 0.78 vs 0.67.

0.67

Evolved-ACF Accuracy

Phase 6 - multi-agent swarm with live Wikipedia. Evolved-ACF 0.67 vs JSON 0.42 on capable models. Format compression has a capability floor.

-31.8%

Champion Token Saving

Phase 5-GA - cross-species evolution found a 3-field champion. 636 tokens vs 932 JSON. Perfect accuracy in A2A — beats ACF and JSON on every metric.

Architecture

How it works.

Human Web

→

bloated HTML

→

Agent

~16,000 tokens 119s latency 0.42 accuracy expensive

ACF Layer

→

clean .acf

→

Agent

~394 tokens 23s latency 0.93 accuracy cheap

Every field is explicit

Structured labels, not prose. An agent always knows exactly what it's reading - no inference, no guessing, no tokens wasted on layout chrome or promotional copy.

Ungameable by design

Structured fields leave no room for SEO manipulation or AEO injection. No narrative spin. No promotional language. The format physically cannot be gamed the way HTML can.

Model-agnostic

Every model tested - 14B local, frontier API, Chinese cloud providers - performs better with ACF. The format wins at every layer of the stack regardless of model capability.

NOTE None of the models tested have ever been trained on ACF or ACF-Champion. All models were trained on JSON and existing web formats. Every result below reflects zero prior exposure to these formats.

Full Results - Phases 1-3

The data.

Metric	HTML	ACF	Improvement
Tokens per queryPhase 1	16,388	394	97.6% reduction
Tokens per 10-doc retrievalPhase 2	84,022	5,429	93.5% reduction
Tokens per live data pagePhase 3	13,287	173	98.7% reduction
Inference latencyPhase 1	119s	23s	5.2× faster
Multi-doc accuracy - avg 4 modelsPhase 2	0.42	0.93	+0.51 absolute
Data staleness - local modelPhase 3	113.5s	9.8s	11.5× fresher

Try it yourself

See what ACF-Champion saves.

Paste any JSON. Nothing leaves your browser.

Your JSON

ACF-Champion

JSON tokens

—

ACF-Champion tokens

—

Token reduction
—

Cost saving / 100K calls

—

Token counts are approximate (length ÷ 4). Cost at Claude Opus 4.6 input rate ($15/1M tokens) per 100K calls. Scale estimates based on public data.

Phase 4 - Agent Communication

Agent communication results.

Qwen 2.5 14B - AI Fairness dataset (deterministic, seed=42)

Format	Total Tokens	Accuracy	Data Loss	Cost/query
ACF	828	0.89	0.11	$0.000249
TOON	860	0.89	0.11	$0.000258
JSON	932	0.89	0.11	$0.000280

Claude Haiku 4.5 - three-way finale

Format	Total Tokens	Accuracy	Data Loss	Cost/query
ACF	828	1.00	0.00	$0.000663
TOON	860	0.89	0.11	$0.000688
JSON	932	0.89	0.11	$0.000746

ACF is the only format achieving perfect accuracy. 8-11% fewer tokens than JSON at every layer of the stack.

Phase 5 - Genetic Algorithm

Format evolution results.

DEAP GA - llama3.1:8b - fitness = 60% accuracy + 40% token efficiency

Schema	Tokens	Accuracy	Fitness
ACF seed	759	0.67	0.941
Evolved	682	0.78	1.073

+13.99% fitness improvement. Converged at generation 7. Positional header, no key names, single-line encoding.

Phase 6 - Multi-Agent Swarm

Swarm results.

Standard run - Kimi K2.5 coordinator (4 queries x 3 Wikipedia sources)

Format	Avg Ctx Tokens	Avg Accuracy	Compounding
Evolved	5,720	0.67	0.989x
ACF	5,783	0.58	0.999x
JSON	5,786	0.42	1.000x

Explicit 3-agent run - 3 parallel Haiku sub-agents + Haiku coordinator

Format	Avg Ctx Tokens	Avg Accuracy	Compounding
Evolved	5,720	0.42	0.989x
ACF	5,783	0.42	0.999x
JSON	5,786	0.50	1.000x

Evolved-ACF's compressed encoding has a capability floor. Larger models benefit from compression; smaller models prefer JSON's explicit structure.

Phase 5-GA - Cross-Species Evolution

The champion.

Two lineages (ACF seed + JSON seed) evolved independently, cross-bred into a hybrid, then combined GA found the global optimum

Stage	Tokens	Accuracy	Fitness
ACF seed	759	0.67	0.941
GA-ACF evolved	682	0.78	1.073
GA-JSON evolved	693	0.78	1.060
Hybrid (cross-bred)	670	0.78	1.082
Champion (combined GA)	636	0.78	1.118

Champion validation - A2A test (Claude Haiku 4.5, Phase 1 dataset)

Format	Total Tokens	Accuracy	Data Loss	Cost/query
Champion	636	1.00	0.00	$0.000509
ACF	828	0.89	0.11	$0.000663
JSON	932	0.89	0.11	$0.000746

+18.77% fitness over ACF seed. Champion: title + author + source only. The body carries the answer — header metadata is noise. -31.8% tokens vs JSON in A2A with higher accuracy than both ACF and JSON.

Phase 7 - Real A2A Protocol

A2A results.

a2a-sdk v1.0.3 - real HTTP transport - Claude Haiku 4.5 - 3 queries x 3 runs averaged

Format	Total Tokens	Accuracy	Data Loss	Cost/query
Champion	636	1.00	0.00	$0.000509
ACF	828	0.96	0.04	$0.000663
JSON	932	0.96	0.04	$0.000746

Real JSON-RPC 2.0 envelopes. Real Agent Cards. Real HTTP. Champion is the only format with zero data loss across all 3 runs. The in-process advantage from Phase 4 holds on the official A2A stack.

Research Progress

The phases.

Complete

Single Document Retrieval

HTML vs ACF on AI fairness content. 97.6% token reduction. 5.2× faster. Zero accuracy loss.

Complete

Multi-Document + Cross-Model

10 Wikipedia topics across 4 models. ACF wins on every model. Agent Swarm can't rescue HTML.

Complete

Live Dynamic Data

Real-time data pipeline with staleness metric. 98.7% reduction. 11.5× fresher answers on local model.

Complete

Agent-to-Agent Communication

ACF vs JSON vs TOON in agent-to-agent comms. 8-11% fewer tokens than JSON. Only format achieving perfect accuracy with Haiku.

Complete

Format Evolution via GA

Genetic algorithm evolved ACF encoding. +13.99% fitness. Positional header, no key names, single-line wins.

Complete

Multi-Agent Swarm

3 parallel fetchers to coordinator with live Wikipedia. Evolved-ACF wins on capable models. Format compression has a capability floor.

Complete

Cross-Species Evolution

ACF + JSON lineages evolved independently, cross-bred, then combined GA found the global optimum. 636 tokens. Perfect A2A accuracy. -31.8% vs JSON.

Complete

A2A Protocol Integration

Champion on real a2a-sdk v1.0.3. JSON-RPC 2.0, HTTP transport, Agent Cards. 1.00 accuracy, zero data loss. -31.8% tokens vs JSON.

The human web isbroken for AI agents._

None of them work.

Scraping

MCPs

Existing APIs

The numbers.

How it works.

The data.

See what ACF-Champion saves.

Agent communication results.

Format evolution results.

Swarm results.

The champion.

A2A results.

The phases.

The human web is
broken for AI agents._