AgentClearfeed is a parallel content layer built for inference.
Clean, structured, verified content served in .acf format -
not a scraper on top of the human web, but a native format designed for
how AI agents actually consume information.
Fragile, breaks constantly. Dependent on HTML structure that changes without warning. 93-98% of every scraped page is noise - ads, navigation, cookie banners, tracking scripts - before the agent reaches actual content.
Powerful but hand-built. Every publisher needs to implement their own. Doesn't scale to the open web. Great for controlled internal tools — not viable for general-purpose agent content retrieval across the web.
Designed for developers building human-facing products. Pagination chrome, JSON wrappers, authentication flows, rate limits built around human usage patterns. Not designed for agent consumption at inference time.
Phase 1 — single document retrieval. HTML averaged 16,388 tokens. ACF averaged 394.
Phase 2 — 10-document multi-topic retrieval. 84,022 tokens of HTML vs 5,429 of ACF. Consistent across all 4 models.
Phase 3 — live dynamic data. 13,287 tokens of bloated crypto tracker vs 173 tokens of ACF action format.
Multi-doc accuracy averaged across 4 models: Qwen 14B, Claude Haiku, Kimi K2.5, Kimi K2.6. Same content - different format.
Phase 3 staleness metric. Local model answers with HTML data 113.5 seconds old. ACF: under 10s. Token bloat costs time.
ACF beats JSON in agent-to-agent communication across all models and datasets. Phase 4 complete.
Structured labels, not prose. An agent always knows exactly what it's reading — no inference, no guessing, no tokens wasted on layout chrome or promotional copy.
Structured fields leave no room for SEO manipulation or AEO injection. No narrative spin. No promotional language. The format physically cannot be gamed the way HTML can.
Every model tested — 14B local, frontier API, Chinese cloud providers — performs better with ACF. The format wins at every layer of the stack regardless of model capability.
| Metric | HTML | ACF | Improvement |
|---|---|---|---|
| Tokens per queryPhase 1 | 16,388 | 394 | 97.6% reduction |
| Tokens per 10-doc retrievalPhase 2 | 84,022 | 5,429 | 93.5% reduction |
| Tokens per live data pagePhase 3 | 13,287 | 173 | 98.7% reduction |
| Inference latencyPhase 1 | 119s | 23s | 5.2× faster |
| Multi-doc accuracy — avg 4 modelsPhase 2 | 0.42 | 0.93 | +0.51 absolute |
| Data staleness — local modelPhase 3 | 113.5s | 9.8s | 11.5× fresher |
Qwen 2.5 14B - AI Fairness dataset (deterministic, seed=42)
| Format | Total Tokens | Accuracy | Data Loss | Cost/query |
|---|---|---|---|---|
| ACF | 828 | 0.89 | 0.11 | $0.000249 |
| TOON | 860 | 0.89 | 0.11 | $0.000258 |
| JSON | 932 | 0.89 | 0.11 | $0.000280 |
Claude Haiku 4.5 - three-way finale
| Format | Total Tokens | Accuracy | Data Loss | Cost/query |
|---|---|---|---|---|
| ACF | 828 | 1.00 | 0.00 | $0.000663 |
| TOON | 860 | 0.89 | 0.11 | $0.000688 |
| JSON | 932 | 0.89 | 0.11 | $0.000746 |
ACF is the only format achieving perfect accuracy. 8-11% fewer tokens than JSON at every layer of the stack.
HTML vs ACF on AI fairness content. 97.6% token reduction. 5.2× faster. Zero accuracy loss.
10 Wikipedia topics across 4 models. ACF wins on every model. Agent Swarm can't rescue HTML.
Real-time data pipeline with staleness metric. 98.7% reduction. 11.5× fresher answers on local model.
ACF vs JSON vs TOON in agent-to-agent comms. 8-11% fewer tokens than JSON. Only format achieving perfect accuracy with Haiku.
Discovery layer and tooling for publishers to serve ACF natively at scale.