Benchmark Deep Dive: Telegraph English vs LLMLingua-2

Benchmark Deep Dive: Telegraph English vs LLMLingua-2
How our symbolic dialect keeps the facts and your budget intact.

TL;DR: Telegraph English outperforms LLMLingua-2 across all models and datasets, losing only 1-3pp on headline facts and 3-11pp on fine details—still ahead of LLML2 at every compression level. TE's symbolic structure (→, caps, atomic facts) keeps logical chains and whole facts intact, helping weaker models avoid hallucinations. LLML2's token deletion can break chains, forcing smaller models to "fill in the blanks." The gap widens to up to 11pp on smaller models and detail-heavy tasks.

1 Why benchmarks matter

Every token that survives compression still has to be processed, stored, and billed. If a compressor can’t protect the information that drives answers, the “savings” get repaid later as hallucinations, retries, or human review. So we stress-test Telegraph English (TE) against Microsoft’s LLMLingua-2 (LLML2) on two LongBench splits:

Split	# QA pairs	What it tests
key_facts	4 081	key concepts & facts
fine_facts	801	buried qualifiers, numbers, edge cases, relationships

2 How the two systems compress

Aspect	Telegraph English	LLMLingua-2
Compression logic	Re-writes text into atomic fact lines; ratio adapts to density (≈ 50 % on benchmark texts)	Classifier deletes low-entropy tokens; fixed ratio per doc (matching 50 % or default 33 %)
Semantic integrity	Keeps whole meaning units → no dangling refs	Can sever logical links or chop granular details
Human readability	Learn 40 symbols → fully audit-able	Microsoft: “may be difficult for humans to understand”

3 Results at a glance

Methodology note: To keep the comparison clean, we fed each model the compressed text as-is. The GPT-4 family has never seen TE’s arrow, cap-tag, or line-atom conventions and received no system prompt explaining them. Despite that, TE retained up to 99% of key facts - proof that TE semantic packing is legible even to an “uninitiated” LLM. A TE-aware model is pure upside.

Model / Split	Orig.	TE	LLML2-50	LLML2-33
GPT-4.1 · key	1.000	0.991 (-0.9 pp)	0.990	0.977
4o-mini · key	0.990	0.957 (-3.4 pp)	0.946	0.906
4.1-nano · key	0.980	0.950 (-3.0 pp)	0.949	0.922
GPT-4o · fine	0.996	0.965 (-3.1 pp)	0.933	0.890
4o-mini · fine	0.938	0.843 (-9.5 pp)	0.820	0.728

TL;DR

TE loses 1-3 pp on headline facts, 3-11 pp on fine details—still ahead of LLML2 at every compression level.
The gap widens as models shrink or questions demand nuance; up to 11 pp in favor of TE on fine_facts with edge-class decoders.

4 Why symbolic beats statistical

Density-adaptive cuts – TE automatically eases off when a paragraph is packed with numerics or clauses; LLML2 still chops 50 %.
Explicit structure – arrows (→), caps, and line boundaries keep cause → effect chains intact, giving smaller models the breadcrumbs they need.
No forced hallucination – because full facts stay whole, the downstream LLM isn’t asked to “fill in” missing prepositions or dates.

Scenario	Uncompressed	LLML2-only	Full-pipeline TE
Prompt + 5 agent hops (2 000 + 5×400 tokens)	4 000 t	~3 300 t	~1 600 t
Cost @ $10/M tokens	$40	$33	$16

The harder your workflow leans on multi-step reasoning and long memories, the faster TE’s savings compound.

6 When to reach for each tool

Choose LLML2 if:

You need a quick, single-prompt patch.
Fixed compression ratio matters more than perfect recall.

Choose Telegraph English if:

Your pipeline spans multiple agents or iterations.
Fine-grained facts (legal, biomedical, finance) can’t be sacrificed.
Humans must be able to audit what the model saw.
You want a future-proof format that models will soon speak natively.

7 Roadmap sneak peek

Detail-aware TE – dynamic knob that loosens compression when confidence dips.
Bilingual TE-native models – slash overhead and let agents think in pure TE.
Fact-level pruning API – keep or drop whole assertions with surgical precision.

Ready to see TE slice your token bill without slicing your facts? [Ping us for a POC] we’ll run your own docs through both compressors and let the numbers speak.workloads.d measure the impact for yourself.

Compression is one-time—every retrieval and generation that follows is permanently cheaper.

Ready to Compress Your Token Costs?

Join forward‑looking AI teams
already boosting their token efficiency.

Early‑access perks:

Priority onboarding support
Influence on feature roadmap
Preferred pricing at launch.