Benchmark Deep Dive: Telegraph English vs LLMLingua-2
How our symbolic dialect keeps the facts and your budget intact.
TL;DR: Telegraph English outperforms LLMLingua-2 across all models and datasets, losing only 1-3pp on headline facts and 3-11pp on fine details—still ahead of LLML2 at every compression level. TE's symbolic structure (→, caps, atomic facts) keeps logical chains and whole facts intact, helping weaker models avoid hallucinations. LLML2's token deletion can break chains, forcing smaller models to "fill in the blanks." The gap widens to up to 11pp on smaller models and detail-heavy tasks.


1 Why benchmarks matter

Every token that survives compression still has to be processed, stored, and billed. If a compressor can’t protect the information that drives answers, the “savings” get repaid later as hallucinations, retries, or human review. So we stress-test Telegraph English (TE) against Microsoft’s LLMLingua-2 (LLML2) on two LongBench splits:

Split

# QA pairs

What it tests

key_facts

4 081

key concepts & facts

fine_facts

801

buried qualifiers, numbers, edge cases, relationships


2 How the two systems compress

Aspect

Telegraph English

LLMLingua-2

Compression logic

Re-writes text into atomic fact lines; ratio adapts to density (≈ 50 % on benchmark texts)

Classifier deletes low-entropy tokens; fixed ratio per doc (matching 50 % or default 33 %)

Semantic integrity

Keeps whole meaning units → no dangling refs

Can sever logical links or chop granular details

Human readability

Learn 40 symbols → fully audit-able

Microsoft: “may be difficult for humans to understand”


3 Results at a glance

Methodology note: To keep the comparison clean, we fed each model the compressed text as-is. The GPT-4 family has never seen TE’s arrow, cap-tag, or line-atom conventions and received no system prompt explaining them. Despite that, TE retained up to 99% of key facts - proof that TE semantic packing is legible even to an “uninitiated” LLM. A TE-aware model is pure upside.

Model / Split

Orig.

TE

LLML2-50

LLML2-33

GPT-4.1 · key

1.000

0.991 (-0.9 pp)

0.990

0.977

4o-mini · key

0.990

0.957 (-3.4 pp)

0.946

0.906

4.1-nano · key

0.980

0.950 (-3.0 pp)

0.949

0.922

GPT-4o · fine

0.996

0.965 (-3.1 pp)

0.933

0.890

4o-mini · fine

0.938

0.843 (-9.5 pp)

0.820

0.728


TL;DR
  • TE loses 1-3 pp on headline facts, 3-11 pp on fine details—still ahead of LLML2 at every compression level.
  • The gap widens as models shrink or questions demand nuance; up to 11 pp in favor of TE on fine_facts with edge-class decoders.

4 Why symbolic beats statistical

  1. Density-adaptive cuts – TE automatically eases off when a paragraph is packed with numerics or clauses; LLML2 still chops 50 %.
  2. Explicit structure – arrows (→), caps, and line boundaries keep cause → effect chains intact, giving smaller models the breadcrumbs they need.
  3. No forced hallucination – because full facts stay whole, the downstream LLM isn’t asked to “fill in” missing prepositions or dates.

Scenario

Uncompressed

LLML2-only

Full-pipeline TE

Prompt + 5 agent hops (2 000 + 5×400 tokens)

4 000 t

~3 300 t

~1 600 t

Cost @ $10/M tokens

$40

$33

$16


The harder your workflow leans on multi-step reasoning and long memories, the faster TE’s savings compound.

6 When to reach for each tool

Choose LLML2 if:
  • You need a quick, single-prompt patch.
  • Fixed compression ratio matters more than perfect recall.
Choose Telegraph English if:
  • Your pipeline spans multiple agents or iterations.
  • Fine-grained facts (legal, biomedical, finance) can’t be sacrificed.
  • Humans must be able to audit what the model saw.
  • You want a future-proof format that models will soon speak natively.

7 Roadmap sneak peek

  • Detail-aware TE – dynamic knob that loosens compression when confidence dips.
  • Bilingual TE-native models – slash overhead and let agents think in pure TE.
  • Fact-level pruning API – keep or drop whole assertions with surgical precision.

Ready to see TE slice your token bill without slicing your facts? [Ping us for a POC] we’ll run your own docs through both compressors and let the numbers speak.workloads.d measure the impact for yourself.
Compression is one-time—every retrieval and generation that follows is permanently cheaper.
Ready to Compress Your Token Costs?
Join forward‑looking AI teams
already boosting their token efficiency.

Early‑access perks:
  • Priority onboarding support
  • Influence on feature roadmap
  • Preferred pricing at launch.
Made on
Tilda