| Benchmark Deep Dive: Telegraph English vs LLMLingua-2 How our symbolic dialect keeps the facts and your budget intact. TL;DR: Telegraph English outperforms LLMLingua-2 across all models and datasets, losing only 1-3pp on headline facts and 3-11pp on fine details—still ahead of LLML2 at every compression level. TE's symbolic structure (→, caps, atomic facts) keeps logical chains and whole facts intact, helping weaker models avoid hallucinations. LLML2's token deletion can break chains, forcing smaller models to "fill in the blanks." The gap widens to up to 11pp on smaller models and detail-heavy tasks. |
Split | # QA pairs | What it tests |
key_facts | 4 081 | key concepts & facts |
fine_facts | 801 | buried qualifiers, numbers, edge cases, relationships |
Aspect | Telegraph English | LLMLingua-2 |
Compression logic | Re-writes text into atomic fact lines; ratio adapts to density (≈ 50 % on benchmark texts) | Classifier deletes low-entropy tokens; fixed ratio per doc (matching 50 % or default 33 %) |
Semantic integrity | Keeps whole meaning units → no dangling refs | Can sever logical links or chop granular details |
Human readability | Learn 40 symbols → fully audit-able | Microsoft: “may be difficult for humans to understand” |
Model / Split | Orig. | TE | LLML2-50 | LLML2-33 |
GPT-4.1 · key | 1.000 | 0.991 (-0.9 pp) | 0.990 | 0.977 |
4o-mini · key | 0.990 | 0.957 (-3.4 pp) | 0.946 | 0.906 |
4.1-nano · key | 0.980 | 0.950 (-3.0 pp) | 0.949 | 0.922 |
GPT-4o · fine | 0.996 | 0.965 (-3.1 pp) | 0.933 | 0.890 |
4o-mini · fine | 0.938 | 0.843 (-9.5 pp) | 0.820 | 0.728 |
Scenario | Uncompressed | LLML2-only | Full-pipeline TE |
Prompt + 5 agent hops (2 000 + 5×400 tokens) | 4 000 t | ~3 300 t | ~1 600 t |
Cost @ $10/M tokens | $40 | $33 | $16 |
| | Ready to Compress Your Token Costs? Join forward‑looking AI teams already boosting their token efficiency. Early‑access perks:
|