Compress prompts by 53%, cut latency 62%
Dual-layer adaptive compression that optimizes both input and output tokens. Dictionary aliases + semantic pruning + output guard.
5-Stage Compression Pipeline
Each stage targets a different source of redundancy
Content Classifier
Auto-detect content type and route to specialized strategy
json_api, code, prose, chat, mixed
Dictionary Compression
Extract repeating substrings and assign §XX / @XX aliases
Input + output bidirectional
Agent-Aware Distillation
Token-level keep/drop classifier trained on 105K agent samples
4–12× faster than LLMLingua-2
Output Shaping
Concise injection, dynamic max_tokens, alias restoration
Real-time streaming decompression
Adaptive Control
Closed-loop rate adjustment based on content density
Smart rate for dense content
What Makes It Different
Novel contributions beyond existing compression methods
Dual-Layer Compression
Combines structural dictionary compression with semantic token pruning for deeper reduction than either method alone.
Output Token Optimization
First system to compress output tokens via dictionary aliases. Output guard ensures expansions don't exceed direct output length.
Multilingual Support
Native Chinese, English, and mixed-language handling with CJK-aware tokenization and language-specific rate adjustment.
Adaptive Rate Control
Content density detection automatically adjusts compression rate. Dense structured content gets gentler compression to preserve information.
Compared to Existing Methods
| Feature | OpenCompact | LLMLingua-2 | Selective Ctx |
|---|---|---|---|
| Input token compression | |||
| Output token optimization | |||
| Dictionary alias compression | |||
| Agent-aware distilled pruning | |||
| Content-type routing | |||
| Adaptive rate control | |||
| Multilingual (CJK + Latin) | |||
| Streaming decompression | |||
| Quality evaluation |
Try it yourself
Paste your prompt, pick a compression rate, and see the A/B comparison in real time.