Introduction
Lightweight embedding-based conversation routing with sub-200ms latency
DriftOS Embed
Lightweight semantic conversation routing engine. Embedding-based drift detection with sub-200ms latency, zero LLM API costs for routing decisions.
Overview
DriftOS Embed uses local embeddings to detect topic shifts and route messages:
- STAY - Same topic, continue in current branch
- BRANCH - Topic drift detected, create new branch
- ROUTE - Return to a previous topic
Result: Focused context windows with zero LLM costs for routing.
Why Embeddings?
| Approach | Latency | Cost | Accuracy |
|---|---|---|---|
| LLM-based routing | 500-2000ms | $0.001-0.01/call | High |
| Embedding-based | <200ms | $0 | Good |
DriftOS Embed uses paraphrase-MiniLM-L6-v2 for semantic similarity. Fast enough for real-time, accurate enough for production.
How It Works
- Embed - Message is embedded using paraphrase-MiniLM-L6-v2
- Compare - Cosine similarity against current branch centroid
- Decide - Based on thresholds: STAY, BRANCH, or ROUTE
- Update - Branch centroid updated with running average
Threshold Logic
similarity > 0.38 → STAY (same topic)
similarity > 0.42 → ROUTE (if matches another branch)
similarity < 0.15 → BRANCH_NEW_CLUSTER (different domain)
else → BRANCH_SAME_CLUSTER (related subtopic)Performance
- Routing latency: <200ms
- Embedding generation: ~30ms
- Zero LLM costs for routing decisions
- LLM used only for fact extraction (optional)
When to Use
Use DriftOS Embed when:
- Latency is critical (<200ms)
- You want zero LLM API costs for routing
- Your topic shifts are clear/obvious
For higher accuracy with LLM reasoning, see DriftOS Core.
