DriftOS Embed

Lightweight semantic conversation routing engine. Embedding-based drift detection with sub-200ms latency, zero LLM API costs for routing decisions.

Overview

DriftOS Embed uses local embeddings to detect topic shifts and route messages:

STAY - Same topic, continue in current branch
BRANCH - Topic drift detected, create new branch
ROUTE - Return to a previous topic

Result: Focused context windows with zero LLM costs for routing.

Why Embeddings?

Approach	Latency	Cost	Accuracy
LLM-based routing	500-2000ms	$0.001-0.01/call	High
Embedding-based	<200ms	$0	Good

DriftOS Embed uses paraphrase-MiniLM-L6-v2 for semantic similarity. Fast enough for real-time, accurate enough for production.

How It Works

Embed - Message is embedded using paraphrase-MiniLM-L6-v2
Compare - Cosine similarity against current branch centroid
Decide - Based on thresholds: STAY, BRANCH, or ROUTE
Update - Branch centroid updated with running average

Threshold Logic

similarity > 0.38  → STAY (same topic)
similarity > 0.42  → ROUTE (if matches another branch)
similarity < 0.15  → BRANCH_NEW_CLUSTER (different domain)
else               → BRANCH_SAME_CLUSTER (related subtopic)

Performance

Routing latency: <200ms
Embedding generation: ~30ms
Zero LLM costs for routing decisions
LLM used only for fact extraction (optional)

When to Use

Use DriftOS Embed when:

Latency is critical (<200ms)
You want zero LLM API costs for routing
Your topic shifts are clear/obvious

For higher accuracy with LLM reasoning, see DriftOS Core.

Introduction