DriftOSDriftOS

Introduction

Lightweight embedding-based conversation routing with sub-200ms latency

DriftOS Embed

Lightweight semantic conversation routing engine. Embedding-based drift detection with sub-200ms latency, zero LLM API costs for routing decisions.

Overview

DriftOS Embed uses local embeddings to detect topic shifts and route messages:

  • STAY - Same topic, continue in current branch
  • BRANCH - Topic drift detected, create new branch
  • ROUTE - Return to a previous topic

Result: Focused context windows with zero LLM costs for routing.

Why Embeddings?

ApproachLatencyCostAccuracy
LLM-based routing500-2000ms$0.001-0.01/callHigh
Embedding-based<200ms$0Good

DriftOS Embed uses paraphrase-MiniLM-L6-v2 for semantic similarity. Fast enough for real-time, accurate enough for production.

How It Works

  1. Embed - Message is embedded using paraphrase-MiniLM-L6-v2
  2. Compare - Cosine similarity against current branch centroid
  3. Decide - Based on thresholds: STAY, BRANCH, or ROUTE
  4. Update - Branch centroid updated with running average

Threshold Logic

similarity > 0.38  → STAY (same topic)
similarity > 0.42  → ROUTE (if matches another branch)
similarity < 0.15  → BRANCH_NEW_CLUSTER (different domain)
else               → BRANCH_SAME_CLUSTER (related subtopic)

Performance

  • Routing latency: <200ms
  • Embedding generation: ~30ms
  • Zero LLM costs for routing decisions
  • LLM used only for fact extraction (optional)

When to Use

Use DriftOS Embed when:

  • Latency is critical (<200ms)
  • You want zero LLM API costs for routing
  • Your topic shifts are clear/obvious

For higher accuracy with LLM reasoning, see DriftOS Core.

On this page