Torqon | Persistent Memory for AI Assistants

Long conversations break language models.

As conversations grow, critical context is lost, token budgets fill with noise, and the model forgets what matters.

Without Torqon

Every session starts with zero context

Repeated prompts inflate token usage

No fact persistence across conversations

Context window fills with redundant history

With Torqon

Atomic facts persisted across all sessions

80% fewer context tokens (tested, 40 prompts)

Semantic retrieval � only relevant facts injected

Stale facts automatically superseded

Three things happen before every response.

Every message triggers a precise pipeline: extract, embed, retrieve. Zero config. Runs before your model sees a single token.

01: EXTRACT

Intent classification & fact isolation

Torqon parses each message to classify intent (store / retrieve / both) and extracts atomic facts: the smallest independently meaningful units of knowledge. "I'm building a Next.js app with pgvector on Railway" yields three distinct facts, not one blob.

async, adds <2ms to your pipeline

02: EMBED

Vector encoding with supersession

Each fact is independently embedded into a 1536-dimensional vector. When a new fact contradicts an old one (e.g. stack change), the previous embedding is marked is_current_fact = false. Stale data never resurfaces in retrieval.

model: text-embedding-3-small

03: INJECT

Cosine retrieval & context assembly

Before your prompt reaches the model, Torqon runs a cosine similarity query against your fact store, deduplicates results, scores by recency and confidence, and assembles the top-k into a structured block injected into your system prompt.

p95 retrieval: <8ms end-to-end

claude_desktop_config.json (3 lines to connect)

// Add Torqon to your Claude Desktop MCP server list
{
  "mcpServers": {
    "torqon": {
      "command": "npx",
      "args": ["-y", "@torqon/mcp@latest"],
      "env": { "TORQON_API_KEY": "tq_live_����������������" }
    }
  }
}

// From that point, Claude stores and retrieves facts automatically.
// No prompting, no wrappers, no SDK to install.

Atomic facts, not conversation blobs.

Torqon doesn't store chat history. It extracts the minimal unit of knowledge that is independently reusable, and indexes each one separately so retrieval stays precise even across thousands of facts.

project_name

"Torqon memory layer"

The canonical name of the active project. Used to namespace all other facts and scope retrieval so facts from different projects never bleed into each other.

tech_stack

["Next.js 14", "pgvector", "Railway"]

Technologies, frameworks, databases in use. Stored as individual embeddings per technology so changing one item supersedes only that item, not the whole array.

goal

"Ship v1 persistent memory API by Q1 2026"

Active objectives and milestones. When a goal is completed or changed, Torqon marks the old fact superseded and stores the new state with a higher confidence score.

preference

"No mock databases in tests. Always real DB."

Working constraints and style rules. Preferences have elevated retrieval priority because violating them wastes entire sessions and erodes trust in the AI assistant.

decision

"Atomic extraction over conversation summaries"

Architectural decisions with their rationale. Prevents relitigating solved problems. Stored with a finality score that influences how aggressively Torqon will counter contradicting instructions.

context

"User is senior backend engineer, new to React"

Background knowledge that shapes how responses should be framed. Influences assumed expertise level, explanation depth, and code verbosity in every reply, across all sessions.

Only what's relevant gets injected.

A single pgvector cosine query fetches semantically similar facts in under 8ms. The result is scored, ranked, and trimmed to your token budget before it touches the system prompt.

retrieval.sql (the actual query)

SELECT id, content, fact_type,
       1 - (embedding <=> $1) AS similarity
FROM  facts
WHERE user_id = $2
  AND  is_current_fact = true
  AND  1 - (embedding <=> $1) > 0.72
ORDER BY similarity DESC
LIMIT 12;

Threshold

0.72

Cosine similarity floor. Facts below this threshold are never surfaced, no matter how many relevant facts exist.

is_current_fact

Boolean guard

Superseded facts are soft-deleted via this flag, not physically removed, so you get a full audit trail and can replay any historical state.

Token budget

Auto-trim

Results are trimmed from the bottom up to fit within your configured token budget before assembly. Higher-similarity facts always survive the cut.

Built for developers who take context seriously

Torqon is a memory infrastructure layer for LLM applications. We give AI assistants persistent, retrievable memory so they can hold context across long conversations without burning your token budget.

We're a small, focused team. We're building the infrastructure layer that makes AI assistants actually useful for day-to-day development work.

Get in touch

<8ms

p95 retrieval latency

80%

token reduction (tested)

99.9%

uptime SLA on paid plans

MCP

native protocol � 3 lines of config

Build with Torqon

Give your AI assistant persistent memory in minutes. Free to start. No credit card required.

Start Free, No Card Needed Read the Docs

Connect via MCP and start remembering. No credit card required.