OpenAI Assistants migration
Move off Assistants before 26 Aug 2026.
The OpenAI Assistants API sunsets on 26 August 2026. Threads, files, and run state stop persisting. Engramia replaces thread persistence with eval-weighted memory and plugs straight into the OpenAI Agents SDK.
The Assistants API is not a small change of behaviour
From 26 Aug 2026, the entire /v1/assistants surface returns 410 Gone. Anything you have wired into Threads, Messages, Runs, or assistant-scoped Files needs a replacement.
Thread persistence stops
Per-user message history is the most common Assistants integration. Engramia replaces it with scope-aware patterns that survive across runs and scale beyond one user.
Files / vector store rebind
Files attached to assistants need re-uploading. Engramia uses pgvector with HNSW index and scope-aware search — re-import once, never rebind.
Vendor lock-in remains
The Assistants successor (Responses API) is still OpenAI-only. Engramia is provider-agnostic — OpenAI, Anthropic, and local embeddings work side-by-side today.
Threads to scopes, messages to patterns
Most Assistants concepts map cleanly onto Engramia primitives. Where they don't, you get something stronger — eval-weighted ranking, RBAC, audit log.
| OpenAI Assistants | Engramia |
|---|---|
| Thread (per-user persistence) | Tenant + project scope (multi-user, RBAC-enforced) |
| Message history (linear, append-only) | Patterns (eval-weighted, decayable, scoped) |
| Files (vector store per assistant) | Embeddings (pgvector, HNSW index, scope-aware search) |
| Run state (transient per request) | Async job queue (Prefer: respond-async, durable status) |
| Tools (assistant-bound) | Skill registry (cross-tenant searchable, evaluated) |
| Single-vendor (OpenAI-only) | Multi-LLM (OpenAI + Anthropic providers, swappable) |
Three lines of glue, not a rewrite
The OpenAI Agents SDK adapter (engramia.sdk.openai_agents) is the recommended path — same gpt-4.1 model, same agent definition, recall and learn happen automatically.
# Before — OpenAI Assistants API (sunset 26 Aug 2026)
from openai import OpenAI
client = OpenAI()
assistant = client.beta.assistants.create(
name="coder",
model="gpt-4.1",
instructions="You are a senior developer.",
)
thread = client.beta.threads.create()
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="Build a CSV parser.",
)
run = client.beta.threads.runs.create_and_poll(
thread_id=thread.id,
assistant_id=assistant.id,
)# After — Engramia + OpenAI Agents SDK (production-ready)
from agents import Agent, Runner
from engramia import Memory
from engramia.sdk.openai_agents import EngramiaRunHooks, engramia_instructions
memory = Memory() # picks up ENGRAMIA_DATABASE_URL + provider config
agent = Agent(
name="coder",
instructions=engramia_instructions(
memory,
base="You are a senior developer.",
),
model="gpt-4.1",
)
hooks = EngramiaRunHooks(memory)
result = await Runner.run(
agent,
"Build a CSV parser.",
hooks=hooks,
)
# Memory captures the run, evaluates the output, and improves recall
# for the next "Build a CSV parser" task automatically.Four steps, dual-write friendly
The path below assumes a production deployment with active Threads. Cut-over is reversible up to step 4 — run both in parallel until you have confidence.
Export Assistants threads
Use the OpenAI API to enumerate threads + messages for the assistant_ids you control. Each Thread becomes a (tenant_id, project_id) scope; each Message becomes a candidate Engramia pattern with the original timestamp.
Bulk-import to Engramia
Group consecutive (user, assistant) messages into pattern pairs and call POST /v1/learn (one per pair) under the right scope. Carry original timestamps via run_id for replay determinism. The backfill script in the example repo does this end to end.
Swap thread.create() for memory.recall()
Replace the Threads-bound run loop with an Engramia recall before each LLM call. The OpenAI Agents SDK adapter (engramia.sdk.openai_agents.EngramiaRunHooks) wires this in three lines and works with the same gpt-4.1 model.
Cut over
Run dual-write for one release cycle (OpenAI Threads + Engramia in parallel) to compare behaviour, then flip the read path to Engramia and stop creating new Threads. Audit log records the cut-over for compliance.