Real-Time Data Sync: CRDTs, OT, and What Actually Works
Google Docs uses Operational Transform. Figma and Notion use CRDTs. The difference is not academic — it determines what is possible when multiple users edit simultaneously, including when those users are AI agents.

Real-time collaborative editing is one of those problems that looks straightforward until you start implementing it. Two users editing the same document simultaneously. One inserts text at position 10. The other deletes characters at position 8. By the time the first user's operation reaches the second user's client, the position numbers are wrong. Welcome to the concurrent editing problem.
Two families of algorithms solve this: Operational Transform (OT) and Conflict-free Replicated Data Types (CRDTs). They make different tradeoffs. Understanding those tradeoffs is the difference between choosing the right architecture and spending six months debugging edge cases that only appear in production.
Operational Transform: The Original Solution
Google Docs uses OT. The algorithm works by transforming operations against concurrent operations before applying them — if you insert at position 10 and I delete at position 8 first, your insertion needs to be transformed to account for the fact that positions have shifted. The server acts as the arbiter of operation order.
OT works well in the scenario it was designed for: a central server coordinating a small number of clients editing a linear document. The challenges emerge at the edges. The transformation functions become exponentially complex as operation types multiply. Multi-server deployments are hard because OT requires a total ordering of operations, and achieving that ordering across servers requires coordination that undermines scalability.
The original Google Wave project (the technical precursor to Google Docs collaboration) had 100,000 lines of code dedicated to OT. That is the maintenance surface you are signing up for if you implement OT from scratch. The pragmatic option is to use an existing OT library and accept its limitations.
CRDTs: The Distributed-First Alternative
A CRDT (Conflict-free Replicated Data Type) is a data structure with merge semantics baked in. Two replicas of a CRDT can be modified independently and then merged automatically, with the merge guaranteed to produce a consistent result regardless of the order operations are applied. No server coordination required. Offline-first as a consequence of the architecture, not as a feature to retrofit.
Figma uses CRDTs for its multiplayer layer. Notion uses CRDTs. The trade-off versus OT: CRDTs are heavier in memory (they need to track operation history or unique identifiers per character), and some CRDT merge semantics produce results that are technically consistent but semantically surprising to users.
“CRDTs are not "better" than OT — they are distributed-first. If your architecture is centralized, OT may be simpler. If you need offline-first or peer-to-peer, CRDTs are the only viable path.”
The canonical CRDT paper by Shapiro et al. describes two main families: state-based CRDTs (CvRDTs) that merge entire states, and operation-based CRDTs (CmRDTs) that broadcast operations. Modern collaborative editing libraries use a variant called a Sequence CRDT or LSEQ that assigns unique, immutable identifiers to each character rather than relying on mutable position indices.
Yjs and Automerge: The Production Libraries
Yjs is the dominant CRDT library for the web. It implements a novel algorithm called YATA (Yet Another Transformation Approach) that is significantly more memory-efficient than earlier CRDT designs. Yjs can handle documents with hundreds of thousands of operations in a few megabytes of memory. The ecosystem is extensive: y-websocket for the network layer, y-indexeddb for persistence, bindings for ProseMirror, Quill, CodeMirror, and Monaco.
Automerge (from Ink & Switch) takes a different approach: it is designed to be a general-purpose collaborative data structure, not just a text editor primitive. Any JSON-like document can be an Automerge document. The trade-off is that Automerge documents are heavier than Yjs documents for pure text use cases, but they are more flexible for applications like shared spreadsheets or structured data editors.
| Library | Algorithm | Memory efficiency | Best for | Ecosystem |
|---|---|---|---|---|
| Yjs | YATA | Excellent | Text editors, code editors | Large (ProseMirror, CM6, Quill) |
| Automerge | RGA + JSON | Good | Structured data, JSON documents | Growing (Automerge-repo) |
| ShareDB | OT (json0) | Moderate | JSON documents, centralized | Mature, smaller |
Managed Real-Time: Liveblocks and PartyKit
Building the real-time infrastructure yourself — WebSocket servers, presence tracking, conflict resolution, persistence, scaling — is weeks of work that is tangential to your actual product. Liveblocks and PartyKit are the two managed options worth considering if you want to skip that investment.
Liveblocks is opinionated: it provides a complete presence, storage, and conflict-resolution layer built on CRDTs. It has first-class bindings for Yjs, so you get the Yjs ecosystem and the managed infrastructure together. Pricing is per monthly active user, which makes it economical at small scale and expensive at large scale.
PartyKit is more general-purpose — it is effectively managed Cloudflare Durable Objects with a developer-friendly API. You bring your own conflict resolution (typically Yjs). The advantage is that PartyKit runs on Cloudflare's edge, so WebSocket connections can be initiated from the nearest edge node, reducing latency for globally distributed users.
The Offline-First Renaissance
Offline-first is not a new concept — it has been discussed since the early 2010s. What is new is the infrastructure maturity to actually build it without heroic engineering effort. Yjs with y-indexeddb, Electric SQL, PowerSync, and Replicache have made offline-first a realistic architectural choice for teams without dedicated infrastructure engineers.
The appeal goes beyond disconnected usage. Offline-first architectures mean every write is instantaneous from the user's perspective — the local replica updates immediately, and sync happens in the background. This eliminates the round-trip latency that makes traditional request-response apps feel sluggish. For latency-sensitive interactions (drawing, coding, writing), this matters enormously.
- Users in unstable network environments (field workers, mobile-heavy audiences)
- Latency-sensitive interactions where round-trip to server is noticeable
- Products where data loss on disconnect is a competitive differentiator
- Multi-player features that require peer-to-peer (no central server model)
AI Agents as Collaborative Editors
The newest challenge for real-time sync systems: multiple AI agents editing the same document simultaneously. This is not a hypothetical. Teams are already building coding assistants where one agent writes tests, another writes implementation, a third reviews — all in the same codebase, at the same time.
CRDTs handle this elegantly by design. An AI agent is just another client in the CRDT model. It reads the current document state, computes edits, applies them to its local replica, and syncs. The merge semantics handle conflicts the same way they handle conflicts between human editors.
OT-based systems struggle here because they depend on a server to impose ordering on concurrent operations. With AI agents that generate edits at near-token-generation speed, the ordering bottleneck becomes significant. At high agent concurrency, the server becomes a throughput ceiling.
CRDT Types: When to Use Each
CRDTs (Conflict-free Replicated Data Types) are a family of data structures designed to allow concurrent modifications across distributed nodes without coordination. They do not eliminate conflicts — they define merge functions that make conflicts deterministic and commutative. Understanding which CRDT to use requires understanding the invariants you need to preserve.
G-Counter (Grow-only Counter): each node maintains its own counter shard. The merged value is the sum of all shards. Supports only increment, never decrement. Use case: view counts, like counts, analytics events — anything where you only ever add. The limitation is the name: if you need decrement, use PN-Counter (two G-Counters: one for increments, one for decrements; value = positive_sum - negative_sum).
LWW-Register (Last-Write-Wins Register): the value with the highest timestamp wins on merge. Simple and widely implemented, but requires a reliable timestamp. Physical clocks are not reliable in distributed systems (clock skew, leap seconds, VM migrations). Use hybrid logical clocks (HLC) — a combination of physical time and a logical counter — to get a causal timestamp that is both human-readable and strictly ordered. LWW-Register is correct for use cases where the last write semantically wins (user profile updates, settings changes).
OR-Set (Observed-Remove Set): allows add and remove operations on a set. Each element is tagged with a unique ID when added. Removes only delete the specific tagged instance seen at remove time — not future re-additions. This solves the "add wins vs remove wins" ambiguity: if two users concurrently add and remove the same element, both operations are respected (the re-add survives). Use case: collaborative document tags, shared shopping carts, todo lists.
RGA (Replicated Growable Array): the structure behind collaborative text editing. Each character is assigned a unique position identifier that sorts relative to its neighbours, enabling interleaving of concurrent inserts. This is the CRDT at the core of both Yjs (via Y.Text) and Automerge (via their internal sequence type).
| CRDT type | Supports | Merge semantics | Use case |
|---|---|---|---|
| G-Counter | Increment only | Sum of all node counters | View counts, analytics, likes |
| PN-Counter | Increment + decrement | Sum(increments) - Sum(decrements) | Shopping cart quantities, inventory |
| LWW-Register | Set value | Highest timestamp wins | User profiles, settings, last-modified fields |
| OR-Set | Add + remove | Add wins over concurrent remove | Tags, to-do lists, shared collections |
| RGA / Y.Array / Y.Text | Insert + delete at positions | Causal ordering of operations | Collaborative text editing, ordered lists |
| LWW-Map | Set/remove key-value pairs | LWW per key | Collaborative property editors, shared state maps |
Yjs vs Automerge: A Practical Comparison
Both Yjs and Automerge implement CRDTs in JavaScript, but they make different architectural choices that lead to different performance profiles and ecosystem integrations. For teams building collaborative features, the choice matters — switching later is a data migration problem. Both integrate well with event-driven architectures for distributing updates.
| Dimension | Yjs | Automerge 2.x |
|---|---|---|
| Encoding | Custom binary (highly optimised) | Binary (automerge-repo format) |
| Document size at scale | Compresses well; handles 100K+ ops efficiently | Larger documents at scale; 2.x improved significantly |
| Awareness/presence | Built-in (Y.Awareness protocol) | Requires separate layer |
| Rich text support | Excellent (ProseMirror, TipTap, CodeMirror bindings) | Good (prosemirror-automerge available) |
| History/undo | Manual via UndoManager | Built-in (change history preserved) |
| Ecosystem | Liveblocks, PartyKit, Hocuspocus, Tiptap | automerge-repo, sync server |
| Persistence | Requires external (IndexedDB via y-indexeddb) | automerge-repo handles persistence |
| Best for | Editor-heavy applications, large document scale | General-purpose state sync, branching workflows |
Yjs has the stronger editor ecosystem: TipTap, ProseMirror, CodeMirror, Monaco, Quill, Slate — all have official Yjs bindings. If your use case involves rich text editing or code editing, Yjs is the clear choice. Automerge 2.x (released 2022) dramatically improved performance and introduced the automerge-repo abstraction for storage and networking. Its preserved change history enables Git-like branching workflows — useful for design tools or document approval workflows where you need to propose changes without immediately applying them.
WebSocket vs SSE vs WebTransport
| Protocol | Direction | Multiplexing | Infrastructure support | Best for |
|---|---|---|---|---|
| WebSocket | Full duplex | Single stream per connection | Wide — most proxies support it | Chat, collaborative editing, live cursors |
| Server-Sent Events (SSE) | Server → client only | Single stream | Works through HTTP/1.1 proxies | Live feeds, notifications, streaming AI responses |
| WebTransport | Full duplex | Multiple independent streams/datagrams | Limited — requires HTTP/3 (QUIC) | Gaming, video, low-latency streaming |
| Long polling | Request/response loop | One response per request | Universal | Fallback for restricted environments |
WebSocket is the pragmatic default for two-way real-time communication. The protocol is mature, well-supported across browsers, load balancers, and CDNs, and has robust client libraries. The limitation: WebSocket connections are opaque TCP streams — many enterprise proxies and firewalls inspect and occasionally drop them, particularly on port 80/443 with non-standard upgrade paths. Always serve WebSockets over TLS (wss://) and port 443; most proxy issues disappear.
SSE is underrated for server-to-client use cases. It is HTTP, it survives proxy inspection, it reconnects automatically, and it is simpler to implement than WebSocket when you do not need client-to-server streaming. Modern AI streaming responses (OpenAI, Anthropic) use SSE because it works through all standard HTTP infrastructure without special proxy configuration.
WebTransport is the emerging standard built on HTTP/3 (QUIC). It provides both reliable ordered streams (like WebSocket) and unreliable datagrams (like UDP) over the same connection. The use case is latency-critical applications where dropped packets are preferable to delayed packets — gaming, live video, real-time sensor data. Browser support is good as of Chrome 97+ and Firefox 114+, but server-side library maturity lags WebSocket significantly.
When OT is Better Than CRDTs
Operational Transformation (OT) preceded CRDTs and remains the implementation choice in some major collaborative editing systems (Google Docs uses an OT-based approach). OT transforms operations against concurrent operations to maintain document consistency. It requires a central server to act as the arbiter of operation ordering — a limitation that CRDTs eliminate by design. For fully peer-to-peer or offline-first collaboration, CRDTs are strictly better. But for systems where a central server is acceptable and always available, OT has one advantage: the transformation functions are easier to implement correctly for complex rich text semantics than an equivalent CRDT. The reason major collaborative editors built on OT before CRDTs matured and have not migrated: the migration cost is a full data format change. If you are building new, use CRDTs. If you are maintaining an OT-based system that works, evaluate migration cost carefully against concrete benefits. For architectural guidance on when to introduce real-time sync into an existing system, see our analysis of event-driven patterns at scale.
“CRDTs do not eliminate the need for conflict resolution — they make conflict resolution deterministic. That is a significant engineering win, but it requires choosing the right CRDT type for your data model.”
Infrastructure Costs and Scaling
Real-time sync infrastructure costs scale with concurrent connections, message frequency, and state size. A collaborative document editor with 10 concurrent users generates modest traffic. A multiplayer game with 1,000 concurrent users generates substantial traffic. The infrastructure choice should match: Liveblocks and Ably handle the connection management and scaling for you (at $0.01-0.05 per monthly active user), while self-hosted Yjs on a WebSocket server gives you full control but requires you to manage horizontal scaling, connection routing, and state persistence.
For self-hosted deployments, the scaling bottleneck is usually the WebSocket server, not the CRDT logic. A single Node.js process can handle approximately 10,000-50,000 concurrent WebSocket connections depending on message frequency. Beyond that, you need horizontal scaling with sticky sessions (so all users editing the same document connect to the same server) or a pub/sub layer (Redis, NATS) to broadcast updates across server instances. The Hocuspocus server (built for Yjs) handles this architecture out of the box with Redis-based multi-server support.
Security Considerations for Real-Time Sync
Real-time sync introduces security challenges that do not exist in traditional request-response architectures. When users have persistent WebSocket connections, you need to handle: authentication token expiry during long-lived connections (a WebSocket can stay open for hours while the auth token expires after 15 minutes), authorisation changes during a session (a user's permissions are revoked while they are still connected and receiving updates), and data isolation between concurrent editing sessions (ensuring that users can only see and modify documents they have access to).
The standard pattern: authenticate during the WebSocket handshake using a short-lived token, then periodically verify the token during the connection lifetime (every 5-10 minutes via a heartbeat check). When the token fails verification, close the WebSocket with a 4001 custom close code that the client interprets as "re-authenticate." For authorisation, each broadcast message is filtered through the user's current permissions before being sent — this is the server-side equivalent of row-level security in a database.
For applications handling sensitive data, end-to-end encryption of CRDT operations adds a layer of protection. The Yjs ecosystem supports this through custom encoding providers that encrypt operations before they leave the client and decrypt them on arrival. The server sees only encrypted payloads and cannot read the document content — it functions as a relay, not a processor. This architecture satisfies data residency and privacy requirements while maintaining real-time collaboration capability.
Need this kind of thinking applied to your product?
We build AI agents, full-stack platforms, and engineering systems. Same depth, applied to your problem.
Enjoyed this? Get the weekly digest.
Research highlights and AI news, delivered every Thursday. No spam.