What are CRDTs and when do you need them for real-time sync?

CRDTs (Conflict-free Replicated Data Types) are data structures designed to merge concurrent edits from multiple clients without conflicts. You need CRDTs when: multiple users can edit the same data simultaneously without a coordination lock (collaborative text editing, shared lists), network partitions must not block local writes, and offline-first behavior is required.

When should I use OT vs CRDTs for real-time collaborative editing?

Operational Transformation (OT) requires a central server to serialize operations and is appropriate when you have a single authoritative server and need precise cursor position preservation (Google Docs model). CRDTs are appropriate for peer-to-peer or offline-first systems where a central server is not always available. CRDTs are simpler to implement correctly; OT is harder to get right but has a longer track record in production.

What are the most common real-time data sync mistakes in production?

Real-time sync mistakes: optimistic updates that do not handle conflict resolution, WebSocket connections without reconnection logic, missing operational transformation for concurrent edits in collaborative features, sync state that diverges when a client is offline for long periods, and no conflict resolution UI when automatic merging fails.

How does real-time sync affect database design?

Real-time sync requires event-sourced or append-only data models rather than last-write-wins updates. You need: event log storage for replaying state, vector clocks or logical timestamps for ordering events, tombstone records for deleted items, and a conflict resolution strategy. Traditional CRUD schemas require significant redesign to support real-time collaborative sync correctly.

Is WebSocket still the right choice for real-time sync in 2026?

WebSocket remains the standard for bidirectional real-time communication. Server-Sent Events are appropriate for one-way server-to-client push. For collaborative sync specifically, frameworks like PartyKit, Liveblocks, and Yjs (CRDT library) abstract over the transport and are worth evaluating before building from scratch. Building raw WebSocket sync in 2026 is only justified when existing solutions do not fit your topology.

Fordel Studios

Real-Time Data Sync: CRDTs, OT, and What Actually Works

Google Docs uses Operational Transform. Figma and Notion use CRDTs. The difference is not academic — it determines what is possible when multiple users edit simultaneously, including when those users are AI agents.

Abhishek Sharma· Founder, Fordel Studios

March 28, 2026Updated May 8, 202615 min read min read

Real-Time Data Sync: CRDTs, OT, and What Actually Works

Real-time collaborative editing is one of those problems that looks straightforward until you start implementing it. Two users editing the same document simultaneously. One inserts text at position 10. The other deletes characters at position 8. By the time the first user's operation reaches the second user's client, the position numbers are wrong. Welcome to the concurrent editing problem.

Two families of algorithms solve this: Operational Transform (OT) and Conflict-free Replicated Data Types (CRDTs). They make different tradeoffs. Understanding those tradeoffs is the difference between choosing the right architecture and spending six months debugging edge cases that only appear in production.

···

Operational Transform: The Original Solution

Google Docs uses OT. The algorithm works by transforming operations against concurrent operations before applying them — if you insert at position 10 and I delete at position 8 first, your insertion needs to be transformed to account for the fact that positions have shifted. The server acts as the arbiter of operation order.

OT works well in the scenario it was designed for: a central server coordinating a small number of clients editing a linear document. The challenges emerge at the edges. The transformation functions become exponentially complex as operation types multiply. Multi-server deployments are hard because OT requires a total ordering of operations, and achieving that ordering across servers requires coordination that undermines scalability.

The original Google Wave project (the technical precursor to Google Docs collaboration) had 100,000 lines of code dedicated to OT. That is the maintenance surface you are signing up for if you implement OT from scratch. The pragmatic option is to use an existing OT library and accept its limitations.

100,000lines of OT code in Google WaveThe engineering cost of a full production OT implementation from scratch

···

CRDTs: The Distributed-First Alternative

A CRDT (Conflict-free Replicated Data Type) is a data structure with merge semantics baked in. Two replicas of a CRDT can be modified independently and then merged automatically, with the merge guaranteed to produce a consistent result regardless of the order operations are applied. No server coordination required. Offline-first as a consequence of the architecture, not as a feature to retrofit.

Figma uses CRDTs for its multiplayer layer. Notion uses CRDTs. The trade-off versus OT: CRDTs are heavier in memory (they need to track operation history or unique identifiers per character), and some CRDT merge semantics produce results that are technically consistent but semantically surprising to users.

“CRDTs are not "better" than OT — they are distributed-first. If your architecture is centralized, OT may be simpler. If you need offline-first or peer-to-peer, CRDTs are the only viable path.”

The canonical CRDT paper by Shapiro et al. describes two main families: state-based CRDTs (CvRDTs) that merge entire states, and operation-based CRDTs (CmRDTs) that broadcast operations. Modern collaborative editing libraries use a variant called a Sequence CRDT or LSEQ that assigns unique, immutable identifiers to each character rather than relying on mutable position indices.

···

Yjs and Automerge: The Production Libraries

Yjs is the dominant CRDT library for the web. It implements a novel algorithm called YATA (Yet Another Transformation Approach) that is significantly more memory-efficient than earlier CRDT designs. Yjs can handle documents with hundreds of thousands of operations in a few megabytes of memory. The ecosystem is extensive: y-websocket for the network layer, y-indexeddb for persistence, bindings for ProseMirror, Quill, CodeMirror, and Monaco.

Automerge (from Ink & Switch) takes a different approach: it is designed to be a general-purpose collaborative data structure, not just a text editor primitive. Any JSON-like document can be an Automerge document. The trade-off is that Automerge documents are heavier than Yjs documents for pure text use cases, but they are more flexible for applications like shared spreadsheets or structured data editors.

5000xperformance improvement in Yjs vs earlier CRDT implementationsMeasured on large document benchmarks comparing Yjs to earlier academic CRDT designs

Library	Algorithm	Memory efficiency	Best for	Ecosystem
Yjs	YATA	Excellent	Text editors, code editors	Large (ProseMirror, CM6, Quill)
Automerge	RGA + JSON	Good	Structured data, JSON documents	Growing (Automerge-repo)
ShareDB	OT (json0)	Moderate	JSON documents, centralized	Mature, smaller

···

Managed Real-Time: Liveblocks and PartyKit

Building the real-time infrastructure yourself — WebSocket servers, presence tracking, conflict resolution, persistence, scaling — is weeks of work that is tangential to your actual product. Liveblocks and PartyKit are the two managed options worth considering if you want to skip that investment.

Liveblocks is opinionated: it provides a complete presence, storage, and conflict-resolution layer built on CRDTs. It has first-class bindings for Yjs, so you get the Yjs ecosystem and the managed infrastructure together. Pricing is per monthly active user, which makes it economical at small scale and expensive at large scale.

PartyKit is more general-purpose — it is effectively managed Cloudflare Durable Objects with a developer-friendly API. You bring your own conflict resolution (typically Yjs). The advantage is that PartyKit runs on Cloudflare's edge, so WebSocket connections can be initiated from the nearest edge node, reducing latency for globally distributed users.

···

The Offline-First Renaissance

Offline-first is not a new concept — it has been discussed since the early 2010s. What is new is the infrastructure maturity to actually build it without heroic engineering effort. Yjs with y-indexeddb, Electric SQL, PowerSync, and Replicache have made offline-first a realistic architectural choice for teams without dedicated infrastructure engineers.

The appeal goes beyond disconnected usage. Offline-first architectures mean every write is instantaneous from the user's perspective — the local replica updates immediately, and sync happens in the background. This eliminates the round-trip latency that makes traditional request-response apps feel sluggish. For latency-sensitive interactions (drawing, coding, writing), this matters enormously.

When offline-first is worth the complexity

Users in unstable network environments (field workers, mobile-heavy audiences)
Latency-sensitive interactions where round-trip to server is noticeable
Products where data loss on disconnect is a competitive differentiator
Multi-player features that require peer-to-peer (no central server model)

···

AI Agents as Collaborative Editors

The newest challenge for real-time sync systems: multiple AI agents editing the same document simultaneously. This is not a hypothetical. Teams are already building coding assistants where one agent writes tests, another writes implementation, a third reviews — all in the same codebase, at the same time.

CRDTs handle this elegantly by design. An AI agent is just another client in the CRDT model. It reads the current document state, computes edits, applies them to its local replica, and syncs. The merge semantics handle conflicts the same way they handle conflicts between human editors.

OT-based systems struggle here because they depend on a server to impose ordering on concurrent operations. With AI agents that generate edits at near-token-generation speed, the ordering bottleneck becomes significant. At high agent concurrency, the server becomes a throughput ceiling.

···

CRDT Types: When to Use Each

CRDTs (Conflict-free Replicated Data Types) are a family of data structures designed to allow concurrent modifications across distributed nodes without coordination. They do not eliminate conflicts — they define merge functions that make conflicts deterministic and commutative. Understanding which CRDT to use requires understanding the invariants you need to preserve.

G-Counter (Grow-only Counter): each node maintains its own counter shard. The merged value is the sum of all shards. Supports only increment, never decrement. Use case: view counts, like counts, analytics events — anything where you only ever add. The limitation is the name: if you need decrement, use PN-Counter (two G-Counters: one for increments, one for decrements; value = positive_sum - negative_sum).

LWW-Register (Last-Write-Wins Register): the value with the highest timestamp wins on merge. Simple and widely implemented, but requires a reliable timestamp. Physical clocks are not reliable in distributed systems (clock skew, leap seconds, VM migrations). Use hybrid logical clocks (HLC) — a combination of physical time and a logical counter — to get a causal timestamp that is both human-readable and strictly ordered. LWW-Register is correct for use cases where the last write semantically wins (user profile updates, settings changes).

OR-Set (Observed-Remove Set): allows add and remove operations on a set. Each element is tagged with a unique ID when added. Removes only delete the specific tagged instance seen at remove time — not future re-additions. This solves the "add wins vs remove wins" ambiguity: if two users concurrently add and remove the same element, both operations are respected (the re-add survives). Use case: collaborative document tags, shared shopping carts, todo lists.

RGA (Replicated Growable Array): the structure behind collaborative text editing. Each character is assigned a unique position identifier that sorts relative to its neighbours, enabling interleaving of concurrent inserts. This is the CRDT at the core of both Yjs (via Y.Text) and Automerge (via their internal sequence type).

CRDT type	Supports	Merge semantics	Use case
G-Counter	Increment only	Sum of all node counters	View counts, analytics, likes
PN-Counter	Increment + decrement	Sum(increments) - Sum(decrements)	Shopping cart quantities, inventory
LWW-Register	Set value	Highest timestamp wins	User profiles, settings, last-modified fields
OR-Set	Add + remove	Add wins over concurrent remove	Tags, to-do lists, shared collections
RGA / Y.Array / Y.Text	Insert + delete at positions	Causal ordering of operations	Collaborative text editing, ordered lists
LWW-Map	Set/remove key-value pairs	LWW per key	Collaborative property editors, shared state maps

···

Yjs vs Automerge: A Practical Comparison

Both Yjs and Automerge implement CRDTs in JavaScript, but they make different architectural choices that lead to different performance profiles and ecosystem integrations. For teams building collaborative features, the choice matters — switching later is a data migration problem. Both integrate well with event-driven architectures for distributing updates.

Dimension	Yjs	Automerge 2.x
Encoding	Custom binary (highly optimised)	Binary (automerge-repo format)
Document size at scale	Compresses well; handles 100K+ ops efficiently	Larger documents at scale; 2.x improved significantly
Awareness/presence	Built-in (Y.Awareness protocol)	Requires separate layer
Rich text support	Excellent (ProseMirror, TipTap, CodeMirror bindings)	Good (prosemirror-automerge available)
History/undo	Manual via UndoManager	Built-in (change history preserved)
Ecosystem	Liveblocks, PartyKit, Hocuspocus, Tiptap	automerge-repo, sync server
Persistence	Requires external (IndexedDB via y-indexeddb)	automerge-repo handles persistence
Best for	Editor-heavy applications, large document scale	General-purpose state sync, branching workflows

Yjs has the stronger editor ecosystem: TipTap, ProseMirror, CodeMirror, Monaco, Quill, Slate — all have official Yjs bindings. If your use case involves rich text editing or code editing, Yjs is the clear choice. Automerge 2.x (released 2022) dramatically improved performance and introduced the automerge-repo abstraction for storage and networking. Its preserved change history enables Git-like branching workflows — useful for design tools or document approval workflows where you need to propose changes without immediately applying them.

···

WebSocket vs SSE vs WebTransport

Protocol	Direction	Multiplexing	Infrastructure support	Best for
WebSocket	Full duplex	Single stream per connection	Wide — most proxies support it	Chat, collaborative editing, live cursors
Server-Sent Events (SSE)	Server → client only	Single stream	Works through HTTP/1.1 proxies	Live feeds, notifications, streaming AI responses
WebTransport	Full duplex	Multiple independent streams/datagrams	Limited — requires HTTP/3 (QUIC)	Gaming, video, low-latency streaming
Long polling	Request/response loop	One response per request	Universal	Fallback for restricted environments

WebSocket is the pragmatic default for two-way real-time communication. The protocol is mature, well-supported across browsers, load balancers, and CDNs, and has robust client libraries. The limitation: WebSocket connections are opaque TCP streams — many enterprise proxies and firewalls inspect and occasionally drop them, particularly on port 80/443 with non-standard upgrade paths. Always serve WebSockets over TLS (wss://) and port 443; most proxy issues disappear.

SSE is underrated for server-to-client use cases. It is HTTP, it survives proxy inspection, it reconnects automatically, and it is simpler to implement than WebSocket when you do not need client-to-server streaming. Modern AI streaming responses (OpenAI, Anthropic) use SSE because it works through all standard HTTP infrastructure without special proxy configuration.

WebTransport is the emerging standard built on HTTP/3 (QUIC). It provides both reliable ordered streams (like WebSocket) and unreliable datagrams (like UDP) over the same connection. The use case is latency-critical applications where dropped packets are preferable to delayed packets — gaming, live video, real-time sensor data. Browser support is good as of Chrome 97+ and Firefox 114+, but server-side library maturity lags WebSocket significantly.

···

When OT is Better Than CRDTs

Operational Transformation (OT) preceded CRDTs and remains the implementation choice in some major collaborative editing systems (Google Docs uses an OT-based approach). OT transforms operations against concurrent operations to maintain document consistency. It requires a central server to act as the arbiter of operation ordering — a limitation that CRDTs eliminate by design. For fully peer-to-peer or offline-first collaboration, CRDTs are strictly better. But for systems where a central server is acceptable and always available, OT has one advantage: the transformation functions are easier to implement correctly for complex rich text semantics than an equivalent CRDT. The reason major collaborative editors built on OT before CRDTs matured and have not migrated: the migration cost is a full data format change. If you are building new, use CRDTs. If you are maintaining an OT-based system that works, evaluate migration cost carefully against concrete benefits. For architectural guidance on when to introduce real-time sync into an existing system, see our analysis of event-driven patterns at scale.

“CRDTs do not eliminate the need for conflict resolution — they make conflict resolution deterministic. That is a significant engineering win, but it requires choosing the right CRDT type for your data model.”

···

Infrastructure Costs and Scaling

Real-time sync infrastructure costs scale with concurrent connections, message frequency, and state size. A collaborative document editor with 10 concurrent users generates modest traffic. A multiplayer game with 1,000 concurrent users generates substantial traffic. The infrastructure choice should match: Liveblocks and Ably handle the connection management and scaling for you (at $0.01-0.05 per monthly active user), while self-hosted Yjs on a WebSocket server gives you full control but requires you to manage horizontal scaling, connection routing, and state persistence.

For self-hosted deployments, the scaling bottleneck is usually the WebSocket server, not the CRDT logic. A single Node.js process can handle approximately 10,000-50,000 concurrent WebSocket connections depending on message frequency. Beyond that, you need horizontal scaling with sticky sessions (so all users editing the same document connect to the same server) or a pub/sub layer (Redis, NATS) to broadcast updates across server instances. The Hocuspocus server (built for Yjs) handles this architecture out of the box with Redis-based multi-server support.

···

Security Considerations for Real-Time Sync

Real-time sync introduces security challenges that do not exist in traditional request-response architectures. When users have persistent WebSocket connections, you need to handle: authentication token expiry during long-lived connections (a WebSocket can stay open for hours while the auth token expires after 15 minutes), authorisation changes during a session (a user's permissions are revoked while they are still connected and receiving updates), and data isolation between concurrent editing sessions (ensuring that users can only see and modify documents they have access to).

The standard pattern: authenticate during the WebSocket handshake using a short-lived token, then periodically verify the token during the connection lifetime (every 5-10 minutes via a heartbeat check). When the token fails verification, close the WebSocket with a 4001 custom close code that the client interprets as "re-authenticate." For authorisation, each broadcast message is filtered through the user's current permissions before being sent — this is the server-side equivalent of row-level security in a database.

For applications handling sensitive data, end-to-end encryption of CRDT operations adds a layer of protection. The Yjs ecosystem supports this through custom encoding providers that encrypt operations before they leave the client and decrypt them on arrival. The server sees only encrypted payloads and cannot read the document content — it functions as a relay, not a processor. This architecture satisfies data residency and privacy requirements while maintaining real-time collaboration capability.

Part of: Fordel pillar guide

AI Agent Architecture: Production Patterns

Fordel's pillar guide to architecting production AI agents — state machines, retry semantics, escalation, and audit trails.

Read the full guide →

Build with us

Need this kind of thinking applied to your product?

We build AI agents, full-stack platforms, and engineering systems. Same depth, applied to your problem.

Start a conversation View services

Newsletter

Enjoyed this? Get the weekly digest.

Research highlights and AI news, delivered every Thursday. No spam.

Loading comments...

All articles