[HN Gopher] You might not need a CRDT
___________________________________________________________________
You might not need a CRDT
Author : paulgb
Score : 264 points
Date : 2022-12-05 13:57 UTC (9 hours ago)
| dustingetz wrote:
| Remember, "conflict free" is the opposite of "transactional"
| (ACID). When you choose conflict-free, you're choosing to give up
| the ability to reject a change due to conflict and send it back
| to the user for resolution. This is the crux of the issue in my
| opinion.
| JayStavis wrote:
| Agreed - conflict resolution and authority handling are key to
| advanced game networking techniques and underpin the
| architecture of many big multiplayer games.
| aboodman wrote:
| I don't think this is true on both accounts:
|
| * There is no such thing as "conflict free". When users edit
| data disconnected from each other, they can conflict, because
| the user intent of the edits can be different and not possible
| to reconcile. Nothing can fix that, not even crdts (As a
| thought experiment - two users are working on a report about
| ferrets. While offline, one decides the report should be about
| lemurs, while the other decides it should be about otters. The
| resulting set of changes will not be mergable without conflicts
| in any reasonable sense.)
|
| * What CRDTs give you is actually _convergence_ which is very
| important. But there are other ways to get convergence, and it
| can be done transactionallym as long as your are willing to
| weaken the D (durability). Replicache (my thing -
| replicache.dev) is one way, the way described here is another.
|
| In general game networking is also transactional in this same
| sense. Operations are transactions, that can affect multiple
| data items and enforce arbitrary invariants. But the _order_
| they apply in drifts, and so durability is sacrificed.
|
| Stepping back even further, CRDTs are also often transactional
| in the same way. Each operation in an op-based crdt can be
| thought of a transaction that applies completely or not at all.
| It's just that the set of available operations in a CRDT is
| typically fixed.
| dustingetz wrote:
| Replicache is cool
|
| 1. So by "weaken durability" you mean basically allow
| committed changes to be "lost" in the case of a common
| conflict, aka give it up entirely?
|
| 2. CRDTs are meant to converge even in the presence of
| network partitions (offline first), so you're saying your
| committed transactions can be uncommitted? Or you're saying
| writers have no idea if their transaction is committed
| because it might converge the other way after arbitrary
| delay?
|
| "Durability" means "transactions that have committed will
| survive permanently" - wikipedia
| aboodman wrote:
| Thanks :)
|
| > So by "weaken durability" you mean basically allow
| committed changes to be "lost" in the case of a common
| conflict, aka give it up entirely?
|
| I should have said "weaken ordering".
|
| I mean that the transaction will run, but its order with
| respect to other concurrently running transactions may be
| different in the end than the submitting client initially
| thought. As an extreme case, you submitted the transaction
| and your local client thought you got the concert tickets,
| but by the time it hits the server, the last ticket was
| already sold. Transaction still runs: it just runs first
| optimistically on the client, and then later
| authoritatively on the server with a potentially different
| result. Clients converge to serve view of the world and
| application invariants are maintained regardless (number of
| sold tickets doesn't go negative).
|
| Or just read Jepsen's take:
| https://doc.replicache.dev/concepts/consistency
|
| In the case of concert tickets, this design doesn't make a
| lot of sense, but in lots of other cases -- in particular
| document editing applications -- it makes tons of sense.
| The chance of concurrent edits is low, but it is critical
| to maintain application-defined invariants when conflicts
| do occur.
|
| > CRDTs are meant to converge even in the presence of
| network partitions (offline first)
|
| Sure. Replicache converges in the presence of network
| partitions, as does OP's design, as do most games,
| distributed databases, etc. What CRDTs uniquely provide is
| the ability to converge *without a leader* -- in a p2p
| environment. As OP observes, this is not a property that
| most applications today need. Because they have an obvious
| leader: the server.
|
| > so you're saying your committed transactions can be
| uncommitted? Or you're saying writers have no idea if their
| transaction is committed because it might converge the
| other way after arbitrary delay?
|
| They run optimistically (speculatively) on the client, and
| then later authoritatively on the server. Application can
| render the optimistic result, and then later the
| authoritative result when it is known.
|
| This is exactly what CRDTs do. The only difference is that
| CRDTs can do it without a leader, but the Replicache (and
| OP's) design offers additional flexibility -- ability to
| define your own operation/transactions.
|
| ===
|
| We need new terminology for distributed systems. Terms
| invented in the 70s aren't going to make sense. I believe
| that "transactions" in the common parlance mean "a set of
| code that runs together or not at all, isolated from other
| concurrently running transactions". This is entirely
| possible to provide with convergence if we are willing to
| weaken ordering.
| dustingetz wrote:
| i accept we need new terminology 100%
|
| i accept we can weaken ordering, 100%
|
| under the model you propose, how does a writer know if
| their transaction has been committed (by the authority)
| and what is the upper bound on how long they will need to
| wait to know the _final_ result? and is there an
| operation for the writer to wait for it?
| aboodman wrote:
| Each client sends a sequence of mutations to the server,
| each identified by a mutationID. During downstream sync,
| server says up to which mutation it has processed.
|
| In Replicache, you can known when a mutation is finalized
| with the `onSync` API, or more ergonomically, with the
| upcoming `pendingMutation` API.
|
| https://doc.replicache.dev/api/classes/Replicache#onsync
| https://doc.replicache.dev/api/classes/Replicache#experim
| ent...
|
| PS- there actually is terminology for this already.
| Replicache is technically a Causal+ Consistent
| transactional system:
| https://jepsen.io/consistency/models/causal
| Sytten wrote:
| Unless I am wrong, if your server uses end-to-end encryption you
| are still better off with CRDT since the server cannot merge the
| state. So there is a caveat to be said about this claim that you
| only need CRDT for peer-to-peer scenarios.
| paulgb wrote:
| That's a good point, but the server can still enforce an
| ordering of encrypted messages it can't read. You just don't
| get the benefit of the server being able to detect invariant
| violations in this case; all change operations would be sent to
| all clients and the client state machine would throw away the
| ones that created conflicts.
| dgb23 wrote:
| Quite a big takeaway here:
|
| > A general theme of successful multiplayer approaches we've seen
| is not overcomplicating things. We've heard a number of companies
| confess that their multiplayer approach feels naive -- especially
| compared to the academic literature on the topic -- and yet it
| works just fine in practice.
|
| The KISS and YAGNI principles apply.
|
| And on another note, it seems like one big industry is often
| overlooked in these types of discussions. I wonder what web
| development could learn from online game development when it
| comes to "multiplayer" apps - I mean it's right there in the
| name.
|
| One very pragmatic approach that strikes me as reasonable is
| having fixed delay changes. The game Warcraft 3 (a real time
| strategy game) did something like this. What it accomplishes is
| that changes are basically forced to be in sync by delaying them.
|
| Note that you will get used to the delay very quickly, you won't
| notice it too much after a short while. It is _much_ more
| noticeable if the delays are variable, but a consistent delay
| gets basically filtered out after a short while.
| paulgb wrote:
| > I wonder what web development could learn from online game
| development when it comes to "multiplayer" apps - I mean it's
| right there in the name.
|
| 100%. The architecture we used for https://plane.dev takes a
| lot of inspiration from how game servers work, but using
| HTTP/WebSocket instead of UDP.
|
| A lot of state sharing techniques that people are now using on
| the web (like cursor position smoothing and optimistic updates)
| have decades of precedent in multiplayer games.
|
| We've also seen more apps start to use an approach to pixel
| streaming which more closely resembles stadia than traditional
| VNC. https://digest.browsertech.com/archive/browsertech-digest-
| wo...
| JayStavis wrote:
| The intersection of state replication and pixel streaming is
| indeed interesting. Cloud games for example are usually using
| game state sync techniques, and then add in the client<>gpu
| latency on top of that.
|
| A true cloud native game (sadly part of Stadia's original
| ambitions) would hopefully have a more e2e state management
| approach inclusive of the pixel latency.
|
| This is very different from "local multiplayer" pixel
| streaming, which may be what companies like Switchboard[1]
| are doing: multiple clients control the same I/O channel
| (e.g. you and I share a mouse on a single remote desktop)
|
| [1] https://www.switchboard.app/
| Joeri wrote:
| _but using HTTP /WebSocket instead of UDP_
|
| I think websockets run over UDP when hosted on http/3.
| paulgb wrote:
| They do, but with HTTP/3 there's also WebTransport, which
| is even better because it means we will be able to avoid
| head-of-line blocking.
| munificent wrote:
| There is definitely a lot we can learn from multiplayer games,
| but the non-game use cases often have different constraints
| that lead to different solutions.
|
| With a game, it's a given that everyone is playing in real-
| time, right now, with as little lag as they can get away with.
| The synchronization happens constantly and ideally the game's
| internal state is completely in sync across all players.
| Players _want_ the game to be kept in sync as quickly as
| possible so that they can react to what other players are
| doing. That means the amount of synchronization to perform at
| any one point in time is generally fairly small.
|
| Also, the structure of the game itself often implicitly
| partitions the game's mutable state among the players. Players
| can obviously do things that affect each other and there are
| mutable parts of the world that can be affected by multiple
| players. But each player often controls their own avatar in the
| game and has an implicit lock on that data.
|
| This means that there is rarely any "merging" happening. It's
| usually just look at the events, determine an order to apply
| them, and do them one at a time. If the result is a little
| weird in terms of state (maybe a wall is built right when
| another player is crossing it and the game lets the teleport
| through), well, it's a game. The programmers can just declare
| by fiat that it's part of the game mechanics.
|
| Outside of games, a common use case is "let me work on this
| document offline for the duration of a flight and then resync
| everything when I get to the airport". There may be hundreds of
| changes and other users may also have huge batches of changes.
| There can be a very complex reconciliation and megre process
| because the changes are many and the data is precious. You
| can't just say it's "part of the game" if your multi-user
| spreadsheet goofs someone's tax statement.
| smolder wrote:
| I think the general term should be multi-actor. I understand
| why people use the term multiplayer, I think: familiarity. But
| players imply play which implies games, so it's not a great fit
| outside of them.
|
| The type of thing you're talking about with Warcraft is a
| deterministic simulation. As long as you have deterministic
| simulation, and you synchronize your inputs by buffering
| sufficiently for the input latency of all the remote
| participants, that ensures everything stays 1:1 between each
| local simulation.
| yazaddaruvala wrote:
| Honestly, I just restarting playing World of Warcraft, and even
| tho there have been amazing tech improvements to Blizzard's
| engine and servers, there is so much simple stuff lacking.
|
| I would honestly say it's likely better for game server
| developers to be taking ideas from web development.
|
| When it comes to CRDT's everyone keeps talking about text-
| editing, honestly one of the hardest problems for a CRDT to
| solve. Text is entirely unstructured, requires extremely high
| precision on the merge of any conflict.
|
| With a video game instead, players already used to "rewind and
| replay" merges, they are already used to having a "miss" when
| it should have been a "hit", and they roll with it because
| while that is friction it's a game and precision isn't that
| important. I'll say differently, precision is already not what
| game servers provide.
|
| At the end of the day, game servers are "just" low-latency chat
| applications with "bots" that process the messages. I'd love to
| see the game server written by the maintainers of WhatsApp /
| Messenger / Slack / etc. That would be a truly scalable game!
|
| P.S. If you think "chat is so simple, can't compare it to a
| game server". Please just think through the design for FB's /
| Discord's display widget that shows "currently active friends".
| wtatum wrote:
| This is an interesting idea but I do wonder if it can be
| applied to generic collaborative editing "as is". The approach
| for synchronized ticks (I think most RTS games do something
| similar to the Blizzard approach) does require each participant
| to "ack" the tick. You can see this when one participant is
| lagging in that the simulation freezes for all players until
| they acknowledge. The online gaming approach is to let other
| players vote to kick someone who is lagging the game but that
| probably wouldn't be considered suitable in something like
| enterprise content management.
|
| Can some variation of the approach be taken without the strict
| requirement that: - There is a known set of participants at any
| time - The list of participants has a reasonable well-bounded
| size - Peers can communicate with each other directly (WebRTC
| is available but there are sharp corners) - It's acceptable to
| freeze all players based on one lagging player - Need an
| unambiguous way to handle simultaneous player actions against
| shared state
|
| Most likely the "fix" for the friction is that you just slow
| the ticks down _a lot_ compared to what you would do in a
| multi-player game.
| wongarsu wrote:
| There are probably good techniques worth stealing from FPS or
| MMORPGs. Those also usually run on a fixed tick rate of about
| 50ms, but can't afford to have clients vote on it (FPS
| because lag is unacceptable, MMORPGs because of user count).
| There are various approaches in use, depending on whether you
| want to just degrade the experience of the lagging
| participant, or give preference based on action (e.g. FPS
| resolve conflicts in favor of the person who fired).
| matlin wrote:
| There's actually an approach that sits in between this and formal
| CRDTs.
|
| I had the same insight that having a total ordered log of changes
| solves a lot of the issues with concurrent updates. We solved
| this by creating one sequence CRDT that ensures all events from
| clients eventually end up in the same order and then our data
| structures are just reductions over those events. It's a little
| bit like a distributed, synchronized Redux store. This let us
| avoid having to think in terms of CRDT types (LWW registers,
| Grow-only sets, sequences, etc) and just think in terms of the
| domain-specific business logic (e.g. "what should happen when
| someone marks a todo as done, after it's deleted?").
|
| There's a little but more to so if this is interesting to anyone,
| I can write a more detailed explanation.
| lifeisstillgood wrote:
| How do you solve the ordering? Timestamps can be wildly out of
| whack? Be interested to hear more
| matlin wrote:
| We use logical timestamps instead of datetime.
|
| So each "event" has three properties that allow us to resolve
| to a sensible order across clients: session (integer),
| device_id (uuid), order (integer). The `session` number is
| set by the highest `order` that your device has seen from the
| server and the `order` gets incremented on each new event.
|
| So an example you might have a sequence of events like this.
| [session] [order] 0 1 0 2 0 3 ---sync --- 3 4 3 5 3 6
|
| We can then sort all events by (session, device_id, order)
| and we'll get any events that happen on a device to be sorted
| in a row even if some other device created a bunch of
| concurrent events at the same time.
| Karrot_Kream wrote:
| What about situations where two different clients both
| begin to publish new events from the same starting point?
| Those events can't be absolutely ordered right? If you're
| thinking of Lamport-style happens-before relations, then
| you can't enforce total ordering of those events. Do you
| just arbitrarily mark one of those client event streams as
| failed and force the client to absorb new state and try
| again?
| LAC-Tech wrote:
| Isn't that essentially doing a CRDT, but the replica is the
| whole datastore?
|
| Ties in nicely with event sourcing - the ordered sequence of
| events is the CRDT, and sending events between nodes are the
| delta states.
| derefr wrote:
| Have you considered backing your application's state-
| synchronization with a centralized CQRS/ES event store (e.g.
| https://www.eventstore.com/)? You get the same semantics,
| without paying the overhead of building them on top of a CRDT
| abstraction; and the event store itself also "knows" (in the
| sense that it supplies the framework and you supply the logic)
| how to reduce over states in order to build snapshot states
| ("aggregates") for clients to download for fast initial
| synchronization.
| matlin wrote:
| We basically have that as well! We use CRDTs on the client to
| allow for optimistic updates and fast replication of events
| directly to other clients but we also do exactly what you
| describe on the server so that a user loading the data for
| the first time just gets the "snapshot" and then plays events
| on top of it.
| maclockard wrote:
| I actually talked to the drifting in space team about our
| experience implementing multiplayer. Really cool to see their
| findings!
|
| Our team ended up having a similar take away, we wrote a blog
| post detailing our approach and experience here:
| https://hex.tech/blog/a-pragmatic-approach-to-live-collabora...
|
| For the most part is been surprisingly solid! I was very much
| expecting to need to completely rewrite it by now, but its met
| most of our present needs.
|
| The only real shortcoming is not being able to support line
| specific commenting. While we _could_ support it, any minor
| update to the text would result in the comment being marked
| 'stale' with a high false positive rate. I've considered adopting
| CRDT/OT just for the text editing portion to solve this.
| jawadch93 wrote:
| toxicFork wrote:
| If I want to have an app that needs to eventually persist user
| state to a server but still work well when the user is offline,
| e.g. a personal vocabulary app that can work with multiple
| devices but also can work when the user is offline, so the user
| can still define new words manually on both devices, and then
| when the phone is back online, the state still makes sense in
| both, should I use CRDT?
| eternalban wrote:
| A definitive no is the answer. This complexity is necessary
| only if others care about your shared state aka vocabulary,
| and, multiple write ops are being done on same records at the
| same relative time on different devices. For example, a
| document editor _could_ use CRDTs without looking silly. A CRDT
| is an _eventually consistent_ distributed object but all its
| complexity is there to handle concurrent modifications.
|
| In your application, the consistency requirements are pretty
| weak and easy to meet with a basic client/server architecture
| with a well defined sync point set on its life-cycle. For
| example, sync on boot, sync on net-reconnect, etc. Or, you
| could use a simple P2P protocol so on boot you discover your
| other connected devices and your apps can shake hand and
| exchange notes to sync up as they modify their local state:
| "Hey gang, + "KISS". "Keep it simple ..". Use json if you must.
|
| Further simple characteristics of this app is that the state
| ops on your dictionary are commutative (like a CRDT!). On one
| device you add word "Foo" and on the other you add "Bar". You
| do not care (do you?) that you added them in a certain order,
| so you don't even have to bother with what is rather difficult
| to do in a performant manner: maintain a total ordering over
| the state of the system. (Think sharded ordered-sets..)
| fwip wrote:
| There's a few edge cases that you'd probably need to handle
| (like adding the same word with two different definitions),
| but probably few enough that you could do them ad-hoc and
| punt to the user. ("Hey, merging these two sets failed,
| here's the conflict, which do you want to keep?")
|
| Something fancy like Operational Transforms or CRDT might
| make your life easier if you find a library that's ergonomic
| for your use case, but it's definitely not necessary.
| jamil7 wrote:
| The article talks about this, in the case where your clients
| are syncing with a centralised server you don't necessarily
| need to use CRDTs, and it might be simpler not to.
| layer8 wrote:
| This doesn't address how changes that were committed offline
| are merged on the central server once the clients are online
| again.
| dinosaurdynasty wrote:
| Basic CRDTs are still useful (idempotency, associativity, and
| commutativity are powerful mental tools for all distributed
| systems, even OT) but you can _massively_ simplify the CRDTs
| if you assume a super-peer (aka centralized server all
| updates go through).
|
| Example: last-write-wins register, which is a (v, t) pair,
| normally has a merge function of "return the pair with the
| greatest t (with deterministic tie breaking for same t)", and
| for a super peer just have v's and "the latest value to hit
| the server wins".
| robertlagrant wrote:
| Good shoutout to Automerge in this.
| syrusakbary wrote:
| > CRDTs are designed for decentralized systems where there is no
| single central authority to decide what the final state should
| be. There is some unavoidable performance and memory overhead
| with doing this. Since Figma is centralized (our server is the
| central authority), we can simplify our system by removing this
| extra overhead and benefit from a faster and leaner
| implementation
|
| I think this is on point. It's also super refreshing to see the
| work on Aper [1] [2] (a Rust library implementing state machine
| synchronization across a trusted network). Looking forward next
| series of articles here!
|
| [1]: https://aper.dev/
|
| [2]: https://github.com/drifting-in-space/aper
| fwip wrote:
| Just to clarify, Aper also uses a client/server model, correct?
| It appears as though it uses a central server to totally order
| incoming change requests.
| paulgb wrote:
| Yes, it does.
| paulgb wrote:
| Thanks, Syrus!
| eatonphil wrote:
| Great breakdown of CRDTs vs replicated state machines (like Raft
| or VSR) and the difference between eventually convergent and
| globally orderable workloads, thanks for this Paul!
| satvikpendem wrote:
| Does anyone have any resources on making an offline-first app
| from scratch (I don't want to use stuff like Firebase and more
| importantly want to learn how it's done)? Like a task manager for
| example that can sync with an online account but still works
| completely offline.
| LAC-Tech wrote:
| I swear I've looked into firebase multiple times and I have no
| idea how they actually detect and handle conflicts. They just
| hand-wave that it works offline... not that comforting.
|
| Regardless, there's no one size fits all for offline.
| Approaches that work for append only data don't work for data
| where multiple nodes are doing destructive updates. And even if
| you're doing that, some data is amenable to deterministic
| merging (ie, CRDTs) and some you need a user to step in to
| decide (revision based multi-master systems, prior art being
| Pouch/Couch or indeed git).
|
| Basically if you can tell me what exactly a taskmanager is and
| how it might be updated I might be able to say something more
| helfpul!
| lewisjoe wrote:
| This is the most underrated problem with CRDTs:
|
| > Both paths will allow us to ensure that each replica converges
| on the same state, but that alone does not guarantee that this
| state is the "correct" state from the point of view of user
| intent.
|
| In context of richtext editors, it's easy to converge a rich-text
| JSON into a single state for all collaborators. What's difficult
| is to ensure that the converged state is renderable as richtext.
| For example, is there a table cell that was inserted where a
| column was deleted?
|
| I wrote a summary of this issue with CRDTs here -
| https://writer.zohopublic.com/writer/published/grcwy5c699d67...
| for those interested.
|
| While I'm optimistic it's not impossible to solve, it's
| surprising why most CRDT literatures don't go into these details
| on correctness. Appreciate the author for putting this out.
|
| PS: We currently use OT for collaboration in Zoho Writer
| (https://writer.zoho.com/). I'm looking out for practical CRDT
| ideas that works well with richtext.
| truculent wrote:
| To what extent is this an issue with CRDTs and to what extent
| is it an issue with the data structure(s) used to represent the
| state?
| dkjaudyeqooe wrote:
| I think the short answer is: if you want your CRDTs to
| enforce application constraints you need to bake them into
| the data structure.
| truculent wrote:
| That makes sense. Thank you.
| [deleted]
| jkaptur wrote:
| The intuitively obvious approach here is to have some kind of
| schema-guided or constraint-guided CRDT. That is, you get the
| guarantee that not only do you end up with _valid JSON_ , you
| get the guarantee that f(that_json) == true. I imagine this is
| considerably easier said than done, but I'm curious if there's
| research (or results!) in the area.
| quickthrower2 wrote:
| Parts of the tree with "don't merge children" might be ok,
| then you show the user a conflict dialog when this happens.
|
| Or, inspired by your comment: run f for the tree, if false,
| search for the smallest subset where f is false and show the
| conflict dialog for that. The user chooses "mine" or
| "theirs".
| kevsim wrote:
| We also had the OT vs. CRDT discussion with regards to our
| editor at Kitemaker (kitemaker.co), and also ended up with OT.
| For us there were a number of factors, but (in addition to the
| case you mentioned) it came down to the fact that (at that
| time) most of the CRDT libs like automerge generate huge diffs
| that we'd need to send over the wire and also that the open
| source editor we use (SlateJS) had operations that just slotted
| very nicely into OT.
|
| Your editor feels really nice BTW. Well done!
| expede wrote:
| > I'm looking out for practical CRDT ideas that works well with
| richtext.
|
| Have you seen Peritext from Ink & Switch?
| https://www.inkandswitch.com/peritext/ It's relatively new, but
| is a CRDT aimed at rich text!
| jamil7 wrote:
| Their linked summary mentions Peritext and the current
| limitations.
|
| Agree it's exciting though and you can implement it fairly
| easily yourself by following the blog post as it's so well
| written and explained.
| samwillis wrote:
| Exactly, the "C" in CRDT is a little misleading. They are
| "conflict free" in as much as they are able to merge and
| resolve in the basic sense. That does not mean that the
| resulting state is valid or meets your internal schema.
|
| A good example of this is using Yjs with ProseMirror, it's
| possible for two concurrent edits to resolve to a structure
| that doesn't meet your ProseMirror schema. An example I have
| used before is a