[HN Gopher] You might not need a CRDT
___________________________________________________________________
 
You might not need a CRDT
 
Author : paulgb
Score  : 264 points
Date   : 2022-12-05 13:57 UTC (9 hours ago)
 
web link (driftingin.space)
w3m dump (driftingin.space)
 
| dustingetz wrote:
| Remember, "conflict free" is the opposite of "transactional"
| (ACID). When you choose conflict-free, you're choosing to give up
| the ability to reject a change due to conflict and send it back
| to the user for resolution. This is the crux of the issue in my
| opinion.
 
  | JayStavis wrote:
  | Agreed - conflict resolution and authority handling are key to
  | advanced game networking techniques and underpin the
  | architecture of many big multiplayer games.
 
  | aboodman wrote:
  | I don't think this is true on both accounts:
  | 
  | * There is no such thing as "conflict free". When users edit
  | data disconnected from each other, they can conflict, because
  | the user intent of the edits can be different and not possible
  | to reconcile. Nothing can fix that, not even crdts (As a
  | thought experiment - two users are working on a report about
  | ferrets. While offline, one decides the report should be about
  | lemurs, while the other decides it should be about otters. The
  | resulting set of changes will not be mergable without conflicts
  | in any reasonable sense.)
  | 
  | * What CRDTs give you is actually _convergence_ which is very
  | important. But there are other ways to get convergence, and it
  | can be done transactionallym as long as your are willing to
  | weaken the D (durability). Replicache (my thing -
  | replicache.dev) is one way, the way described here is another.
  | 
  | In general game networking is also transactional in this same
  | sense. Operations are transactions, that can affect multiple
  | data items and enforce arbitrary invariants. But the _order_
  | they apply in drifts, and so durability is sacrificed.
  | 
  | Stepping back even further, CRDTs are also often transactional
  | in the same way. Each operation in an op-based crdt can be
  | thought of a transaction that applies completely or not at all.
  | It's just that the set of available operations in a CRDT is
  | typically fixed.
 
    | dustingetz wrote:
    | Replicache is cool
    | 
    | 1. So by "weaken durability" you mean basically allow
    | committed changes to be "lost" in the case of a common
    | conflict, aka give it up entirely?
    | 
    | 2. CRDTs are meant to converge even in the presence of
    | network partitions (offline first), so you're saying your
    | committed transactions can be uncommitted? Or you're saying
    | writers have no idea if their transaction is committed
    | because it might converge the other way after arbitrary
    | delay?
    | 
    | "Durability" means "transactions that have committed will
    | survive permanently" - wikipedia
 
      | aboodman wrote:
      | Thanks :)
      | 
      | > So by "weaken durability" you mean basically allow
      | committed changes to be "lost" in the case of a common
      | conflict, aka give it up entirely?
      | 
      | I should have said "weaken ordering".
      | 
      | I mean that the transaction will run, but its order with
      | respect to other concurrently running transactions may be
      | different in the end than the submitting client initially
      | thought. As an extreme case, you submitted the transaction
      | and your local client thought you got the concert tickets,
      | but by the time it hits the server, the last ticket was
      | already sold. Transaction still runs: it just runs first
      | optimistically on the client, and then later
      | authoritatively on the server with a potentially different
      | result. Clients converge to serve view of the world and
      | application invariants are maintained regardless (number of
      | sold tickets doesn't go negative).
      | 
      | Or just read Jepsen's take:
      | https://doc.replicache.dev/concepts/consistency
      | 
      | In the case of concert tickets, this design doesn't make a
      | lot of sense, but in lots of other cases -- in particular
      | document editing applications -- it makes tons of sense.
      | The chance of concurrent edits is low, but it is critical
      | to maintain application-defined invariants when conflicts
      | do occur.
      | 
      | > CRDTs are meant to converge even in the presence of
      | network partitions (offline first)
      | 
      | Sure. Replicache converges in the presence of network
      | partitions, as does OP's design, as do most games,
      | distributed databases, etc. What CRDTs uniquely provide is
      | the ability to converge *without a leader* -- in a p2p
      | environment. As OP observes, this is not a property that
      | most applications today need. Because they have an obvious
      | leader: the server.
      | 
      | > so you're saying your committed transactions can be
      | uncommitted? Or you're saying writers have no idea if their
      | transaction is committed because it might converge the
      | other way after arbitrary delay?
      | 
      | They run optimistically (speculatively) on the client, and
      | then later authoritatively on the server. Application can
      | render the optimistic result, and then later the
      | authoritative result when it is known.
      | 
      | This is exactly what CRDTs do. The only difference is that
      | CRDTs can do it without a leader, but the Replicache (and
      | OP's) design offers additional flexibility -- ability to
      | define your own operation/transactions.
      | 
      | ===
      | 
      | We need new terminology for distributed systems. Terms
      | invented in the 70s aren't going to make sense. I believe
      | that "transactions" in the common parlance mean "a set of
      | code that runs together or not at all, isolated from other
      | concurrently running transactions". This is entirely
      | possible to provide with convergence if we are willing to
      | weaken ordering.
 
        | dustingetz wrote:
        | i accept we need new terminology 100%
        | 
        | i accept we can weaken ordering, 100%
        | 
        | under the model you propose, how does a writer know if
        | their transaction has been committed (by the authority)
        | and what is the upper bound on how long they will need to
        | wait to know the _final_ result? and is there an
        | operation for the writer to wait for it?
 
        | aboodman wrote:
        | Each client sends a sequence of mutations to the server,
        | each identified by a mutationID. During downstream sync,
        | server says up to which mutation it has processed.
        | 
        | In Replicache, you can known when a mutation is finalized
        | with the `onSync` API, or more ergonomically, with the
        | upcoming `pendingMutation` API.
        | 
        | https://doc.replicache.dev/api/classes/Replicache#onsync 
        | https://doc.replicache.dev/api/classes/Replicache#experim
        | ent...
        | 
        | PS- there actually is terminology for this already.
        | Replicache is technically a Causal+ Consistent
        | transactional system:
        | https://jepsen.io/consistency/models/causal
 
| Sytten wrote:
| Unless I am wrong, if your server uses end-to-end encryption you
| are still better off with CRDT since the server cannot merge the
| state. So there is a caveat to be said about this claim that you
| only need CRDT for peer-to-peer scenarios.
 
  | paulgb wrote:
  | That's a good point, but the server can still enforce an
  | ordering of encrypted messages it can't read. You just don't
  | get the benefit of the server being able to detect invariant
  | violations in this case; all change operations would be sent to
  | all clients and the client state machine would throw away the
  | ones that created conflicts.
 
| dgb23 wrote:
| Quite a big takeaway here:
| 
| > A general theme of successful multiplayer approaches we've seen
| is not overcomplicating things. We've heard a number of companies
| confess that their multiplayer approach feels naive -- especially
| compared to the academic literature on the topic -- and yet it
| works just fine in practice.
| 
| The KISS and YAGNI principles apply.
| 
| And on another note, it seems like one big industry is often
| overlooked in these types of discussions. I wonder what web
| development could learn from online game development when it
| comes to "multiplayer" apps - I mean it's right there in the
| name.
| 
| One very pragmatic approach that strikes me as reasonable is
| having fixed delay changes. The game Warcraft 3 (a real time
| strategy game) did something like this. What it accomplishes is
| that changes are basically forced to be in sync by delaying them.
| 
| Note that you will get used to the delay very quickly, you won't
| notice it too much after a short while. It is _much_ more
| noticeable if the delays are variable, but a consistent delay
| gets basically filtered out after a short while.
 
  | paulgb wrote:
  | > I wonder what web development could learn from online game
  | development when it comes to "multiplayer" apps - I mean it's
  | right there in the name.
  | 
  | 100%. The architecture we used for https://plane.dev takes a
  | lot of inspiration from how game servers work, but using
  | HTTP/WebSocket instead of UDP.
  | 
  | A lot of state sharing techniques that people are now using on
  | the web (like cursor position smoothing and optimistic updates)
  | have decades of precedent in multiplayer games.
  | 
  | We've also seen more apps start to use an approach to pixel
  | streaming which more closely resembles stadia than traditional
  | VNC. https://digest.browsertech.com/archive/browsertech-digest-
  | wo...
 
    | JayStavis wrote:
    | The intersection of state replication and pixel streaming is
    | indeed interesting. Cloud games for example are usually using
    | game state sync techniques, and then add in the client<>gpu
    | latency on top of that.
    | 
    | A true cloud native game (sadly part of Stadia's original
    | ambitions) would hopefully have a more e2e state management
    | approach inclusive of the pixel latency.
    | 
    | This is very different from "local multiplayer" pixel
    | streaming, which may be what companies like Switchboard[1]
    | are doing: multiple clients control the same I/O channel
    | (e.g. you and I share a mouse on a single remote desktop)
    | 
    | [1] https://www.switchboard.app/
 
    | Joeri wrote:
    | _but using HTTP /WebSocket instead of UDP_
    | 
    | I think websockets run over UDP when hosted on http/3.
 
      | paulgb wrote:
      | They do, but with HTTP/3 there's also WebTransport, which
      | is even better because it means we will be able to avoid
      | head-of-line blocking.
 
  | munificent wrote:
  | There is definitely a lot we can learn from multiplayer games,
  | but the non-game use cases often have different constraints
  | that lead to different solutions.
  | 
  | With a game, it's a given that everyone is playing in real-
  | time, right now, with as little lag as they can get away with.
  | The synchronization happens constantly and ideally the game's
  | internal state is completely in sync across all players.
  | Players _want_ the game to be kept in sync as quickly as
  | possible so that they can react to what other players are
  | doing. That means the amount of synchronization to perform at
  | any one point in time is generally fairly small.
  | 
  | Also, the structure of the game itself often implicitly
  | partitions the game's mutable state among the players. Players
  | can obviously do things that affect each other and there are
  | mutable parts of the world that can be affected by multiple
  | players. But each player often controls their own avatar in the
  | game and has an implicit lock on that data.
  | 
  | This means that there is rarely any "merging" happening. It's
  | usually just look at the events, determine an order to apply
  | them, and do them one at a time. If the result is a little
  | weird in terms of state (maybe a wall is built right when
  | another player is crossing it and the game lets the teleport
  | through), well, it's a game. The programmers can just declare
  | by fiat that it's part of the game mechanics.
  | 
  | Outside of games, a common use case is "let me work on this
  | document offline for the duration of a flight and then resync
  | everything when I get to the airport". There may be hundreds of
  | changes and other users may also have huge batches of changes.
  | There can be a very complex reconciliation and megre process
  | because the changes are many and the data is precious. You
  | can't just say it's "part of the game" if your multi-user
  | spreadsheet goofs someone's tax statement.
 
  | smolder wrote:
  | I think the general term should be multi-actor. I understand
  | why people use the term multiplayer, I think: familiarity. But
  | players imply play which implies games, so it's not a great fit
  | outside of them.
  | 
  | The type of thing you're talking about with Warcraft is a
  | deterministic simulation. As long as you have deterministic
  | simulation, and you synchronize your inputs by buffering
  | sufficiently for the input latency of all the remote
  | participants, that ensures everything stays 1:1 between each
  | local simulation.
 
  | yazaddaruvala wrote:
  | Honestly, I just restarting playing World of Warcraft, and even
  | tho there have been amazing tech improvements to Blizzard's
  | engine and servers, there is so much simple stuff lacking.
  | 
  | I would honestly say it's likely better for game server
  | developers to be taking ideas from web development.
  | 
  | When it comes to CRDT's everyone keeps talking about text-
  | editing, honestly one of the hardest problems for a CRDT to
  | solve. Text is entirely unstructured, requires extremely high
  | precision on the merge of any conflict.
  | 
  | With a video game instead, players already used to "rewind and
  | replay" merges, they are already used to having a "miss" when
  | it should have been a "hit", and they roll with it because
  | while that is friction it's a game and precision isn't that
  | important. I'll say differently, precision is already not what
  | game servers provide.
  | 
  | At the end of the day, game servers are "just" low-latency chat
  | applications with "bots" that process the messages. I'd love to
  | see the game server written by the maintainers of WhatsApp /
  | Messenger / Slack / etc. That would be a truly scalable game!
  | 
  | P.S. If you think "chat is so simple, can't compare it to a
  | game server". Please just think through the design for FB's /
  | Discord's display widget that shows "currently active friends".
 
  | wtatum wrote:
  | This is an interesting idea but I do wonder if it can be
  | applied to generic collaborative editing "as is". The approach
  | for synchronized ticks (I think most RTS games do something
  | similar to the Blizzard approach) does require each participant
  | to "ack" the tick. You can see this when one participant is
  | lagging in that the simulation freezes for all players until
  | they acknowledge. The online gaming approach is to let other
  | players vote to kick someone who is lagging the game but that
  | probably wouldn't be considered suitable in something like
  | enterprise content management.
  | 
  | Can some variation of the approach be taken without the strict
  | requirement that: - There is a known set of participants at any
  | time - The list of participants has a reasonable well-bounded
  | size - Peers can communicate with each other directly (WebRTC
  | is available but there are sharp corners) - It's acceptable to
  | freeze all players based on one lagging player - Need an
  | unambiguous way to handle simultaneous player actions against
  | shared state
  | 
  | Most likely the "fix" for the friction is that you just slow
  | the ticks down _a lot_ compared to what you would do in a
  | multi-player game.
 
    | wongarsu wrote:
    | There are probably good techniques worth stealing from FPS or
    | MMORPGs. Those also usually run on a fixed tick rate of about
    | 50ms, but can't afford to have clients vote on it (FPS
    | because lag is unacceptable, MMORPGs because of user count).
    | There are various approaches in use, depending on whether you
    | want to just degrade the experience of the lagging
    | participant, or give preference based on action (e.g. FPS
    | resolve conflicts in favor of the person who fired).
 
| matlin wrote:
| There's actually an approach that sits in between this and formal
| CRDTs.
| 
| I had the same insight that having a total ordered log of changes
| solves a lot of the issues with concurrent updates. We solved
| this by creating one sequence CRDT that ensures all events from
| clients eventually end up in the same order and then our data
| structures are just reductions over those events. It's a little
| bit like a distributed, synchronized Redux store. This let us
| avoid having to think in terms of CRDT types (LWW registers,
| Grow-only sets, sequences, etc) and just think in terms of the
| domain-specific business logic (e.g. "what should happen when
| someone marks a todo as done, after it's deleted?").
| 
| There's a little but more to so if this is interesting to anyone,
| I can write a more detailed explanation.
 
  | lifeisstillgood wrote:
  | How do you solve the ordering? Timestamps can be wildly out of
  | whack? Be interested to hear more
 
    | matlin wrote:
    | We use logical timestamps instead of datetime.
    | 
    | So each "event" has three properties that allow us to resolve
    | to a sensible order across clients: session (integer),
    | device_id (uuid), order (integer). The `session` number is
    | set by the highest `order` that your device has seen from the
    | server and the `order` gets incremented on each new event.
    | 
    | So an example you might have a sequence of events like this.
    | [session] [order] 0 1 0 2 0 3 ---sync --- 3 4 3 5 3 6
    | 
    | We can then sort all events by (session, device_id, order)
    | and we'll get any events that happen on a device to be sorted
    | in a row even if some other device created a bunch of
    | concurrent events at the same time.
 
      | Karrot_Kream wrote:
      | What about situations where two different clients both
      | begin to publish new events from the same starting point?
      | Those events can't be absolutely ordered right? If you're
      | thinking of Lamport-style happens-before relations, then
      | you can't enforce total ordering of those events. Do you
      | just arbitrarily mark one of those client event streams as
      | failed and force the client to absorb new state and try
      | again?
 
  | LAC-Tech wrote:
  | Isn't that essentially doing a CRDT, but the replica is the
  | whole datastore?
  | 
  | Ties in nicely with event sourcing - the ordered sequence of
  | events is the CRDT, and sending events between nodes are the
  | delta states.
 
  | derefr wrote:
  | Have you considered backing your application's state-
  | synchronization with a centralized CQRS/ES event store (e.g.
  | https://www.eventstore.com/)? You get the same semantics,
  | without paying the overhead of building them on top of a CRDT
  | abstraction; and the event store itself also "knows" (in the
  | sense that it supplies the framework and you supply the logic)
  | how to reduce over states in order to build snapshot states
  | ("aggregates") for clients to download for fast initial
  | synchronization.
 
    | matlin wrote:
    | We basically have that as well! We use CRDTs on the client to
    | allow for optimistic updates and fast replication of events
    | directly to other clients but we also do exactly what you
    | describe on the server so that a user loading the data for
    | the first time just gets the "snapshot" and then plays events
    | on top of it.
 
| maclockard wrote:
| I actually talked to the drifting in space team about our
| experience implementing multiplayer. Really cool to see their
| findings!
| 
| Our team ended up having a similar take away, we wrote a blog
| post detailing our approach and experience here:
| https://hex.tech/blog/a-pragmatic-approach-to-live-collabora...
| 
| For the most part is been surprisingly solid! I was very much
| expecting to need to completely rewrite it by now, but its met
| most of our present needs.
| 
| The only real shortcoming is not being able to support line
| specific commenting. While we _could_ support it, any minor
| update to the text would result in the comment being marked
| 'stale' with a high false positive rate. I've considered adopting
| CRDT/OT just for the text editing portion to solve this.
 
| jawadch93 wrote:
 
| toxicFork wrote:
| If I want to have an app that needs to eventually persist user
| state to a server but still work well when the user is offline,
| e.g. a personal vocabulary app that can work with multiple
| devices but also can work when the user is offline, so the user
| can still define new words manually on both devices, and then
| when the phone is back online, the state still makes sense in
| both, should I use CRDT?
 
  | eternalban wrote:
  | A definitive no is the answer. This complexity is necessary
  | only if others care about your shared state aka vocabulary,
  | and, multiple write ops are being done on same records at the
  | same relative time on different devices. For example, a
  | document editor _could_ use CRDTs without looking silly. A CRDT
  | is an _eventually consistent_ distributed object but all its
  | complexity is there to handle concurrent modifications.
  | 
  | In your application, the consistency requirements are pretty
  | weak and easy to meet with a basic client/server architecture
  | with a well defined sync point set on its life-cycle. For
  | example, sync on boot, sync on net-reconnect, etc. Or, you
  | could use a simple P2P protocol so on boot you discover your
  | other connected devices and your apps can shake hand and
  | exchange notes to sync up as they modify their local state:
  | "Hey gang, + "KISS". "Keep it simple ..". Use json if you must.
  | 
  | Further simple characteristics of this app is that the state
  | ops on your dictionary are commutative (like a CRDT!). On one
  | device you add word "Foo" and on the other you add "Bar". You
  | do not care (do you?) that you added them in a certain order,
  | so you don't even have to bother with what is rather difficult
  | to do in a performant manner: maintain a total ordering over
  | the state of the system. (Think sharded ordered-sets..)
 
    | fwip wrote:
    | There's a few edge cases that you'd probably need to handle
    | (like adding the same word with two different definitions),
    | but probably few enough that you could do them ad-hoc and
    | punt to the user. ("Hey, merging these two sets failed,
    | here's the conflict, which do you want to keep?")
    | 
    | Something fancy like Operational Transforms or CRDT might
    | make your life easier if you find a library that's ergonomic
    | for your use case, but it's definitely not necessary.
 
  | jamil7 wrote:
  | The article talks about this, in the case where your clients
  | are syncing with a centralised server you don't necessarily
  | need to use CRDTs, and it might be simpler not to.
 
    | layer8 wrote:
    | This doesn't address how changes that were committed offline
    | are merged on the central server once the clients are online
    | again.
 
    | dinosaurdynasty wrote:
    | Basic CRDTs are still useful (idempotency, associativity, and
    | commutativity are powerful mental tools for all distributed
    | systems, even OT) but you can _massively_ simplify the CRDTs
    | if you assume a super-peer (aka centralized server all
    | updates go through).
    | 
    | Example: last-write-wins register, which is a (v, t) pair,
    | normally has a merge function of "return the pair with the
    | greatest t (with deterministic tie breaking for same t)", and
    | for a super peer just have v's and "the latest value to hit
    | the server wins".
 
| robertlagrant wrote:
| Good shoutout to Automerge in this.
 
| syrusakbary wrote:
| > CRDTs are designed for decentralized systems where there is no
| single central authority to decide what the final state should
| be. There is some unavoidable performance and memory overhead
| with doing this. Since Figma is centralized (our server is the
| central authority), we can simplify our system by removing this
| extra overhead and benefit from a faster and leaner
| implementation
| 
| I think this is on point. It's also super refreshing to see the
| work on Aper [1] [2] (a Rust library implementing state machine
| synchronization across a trusted network). Looking forward next
| series of articles here!
| 
| [1]: https://aper.dev/
| 
| [2]: https://github.com/drifting-in-space/aper
 
  | fwip wrote:
  | Just to clarify, Aper also uses a client/server model, correct?
  | It appears as though it uses a central server to totally order
  | incoming change requests.
 
    | paulgb wrote:
    | Yes, it does.
 
  | paulgb wrote:
  | Thanks, Syrus!
 
| eatonphil wrote:
| Great breakdown of CRDTs vs replicated state machines (like Raft
| or VSR) and the difference between eventually convergent and
| globally orderable workloads, thanks for this Paul!
 
| satvikpendem wrote:
| Does anyone have any resources on making an offline-first app
| from scratch (I don't want to use stuff like Firebase and more
| importantly want to learn how it's done)? Like a task manager for
| example that can sync with an online account but still works
| completely offline.
 
  | LAC-Tech wrote:
  | I swear I've looked into firebase multiple times and I have no
  | idea how they actually detect and handle conflicts. They just
  | hand-wave that it works offline... not that comforting.
  | 
  | Regardless, there's no one size fits all for offline.
  | Approaches that work for append only data don't work for data
  | where multiple nodes are doing destructive updates. And even if
  | you're doing that, some data is amenable to deterministic
  | merging (ie, CRDTs) and some you need a user to step in to
  | decide (revision based multi-master systems, prior art being
  | Pouch/Couch or indeed git).
  | 
  | Basically if you can tell me what exactly a taskmanager is and
  | how it might be updated I might be able to say something more
  | helfpul!
 
| lewisjoe wrote:
| This is the most underrated problem with CRDTs:
| 
| > Both paths will allow us to ensure that each replica converges
| on the same state, but that alone does not guarantee that this
| state is the "correct" state from the point of view of user
| intent.
| 
| In context of richtext editors, it's easy to converge a rich-text
| JSON into a single state for all collaborators. What's difficult
| is to ensure that the converged state is renderable as richtext.
| For example, is there a table cell that was inserted where a
| column was deleted?
| 
| I wrote a summary of this issue with CRDTs here -
| https://writer.zohopublic.com/writer/published/grcwy5c699d67...
| for those interested.
| 
| While I'm optimistic it's not impossible to solve, it's
| surprising why most CRDT literatures don't go into these details
| on correctness. Appreciate the author for putting this out.
| 
| PS: We currently use OT for collaboration in Zoho Writer
| (https://writer.zoho.com/). I'm looking out for practical CRDT
| ideas that works well with richtext.
 
  | truculent wrote:
  | To what extent is this an issue with CRDTs and to what extent
  | is it an issue with the data structure(s) used to represent the
  | state?
 
    | dkjaudyeqooe wrote:
    | I think the short answer is: if you want your CRDTs to
    | enforce application constraints you need to bake them into
    | the data structure.
 
      | truculent wrote:
      | That makes sense. Thank you.
 
  | [deleted]
 
  | jkaptur wrote:
  | The intuitively obvious approach here is to have some kind of
  | schema-guided or constraint-guided CRDT. That is, you get the
  | guarantee that not only do you end up with _valid JSON_ , you
  | get the guarantee that f(that_json) == true. I imagine this is
  | considerably easier said than done, but I'm curious if there's
  | research (or results!) in the area.
 
    | quickthrower2 wrote:
    | Parts of the tree with "don't merge children" might be ok,
    | then you show the user a conflict dialog when this happens.
    | 
    | Or, inspired by your comment: run f for the tree, if false,
    | search for the smallest subset where f is false and show the
    | conflict dialog for that. The user chooses "mine" or
    | "theirs".
 
  | kevsim wrote:
  | We also had the OT vs. CRDT discussion with regards to our
  | editor at Kitemaker (kitemaker.co), and also ended up with OT.
  | For us there were a number of factors, but (in addition to the
  | case you mentioned) it came down to the fact that (at that
  | time) most of the CRDT libs like automerge generate huge diffs
  | that we'd need to send over the wire and also that the open
  | source editor we use (SlateJS) had operations that just slotted
  | very nicely into OT.
  | 
  | Your editor feels really nice BTW. Well done!
 
  | expede wrote:
  | > I'm looking out for practical CRDT ideas that works well with
  | richtext.
  | 
  | Have you seen Peritext from Ink & Switch?
  | https://www.inkandswitch.com/peritext/ It's relatively new, but
  | is a CRDT aimed at rich text!
 
    | jamil7 wrote:
    | Their linked summary mentions Peritext and the current
    | limitations.
    | 
    | Agree it's exciting though and you can implement it fairly
    | easily yourself by following the blog post as it's so well
    | written and explained.
 
  | samwillis wrote:
  | Exactly, the "C" in CRDT is a little misleading. They are
  | "conflict free" in as much as they are able to merge and
  | resolve in the basic sense. That does not mean that the
  | resulting state is valid or meets your internal schema.
  | 
  | A good example of this is using Yjs with ProseMirror, it's
  | possible for two concurrent edits to resolve to a structure
  | that doesn't meet your ProseMirror schema. An example I have
  | used before is a 
element that can contain a single | optional
element. It's possible for two users to add | that caption to the figure concurrently, Yjs will happily merge | its XMLFragment structures and place two captions in the | figure. However when this is loaded into ProseMirror it will | see that the document does not match its schema and dump the | second caption. Fortunately the captions are ordered and so the | same state will re resolved by all parties. | | This is all ok if you are doing the merging on the front end, | but if you try to merge the documents on the server, not | involving ProseMirror, the structure you save, and potentially | index or save as html, will be incorrect. | | Point is, it's still essential with CRDTs to have a schema and | a validation/resolution process. That or you use completely | custom CRDTs that encodes into their resolution process the | validation of the schema. | lewisjoe wrote: | Your example is spot on :) | | I believe we shouldn't lock our richtext state to | Prosemirror's specs. The state should better be modelled as a | CRDT and ensuring correctness should be part of CRDT | operations. This way we unlock other possibilities like | server-side merging and the editing library can be swapped | (eg. lexical) | | All this sounds right but it's lot of work. | lijogdfljk wrote: | Yea, this is kinda why i don't understand a lot of these off | the shelf CRDT solutions. Like CRDTs for JSON. | | I'm still _trying_ to learn CRDTs; but early on in my | learning process i had the thought that CRDTs don 't have | mere data types - as you say, they have Data Type + Conflict | Resolution bundled into one. It's not enough to make a String | type, because different types need to be resolved | differently. Plus some types of resolution have a lot more | metadata, so you want to choose the best fitting one. | | I found that my goal to make SQL + CRDT meant i had to expose | this combo of Type + Resolution to the end user. It seems | essential to how the app works. | | But maybe i don't know anything, /shrug. I'm still learning | them. | toomim wrote: | Yes! We call this the "Merge Type" of data in the braid.org | group. | | Each datum as both a _data type_ and a _merge type_. The | programmer just needs to specify these two types, and then | the programming runtime can choose which synchronization | algorithm to use. | naasking wrote: | > Point is, it's still essential with CRDTs to have a schema | and a validation/resolution process. That or you use | completely custom CRDTs that encodes into their resolution | process the validation of the schema. | | I'm surprised that MS's concurrent revisions haven't taken | off because this is what they do by default: you specify a | custom merge function for any versioned data type so you can | resolve conflicts using any computable strategy. | CGamesPlay wrote: | I don't think I've seen any implementations of this, but the | table example you listed is solvable. The insight is to stop | thinking of the table as a bundle of markup and think about it | as itself a primitive CRDT type that contains text in the table | cells. | | The columns are a CRDT set of column IDs. The rows are a CRDT | map of row IDs to CRDT maps of column IDs to cell values, where | the cell values are again the same type as your document text. | Column headings are a specially formatted row. Rows are ordered | using a sequence type. | | Removing a row is a normal CRDT map deletion. Removing a column | is a normal CRDT map deletion from the columns map. Rendering | the table will ignore map entries that don't correspond to real | columns. | georgelyon wrote: | This comment really needs a "yo dawg" meme. | CGamesPlay wrote: | I mean, JSON is objects all the way down, CRDTs are | conflict-free maps all the way down. | inglor wrote: | So with your solution, the inserted table cell would | "disappear" from what is rendered? | | That's not necessarily (depends on the product) great or even | acceptable user experience. As a user I would certainly want | a way to know there is an "orphaned" "edited after delete" | cell to salvage the data inputted or understand the confusion | between editors. | jameshart wrote: | What you would 'certainly want' depends very much on the | application this is implemented in. The data isn't gone | from the CRDT table unless the program decides it is, so | many ui choices are possible. | georgelyon wrote: | The best UX resolution to this kind of issue I have seen is | to display "This table cell was added by Bob concurrently | to Alice removing its row. How would you like to proceed: | recover the row or ignore the change?" | drfuchs wrote: | Who gets the pop-up? If both Alice and Bob, and they | simultaneously give conflicting replies, you're right | back to where you started. | georgelyon wrote: | Yes, this is the correct behavior. If Alice and Bob | disagree about whether or not that row should be in the | table you aren't going to fix it with math. More often | than not, the expected outcome will be obvious (i.e. | spelling correction in a deleted cell can be ignored) and | if it isn't obvious, there will need to be human | interaction to determine the best path forward. | layer8 wrote: | This isn't really a problem. If Alice and Bob keep | disagreeing, then it is appropriate that a conflict | remains unresolved. | Dylan16807 wrote: | You're not back to where you started. You've gone from a | technical conflict to a direct disagreement about what is | supposed to happen. There's no technical solution to an | edit war. So maybe ask anew about the conflicting | answers, or pick one arbitrarily at that point. Or make | them play rock paper scissors. But it still helps to ask | _before_ it becomes a direct disagreement. | jlokier wrote: | Usually that doesn't happen because their replies don't | conflict within the time interval of a round trip between | the two of them, so one can see the other's action first, | and decide to agree. | | But if it does happen, either they get a new popup, | eventually, saying that their conflict-resolution | decisions also conflicted and what would they like to do, | or the system decays to a CRDT-style pre-determined | resolution rule for the second-order conflict -- with | similar problems to the original CRDT resolution rule we | tried to avoid. Things like "give priority to the person | who chose to keep information instead of deleting", "give | priority to the person with the biggest random number", | "merge them by keeping information along with a note in | the document about the attempt to delete the column", or | "delete the column and put the edited cell in a note". | But this time, only when the users conflict after the | first popup, so it doesn't occur as often. | | I think you're describing a metastability effect which | occurs in all distributed systems that don't use pre- | determined resolution rules like pure CRDTs. It happens | in network protocols, distributed transactions, and | consensus protocols in general. If the process you use to | resolve depends on input from multiple parties who don't | have a way to communicate and could choose conflicting | values, there's a non-zero probability of needing another | round or to escalate upwards to a higher-level system to | resolve. | | In many technical systems, _provided there 's a way to | communicate_ it's not difficult to make the probability | of repeated rounds or escalations tend arbitrarily close | to zero (but not actually zero). If there's a possibility | of network partition, though, the conflict can come back | later when communication returns; there's no way to | completely make it go away. | tomhallett wrote: | Stupid question - are there any good resources on User | Experience patterns/recipes for implementing OT/CRDTs? | | Context - a lot of the literature is about the backend | algorithm. But that still leaves individual teams | tripping over the frontend interaction design and | usability. | | Example of what I'm looking for - hex.tech has a video | tutorial of their "Cell locking" feature, where the cell | gets disabled and a "Mac Lockard has taken control of a | cell you were editing" toast notification appears. [1]. | This is fully baked and a recipe which if I put in a jira | ticket could actually get implemented. :) | | I would love to read a book which is a "Don't make me | think" and has an example with "This table cell was added | by Bob concurrently to Alice removing its row. How would | you like to proceed: recover the row or ignore the | change?". All the UI's in my head would look like | "Clippy" from MS Word - or "Pipey" from Silicon Valley | [2], haha. | | 1: https://hex.tech/blog/a-pragmatic-approach-to-live- | collabora... | | 2: https://www.youtube.com/watch?v=E2YcOV5C2x4 | toomim wrote: | This is close: https://decentpatterns.com/library/ | | ...but my general belief is that UI patterns need to | evolve as (1) specific UIs before they become (2) general | patterns, and we simply don't have enough good solutions | yet to make a pattern library out of them. | jdvh wrote: | For the table situation we solve this by have a generic concept | of a "card" that is rendered as a table cell when it's the | grandchild of a table (table > tr > td) but otherwise it's just | a standalone item, much like a markdown callout. | aboodman wrote: | This is a cool approach. It reminds me of statebox by mochimedia: | https://github.com/mochi/statebox. | | If I'm understanding correctly, it requires the mutations to be | deterministic in order for the nodes to converge. | | Replicache (replicache.dev - my thing) takes a similar approach | except it does _not_ requires the mutations to be deterministic, | which is very useful because it enables the mutations to reach | out to other services, be authenticated, etc. | | Both the idea here and Replicache's approach are closely related | to game networking. If you are interested in these ideas, a | really excellent set of content is: | https://www.gabrielgambetta.com/client-server-game-architect.... | | If you are interested in how Replicache's sync protocol works and | achieves transactional changes with server authority, it is | described here: https://doc.replicache.dev/concepts/how-it-works. | anonymousDan wrote: | Am I missing something? I'm all for SMR when possible but the | whole point of moving to something with weaker consistency is | that you don't want to rely on availability of a central server | (or quorum of servers in the case of SMR)? | LAC-Tech wrote: | First off, well done on examining your problem domain, realising | that your operations are not in-fact commutative, and realising | you need to discard your model. A lot of people would have just | decided it was 'agile' to 'iterate' over there fundamentally | broken solution until they had coded themselves in a corner, but | I digress. | | I would note that your path 2 is essentially another kind of CRDT | though. Instead of your replica being in individual document, | your replica is now the entire DB, and your changes are a grow | only sequence. (Wish I could remember how pointed this out to me | on HN, my mind was blown). | wim wrote: | For certain data structures the "pure" CRDT approach can get | tricky. In our case, we're building a collaborative notes/task | IDE [1] which is a hybrid of a text editor and an outliner at the | same time. So you can perform all the usual text editor | operations on a document as if it was text, but the underlying | data structure is a tree (or graph really). | | So just like the example in article we use a hybrid approach and | global order for certain (tree) operations to enforce invariants | on the data structure when syncing. That still works with | offline-mode and end-to-end encryption, but as we don't have the | requirement to get rid of a central server we can get away with | this model. | | [1] https://thymer.com | rubyist5eva wrote: | "You might not (probably need) need $FANCY_TRENDY_THING" | acrefoot wrote: | Whatever Clickup uses for their Clickup docs, it reliably gets | into a confused state and data is lost, during which not all | editors show the same state locally. | hardwaregeek wrote: | I really loved the Signals and Thread episode[0] on state machine | replication. It basically said "you could do all of these fancy | distributed systems techniques that assume a non total ordering | of events, ooooorr you could have a main server that determines a | total ordering for events and make your life infinitely easier" | | [0]: https://signalsandthreads.com/state-machine-replication- | and-... | Joeri wrote: | It breaks down as soon as you have an offline scenario. You can | only assume the server is authorative if there will not be | network partitions or if writes are not accepted on the other | side of the network partition (meaning on the client). As soon | as the client accepts writes while offline those writes become | authorative and will need to be merged with whatever changes | the server saw. | layer8 wrote: | It also breaks down when you have a passive/non-transactional | server, like an app syncing its state via files on Dropbox. | derefr wrote: | Or even just have a simple central presence server hand out | randomized-per-epoch node-IDs; have all nodes in the system use | node-IDs as the prefix of the event temporal ordering key per | "tick"; and trust your nodes to self-report their node-ID | correctly in all events they send. Then your data protocol can | still be p2p rather than all events having to flow through the | server (if that benefits you in your problem domain) while | getting all the same advantages of total ordering. | | And this works great for anything enterprise-y, where all the | nodes are operated by one party or a trusted consortium of | parties, and no nodes are are intentionally trying to cheat by | misreporting their assigned node-IDs. You only _need_ central | servers (or virtual central servers, a.k.a. blockchains) when | you get to things like MMO games, or HFT trading bots talking | to an exchange, where every client has a reason to want to | pretend it 's the one that "should" go earliest in the per-tick | transaction ordering. | lifeisstillgood wrote: | But if I have a conflict (ie one node changes a column | datatype, another adds data to column) how do i know which | one went first _inside the same tick?_ | | If I understand you, there will be a tick between 12.01 and | 12.02, and node A gets token 1234, and node B gets 5678. | Events are published p2p, and the ordering is 5678_+9seconds | and 1234_+12seconds - node A (1234) admits it has done the | action 3 seconds after B. But this still depends on | synchronised clocks I think? | | Also you still need to resolve each type of conflict (hence | having CRDTs really helps) - we know there is a conflict | between A and B changes, but how do we _resolve_ them? | | Maybe I dont get it :-) | meheleventyone wrote: | Yeah using an event stream like this requires synchronising | the clock of participants. It's similar to rollback | networking in games. Peers can recover from receiving late | updates but if all the updates are late from one peer the | situation still degrades. To avoid that participants run | time per tick slightly shorter or longer to keep within a | desired window. | pookha wrote: | I thought James Long's 200 or three hundred lines of code to | implement a real-world CRDT solution (Actual) was pretty cool. | Seemed to solve a lot of issues revolving around heavy duty | servers. | [deleted] | mathgladiator wrote: | Another reason to not need a CRDT is privacy. A CRDT assumes that | every participate has equal rights to all data. | | I'm biased to simple WebSockets with OT using JSON deltas, and | I'm building a platform that greatly simplifies how to | efficiently build collaborative apps: https://www.adama- | platform.com/ | CGamesPlay wrote: | These are orthogonal to one another. For read-protection, | encryption is a viable solution: specific paths into your data | structure (e.g. certain keys in a map) can be encrypted with a | key that isn't provided to all participants. For write- | protection, the solution is the same as with any syncing | service: when receiving an update, if you don't authorize the | sender of the update, you don't accept it. | samwillis wrote: | > A CRDT assumes that every participate has equal rights to all | data. | | Within a single data structure yes, so within a "room" or | "document". If you need to partition right to the data | (read/update) then you should use separate structures for each | area. | | Yjs implements this as as "sub documents". | charles_f wrote: | That sounds like a weak counter argument to me, 1) crdt is | focusing on the conflict resolution part, access control is not | in scope so you need to implement on top. 2) if you are already | implementing access control, then filtering out what people | don't have access to doesn't seem much more complexity. If you | go the full client-side way, you can also add a cryptographic | layer to whatever needs hiding 3) one thing I don't see in your | solution is the offline aspect to it. With a central authority | in the middle and online connectivity, conflicts become | unlikely and much smaller, but with offline support, documents | et can evolve in drastic ways where having a formalized | strategy like what crdt offers makes sense | pookha wrote: | CRDT's can be encrypted end-to-end. Privacy seems to be one of | the selling points. | pattrn wrote: | Is this really the case? While CRDT's are designed to work | peer-to-peer, they don't need to be fully connected to all | clients. Forcing the synchronization through controlled nodes | (a server or cluster) allows adding read/write permissions. | Depending on the use case, it may require additional logic for | reversing operations before propagating to other clients, or in | some cases forcing a client to revert to a snapshot (this can | be a bit complex). That's an approach I've used in the past. | | Have I overlooked something (highly likely)? | dxchester wrote: | We have used Automerge a bunch, but found that there is a | threshold where beyond a given document size, performance gets | exponentially worse, until even trivial updates take many | seconds' worth of CPU. That is often how it works when the | document end state is exclusively the sum of all the edits that | have ever happened. | | Our answer was to reimplement the Automerge API with different | mechanics underneath that allows for a "snapshots + recent | changes" paradigm, instead of "the doc is the sum of all | changes". That way performance doesn't have to degrade over time | as changes accumulate. | | Project is here: https://github.com/frameable/pigeon, with some | benchmarks: https://github.com/frameable/pigeon/wiki/Benchmarks | in the wiki... | supermatt wrote: | > Conflict-free replicated datasets (CRDTs) | | Conflict-free Replicated Data Types, surely? | paulgb wrote: | Oof, in the first sentence, too, that's embarrassing. Fixed, | thanks! | supermatt wrote: | No problem :) | nobu-mori wrote: | CRDTs have been on HN a lot recently. I'm working on a database | that deals in events rather than raw data. Application developers | specify event handlers in JavaScript. The database layer then | takes the event handlers and distributes them to the clients. | Clients receive the raw stream of events and reconcile the final | data state independently. The key aspect is all the event | handlers are reversible. This allows clients to insert locally | generated events immediately. If any remote events are received | out-of-order, the client can undo events, insert the new events, | and reapply everything on top of the new state. | | I'm curious how many people have a need for solving the real-time | multi-player use case. The database I'm working on was inspired | by the networking code used in fighting games (rollback netcode), | but I'm always curious to learn about additional use cases. | no_wizard wrote: | I think the most common outside of gaming is collaboration | software. Think whiteboards, docs (e.g. Google Docs) and other | collaborative tools. ___________________________________________________________________ (page generated 2022-12-05 23:00 UTC)