[HN Gopher] FoundationDB: A Distributed, Unbundled, Transactiona...
___________________________________________________________________
 
FoundationDB: A Distributed, Unbundled, Transactional Key Value
Store [pdf]
 
Author : wwilson
Score  : 130 points
Date   : 2021-06-07 16:37 UTC (6 hours ago)
 
web link (www.foundationdb.org)
w3m dump (www.foundationdb.org)
 
| jwr wrote:
| I just implemented a database with changefeeds using FoundationDB
| (in Clojure), to eventually replace RethinkDB in my system. Very
| impressed so far.
 
| jbverschoor wrote:
| It's unfortunate that they went silent for years after the Apple
| acquisition. That period was key for database adoption. I have
| the feeling everybody kind of settled for pgsql.
 
  | threeseed wrote:
  | > I have the feeling everybody kind of settled for pgsql.
  | 
  | That's probably because of spending time on this echo chamber.
  | 
  | In reality everyone has likely been staying with the same
  | databases they know and love but just moved to the cloud. It's
  | why now AWS for example offers such a wide variety of databases
  | e.g. MySQL, PostgreSQL, SQL Server, Oracle, MongoDB, Cassandra,
  | Redis.
 
  | eloff wrote:
  | Those are two completely non overlapping use cases. If you can
  | use pgsql for your problem, you have no business trying to use
  | a distributed key value store instead. That would be at least
  | as dumb as driving screws with a hammer.
 
    | cwp wrote:
    | Yeah, but there are quite a few efforts out there to extend
    | PG into a distributed DB of one flavor or another. Some
    | examples are YugabyteDB, CockroachDB, Aurora and Citus. It's
    | a reasonable approach, but it's also reasonable to come at it
    | from the other direction - build a SQL engine on top of a
    | solid distributed key-value store. Contrafactuals are always
    | dicey, but FDB vanishing behind the Apple wall of silence
    | sure didn't help.
 
| eyelovewe wrote:
| CouchDB 4 is built upon Foundation FWIW
 
  | jbverschoor wrote:
  | Didn't know, very happy to hear
 
| rubyn00bie wrote:
| Here's one of my favorite articles on FoundationDB, where it
| (FDB) passes Jepsen first try:
| https://web.archive.org/web/20150312112556/http://blog.found...
| 
| > I ran FoundationDB Key-Value Store through every nemesis in
| Jepsen - including those that found failures in other databases -
| and FoundationDB passed all of them with flying colors.
| 
| FoundationDB is one of the coolest pieces of technology I've used
| in the past decade. The tuple keyspace is incredibly useful, so
| are the multi-key transactions. I've physically killed the power
| on an FDB node and FDB cluster; multiple times (heh, home
| servers)... and _every_ time the cluster or node just comes back.
 
  | gregwebs wrote:
  | That's great that you are doing your own resiliency testing.
  | 
  | Having someone other than those officially on the Jepsen
  | project run the Jepsen test is a good start. However, many
  | databases have claimed to run the Jepsen tests themselves and
  | pass, but when there is an actual paid engagement for a
  | distributed database there are always issues that are found.
  | That's generally true even for unpaid official runs as well
  | although Zookeeper did pass existing tests. Every database is
  | different and the paid engagement will design specific tests
  | designed to break the database in question.
 
  | kendallgclark wrote:
  | This was the FDB team's stock demo in the early days. It's a
  | killer move.
 
| jFriedensreich wrote:
| I am pretty sure that the new cloudant transaction/storage engine
| is also based on foundationDB, which powers a lot of things
| behind the scenes at ibm. And couchdb 4 with foundationDB storage
| engine is hopefully not too far out either. Lets see how long
| this whole transition takes, but i am still hopeful that the
| mindshare and motivation of apple, snowflake, ibm and apache
| community will lead to something great.
 
| jorangreef wrote:
| Markus Pilman from Snowflake did an awesome talk on
| FoundationDB's testing at CMU's Quarantine Tech Talks (2020), How
| I Learned to Stop Worrying and Trust the Database:
| 
| https://www.youtube.com/watch?v=OJb8A6h9jQQ
 
  | sgk284 wrote:
  | Here's another excellent talk at Strangeloop on FoundationDB's
  | simulation testing by Will Wilson in 2014:
  | https://www.youtube.com/watch?v=4fFDFbi3toc
 
| jtdev wrote:
| I'd love to see a good primer on data models and scenarios that
| are well suited to FDB.
 
  | selljamhere wrote:
  | Their docs might be a good place to start.
  | https://apple.github.io/foundationdb/developer-guide.html#da...
 
  | sigstoat wrote:
  | this is limited by your creativity and willingness to make
  | tradeoffs.
  | 
  | the only really general statement i can think of is that the
  | "larger"/"longer" your transactions are, the harder a time
  | you'll have getting it to cooperate with FDB. "small"/"fast"
  | transactions will be easier to fit into its model.
  | 
  | (to likely replies: this isn't an absolute, see all the quotes.
  | yes things like redwood will alleviate some of this, but not
  | all.)
 
    | vvern wrote:
    | IIRC fdb is fully optimistic concurrency control. It doesn't
    | do any locking. If you have workloads which are highly
    | contended, you'll need to do something in the layer above to
    | coordinate. Otherwise, performance will be unbearable.
    | 
    | This may be out-dated, please let me know if the story has
    | evolved here.
 
| georgelyon wrote:
| FDB is an awesome and unique piece of software (I attribute quite
| a bit of Snowflake's success to FDB). I've also had the pleasure
| of meeting some folks from the original team and they are true
| engineers. Does anyone know if/when Redwood (the new storage
| engine) has landed / will land?
 
  | victor106 wrote:
  | > I attribute quite a bit of Snowflake's success to FDB
  | 
  | How so?
 
    | foobiekr wrote:
    | Snowflake is the biggest deployment of fdb in the world after
    | iCloud.
 
  | kendallgclark wrote:
  | Founders are building a distributed systems simulation product
  | now called Antithesis. My data fabric startup, Stardog, is a
  | happy Antithesis early adopter customer. It's helping us
  | reproduce and fix non-deterministic bugs deterministically.
  | Good stuff.
 
| twoodfin wrote:
| Did they ever implement a SQL layer? They seemed like one of the
| only NoSQL products with the architecture to make it plausible to
| do so.
 
| polskibus wrote:
| What is the backup / restore story in FoundationDB? How does it
| compare to postgresql?
 
  | ex3ndr wrote:
  | Much much better. Single line backup/restore and Disaster
  | Recovery mode that syncs second DC and able too switch on the
  | fly with barely any configs (except one file).
 
| e12e wrote:
| This seems like a good place to ask - are there any new and
| exiting FOSS "application" worth checking out? I recall from the
| initial publication of the source - there was references to a
| great sql layer? I don't know if a FOSS work-a-like ever
| materialized? Other things I'd hoped for was a network
| filesystem/blob layer, like maybe s3/nfs/webdavfs compatible?
| What are people building on top of foundationdb today?
| 
| Ed: i suppose various document/db applications - like IMAP might
| be a good fit too?
 
  | jFriedensreich wrote:
  | large unstructured blobs and large files are among the things
  | not well suited to foundationdb and couchdb 4 actually reduced
  | supported blob size in the transition to foundationdb. it looks
  | like object/blob storage systems are at the moment rather
  | seperating more from key/value and document storage than
  | growing together. but this is a good thing because the
  | tradeoffs are very different and it allows each system to focus
  | on what it does best. blob stores will hopefully move even more
  | to content addressing and merkle dag similar to git and ipfs.
 
  | agency wrote:
  | I'm curious about this as well. Is anyone working on building
  | text search on top of FDB? It's kind of astounding to me that
  | last time I checked Elasticsearch was still essentially the
  | only game in town.
 
    | jFriedensreich wrote:
    | its pretty hard to catch up with lucene, there is just so
    | much work, features and brainpower in there at this point. as
    | many features of foundationdb such as the transaction
    | guarantees and reliability are not super important for
    | fulltext search i cannot imagine any company even apple or
    | ibm being able to justify that gigantic investment, instead
    | im sure nearly any soluion willcontinue to use lucene under
    | the hood for the forseeable future.
 
  | sigstoat wrote:
  | peruse the fdb forum. they produce document and record layers
  | now. there are community layers of varying quality for a
  | network block device, a filesystem, and a few other things.
 
| AtlasBarfed wrote:
| They got acquihired by apple, didn't they? Was. Fdb ever oss'd?
| 
| Is it CP or AP? Comments seem to imply AP
 
  | ssgao wrote:
  | FoundationDB is Apache 2.0
  | https://github.com/apple/foundationdb/blob/master/LICENSE
  | 
  | It is CP per https://apple.github.io/foundationdb/cap-
  | theorem.html
 
  | kendallgclark wrote:
  | It wasn't an acquihire. Apple paid a lot of $$ for FDB.
 
| [deleted]
 
| ryanworl wrote:
| Two quotes from the paper that I think will motivate people to
| read it:
| 
| "Rigorous correctness testing via simulation makes FDB extremely
| reliable. In the past several years, CloudKit [59] has deployed
| FDB for more than 0.5M disk years without a single data
| corruption event. Additionally, we constantly perform data
| consistency checks by comparing replicas of data records and
| making sure they are the same. To this date, no inconsistent data
| replicas have ever been found in our production clusters."
| 
| "For example, early versions of FDB depended on Apache Zookeeper
| for coordination, which was deleted after real-world fault
| injection found two independent bugs in Zookeeper (circa 2010)
| and was replaced by a de novo Paxos implementation written in
| Flow. No production bugs have ever been reported since."
 
  | jeffbee wrote:
  | Ehhhh, doesn't align with my experience. I think FDB is
  | actually really poorly tested. When I was evaluating it for
  | replacement of the metadata key-value store at a major, public
  | web services company we found that injecting faults into
  | virtual NVMe devices on individual replicas would cause corrupt
  | results returned to clients. We also found that it would just
  | crash-loop on Linux systems with huge pages, because although
  | someone from the project had written a huge-page-aware C++
  | allocator "for performance", evidently nobody had ever actually
  | tried to use it, including the author.
  | 
  | It's also really, really weird that their non-scalable
  | architecture hits a brick wall at 25 machines. Ignoring the
  | correctness flaws, it only works if you can either design
  | around that limit by sharding, and never off cross-shard
  | transactions, or if you can assure yourself that your use case
  | will never outgrow half a rack of equipment.
 
    | fnordpiglet wrote:
    | Can you fix a point in time? Software evolves and I think a
    | point I saw is that it wasn't well tested then they changed
    | once production workloads told them it needs to change.
 
    | bpicolo wrote:
    | What were the strong contenders?
 
    | rbranson wrote:
    | Were there other distributed databases that did pass the
    | fault injection testing?
 
      | jeffbee wrote:
      | There weren't any, which is why that particular shop
      | elected to roll their own distributed system on top of
      | rocks.
      | 
      | In general I think people who think they want to do
      | FoundationDB owe themselves a serious contemplation of the
      | cost/benefit of using Cloud Spanner instead. Obviously you
      | cannot do your own fault injection testing of Spanner, but
      | it does have end-to-end checksums.
 
        | sigstoat wrote:
        | > There weren't any, which is why that particular shop
        | elected to roll their own distributed system on top of
        | rocks.
        | 
        | that's nuts. rocks could've been added as a storage
        | engine to fdb far more easily.
 
        | ryanworl wrote:
        | This is currently in progress right now.
        | 
        | https://github.com/apple/foundationdb/blob/e7d7b39f12afa8
        | ea2...
 
        | jeffbee wrote:
        | For the record, I said the same thing. But it's a
        | management problem because on the one hand you have a
        | known open project with demonstrable flaws, and on the
        | other you have your own in-house developers and you will
        | tend to discount the bugs they haven't written yet.
        | 
        | But, also for the same record, thinking you can implement
        | a reliable, globally-replicated key-value store on top of
        | FoundationDB that is cheaper and better than Cloud
        | Spanner may be evidence of the same cognitive bias.
 
        | sigstoat wrote:
        | > But, also for the same record, thinking you can
        | implement a reliable, globally-replicated key-value store
        | on top of FoundationDB that is cheaper and better than
        | Cloud Spanner may be evidence of the same cognitive bias.
        | 
        | man, good thing nobody made any claim like that.
 
  | sandinmyjoints wrote:
  | What is the Flow referred to here?
 
    | oconnor663 wrote:
    | It's an async/await framework for C++. I'm not sure what the
    | best source on this is, but here's a discussion:
    | https://forums.foundationdb.org/t/why-was-flow-
    | developed/171...
    | 
    | My understanding is that FDB relies heavily on deterministic
    | simulations for testing, and that their async/await model is
    | a big part of how they make sure they cover different
    | possible interleavings in a deterministic way.
 
  | jorangreef wrote:
  | Thanks for the quotes, I've been wanting to read this paper for
  | some time. Great to see they went through the consensus
  | literature and made a decision to go with Active Disk Paxos,
  | instead of stopping short and not fully understanding the
  | consensus they're building on. The consensus and replication
  | protocol is such a huge part of building a distributed
  | database.
 
  | fizwhiz wrote:
  | > de novo Paxos implementation written in Flow
  | 
  | That's... brave. Flow is a DSL built on top of C++?
 
    | alistairw wrote:
    | Yeah it's their own language on top of c++ to help them with
    | testing distributed systems with deterministic simulation.
    | 
    | Their talk from a while ago about it was something that
    | really blew me away at the time [0]
    | 
    | [0] https://www.youtube.com/watch?v=4fFDFbi3toc
 
| monstrado wrote:
| Have nothing but praise for FoundationDB. It has been by far the
| most rock solid distributed database I have ever had the pleasure
| of using. I used to manage HBase clusters, and the fact that I
| have never once had to worry about manually splitting "regions"
| is such a boon for administration...let alone JVM GC tuning.
| 
| We run several FDB clusters using 3-DC replication and have never
| once lost data. I remember when we wanted to replace all of the
| FDB hardware (one cluster) in AWS, and so we just doubled the
| cluster size, waited for data shuffling to calm down, and just
| started axing the original hardware. We did this all while
| performing over 100K production TPS.
| 
| One thing that makes the above seamless for all existing
| connections is that clients automatically update their "cluster
| file" in the event that new coordinators join or are reassigned.
| That alone is amazing...as you don't have to track down every
| single client and change / re-roll with new connection
| parameters.
| 
| Anyway, I talk this database up every chance I get. Keep up the
| awesome work.
| 
| - A very happy user.
 
___________________________________________________________________
(page generated 2021-06-07 23:00 UTC)