proxy70

	[HN Gopher] FoundationDB: A Distributed, Unbundled, Transactiona... ___________________________________________________________________ FoundationDB: A Distributed, Unbundled, Transactional Key Value Store [pdf] Author : wwilson Score : 130 points Date : 2021-06-07 16:37 UTC (6 hours ago)
	web link (www.foundationdb.org)
	w3m dump (www.foundationdb.org)
	\| jwr wrote: \| I just implemented a database with changefeeds using FoundationDB \| (in Clojure), to eventually replace RethinkDB in my system. Very \| impressed so far. \| jbverschoor wrote: \| It's unfortunate that they went silent for years after the Apple \| acquisition. That period was key for database adoption. I have \| the feeling everybody kind of settled for pgsql. \| threeseed wrote: \| > I have the feeling everybody kind of settled for pgsql. \| \| That's probably because of spending time on this echo chamber. \| \| In reality everyone has likely been staying with the same \| databases they know and love but just moved to the cloud. It's \| why now AWS for example offers such a wide variety of databases \| e.g. MySQL, PostgreSQL, SQL Server, Oracle, MongoDB, Cassandra, \| Redis. \| eloff wrote: \| Those are two completely non overlapping use cases. If you can \| use pgsql for your problem, you have no business trying to use \| a distributed key value store instead. That would be at least \| as dumb as driving screws with a hammer. \| cwp wrote: \| Yeah, but there are quite a few efforts out there to extend \| PG into a distributed DB of one flavor or another. Some \| examples are YugabyteDB, CockroachDB, Aurora and Citus. It's \| a reasonable approach, but it's also reasonable to come at it \| from the other direction - build a SQL engine on top of a \| solid distributed key-value store. Contrafactuals are always \| dicey, but FDB vanishing behind the Apple wall of silence \| sure didn't help. \| eyelovewe wrote: \| CouchDB 4 is built upon Foundation FWIW \| jbverschoor wrote: \| Didn't know, very happy to hear \| rubyn00bie wrote: \| Here's one of my favorite articles on FoundationDB, where it \| (FDB) passes Jepsen first try: \| https://web.archive.org/web/20150312112556/http://blog.found... \| \| > I ran FoundationDB Key-Value Store through every nemesis in \| Jepsen - including those that found failures in other databases - \| and FoundationDB passed all of them with flying colors. \| \| FoundationDB is one of the coolest pieces of technology I've used \| in the past decade. The tuple keyspace is incredibly useful, so \| are the multi-key transactions. I've physically killed the power \| on an FDB node and FDB cluster; multiple times (heh, home \| servers)... and _every_ time the cluster or node just comes back. \| gregwebs wrote: \| That's great that you are doing your own resiliency testing. \| \| Having someone other than those officially on the Jepsen \| project run the Jepsen test is a good start. However, many \| databases have claimed to run the Jepsen tests themselves and \| pass, but when there is an actual paid engagement for a \| distributed database there are always issues that are found. \| That's generally true even for unpaid official runs as well \| although Zookeeper did pass existing tests. Every database is \| different and the paid engagement will design specific tests \| designed to break the database in question. \| kendallgclark wrote: \| This was the FDB team's stock demo in the early days. It's a \| killer move. \| jFriedensreich wrote: \| I am pretty sure that the new cloudant transaction/storage engine \| is also based on foundationDB, which powers a lot of things \| behind the scenes at ibm. And couchdb 4 with foundationDB storage \| engine is hopefully not too far out either. Lets see how long \| this whole transition takes, but i am still hopeful that the \| mindshare and motivation of apple, snowflake, ibm and apache \| community will lead to something great. \| jorangreef wrote: \| Markus Pilman from Snowflake did an awesome talk on \| FoundationDB's testing at CMU's Quarantine Tech Talks (2020), How \| I Learned to Stop Worrying and Trust the Database: \| \| https://www.youtube.com/watch?v=OJb8A6h9jQQ \| sgk284 wrote: \| Here's another excellent talk at Strangeloop on FoundationDB's \| simulation testing by Will Wilson in 2014: \| https://www.youtube.com/watch?v=4fFDFbi3toc \| jtdev wrote: \| I'd love to see a good primer on data models and scenarios that \| are well suited to FDB. \| selljamhere wrote: \| Their docs might be a good place to start. \| https://apple.github.io/foundationdb/developer-guide.html#da... \| sigstoat wrote: \| this is limited by your creativity and willingness to make \| tradeoffs. \| \| the only really general statement i can think of is that the \| "larger"/"longer" your transactions are, the harder a time \| you'll have getting it to cooperate with FDB. "small"/"fast" \| transactions will be easier to fit into its model. \| \| (to likely replies: this isn't an absolute, see all the quotes. \| yes things like redwood will alleviate some of this, but not \| all.) \| vvern wrote: \| IIRC fdb is fully optimistic concurrency control. It doesn't \| do any locking. If you have workloads which are highly \| contended, you'll need to do something in the layer above to \| coordinate. Otherwise, performance will be unbearable. \| \| This may be out-dated, please let me know if the story has \| evolved here. \| georgelyon wrote: \| FDB is an awesome and unique piece of software (I attribute quite \| a bit of Snowflake's success to FDB). I've also had the pleasure \| of meeting some folks from the original team and they are true \| engineers. Does anyone know if/when Redwood (the new storage \| engine) has landed / will land? \| victor106 wrote: \| > I attribute quite a bit of Snowflake's success to FDB \| \| How so? \| foobiekr wrote: \| Snowflake is the biggest deployment of fdb in the world after \| iCloud. \| kendallgclark wrote: \| Founders are building a distributed systems simulation product \| now called Antithesis. My data fabric startup, Stardog, is a \| happy Antithesis early adopter customer. It's helping us \| reproduce and fix non-deterministic bugs deterministically. \| Good stuff. \| twoodfin wrote: \| Did they ever implement a SQL layer? They seemed like one of the \| only NoSQL products with the architecture to make it plausible to \| do so. \| polskibus wrote: \| What is the backup / restore story in FoundationDB? How does it \| compare to postgresql? \| ex3ndr wrote: \| Much much better. Single line backup/restore and Disaster \| Recovery mode that syncs second DC and able too switch on the \| fly with barely any configs (except one file). \| e12e wrote: \| This seems like a good place to ask - are there any new and \| exiting FOSS "application" worth checking out? I recall from the \| initial publication of the source - there was references to a \| great sql layer? I don't know if a FOSS work-a-like ever \| materialized? Other things I'd hoped for was a network \| filesystem/blob layer, like maybe s3/nfs/webdavfs compatible? \| What are people building on top of foundationdb today? \| \| Ed: i suppose various document/db applications - like IMAP might \| be a good fit too? \| jFriedensreich wrote: \| large unstructured blobs and large files are among the things \| not well suited to foundationdb and couchdb 4 actually reduced \| supported blob size in the transition to foundationdb. it looks \| like object/blob storage systems are at the moment rather \| seperating more from key/value and document storage than \| growing together. but this is a good thing because the \| tradeoffs are very different and it allows each system to focus \| on what it does best. blob stores will hopefully move even more \| to content addressing and merkle dag similar to git and ipfs. \| agency wrote: \| I'm curious about this as well. Is anyone working on building \| text search on top of FDB? It's kind of astounding to me that \| last time I checked Elasticsearch was still essentially the \| only game in town. \| jFriedensreich wrote: \| its pretty hard to catch up with lucene, there is just so \| much work, features and brainpower in there at this point. as \| many features of foundationdb such as the transaction \| guarantees and reliability are not super important for \| fulltext search i cannot imagine any company even apple or \| ibm being able to justify that gigantic investment, instead \| im sure nearly any soluion willcontinue to use lucene under \| the hood for the forseeable future. \| sigstoat wrote: \| peruse the fdb forum. they produce document and record layers \| now. there are community layers of varying quality for a \| network block device, a filesystem, and a few other things. \| AtlasBarfed wrote: \| They got acquihired by apple, didn't they? Was. Fdb ever oss'd? \| \| Is it CP or AP? Comments seem to imply AP \| ssgao wrote: \| FoundationDB is Apache 2.0 \| https://github.com/apple/foundationdb/blob/master/LICENSE \| \| It is CP per https://apple.github.io/foundationdb/cap- \| theorem.html \| kendallgclark wrote: \| It wasn't an acquihire. Apple paid a lot of $$ for FDB. \| [deleted] \| ryanworl wrote: \| Two quotes from the paper that I think will motivate people to \| read it: \| \| "Rigorous correctness testing via simulation makes FDB extremely \| reliable. In the past several years, CloudKit [59] has deployed \| FDB for more than 0.5M disk years without a single data \| corruption event. Additionally, we constantly perform data \| consistency checks by comparing replicas of data records and \| making sure they are the same. To this date, no inconsistent data \| replicas have ever been found in our production clusters." \| \| "For example, early versions of FDB depended on Apache Zookeeper \| for coordination, which was deleted after real-world fault \| injection found two independent bugs in Zookeeper (circa 2010) \| and was replaced by a de novo Paxos implementation written in \| Flow. No production bugs have ever been reported since." \| jeffbee wrote: \| Ehhhh, doesn't align with my experience. I think FDB is \| actually really poorly tested. When I was evaluating it for \| replacement of the metadata key-value store at a major, public \| web services company we found that injecting faults into \| virtual NVMe devices on individual replicas would cause corrupt \| results returned to clients. We also found that it would just \| crash-loop on Linux systems with huge pages, because although \| someone from the project had written a huge-page-aware C++ \| allocator "for performance", evidently nobody had ever actually \| tried to use it, including the author. \| \| It's also really, really weird that their non-scalable \| architecture hits a brick wall at 25 machines. Ignoring the \| correctness flaws, it only works if you can either design \| around that limit by sharding, and never off cross-shard \| transactions, or if you can assure yourself that your use case \| will never outgrow half a rack of equipment. \| fnordpiglet wrote: \| Can you fix a point in time? Software evolves and I think a \| point I saw is that it wasn't well tested then they changed \| once production workloads told them it needs to change. \| bpicolo wrote: \| What were the strong contenders? \| rbranson wrote: \| Were there other distributed databases that did pass the \| fault injection testing? \| jeffbee wrote: \| There weren't any, which is why that particular shop \| elected to roll their own distributed system on top of \| rocks. \| \| In general I think people who think they want to do \| FoundationDB owe themselves a serious contemplation of the \| cost/benefit of using Cloud Spanner instead. Obviously you \| cannot do your own fault injection testing of Spanner, but \| it does have end-to-end checksums. \| sigstoat wrote: \| > There weren't any, which is why that particular shop \| elected to roll their own distributed system on top of \| rocks. \| \| that's nuts. rocks could've been added as a storage \| engine to fdb far more easily. \| ryanworl wrote: \| This is currently in progress right now. \| \| https://github.com/apple/foundationdb/blob/e7d7b39f12afa8 \| ea2... \| jeffbee wrote: \| For the record, I said the same thing. But it's a \| management problem because on the one hand you have a \| known open project with demonstrable flaws, and on the \| other you have your own in-house developers and you will \| tend to discount the bugs they haven't written yet. \| \| But, also for the same record, thinking you can implement \| a reliable, globally-replicated key-value store on top of \| FoundationDB that is cheaper and better than Cloud \| Spanner may be evidence of the same cognitive bias. \| sigstoat wrote: \| > But, also for the same record, thinking you can \| implement a reliable, globally-replicated key-value store \| on top of FoundationDB that is cheaper and better than \| Cloud Spanner may be evidence of the same cognitive bias. \| \| man, good thing nobody made any claim like that. \| sandinmyjoints wrote: \| What is the Flow referred to here? \| oconnor663 wrote: \| It's an async/await framework for C++. I'm not sure what the \| best source on this is, but here's a discussion: \| https://forums.foundationdb.org/t/why-was-flow- \| developed/171... \| \| My understanding is that FDB relies heavily on deterministic \| simulations for testing, and that their async/await model is \| a big part of how they make sure they cover different \| possible interleavings in a deterministic way. \| jorangreef wrote: \| Thanks for the quotes, I've been wanting to read this paper for \| some time. Great to see they went through the consensus \| literature and made a decision to go with Active Disk Paxos, \| instead of stopping short and not fully understanding the \| consensus they're building on. The consensus and replication \| protocol is such a huge part of building a distributed \| database. \| fizwhiz wrote: \| > de novo Paxos implementation written in Flow \| \| That's... brave. Flow is a DSL built on top of C++? \| alistairw wrote: \| Yeah it's their own language on top of c++ to help them with \| testing distributed systems with deterministic simulation. \| \| Their talk from a while ago about it was something that \| really blew me away at the time [0] \| \| [0] https://www.youtube.com/watch?v=4fFDFbi3toc \| monstrado wrote: \| Have nothing but praise for FoundationDB. It has been by far the \| most rock solid distributed database I have ever had the pleasure \| of using. I used to manage HBase clusters, and the fact that I \| have never once had to worry about manually splitting "regions" \| is such a boon for administration...let alone JVM GC tuning. \| \| We run several FDB clusters using 3-DC replication and have never \| once lost data. I remember when we wanted to replace all of the \| FDB hardware (one cluster) in AWS, and so we just doubled the \| cluster size, waited for data shuffling to calm down, and just \| started axing the original hardware. We did this all while \| performing over 100K production TPS. \| \| One thing that makes the above seamless for all existing \| connections is that clients automatically update their "cluster \| file" in the event that new coordinators join or are reassigned. \| That alone is amazing...as you don't have to track down every \| single client and change / re-roll with new connection \| parameters. \| \| Anyway, I talk this database up every chance I get. Keep up the \| awesome work. \| \| - A very happy user. ___________________________________________________________________ (page generated 2021-06-07 23:00 UTC)