proxy70

	[HN Gopher] The simple joys of scaling up ___________________________________________________________________ The simple joys of scaling up Author : eatonphil Score : 78 points Date : 2023-05-18 15:04 UTC (7 hours ago)
	web link (motherduck.com)
	w3m dump (motherduck.com)
	\| ThePhysicist wrote: \| FWIW I really like this new "neo-brutalist" website style with \| hard shadows, clear, solid lines and simple typography and \| layout. \| andrewstuart wrote: \| Moore's law freezes in the cloud. \| winrid wrote: \| The i4i instances also have crazy fast disks to go along with \| that 1tb of ram. I hope to move all our stuff off i3 instances \| this year to i4. \| waynesonfire wrote: \| it's cloud companies raking in the profits of these hardware \| improvements. "Widely-available machines now have 128 cores and a \| terabyte of RAM." and I'm still paying $5 bucks for a couple \| cores. \| dangoodmanUT wrote: \| I've been following DuckDB for a while now, and even tinkered \| with a layer on top called "IceDB" (totally needs a rewrite: \| https://blog.danthegoodman.com/introducing-icedb--a-serverle...) \| \| The issue I see now is that there is no good way to know what \| files will match well when reading from remote (decoupled) \| storage. \| \| While it does support hive partitioning (thank god), and S3 list \| calls, if you are looking at doing inserts frequently you need \| some way to merge these parquet files. \| \| The MergeTree engine is my favorite thing about ClickHouse, and \| why it's still my go-to. I think if there was a serverless way to \| merge parquet (which was the aim of IceDB) that would make DuckDB \| massively more powerful as a primary OLAP db. \| LewisJEllis wrote: \| Yea, DuckDB is a slam dunk when you have a relatively static \| dataset - object storage is your durable primary SSOT, and \| ephemeral VMs running duckdb pointed at the object storage \| parquet files are your scalable stateless replicas - but the \| story gets trickier in the face of frequent ongoing writes / \| inserts. ClickHouse handles that scenario well, but I suspect \| the MotherDuck folks have answers for that in mind :) \| marsupialtail_2 wrote: \| You will always be limited by network throughput. Sure that wire \| is getting bigger but so is your data \| brundolf wrote: \| Probably biased given that it's on the DuckDB site, but well- \| reasoned and referenced, and my gut agrees with the overall \| philosophy \| \| This feels like the kicker: \| \| > In the cloud, you don't need to pay extra for a "big iron" \| machine because you're already running on one. You just need a \| bigger slice. Cloud vendors don't charge proportionally more for \| a larger slice, so your cost per unit of compute doesn't change \| if you're working on a tiny instance or a giant one. \| \| It's obvious once you think about it: you aren't choosing between \| a bunch of small machines and one big machine, you may very well \| be choosing between a bunch of small slices of a big machine and \| one big slice of a big machine. The only difference would be in \| how your software sees it: as a complex distributed system, or as \| a single system (that can eg. share memory with itself instead of \| serializing and deserializing data over network sockets) \| LeifCarrotson wrote: \| The reason this feels non-obvious is that people like to think \| that they're choosing a variable number of small slices of a \| big _datacenter_ , scaling up and down hour-by-hour or minute- \| by-minute to get maximum efficiency. \| \| Really, though, you're generating enormous overhead while \| turning on and off small slices of a 128-core monster with a \| terabyte of RAM. \| JohnMakin wrote: \| That's not the only difference - there are many more facets of \| reliability guarantees than the brief hand-waving this author \| does about it in the article. \| paulddraper wrote: \| This is absolutely correct (and gets more correct every year). \| \| A m5.large (2 vCPU, 8GB RAM) is $0.096/hr. m5.24xlarge (96 vCPU, \| 384GB RAM) is $4.608/hr. \| \| Exactly 1:48 scale up, in capacity and cost. \| \| The largest AWS instance is x2iedn.32xlarge (128 vCPU, 4096GB \| RAM) for $26.676/hr. Compared to m5.large, a 64x increase in \| compute and 512x increase in memory for 277x the cost. \| \| Long story short.....you can scale up linearly for a long time in \| the cloud. \| samsquire wrote: \| This is an interesting post, thank you. \| \| In my toy barebones SQL database, I store rows alternatedly on \| different replicas based on a consistent hash. I also have a \| "create join" statement, this keeps join keys colocated. \| \| Then when there is a join query issued, I can always join because \| the join keys are available and the join query can be executed on \| each replica and returned to the client to be aggregated. \| \| I want building distributed high throughput systems to be easier \| and less error prone. I wonder if a mixture of scale up and scale \| out could be useful architecture. \| \| You want minimum network round trips or crossovers between \| threads (synchronization cost) as you can get. ___________________________________________________________________ (page generated 2023-05-18 23:01 UTC)