[HN Gopher] The simple joys of scaling up
___________________________________________________________________
 
The simple joys of scaling up
 
Author : eatonphil
Score  : 78 points
Date   : 2023-05-18 15:04 UTC (7 hours ago)
 
web link (motherduck.com)
w3m dump (motherduck.com)
 
| ThePhysicist wrote:
| FWIW I really like this new "neo-brutalist" website style with
| hard shadows, clear, solid lines and simple typography and
| layout.
 
| andrewstuart wrote:
| Moore's law freezes in the cloud.
 
| winrid wrote:
| The i4i instances also have crazy fast disks to go along with
| that 1tb of ram. I hope to move all our stuff off i3 instances
| this year to i4.
 
| waynesonfire wrote:
| it's cloud companies raking in the profits of these hardware
| improvements. "Widely-available machines now have 128 cores and a
| terabyte of RAM." and I'm still paying $5 bucks for a couple
| cores.
 
| dangoodmanUT wrote:
| I've been following DuckDB for a while now, and even tinkered
| with a layer on top called "IceDB" (totally needs a rewrite:
| https://blog.danthegoodman.com/introducing-icedb--a-serverle...)
| 
| The issue I see now is that there is no good way to know what
| files will match well when reading from remote (decoupled)
| storage.
| 
| While it does support hive partitioning (thank god), and S3 list
| calls, if you are looking at doing inserts frequently you need
| some way to merge these parquet files.
| 
| The MergeTree engine is my favorite thing about ClickHouse, and
| why it's still my go-to. I think if there was a serverless way to
| merge parquet (which was the aim of IceDB) that would make DuckDB
| massively more powerful as a primary OLAP db.
 
  | LewisJEllis wrote:
  | Yea, DuckDB is a slam dunk when you have a relatively static
  | dataset - object storage is your durable primary SSOT, and
  | ephemeral VMs running duckdb pointed at the object storage
  | parquet files are your scalable stateless replicas - but the
  | story gets trickier in the face of frequent ongoing writes /
  | inserts. ClickHouse handles that scenario well, but I suspect
  | the MotherDuck folks have answers for that in mind :)
 
| marsupialtail_2 wrote:
| You will always be limited by network throughput. Sure that wire
| is getting bigger but so is your data
 
| brundolf wrote:
| Probably biased given that it's on the DuckDB site, but well-
| reasoned and referenced, and my gut agrees with the overall
| philosophy
| 
| This feels like the kicker:
| 
| > In the cloud, you don't need to pay extra for a "big iron"
| machine because you're already running on one. You just need a
| bigger slice. Cloud vendors don't charge proportionally more for
| a larger slice, so your cost per unit of compute doesn't change
| if you're working on a tiny instance or a giant one.
| 
| It's obvious once you think about it: you aren't choosing between
| a bunch of small machines and one big machine, you may very well
| be choosing between a bunch of small slices of a big machine and
| one big slice of a big machine. The only difference would be in
| how your software sees it: as a complex distributed system, or as
| a single system (that can eg. share memory with itself instead of
| serializing and deserializing data over network sockets)
 
  | LeifCarrotson wrote:
  | The reason this feels non-obvious is that people like to think
  | that they're choosing a variable number of small slices of a
  | big _datacenter_ , scaling up and down hour-by-hour or minute-
  | by-minute to get maximum efficiency.
  | 
  | Really, though, you're generating enormous overhead while
  | turning on and off small slices of a 128-core monster with a
  | terabyte of RAM.
 
  | JohnMakin wrote:
  | That's not the only difference - there are many more facets of
  | reliability guarantees than the brief hand-waving this author
  | does about it in the article.
 
| paulddraper wrote:
| This is absolutely correct (and gets more correct every year).
| 
| A m5.large (2 vCPU, 8GB RAM) is $0.096/hr. m5.24xlarge (96 vCPU,
| 384GB RAM) is $4.608/hr.
| 
| Exactly 1:48 scale up, in capacity and cost.
| 
| The largest AWS instance is x2iedn.32xlarge (128 vCPU, 4096GB
| RAM) for $26.676/hr. Compared to m5.large, a 64x increase in
| compute and 512x increase in memory for 277x the cost.
| 
| Long story short.....you can scale up linearly for a long time in
| the cloud.
 
| samsquire wrote:
| This is an interesting post, thank you.
| 
| In my toy barebones SQL database, I store rows alternatedly on
| different replicas based on a consistent hash. I also have a
| "create join" statement, this keeps join keys colocated.
| 
| Then when there is a join query issued, I can always join because
| the join keys are available and the join query can be executed on
| each replica and returned to the client to be aggregated.
| 
| I want building distributed high throughput systems to be easier
| and less error prone. I wonder if a mixture of scale up and scale
| out could be useful architecture.
| 
| You want minimum network round trips or crossovers between
| threads (synchronization cost) as you can get.
 
___________________________________________________________________
(page generated 2023-05-18 23:01 UTC)