|
| ThePhysicist wrote:
| FWIW I really like this new "neo-brutalist" website style with
| hard shadows, clear, solid lines and simple typography and
| layout.
| andrewstuart wrote:
| Moore's law freezes in the cloud.
| winrid wrote:
| The i4i instances also have crazy fast disks to go along with
| that 1tb of ram. I hope to move all our stuff off i3 instances
| this year to i4.
| waynesonfire wrote:
| it's cloud companies raking in the profits of these hardware
| improvements. "Widely-available machines now have 128 cores and a
| terabyte of RAM." and I'm still paying $5 bucks for a couple
| cores.
| dangoodmanUT wrote:
| I've been following DuckDB for a while now, and even tinkered
| with a layer on top called "IceDB" (totally needs a rewrite:
| https://blog.danthegoodman.com/introducing-icedb--a-serverle...)
|
| The issue I see now is that there is no good way to know what
| files will match well when reading from remote (decoupled)
| storage.
|
| While it does support hive partitioning (thank god), and S3 list
| calls, if you are looking at doing inserts frequently you need
| some way to merge these parquet files.
|
| The MergeTree engine is my favorite thing about ClickHouse, and
| why it's still my go-to. I think if there was a serverless way to
| merge parquet (which was the aim of IceDB) that would make DuckDB
| massively more powerful as a primary OLAP db.
| LewisJEllis wrote:
| Yea, DuckDB is a slam dunk when you have a relatively static
| dataset - object storage is your durable primary SSOT, and
| ephemeral VMs running duckdb pointed at the object storage
| parquet files are your scalable stateless replicas - but the
| story gets trickier in the face of frequent ongoing writes /
| inserts. ClickHouse handles that scenario well, but I suspect
| the MotherDuck folks have answers for that in mind :)
| marsupialtail_2 wrote:
| You will always be limited by network throughput. Sure that wire
| is getting bigger but so is your data
| brundolf wrote:
| Probably biased given that it's on the DuckDB site, but well-
| reasoned and referenced, and my gut agrees with the overall
| philosophy
|
| This feels like the kicker:
|
| > In the cloud, you don't need to pay extra for a "big iron"
| machine because you're already running on one. You just need a
| bigger slice. Cloud vendors don't charge proportionally more for
| a larger slice, so your cost per unit of compute doesn't change
| if you're working on a tiny instance or a giant one.
|
| It's obvious once you think about it: you aren't choosing between
| a bunch of small machines and one big machine, you may very well
| be choosing between a bunch of small slices of a big machine and
| one big slice of a big machine. The only difference would be in
| how your software sees it: as a complex distributed system, or as
| a single system (that can eg. share memory with itself instead of
| serializing and deserializing data over network sockets)
| LeifCarrotson wrote:
| The reason this feels non-obvious is that people like to think
| that they're choosing a variable number of small slices of a
| big _datacenter_ , scaling up and down hour-by-hour or minute-
| by-minute to get maximum efficiency.
|
| Really, though, you're generating enormous overhead while
| turning on and off small slices of a 128-core monster with a
| terabyte of RAM.
| JohnMakin wrote:
| That's not the only difference - there are many more facets of
| reliability guarantees than the brief hand-waving this author
| does about it in the article.
| paulddraper wrote:
| This is absolutely correct (and gets more correct every year).
|
| A m5.large (2 vCPU, 8GB RAM) is $0.096/hr. m5.24xlarge (96 vCPU,
| 384GB RAM) is $4.608/hr.
|
| Exactly 1:48 scale up, in capacity and cost.
|
| The largest AWS instance is x2iedn.32xlarge (128 vCPU, 4096GB
| RAM) for $26.676/hr. Compared to m5.large, a 64x increase in
| compute and 512x increase in memory for 277x the cost.
|
| Long story short.....you can scale up linearly for a long time in
| the cloud.
| samsquire wrote:
| This is an interesting post, thank you.
|
| In my toy barebones SQL database, I store rows alternatedly on
| different replicas based on a consistent hash. I also have a
| "create join" statement, this keeps join keys colocated.
|
| Then when there is a join query issued, I can always join because
| the join keys are available and the join query can be executed on
| each replica and returned to the client to be aggregated.
|
| I want building distributed high throughput systems to be easier
| and less error prone. I wonder if a mixture of scale up and scale
| out could be useful architecture.
|
| You want minimum network round trips or crossovers between
| threads (synchronization cost) as you can get.
___________________________________________________________________
(page generated 2023-05-18 23:01 UTC) |