[HN Gopher] Are chiplets enough to save Moore's Law?
___________________________________________________________________
 
Are chiplets enough to save Moore's Law?
 
Author : transpute
Score  : 57 points
Date   : 2023-06-17 16:25 UTC (6 hours ago)
 
web link (www.eetimes.com)
w3m dump (www.eetimes.com)
 
| Animats wrote:
| In the transistors per cubic meter dimension, maybe. In the cost
| and heat dimensions, no.
 
  | bee_rider wrote:
  | Chiplets should help keep the yields up (I assume, at least,
  | that they test the chiplets before packing them in together
  | into the package, although that's a guess). Improving yields
  | should help with prices; throwing away products or binning them
  | down isn't very economically favorable.
 
    | hristov wrote:
    | Yes they do. In fact chiplets are bringing a lot of
    | excitement to the semi testing industry. Chiplets have
    | popularized the concept of "known good die". I.E. a chiplet
    | that has been tested so much that is practically proven to be
    | error free.
    | 
    | This is necessary because packaging chiplets is usually
    | nonreversible. So if you put ten chiplets in a package and
    | one of them is bad, you have to toss the entire package with
    | all ten chiplets and all your packaging costs.
 
| bee_rider wrote:
| Where are R&D nodes at nowadays, anyway?
| 
| I guess chiplets will help with yield issues, which could make
| some nodes economical (which is of course what Moore's law is all
| about). But of course they won't allow them to get down to
| transistor sizes that they just couldn't hit at all.
 
  | tux1968 wrote:
  | Jim Keller is bullish on chiplets and it's not just about
  | yield, it's also about power efficiency and the economics of
  | only using the most expensive nodes (eg. 3nm) for critical
  | parts and being able to combine it lego-style with IP
  | fabricated on cheaper nodes.
  | 
  | https://youtu.be/c_Wj_rNln-s?t=127
 
    | BeefWellington wrote:
    | In a way, these points are roughly how modern computers work.
    | 
    | For instance, let's say you bought a new PC in 2017 with a
    | high end graphics card. Your CPU was probably manufactured on
    | 14nm processes, your GPU on 16nm, your RAM from 16 to 20nm,
    | and your motherboard chipset on 22nm. It's been a similar
    | story throughout computing.
    | 
    | Taking that idea and scaling it down to processor designs
    | themselves is cool and no doubt faces hard technical
    | challenges.
 
    | bee_rider wrote:
    | That's interesting, I'm curious about the argument for power
    | efficiency (it seems like breaking a design up into chiplets
    | would only hurt there, but I'm no Jim Keller!). I'll
    | definitely check the video out when I get a chance.
    | 
    | Lego-ing out IP blocks could be really cool. It would be neat
    | if this gave OEMs room to compete in a more technical field
    | (selling Xeons in Dell cases vs Xeons in Lenovo cases is not
    | very cool, if Dell and Lenovo could pick the parts in their
    | package, that could be interesting).
 
      | pclmulqdq wrote:
      | I think this may be where some manufacturers are heading,
      | but not necessarily the big "consumer" names like
      | Intel/AMD. It's a way to significantly lower the risk of
      | building an SoC, and I think it's more plausible that this
      | sort of third-party chiplet market will be driven by the
      | Marvell/Broadcom types rather than Intel/AMD.
 
        | touisteur wrote:
        | There was something impressive with tenstorrent original
        | architecture (haven't been following the talk about
        | pivoting to risc-v) was the low cost of validation. It
        | seemed like the idea was 'cheapest to tape-out with max
        | FLOPS' and it somehow worked out well. Can't say more, I
        | couldn't ever get my hands on one of them Grayskull so...
 
        | pclmulqdq wrote:
        | Quietly, digital ASIC validation has gotten very cheap
        | and quick over the last 20 years thanks to the
        | proliferation of software models and compilers that
        | translate Verilog to C++ (which are literally orders of
        | magnitude faster than previous simulators). I'm not
        | surprised that Tenstorrent has been doing well with that,
        | given the expertise they have.
 
    | uluyol wrote:
    | It's also about reuse and fungibility.
    | 
    | AMD has used the same IO die across multiple generations of
    | Zen hardware. This cuts the dev and validation costs for
    | using new processes.
    | 
    | Fungibility helps reduce the cost of making SKUs. Want more
    | cores? Add more compute dies. Your customers want more big
    | chips then expected? It's easier/faster to adapt production
    | since only the final steps differ. The component pieces are
    | the same.
 
| anyoneamous wrote:
| Betteridge alert!
 
| AnimalMuppet wrote:
| Where do chiplets fit in Moore's Law? Moore said that the number
| of transistors _in a single package_ would increase. Well, are
| chiplets their own package? If so, then Moore 's Law is clearly
| broken, because they are a _decrease_ in the number of
| transistors in a package (compared to conventional approaches).
| 
| But if you don't consider a chiplet to be an independent package,
| then yes, maybe they can save Moore's Law.
| 
| Personally, once chiplets enter the picture, I think the question
| is becoming one of semantics, and therefore less interesting.
 
  | ilyt wrote:
  | >Where do chiplets fit in Moore's Law? Moore said that the
  | number of transistors in a single package would increase.
  | 
  | Well there were no multi-chiplet IC's back in his days
 
| sylware wrote:
| Once we have a GPU filling the whole silicon real estate of a M2
| chip, that with a 2nm process...
| 
| I guess the next playstation? Even though I quite dislike the
| digital jail which are video game consoles, going to be hard to
| resist to FFXX.
 
| bena wrote:
| Who cares? Moore's Law is not something that needs "saving". It
| was an observation of the time. It was bound to run up against
| physical reality.
| 
| In 1965, he predicted a doubling of components every year for the
| next ten years. In 1975, he changed it to every 2 years. And he
| was probably only looking out to the next 10 to 20 years.
| 
| Because, you know, that's reasonable. Even Moore himself knew
| that it was impossible to continue indefinitely.
| 
| Yes, let's get better. Yes, let's try to find more ways to
| efficiently compute. But let's not be slaves to words spoken
| before we were born because they were easy to achieve back then.
 
  | stefncb wrote:
  | Moore even later admitted that he made an insane guess for
  | marketing.
 
  | hindsightbias wrote:
  | People will care when HW stops being able to keep up with SW
  | bloat.
 
  | NovaDudely wrote:
  | For his prediction to last as long as it has is an achievement
  | on its own. Anyone that can predict 20 years out and be within
  | 50% of a target on whatever field is a win to me. Moore should
  | be chuffed it stood so long.
 
  | wmf wrote:
  | Nividia is using "the end of Moore's Law" as an excuse to
  | provide zero price/performance improvement. It would be pretty
  | anti-consumer if that became the industry standard.
 
    | dolni wrote:
    | I'm sure AMD would LOVE to eat Nvidia's lunch.
 
      | wmf wrote:
      | Yet they're not. They're selling very few GPUs at high
      | prices.
 
        | Buttons840 wrote:
        | Is that because they have bad hardware or because of the
        | CUDA monopoly?
 
        | wmf wrote:
        | Neither?
 
        | gymbeaux wrote:
        | More the latter than the former these days, but at a
        | macro level, like last 20 years, it's more the former
        | than the latter.
        | 
        | Basically AMD has caught up (arguable, but let's say they
        | have) to Nvidia with respect to driver quality and
        | hardware performance/power consumption, just in time for
        | the world to become more concerned with how a GPU handles
        | AI stuff than how it handles video games.
 
        | causality0 wrote:
        | I think the claim AMD has caught up with Nvidia drivers
        | is untrue. Sure, they don't cause constant crashing in
        | mainstream games anymore like they used to but that's
        | where the improvement ends. AMD drivers are still broken
        | enough you run into issues whenever you step outside the
        | box. Emulation is universally worse on AMD, and that's
        | when it's not completely broken. VR is the same way, and
        | AMD is now six months late for their "we promise AI won't
        | be busted on RDNA3" promise.
 
        | gymbeaux wrote:
        | I've read about/had enough issues with NVidia drivers in
        | the last couple of years I am willing to say AMD has
        | effectively caught up in that regard, especially when you
        | consider "FineWine" (the common occurrence of AMD GPUs
        | gaining performance over time through driver updates,
        | while that seldom occurs on the Nvidia side).
 
    | dotnet00 wrote:
    | Wasn't the 4090 roughly twice the performance of a 3090 at a
    | similar cost?
 
      | BeefWellington wrote:
      | The 3090 and the 4090 are on different process nodes
      | (8/10nm vs 4nm), so you'd expect the 4090 to be more
      | efficient. Pumping the same (or in this case more) power
      | through a more efficient node _should_ result in
      | improvements in performance.
      | 
      | Hardware Unboxed covers this pretty well here:
      | https://www.youtube.com/watch?v=aQklDR8nv8U&t=1365s
      | 
      | You can see that compared to the 3090Ti or the 6950XT* it's
      | sipping power when it's performance limited. I think what
      | many gamers see though, is that the performance difference
      | is about all you'd expect going to 4nm from 8nm, based on
      | performance gains we've seen in the past from other chips.
      | 
      | Anyway, the rough idea is: If I built a 3090 on a 4nm node
      | it would perform the same. Plus, NVidia has a track record
      | of how they behave when the competition isn't up to snuff.
      | 
      | Now, this is silly because using a more leading edge
      | manufacturing process takes a ton of work and _is_ an
      | improvement overall.
      | 
      | [*]: The 6950XT is manufactured on a 7nm node from TSMC.
      | Note that there is weirdness in how each company measures
      | their process node, which I won't go into here but just be
      | aware that a TSMC N7 process might not be 7nm sized parts
      | for all parts, and a lot of it has to do with how well they
      | can layer things.
 
        | dotnet00 wrote:
        | The person I replied to said price/performance, not
        | efficiency.
        | 
        | With the high end chips I think they're simply of the
        | opinion that performance is more important than
        | efficiency (and I don't really pay attention to the lower
        | range too much these days), and thus like most other high
        | end desktop parts, they're all clocked as high as the
        | chip and cooling can handle, even if dialing it back a
        | little would drastically improve efficiency.
 
  | dghughes wrote:
  | Ancient Greeks believed life involves and needs two types of
  | suffering one is war which is destructive the other is
  | competition. Competition benefits society it enlightens it
  | moves it forward. Competition isn't meant to destroy your
  | competitor that would be war. Moore's Law may be a type of good
  | suffering.
 
| ilaksh wrote:
| What about radical new compute-in-memory paradigms based on
| memristors or something? We really need more performance for
| running transformers. Not for general purpose CPUs.
 
| _hypx wrote:
| The problem with chiplets is that you can only fit so many of
| them in a realistic package. Once you hit that limit, you will
| need a way to pack more transistors on a single die. As a result,
| chiplets can only partial solve some of the issues of Moore's
| Law. It can't be a long-term extension of it.
 
  | OliverGuy wrote:
  | I suppose that the packaging problem can be solved - up to a
  | point - with stacking vertically as well as placing them next
  | to the io die. Wonder how well that will scale, could we stack
  | multiple chiplets high? What effect would that have on
  | thermals?
 
    | jackmott wrote:
    | [dead]
 
    | _hypx wrote:
    | Vertical chip stacking is another idea being proposed. But as
    | you said, that will have thermal problems.
    | 
    | It could be possible to stack less power-hungry chips like
    | cache chips, but probably not the main CPU/GPU core. It will
    | just be another way of extending Moore's Law.
 
      | bravetraveler wrote:
      | Would the X3D cache setup for the newer AMD CPUs count as
      | vertical chips?
      | 
      | I'm a bit naive of the actual... physical bits. I'm not
      | clear what makes a chip, and if the cache would be that -
      | or something a bit simpler
 
        | jagger27 wrote:
        | Yes, it totally counts and is indeed being vertically
        | stacked on Ryzen products today. Cache takes up a huge
        | portion of total die area and doesn't hurt temperatures
        | too much.
        | 
        | For now, AMD is using the technique to add more cache to
        | their CPUs rather than moving all of it to another layer.
 
        | adgjlsfhk1 wrote:
        | As I understand it, this is because the 3d stacking is
        | fairly expensive (it adds a lot of silicon and extra
        | manufacturing cost), so 3d cache only makes sense once
        | your chip is too big to put all the cache in 2D.
 
        | convolvatron wrote:
        | I suspect there is a yield issue also. that is one large
        | chip with one defect in the wrong place is just glass.
        | with chipsets you would assemble the module after chipset
        | unit test.
        | 
        | but maybe redundancy and binning has made this moot.
 
| polishdude20 wrote:
| I wonder if / when hardware can't continue to get better, will
| software start to then be the differentiating factor in
| performance?
 
  | BenoitP wrote:
  | It sure can. Large parts of the whole stack have been developed
  | with trading off developer velocity vs performance.
  | 
  | But hardware still has plenty of runway. The third dimension is
  | barely beginning. Also customizing the hardware to the workload
  | has lot of untapped potential. RISC-V has a committee
  | discussing accelerating dynamic languages for example. Memory
  | tagging for automatic array bounds checking, pointer masking
  | for helping GCs, custom instructions caches, etc. And you can
  | go deeper; why not have a Node Webserver chip?
 
    | bayindirh wrote:
    | > Large parts of the whole stack have been developed with
    | trading off developer velocity vs performance.
    | 
    | The time is always paid by someone. If developers decide to
    | write the best code, they pay the price once in terms of
    | development time. If the code is written with a developer
    | velocity first perspective, every user pays the price every
    | time they use the code.
    | 
    | Also, developer velocity vs. program performance is not a
    | correct perspective to look at it. I have used at least half
    | a dozen libraries with great developer ergonomics and have
    | world leading performance (e.g. Eigen, Catch2, easylogging++
    | from top of my head).
 
  | transcriptase wrote:
  | As far as I can tell hardware gains are enabling an entire
  | generation of programmers to ignore the idea of taking
  | optimization/performance into consideration entirely, outside
  | of niche areas.
  | 
  | The sluggishness, absurd loading times, and RAM usage for even
  | the simplest applications on blisteringly fast hardware is
  | astonishing.
 
    | bee_rider wrote:
    | This has been going on for two or three generations at least.
 
    | Rapzid wrote:
    | Seems fine to me.
 
      | bayindirh wrote:
      | When you start to pay attention to code you write, it
      | accelerates in tremendous amounts, even without doing fancy
      | optimizations. Also, clean and sensible code is easily
      | optimized by compilers a great deal already.
      | 
      | A numerical Gaussian integration code I have written in
      | 2017 or so can run at 1.7 _million_ iterations _per second,
      | per core_ , on a stock, 3rd generation i7 3770K.
      | 
      | It's implemented in C++, and only written carefully. No
      | fancy optimizations are done.
      | 
      | If programmers paid half the attention their code deserves,
      | I bet most of the software we use would be 2x faster, on
      | average.
 
        | bee_rider wrote:
        | In this case, it would probably be best to use a tuned
        | library, right?
        | 
        | The tricky bit is finding the spot between "find a
        | library" and "be lazy about it," if you want to find
        | tasks worth doing well.
 
        | bayindirh wrote:
        | Yes, the integration was not handled by a library though.
        | I coded it myself (because it was a novel method). Used
        | Eigen for matrix related stuff because it was fast and
        | _fun_ to use. Not to mention how feature packed it is.
 
        | NovaDudely wrote:
        | I came across this recently which is a fun way of
        | demonstrating how performance can change via using
        | optimised code.
        | 
        | It is about running a rebuild if Minecraft on late 90s
        | early 2000s Macs and showing the huge performance
        | differences.
        | 
        | Yes Minecraft is not a great example but it is visually
        | striking.
        | 
        | https://www.youtube.com/watch?v=awGVUxl0T0Y
 
        | Rapzid wrote:
        | Great, wow, that's interesting.
        | 
        | I mean as a user everything seems fine to me. Things are
        | getting better, not worse.
 
        | bayindirh wrote:
        | Because OS can hide the resource waste pretty cleverly.
        | macOS compresses RAM on the fly, Linux moves very stale
        | pages to swap early to keep memory utilization in check.
        | 
        | On that department, I was able to store two 3000x3000
        | double precision dense matrices and some big double
        | precision vectors, _plus the 3D model representing the
        | object I 'm working on_ in ~250MB IIRC. The code is a
        | scientific materials simulator, BTW.
        | 
        | Eclipse (the IDE) uses ~1GB (after warming up) with my
        | fat C++ config, and that thing is written in Java, and is
        | a full blown IDE with an always on indexer + static
        | analyzer.
        | 
        | VSCode uses 1GB for nothing. Atom used 1GB for nothing.
        | Evernote uses ~900MB on start on macOS. That thing takes
        | notes!
        | 
        | We have the resources and wizardry to keep bad
        | programming in check, but we can do much better if we
        | wanted to.
 
  | NovaDudely wrote:
  | This is why I say if folks want the "year of Linux desktop"
  | focus on optimising performance now before it is forced. Make
  | older machine perform better and avoid a lot of the hardware
  | lockdown coming.
  | 
  | Note: I don't believe there ever will be a year of the desktop.
  | But for Libre/Open source to thrive, it cannot just be
  | free/open. It has to be good.
 
  | ilaksh wrote:
  | Then they invent a new hardware paradigm.
 
  | thechao wrote:
  | There's an enormous stack of SW in the HW that's enabling
  | "cheap" ASIC development at the cost of chip performance.
  | There's probably as much room to optimize on any given node as
  | there are optimization opportunities for the SW on that part.
  | 
  | Both SW _and_ HW have been leaning on Moore 's law a long time.
 
| anticensor wrote:
| Wirth's Law will eventually win.
 
___________________________________________________________________
(page generated 2023-06-17 23:01 UTC)