[HN Gopher] SiFive Tapes Out First 5nm TSMC 32-bit RISC-V Chip w...
___________________________________________________________________
 
SiFive Tapes Out First 5nm TSMC 32-bit RISC-V Chip with 7.2 Gbps
HBM3
 
Author : pabs3
Score  : 144 points
Date   : 2021-04-15 06:29 UTC (1 days ago)
 
web link (www.tomshardware.com)
w3m dump (www.tomshardware.com)
 
| zozbot234 wrote:
| IIRC, 32-bit RISC-V is only intended for deep embedded workloads,
| with 64-bit for general purpose compute. So a SoC w/ a single
| 32-bit core would seem to be a less-than-ideal fit for the
| cutting-edge 5nm process.
 
  | tyingq wrote:
  | The core is supposed to compete with the Cortex M7. The
  | smallest process M7 I can find is the STM32H7, which is 40nm.
 
    | makapuf wrote:
    | I rave for stm32 with high end processes (10nm or less),
    | whether that makes sense or not. I just love stm32..
 
  | dragontamer wrote:
  | Routers / Switches have extremely weird performance
  | characteristics, and I think that's what SiFive is targeting
  | with this chip.
  | 
  | * HBM3 for the highest memory bandwidth (10Gbps switches need
  | tons and tons of bandwidth. That's 10Gbps per direction per
  | connection, 8x ports is 160Gbps, and then that's multiplied
  | multiple times over by every memcpy / operation your chip
  | actually does. You need to DELIVER 160Gbps, which means your
  | physical RAM-bandwidth needs to be an order of magnitude
  | greater than that)
  | 
  | * Embedded 32-bit design for low-power usage.
  | 
  | * All switches have small, fixed size buffers. Memory capacity
  | is not a problem, its feasible to imagine useful switches and
  | routers (even 10Gbps, 40Gbps, or 100Gbps) that only have
  | hundreds-of-MBs of RAM. As such, 32-bit is sufficient and
  | 64-bit is a waste (You'd rather half your pointer memory
  | requirements with 32-bit pointers rather than go beyond 4GB
  | capacity)
 
    | GoblinSlayer wrote:
    | It's E76 with F set, and F set is huge compared to RV64I. And
    | the article proposes HPC as possible application.
 
    | rjsw wrote:
    | Routers need quite a bit of memory to handle IPv6.
    | 
    | Switches as an application of this makes sense.
 
    | jandrese wrote:
    | IPv6 address comparison on a 32 bit design is fairly awkward.
    | Switches won't care, but routers need to make routing
    | decisions.
 
    | foobiekr wrote:
    | While these are all good points, this really does not appear
    | to be a competitive NPU design on any axis that matters. I
    | don't know what this chip is for, but a router NPU it is not,
    | nor a switch. Maybe some soho switch or smart NIC, but those
    | have moved on far along the performance spectrum away from
    | the place where this would fit.
 
  | zibzab wrote:
  | Yeah, this seems like an odd move to me.
  | 
  | For this kinda of applications the static leakage of the newer
  | & smaller node will probably hurt rather than help.
 
  | [deleted]
 
  | justincormack wrote:
  | I think it means 32 bit floating point, not 32 bit CPU, as it
  | mentions "other relatively simplistic applications that do not
  | require full precision" but its a bit unclear.
 
    | phendrenad2 wrote:
    | The quote that stands out to me is that the core is "ideal
    | for applications which require high performance -- but have
    | power constraints (e.g., Augmented Reality and Virtual
    | Reality , IoT Edge Compute, Biometric Signal Processing, and
    | Industrial Automation)."
 
  | Fordec wrote:
  | With my industry / product management / business strategy hat
  | on, totally agree from SiFive's perspective.
  | 
  | With my early days electronics hat on, the 5nm process adds
  | additional energy performance gains that in conjunction with
  | RISCV in an embedded environment, especially in a battery
  | powered remote operation use case, has me salivating at what
  | could be achieved from a would-be customer perspective.
 
| volta83 wrote:
| HBM2 is like 2Tb/s, how is HBM3 7GB/s ?
 
  | hajile wrote:
  | HBM3 wasn't just supposed to be about speed. It also offers a
  | 512-bit option that doesn't require a silicon interposer. I'd
  | guess this was added to make cheaper consumer GPU designs
  | possible.
  | 
  | I suspect they're using the HBM2 spec for the narrow bus and
  | cheaper interposer while keeping speeds lower and only using a
  | couple stacks instead of the 16 or so HBM2 stacks required for
  | those 2Tb/s speeds you mention. It makes sense given that their
  | chip likely couldn't use a huge amount of bandwidth anyway.
 
  | virtuallynathan wrote:
  | I think that's per-Pin bandwidth?
 
  | vmception wrote:
  | HBM3 was expected to be like 4GB/s per pin which was seen as
  | double HBM2 per pin, so this is therefore almost even double
  | that, which is good news
  | 
  | The HBM2 total memory bandwidth is like 2TB/s, just different
  | scale
  | 
  | Anyway I could totally be using wrong nomenclature and
  | terminology, feel free to discuss, these aren't assertions or
  | aren't strongly held assertions
 
| throwaway4good wrote:
| What is the use case of this chip? I have the feeling it is some
| way away from a general purpose CPU / SOC like the Apple M1?
 
  | 01100011 wrote:
  | RTFA?
  | 
  | > The SoC can be used for AI and HPC applications and can be
  | further customized by SiFive customers to meet their needs.
  | Meanwhile, elements from this SoC can be licensed and used for
  | other N5 designs without any significant effort.
  | 
  | > The SoC contains the SiFive E76 32-bit CPU core(s) for AI,
  | microcontrollers, edge-computing, and other relatively
  | simplistic applications that do not require full precision.
 
    | throwaway4good wrote:
    | So it is a proof of concept / demo of subcomponents someone
    | else may license? Is that a correct interpretation?
 
      | sanxiyn wrote:
      | Yes.
 
| klelatti wrote:
| How did SiFive get anywhere near 5nm TSMC?
 
  | baq wrote:
  | perhaps paid some money when the process wasn't booked till the
  | end of time
 
  | lizknope wrote:
  | They pay money just like any other customer of TSMC. SiFive has
  | a lot of buzz in the industry. I wouldn't be surprised that
  | TSMC wanted to work with them.
  | 
  | But there are other intermediary companies that help startups
  | group multiple chips from multiple companies together into a
  | single mask. This is called a "shuttle" and allows the
  | companies to split the costs of the masks (I've heard up to $30
  | million for 5nm)
  | 
  | SiFive is probably building about 2,000 of these chips for
  | development boards. They aren't trying to order a hundred
  | million like Nvidia.
 
    | klelatti wrote:
    | Thanks that's very interesting. No intention in any way to
    | belittle SiFive - just puzzled as to how they managed to get
    | onto this process when it's obviously so much in demand. Good
    | for them!
 
  | RicoElectrico wrote:
  | For test chips there is something called shuttle.
  | 
  | Other than that, foundries are known to sponsor IP development
  | on their processes.
 
  | snypher wrote:
  | "The tape out means that the documentation for the chip has
  | been submitted for manufacturing to TSMC, which essentially
  | means that the SoC has been successfully simulated. The silicon
  | is expected to be obtained in Q2 2021."
  | 
  | Would this mean the actual chip delivery may still be delayed?
 
    | StringyBob wrote:
    | Chip manufacturing has many steps. For a new leading edge
    | process it may take 3-6 months to get silicon back after
    | submitting the design to a silicon foundry for manufacturing.
    | 
    | For a small volume 'shuttle' run hopefully there won't be
    | delays, but this is not the same as having working chips!
    | 
    | The foundry will do initial checks it is manufacturable at
    | 'tapeout' when you submit your design, but you don't know for
    | sure if your chip works with intended functionality until you
    | get it back! You are relying on lots and lots of simulations
    | up front before your 'tape-out'.
    | 
    | Sometimes issues are found and a chip requires a re-spin -
    | basically another go with the bugs fixed. You want to do this
    | as few times as possible (ideally right first time) due to
    | cost and time of these iterations.
 
  | gumby wrote:
  | It's also in TSMC's marketing interest to product a small
  | number of RISC V parts with their latest process.
  | 
  | Plus it's probably fun for some of the people there.
 
| ohazi wrote:
| I know they're separate lines and capacity is sold well in
| advance and all that, but this chip shortage still baffles me.
| 
| A startup can tape out a 5 nm chip, but STMicroelectronics can't
| make any of their 40-130 nm microcontrollers for the next year?
| 
| Also car companies are supposedly the culprit, even though their
| volume is only in the low tens of millions per year, and the
| dustup is apparently over only six months of capacity? What? I
| get that the auto industry is a nice reliable long-term source of
| revenue for chip companies, but fabs should barely be sneezing at
| that sort of volume.
 
  | lizknope wrote:
  | I'm in the semiconductor company.
  | 
  | I don't really understand your question.
  | 
  | Anyone can start a company and tape out a chip even in 5nm. My
  | previous startup did something similar. We used an intermediate
  | company between us and TSMC that specifically works with
  | smaller companies. They (or TSMC) will bundle together 4 to 20
  | chips into a common mask as a "shuttle" run. Shuttle runs are
  | really only used to get samples for the first version of your
  | chip. You can't really go to production with them because the
  | mask has chips from multiple different companies but this
  | allows all of the companies to share the mask costs (I've heard
  | up to $30 million for 5nm)
  | 
  | What is ST Micro talking about? I assume they can produce chips
  | but can't get the volume that they want. SiFive are probably
  | producing about 2,000 of these chips for development and test
  | boards. ST Micro would be buying in the hundreds of millions or
  | tens of billions range.
 
    | bogomipz wrote:
    | >" Shuttle runs are really only used to get samples for the
    | first version of your chip."
    | 
    | Is a "tape out" the same thing as a shuttle run/sample chip
    | run?
 
      | Kliment wrote:
      | a "tape out" is the process of transforming a design into a
      | physical die - i.e. a manufacturing run. It's when you hand
      | over a design to a foundry to do their thing with it.
 
    | zibzab wrote:
    | Sounds like OSH Park for silicon...
    | 
    | Anyway, I'm still not sure why SiFive is doing this. Seems
    | like a waste of money even as a prototype
 
      | lizknope wrote:
      | The article mentions that is is from the OpenFive division
      | of SiFive. OpenFive used to be Open Silicon and their
      | business model was working with other companies to take
      | their Verilog RTL and do all of the physical design
      | (synthesis to logic gates, place and route of the standard
      | cells, timing analysis, test vector generation) and then
      | work with the foundries to deliver all of the data for
      | manufacturing.
      | 
      | Since Open Silicon is now OpenFive and part of SiFive they
      | literally have all this experience in house and don't need
      | to depend on another company between them and TSMC.
      | 
      | https://en.wikipedia.org/wiki/Open-Silicon
 
      | variaga wrote:
      | SiFive is in the business of selling IP cores and back-end
      | implementation services. The gold standard for IP core
      | validation is "silicon proven" i.e. that it's not just a
      | nice theoretical design on paper, but someone has actually
      | turned it into a physical chip and tested the real life
      | performance.
      | 
      |  _Lots_ of people will try to sell you their designs and
      | services. Picking the wrong ones can waste millions of
      | dollars and months /years of time.
      | 
      | The money spent on this a prototype buys SiFive credibility
      | for both aspects of their business (assuming the chip
      | works) - "we were able to do this for ourselves, so you
      | know we'll be able to do it for you".
      | 
      | So it's not a waste, it's a marketing expense, and a
      | necessary one.
 
    | varispeed wrote:
    | Out of curiosity - what software is being used to design
    | chips? Is there anything within reach of a small company, or
    | something open source?
 
      | thechao wrote:
      | Front-end is HDLs -- (System)Verilog, VHDL, etc.
      | Implementation and formal will be Jasper & its ilk. Backend
      | (physical, etc.) use fab-specific bespoke software from the
      | majors (Cadence, NXP, MG, Synopsis, ...).
      | 
      | The front-end stuff could be done by _one person_ ;
      | Verilator is a great example (although it's now "in house"
      | to NXP). Implementation, LEC, etc. are mathematically
      | intimidating -- they're proof engines -- but doable by a
      | small team.
      | 
      | Physical _requires_ inside knowledge of the fabs. The fabs
      | aren 't going to let you participate unless you're a major,
      | because it costs them a lot of money, and each additional
      | participant is another potential leak of their critical IP.
      | 
      | The tooling is all "vertical" and starts on the backend. If
      | you can't do backend, you're not a player.
 
      | jecel wrote:
      | The commercial tools are indeed very expensive but the
      | required data files can be as much of a problem. Normally
      | you have to sign a bunch of NDAs (non disclosure
      | agreements) to get your hands on the design rules and
      | standard cell libraries supplied by the foundries and
      | required to make the tools work.
      | 
      | One effort to organize several previously available open
      | source tools into a practical system is OpenLane, which is
      | based on the DARPA OpenRoad project:
      | 
      | https://woset-workshop.github.io/PDFs/2020/a21.pdf
      | 
      | Recently, Google has financed a project where a foundry has
      | made its data files available without any NDAs:
      | 
      | https://github.com/google/skywater-pdk
      | 
      | The combination has made it possible to have completely
      | open source chip designs.
 
  | PragmaticPulp wrote:
  | > Also car companies are supposedly the culprit, even though
  | their volume is only in the low tens of millions per year, and
  | the dustup is apparently over only six months of capacity?
  | What? I get that the auto industry is a nice reliable long-term
  | source of revenue for chip companies, but fabs should barely be
  | sneezing at that sort of volume.
  | 
  | I agree. I think the blame on automakers has been blown out of
  | proportion. It doesn't make any sense that automakers cancelled
  | orders, then reinstated those orders again with some extra
  | demand, and now the entire chip market is stalled.
  | 
  | It's most likely due to the fact that consumer demand is up
  | everywhere. The pandemic didn't hit the economy nearly as hard
  | as expected, and we piled a lot of stimulus on top of that.
  | Savings rate went up a bit, but much discretionary spending was
  | diverted away from things like dining out and toward buying
  | consumer goods.
  | 
  | > STMicroelectronics can't make any of their 40-130 nm
  | microcontrollers for the next year
  | 
  | They're almost certainly making huge volumes of
  | microcontrollers, but they're all spoken for with orders from
  | the highest bidders.
  | 
  | We won't have inventory sitting on shelves again until fab
  | capacity isn't being 100% occupied by existing orders. Need
  | some surplus before we can get parts at DigiKey.
 
  | bravo22 wrote:
  | A lot of chips are made on mature fab lines because they don't
  | need the performance of 5nm lines or can't justify the mask
  | costs.
  | 
  | No one is investing in mature fab lines because they're not
  | leading edge and they're being run to amortize the initial
  | investmnet made into them years ago. Therefore not much
  | additional capacity for mature lines.
  | 
  | So yes you can see 5nm chips being taped out but the 40-130nm
  | chips are squeezed for capacity. Also this chip is likely not
  | running in the same crazy volumes that ST microcontrollers. It
  | is easier for TSMC to squeeze in a few dozen to a hundred
  | wafers for SiFive on their line.
 
    | dragontamer wrote:
    | > A lot of chips are made on mature fab lines because they
    | don't need the performance of 5nm lines or can't justify the
    | mask costs.
    | 
    | Alternatively: they're car-scale products dealing primarily
    | with high electric currents (10s or 100s of milliamps) and/or
    | higher voltages (5V instead of 1.3V).
    | 
    | Smaller chips use (and therefore output) less current than
    | larger scale chips. But if your goal is to output 10mA to
    | better drive an IGBT or other transistor anyway, then you
    | really prefer 40nm to 130nm ANYWAY, because those larger
    | sizes are just a lot better at moving those large currents
    | around.
    | 
    | Bigger wires mean bigger currents.
 
      | bravo22 wrote:
      | High voltage MOSFETs and IGBTs are built on a completely
      | different process. Size is definitely not an issue with
      | them. It is about exotic doping to create the desired
      | characteristics.
      | 
      | They're built using much larger feature sizes but on
      | completely separate lines.
 
        | dragontamer wrote:
        | I'm not really in the industry. But I know that high-
        | voltage MOSFETs / IGBTs need substantial amounts of
        | current to turn on / off adequately. Under typical use,
        | there's a dedicated chip called a "Gate Driver" that
        | provides that current, between a microcontroller and the
        | IGBT.
        | 
        | Its not that the IGBT / MOSFETs are built on these
        | microcontrollers. Its that the Gate-Driver can be
        | integrated into a microcontroller (simplifying the
        | circuit design and reducing the number of parts you need
        | to buy).
        | 
        | Under normal circumstances, a microcontroller can
        | probably source/sink 1mA (too little to adequately turn
        | on an IGBT). You amplify the 1mA with a gate-driver chip
        | into 100mA, and then the amplified 100mA is used to turn
        | on/off the IGBT.
        | 
        | By integrating a gate-driver into the microcontroller,
        | you save a part.
 
    | variaga wrote:
    | Your point is valid, but this is almost certainly a shuttle
    | run, so it won't be even one full wafer.
 
      | bravo22 wrote:
      | You're right. Definitely a "hot" wafer for the engineering
      | samples.
 
  | monocasa wrote:
  | ST fabs their own chips. If their fabs don't have the capacity,
  | it's a huge slog to tape them out to a radically different
  | process at another company.
 
  | Kliment wrote:
  | This is an extremely low volume prototype run. You can get
  | those scheduled on short notice. Fabs love them because they
  | can do process optimization using them, without impacting
  | production customers. They're ridiculously expensive per-die
  | and you commit to accept a much higher failure rate than
  | normal.
  | 
  | ST can and is making microcontrollers. It's just that they've
  | sold their production for a year ahead, before it's even been
  | manufactured. Car companies fucked everyone over by flipping a
  | large volume of orders back and forth causing bullwhip effect
  | on the whole industry, and lots of knock-on effects in other
  | industries who suddenly got told (occasionally too late) that
  | they need to plan their inventory a year ahead because they
  | can't get anything at short notice anymore. Car companies
  | vehicle production volume is tens of millions, but each vehicle
  | has thousands to tens of thousands of ICs. The six months you
  | are mentioning are not the capacity period, they are the _lead
  | times_ involved.
  | 
  | I don't want to repeat the whole story but I wrote a comment
  | about this on another thread. See
  | https://news.ycombinator.com/item?id=26659709
 
    | jankeymeulen wrote:
    | Thousands to tens of thousands per car? I think you're off by
    | an order of magnitude.
 
      | rowanG077 wrote:
      | What? You think it's ten thousands to hundred thousand.
      | Hundred thousand seems excessive to me.
 
      | buildbot wrote:
      | I know a typical Mercedes has roughly a hundred individual
      | computers, not too far reached to think the average chip
      | count could be 10 or higher per device on the can bus.
 
      | mschuster91 wrote:
      | Almost everything in a car has a _number_ of chips. Power
      | regulations, communication buses... and in electric cars
      | with thousands of batteries, _at least_ one chip per
      | battery for protection.
 
        | osamagirl69 wrote:
        | This is blatantly false, unless you are confusing battery
        | for an assembled battery pack. In EVs each battery
        | management IC can run somewhere in the range of 4-14
        | cells in series per chip, and they almost universally run
        | banks of up to 100 cells in parallel. For example, in the
        | tesla model s the pack is comprised into submodules of 76
        | cells in parallel and 6 of those groups in series per
        | management chip--so only one management chip per 456
        | cells.
 
        | dragontamer wrote:
        | Electric cars have ONE battery with thousands of *cells*.
        | I do realize that the colloquial term for "cell" is
        | "battery" (ex: an AA cell is called a battery), but it
        | becomes important to be precise with our words when
        | talking about manufacturing.
        | 
        | Small scale Li-ion does a protection-IC per cell (ex:
        | cell phones), mostly because cell phones are so small
        | they only use one cell.
        | 
        | Larger scale Li-ion, such as Laptop batteries, may use
        | one-IC per cell, OR one-protection IC for all 3x or 4x
        | cells combined. As long as all the cells are soldered
        | together, one protection IC is cheaper and still usable.
        | 
        | At electric-car scales, you have thousands-and-thousands
        | of cells. You can't just manage all of them with one IC,
        | so you build an IC per bundle. Maybe 48 cells or
        | 100-cells per IC or so.
 
        | mschuster91 wrote:
        | Indeed yes I meant cells, I'm not a native English
        | speaker.
        | 
        | > At electric-car scales, you have thousands-and-
        | thousands of cells. You can't just manage all of them
        | with one IC, so you build an IC per bundle. Maybe 48
        | cells or 100-cells per IC or so.
        | 
        | Ah okay, I had more expected something on the order of 1
        | IC per 4 cells to allow individual cell health
        | monitoring.
 
        | dragontamer wrote:
        | > Indeed yes I meant cells, I'm not a native English
        | speaker.
        | 
        | You're doing fine. Native English speakers don't know the
        | difference between cell or battery either. This is more
        | of a precise / technical engineering distinction.
        | 
        | * 9V Battery (https://imgur.com/FHJdhIK), a collection of
        | 6x cells.
        | 
        | * AAAA Cell (one singular chemical reaction of 1.5V)
        | 
        | Notice that the imgur is wrong: they call it a AAAA
        | battery (when the proper term is a AAAA cell).
        | 
        | --------
        | 
        | "Battery" is a bunch of objects doing one task.
        | Originally, a "battery" described cannons. Or two rooks
        | (in chess) that work together. Or... 6x 1.5V cells
        | working together to produce a 9V battery.
 
    | ohazi wrote:
    | > Fabs love them because they can do process optimization
    | using them, without impacting production customers.
    | 
    | I didn't realize that, but it makes a lot of sense. I assumed
    | that they acted more like the downstream manufacturers that
    | I'm used to dealing with, that don't even want to talk to you
    | unless they think you're going to place a huge order.
 
| winter_blue wrote:
| HBM might be an interesting idea. I would love to see multiple
| bandwidth levels of memory becoming a norm, with computers a very
| fast small amount of memory, and a larger set of DRR4 or DRR5. We
| already have multiple levels of cache, why not having multiple
| levels of RAM? Operating systems and software would need to
| accommodate a new reality where NUMA is the norm though. But it's
| good that we even have the concept of NUMA, so this is not
| entirely uncharted/unfamiliar territory.
 
  | wmf wrote:
  | You would love to see computers become harder to program?
 
    | makapuf wrote:
    | It canbe nice to have the opportunity to program something
    | harder but faster. Counter example: Itanium, which was too
    | hard to program (compilers) for.
 
      | sanxiyn wrote:
      | It is kind of ironic that compiler theory has advanced and
      | now we can target Itanium no problem. It was a bit (well, a
      | lot) ahead of its time.
 
    | winter_blue wrote:
    | I would try to build a new compiler (or a LLVM intermediary
    | processing layer) that does NUMA optimizations.
 
| [deleted]
 
___________________________________________________________________
(page generated 2021-04-16 22:01 UTC)