[HN Gopher] New SiFive RISC-V core P650 with 40% IPC increase
___________________________________________________________________
 
New SiFive RISC-V core P650 with 40% IPC increase
 
Author : FullyFunctional
Score  : 131 points
Date   : 2021-12-02 16:21 UTC (6 hours ago)
 
web link (www.sifive.com)
w3m dump (www.sifive.com)
 
| snvzz wrote:
| Some context: RISC-V Summit is next week, and RISC-V
| international has just approved a batch of important
| extensions[0]. With these extensions, RISC-V is not missing
| anything relative to ARM and x86 ISAs in terms of functionality.
| 
| I expect a lot of tape-outs to happen this month, as core vendors
| were probably holding for the announced ratifications, in fear of
| last minute changes. Next year is going to be exciting.
| 
| [0]: https://riscv.org/announcements/2021/12/riscv-
| ratifies-15-ne...
 
  | [deleted]
 
  | socialdemocrat wrote:
  | That is great news! Is there any friendly intro/coverage
  | anywhere of the new vector extension?
  | 
  | I am curious about the final design. Would be interesting to
  | hear how people think it compares with ARMs scalable vector
  | extensions.
 
    | snvzz wrote:
    | There's been a few talks on the topic. They're archived in
    | e.g. youtube.
    | 
    | I like it. It's fairly simple and clean, yet powerful.
    | 
    | There was also some discussion here in HN months ago, about
    | an article comparing RISC-V V extension and ARM SVE.
    | 
    | The article itself got several things wrong about V, but the
    | discussion[0] was interesting.
    | 
    | [0] https://news.ycombinator.com/item?id=27063748
 
      | [deleted]
 
  | monocasa wrote:
  | I wouldn't say RISC-V isn't missing anything. The lack of
  | add/subtract with carry is an issue for efficient runtime of
  | many JITed languages like JavaScript.
  | 
  | That being said, I don't think it's the worse thing in the
  | world like some do. The focus now should be on compiled code
  | since JITs by definition can make runtime descions on if some
  | future extension that fixes this deficiency exists or not. The
  | J extension has stalled for the moment, but with these other
  | extensions ratified there should be more bandwidth available
  | hopefully.
 
    | teruakohatu wrote:
    | Can't vendor's making desktop/mobile class CPUs detect the
    | equivalent pattern and optimize it in microcode or silicon?
    | 
    | Or is that what we are trying to get away from?
 
      | monocasa wrote:
      | Maybe, but it's a leap, IMO. The equivalent patterns are 3x
      | as long, and modify tons of arch visible state for their
      | intermediate results which leaves more work for those
      | combined instructions to do.
      | 
      | The complaint is valid, IMO, and would show up on the
      | filtration test they used to come up with ops if they were
      | working with JITs too rather than just what's in AOT code.
 
| socialdemocrat wrote:
| Anyone able to put this in context? How fast are these cores
| compared to various ARM, Intel and AMD cores? At what level can
| they compete?
 
  | sanxiyn wrote:
  | > With a projected score of 11+ SPECInt2006/GHz, the SiFive
  | Performance P650 brings RISC-V into a new category of high-end
  | computing applications.
  | 
  | 11+ SPECInt2006/GHz is comparable to Apple Icestorm
  | microarchitecture. Apple Firestorm microarchitecture is roughly
  | 2x better at 22 SPECInt2006/GHz.
 
    | Symmetry wrote:
    | How impressive that number is rather depends on how many GHz
    | they're managing. In general the slower you design your clock
    | to clock, the faster you can make all your caches. Plus the
    | slower you clock your core, designed in or not, the lower the
    | number of clock cycles it takes to talk to main memory.
 
    | pantalaimon wrote:
    | Mind you that raw core performance is not everything, memory
    | bandwidth and caches are crucial to make sure the CPU isn't
    | waiting for data all the time.
 
      | sanxiyn wrote:
      | Yes, but SPECint includes all such effects. As long as
      | SPECint benchmarks (such as GCC) are representative of your
      | workload, it works fine.
 
        | tlb wrote:
        | I trust that the Apple benchmarks include all such
        | effects. I'm less convinced that the RISC-V "projections"
        | include them. SPECint2006 is supposed to be measured with
        | real memory and an OS. Per-GHz numbers can't accurately
        | reflect main memory latency, since its speed doesn't
        | scale with the CPU clock.
 
        | spear wrote:
        | Right, and "per GHz" numbers are also not very useful
        | because you can't just crank up the GHz when you need
        | performance. Even with the same process technology, you
        | can't assume different microarchitectures will max out at
        | the same frequency.
 
  | sebow wrote:
  | If i recall correctly the sifive unmatched is still pretty slow
  | compared to ARM(
  | https://www.phoronix.com/scan.php?page=article&item=hifive-u...
  | ).Now this board is not the one in question(P650) but we'll
  | have to observe upcoming benchmarks [for which i recommend
  | phoronix]
  | 
  | Obviously you can't even think about comparing it further with
  | Intel & AMD, but when you look at the history of something like
  | ARM(which i believe is 30-40 years old), riscv came a long way
  | pretty fast, and the good thing it's a solid choice for the
  | future due being open.
 
| sebow wrote:
| Sweet, are there any resources on transitioning/migrating or
| differences between x86_64 and riscv; or the ISAs are drastically
| different that it's just better to dive in head-first?
 
| bruce343434 wrote:
| > With a projected score of 11+ SPECInt2006/GHz
| 
| That seems to imply a certain integer arithmetic performance, but
| I wonder what the floating point performance is. They could have
| just said "X flops".
| 
| Comparing to other benchmarks at [1], I have no idea, because
| they all have denormalized results, so totals, rather than per
| GHz per core. Nice reporting.
| 
| How fast is this thing? Pentium? first gen i3? current gent ryzen
| 5? The fact that they are being so obtuse about it leads me to
| believe performance isn't great.
| 
| [1] https://www.spec.org/cgi-
| bin/osgresults?conf=cint2006;op=dum...
 
  | wmf wrote:
  | I'd compare it to an Atom "efficiency" core.
 
| marcodiego wrote:
| Faster than ARM A-77:
| https://www.phoronix.net/image.php?id=2021&image=sifive_p650... .
| Performance comparable to Apple Icestorm architecture, the
| 'efficiency' cores in M1. Considering A-710 is the fastest ARM
| core currently available and its successor will only be available
| next year, SiFive is just a few years before real competition
| starts in an arena currently dominated by ARM.
| 
| This will be beautiful to watch.
 
  | [deleted]
 
  | zozbot234 wrote:
  | It will be interesting to see a comparison on power-efficiency
  | as well as performance. RISC-V implementations have shown a
  | pretty sizeable advantage wrt. power use in the past, and we
  | don't quite know how this advantage compares in these larger,
  | performance-focused designs.
 
  | dmitrygr wrote:
  | > just a few years before real competition starts
  | 
  | Are you assuming the competition will just sit and do nothing?
 
    | GhettoComputers wrote:
    | Good enough" matters more than benchmarks. They can make
    | supercomputers but it doesn't matter to someone who wants a
    | $100 computer.
 
      | dmitrygr wrote:
      | All riscv thingies i see today are decidedly not $100. I do
      | see plenty of arm designs running linux under $10 though
 
| baybal2 wrote:
| This is something genuinely interesting from riscv crowd for the
| first time
 
| danielEM wrote:
| Once it gets to the shelfes at reasonable price will be happy to
| work with/on it.
| 
| Curious how IP pricing compares to ARM in this case and how much
| would I need to put on top of it to tape out own batch of
| processors
 
  | snvzz wrote:
  | The license to the ISA itself is free.
  | 
  | There's several vendors besides RISC-V offering cores for
  | licensing. There's even some OSHW cores that can be freely
  | used.
  | 
  | Even if we choose to ignore the technical prowess of being a
  | true 5th generation RISC ISA built with hindsight no other ISA
  | has, what's IMHO a big deal in RISC-V is the mere availability
  | of this market of cores.
  | 
  | It poses a threat to ARM's business model, where ARM licenses
  | cores and ISA, but nobody else than ARM can license cores to
  | others.
 
    | Teknoman117 wrote:
    | As far as OSHW cores go, it's so very nice to be able to
    | throw something together in verilog and be able to inherit a
    | compiler and not be trampling on someone else's copyright...
 
    | dmitrygr wrote:
    | > built with hindsight no other ISA has
    | 
    | Why do all the riscv fans Conveniently ignore aarch64 when
    | they make statements like this? It was in fact a completely
    | clean new design, based on hindsight, by people who know what
    | they are doing, and with no legacy Cruft.
 
      | FullyFunctional wrote:
      | I'm a fan of RISC-V but the freedom is a large part of it.
      | Aarch64 _is_ a very well designed ISA and _clearly_ has a
      | lot of benefit of hindsight. The load pair /store pair
      | instructions, the addressing modes, fixed 32-bit
      | instruction size, etc. It all really helps. I suspect that
      | Apple was actively part of designing it.
      | 
      | I think however that RISC-V isn't that much worse and
      | because of the freedom we will almost certainly see more
      | implementation of RISC-V. I'd be watching Tenstorrent,
      | SiFive, Rivos, Esperanto, and maybe Alibaba/T-Head.
 
      | brucehoult wrote:
      | Aarch64 obviously _isn 't_ a completely clean sheet design.
      | It was constrained by having to execute on the same CPU
      | pipelines as 32 bit code, at least for the first decade or
      | so. And the 32 bit mode has to perform well. There are tens
      | of millions of Raspberry Pi 3s and 4s (and later model Pi
      | 2s) which have 64 bit CPUs but have never seen a 64 bit
      | instruction in their lives. Android phones have been
      | supporting both 32 and 64 bit apps for a long time.
      | 
      | The "by people who know what they are doing" thing is just
      | pure FUD. Sure, ARM employs some competent people, but no
      | more so than IBM, Intel, AMD or the various members of
      | RISC-V International.
 
      | snvzz wrote:
      | >Why do all the riscv fans Conveniently ignore aarch64 when
      | they make statements like this? It was in fact a completely
      | clean new design, based on hindsight, by people who know
      | what they are doing, and with no legacy Cruft.
      | 
      | aarch64 seems poorly designed to me.
      | 
      | ARMv7 had thumb, but for some reason ARMv8 did not
      | incorporate any lessons from that. As a result, code
      | density is bad; ARMv8 binaries are huge.
      | 
      | ARMv9, to be available in chips next year, is just a higher
      | profile of required extensions, and does nothing to fix
      | that.
      | 
      | Ever wonder why M1 needs such huge L1 cache? Well, now you
      | know.
      | 
      | Considering ARMv9 will be competing against RVA22, I don't
      | have much hope for ARM.
 
        | dmitrygr wrote:
        | > for some reason ARMv8 did not incorporate any lessons
        | from that.
        | 
        | I used to think so too, until I asked some more
        | knowledgeable people about it. Turns out the lesson _IS_
        | that not having it is better. Fixed-sized instructions
        | make a decoding significantly simpler, making it much
        | easier to make very wide front ends
 
        | brucehoult wrote:
        | A little easier, not much easier. A number of
        | organisations are making very wide RISC-V
        | implementations, and one has already published how their
        | decoder works. It's modular, with each block looking at
        | 48 bits of code (the first 16 overlapping with the
        | previous block) and decoding either two 16 bit
        | instructions, or one aligned 32 bit instruction, or one
        | misaligned 32 bit instruction with a following 16 bit
        | instruction, or one misaligned 32 bit instruction
        | followed by an ignored start of another misaligned 32 bit
        | instruction.
        | 
        | You can put as many of these modules side by side as you
        | want. There is a serial dependency between them in that
        | each block has to tell the next block whether its last 16
        | bits are the start of a misaligned 32 bit instruction or
        | not. That could become an issue with really really wide
        | but for something decoding e.g. 16 bytes at a time (4 to
        | 8 instructions) it's not an issue.
        | 
        | There is a trade-off between a little bit of decoder
        | complexity and a lot of improved code density -- but
        | nowhere near to the same extent as say x86.
 
        | adrian_b wrote:
        | ARMv8 code density is quite good for a fixed-length ISA
        | and is of course much better than that of RISC-V.
        | 
        | RISC-V has only one good feature for code density, the
        | combined compare-and-branch instructions, but even this
        | feature was designed poorly, because it does not have all
        | the kinds of compare-and-branch that are needed, e.g. if
        | you want safe code that checks for overflows, the number
        | of required instructions and the code size explode. Only
        | unsafe code, without run-time checks, can have an
        | acceptable size in RISC-V.
        | 
        | ARMv8 has an adequate unused space in the branch opcode
        | map, where combined compare-and-branch instructions could
        | be added, and with a larger branch offset range than in
        | RISC-V, in which case the code size advantage of ARMv8
        | vs. RISC-V would increase significantly.
        | 
        | While the combined compare-and-branch of RISC-V are good
        | for code density, because branches are very frequent, the
        | rest of the ISA is bad and the worst is the lack of
        | indexed addressing, which frequently requires 2 RISC-V
        | instructions instead of 1 ARM instruction.
 
        | brucehoult wrote:
        | I'm not sure how you missed RISC-V's big feature for code
        | density -- the "C" extension, giving it arbitrarily mixed
        | 16 and 32 bit opcodes.
        | 
        | I've heard of that feature before somewhere else. It gave
        | the company that invented it unparalleled code density in
        | their 32 bit systems and propelled them to the heights of
        | success in mobile devices. What was their name? Wait ..
        | oh, yes ... ARM.
        | 
        | Why they forgot this in their 64 bit ISA is a mystery.
        | The best theory I can come up with is that they thought
        | the industry had shaken out and amd64 was the only
        | competition they were going to have, ever. Aarch64 does
        | indeed have very good code density for a fixed-length 32
        | bit opcode ISA, and comes very close to matching amd64.
        | They may have thought that was going to be good enough.
        | 
        | Note: the RISC-V "C" extension is technically optional,
        | but the only CPU cores I know of that don't implement it
        | are academic toys, student projects, and tiny cores for
        | use in FPGAs where they are running programs with only a
        | few hundred instructions in them. Once you get over even
        | maybe 1 KB of code it's cheaper in resources to implement
        | "C" than to provide more program storage.
 
        | zozbot234 wrote:
        | The thing with lack of shifted indexed addressing is that
        | it just might not matter all that much beyond toy
        | examples. Address calculations can generally be folded in
        | with other code, particularly in loops which are a common
        | case. So it's only rarely that you actually need those
        | extra instructions.
 
        | adrian_b wrote:
        | Shifted indexed addressing is needed more seldom, but
        | indexed addressing, i.e. register + register, is needed
        | in every loop that accesses memory.
        | 
        | There are 2 ways of programming a loop that addresses
        | memory with a minimum of instructions.
        | 
        | One way, which is preferable e.g. on Intel/AMD, is to
        | reuse the loop counter as the index into the data
        | structure that is accessed, so each load/store needs a
        | base register + index register addressing, which is
        | missing in RISC-V.
        | 
        | The second way, which is preferable e.g. on POWER and
        | which is also available on ARM, is to use an addressing
        | mode with auto-update, where the offset used in loads or
        | stores is added into the base register. This is also
        | missing in RISC-V.
        | 
        | Because none of the 2 methods works in RISC-V with a
        | minimum number of instructions, like in all other CPUs,
        | all such loops, which are very frequent, need pairs of
        | instructions in RISC-V, corresponding to single
        | instructions in the other CPUs.
 
        | brucehoult wrote:
        | A big difference here is that the RISC-V instructions are
        | usually all 16 bits in size while the Aarch64 and POWER
        | instructions are all 32 bits in size. So the code size is
        | the same.
        | 
        | Also, high performance Aarch64 and POWER implementations
        | are likely to be splitting those instructions into two
        | decoupled uops in the back end.
        | 
        | Performance-critical loops are unrolled on all ISAs to
        | minimise loop control overhead and also to allow
        | scheduling instructions to allow for the several cycle
        | latency of loads from even L1 cache. When you do that,
        | indexed addressing and auto-update addressing are still
        | doing both operations for every load or store which, as
        | well as being a lot of operations, introduces sequential
        | dependency between the instructions. The RISC-V way
        | allows the use of simple load/store with offset -- all of
        | which are independent of each other -- with one merged
        | update of each pointer at the end of the loop. POWER and
        | Aarch64 compilers for high performance microarchitectures
        | use the RISC-V structure for unrolled loops anyway.
        | 
        | So indexed addressing and auto-update addressing give no
        | advantage for code size, and don't help performance at
        | the high end.
 
        | snvzz wrote:
        | >in which case the code size advantage of ARMv8 vs.
        | RISC-V would increase significantly.
        | 
        | Many things could be said about ARMv8, but that it has
        | good code size is not one of it. It does, in fact, have
        | abysmal code density. Both RISC-V and x86-64 produce
        | significantly smaller binaries. For RISC-V, we're talking
        | about a 20% reduction of size.
        | 
        | There's a wealth of papers on this, but you can verify
        | this trivially yourself, by either compiling binaries for
        | different architectures from the same sources, or
        | comparing binaries in Linux distributions that support
        | RISC-V and ARM.
        | 
        | >where combined compare-and-branch instructions could be
        | added, and with a larger branch offset range than in
        | RISC-V
        | 
        | If your argument is that ARMv8 could get better over
        | time, I hate to be the bearer of bad news. ARMv9 code
        | density isn't any better.
        | 
        | >and the worst is the lack of indexed addressing, which
        | frequently requires 2 RISC-V instructions instead of 1
        | ARM instruction.
        | 
        | These patterns are standardized, and they become one
        | instruction after fusion.
        | 
        | RISC-V, unlike the previous generation of ISAs, was
        | thoroughly designed with hindsight on fusion. The
        | simplest microarchitectures can of course omit it
        | altogether, but the cost of fusion in RISC-V is low; I
        | have seen it quoted at 400 gates.
 
        | brucehoult wrote:
        | Instruction fusion is a possibility for the future, which
        | has been discussed academically, but no one implements it
        | at present. I'm not sure anyone will -- it's too much
        | complexity for simple cores, and not needed for big OoO
        | cores.
        | 
        | The one fusion implementation I'm aware of if the SiFive
        | 7-series combining a conditional branch that jumps
        | forward over exactly one instruction. It turns the
        | instruction pair into predicated execution.
        | 
        | I agree with everything else. In particular the code
        | density. Anyone can download Ubuntu or Fedora images for
        | the same release for amd64, arm64, and riscv64. Mount
        | them and run "size" on any selection of binaries you
        | want. The RISC-V ones are consistently and significantly
        | smaller than the other two, with arm64 the biggest.
 
        | pohl wrote:
        | _Ever wonder why M1 needs such huge L1 cache? Well, now
        | you know._
        | 
        | I'm not sure I follow this, but it reminds me to ask:
        | does RISC-V allow for designs to have both efficiency &
        | performance cores like the ARM big.LITTLE concept? Has
        | anyone made one yet?
 
        | brucehoult wrote:
        | Of course you can do it. SiFive has been allowing
        | customers to configure core complexes with a mixture of
        | different core types for years -- for example mixing U84
        | cores with U74 or U54. If you want to do a BIG.little
        | thing with transferring a running program from one core
        | type to another that's just a software thing -- and using
        | cores with the same ISA but different microarchitecture.
        | 
        | To date the examples of this that have been shipped to
        | the public have used cores with similar
        | microarchitecture, but a different set of extensions.
        | 
        | For example the U54-MC in the HiFive Unleashed and in the
        | Microsemi Polarfire SoC FPGAs use four U54 cores plus one
        | E51 core for "real time" tasks. The E51 doesn't have an
        | FPU or MMU or Supervisor mode. The U74-MC in the HiFive
        | Unmatched is similar.
        | 
        | Alibaba's ICE SoC, which you may have seen videos of
        | running Android, has two C910 Out-of-Order cores (similar
        | to ARM A72/A73) implementing RV64GC, and a third C910
        | core that also has a vector processing unit with two
        | pipes with 256 bit vector ALU each, plus 128 bit vector
        | load and store pipes.
 
        | [deleted]
 
    | fartcannon wrote:
    | So I guess we should expect to hear a lot of FUD about RISC-V
    | over the coming years.
 
      | marcodiego wrote:
      | No need to wait. Already happened in 2018:
      | https://www.theregister.com/2018/07/10/arm_riscv_website/
      | 
      | https://www.extremetech.com/wp-
      | content/uploads/2018/07/arm-r...
 
        | snvzz wrote:
        | And it is how many learned about RISC-V's existence.
        | 
        | It will be a PR disaster long remembered. One for the
        | textbooks.
 
      | snvzz wrote:
      | This is a real possibility, albeit a sad one.
      | 
      | No amount of FUD will save ARM. Only pivoting into a
      | different business model could.
 
        | duskwuff wrote:
        | Honestly, ARM is fine. They're no longer the only game in
        | town, but they've still got a huge head start.
 
        | snvzz wrote:
        | They'll be fine if they focus on their microarchitectures
        | rather than the ISA (where IMHO they've already lost),
        | and make the process for obtaining a license much more
        | streamlined; I've heard it takes no less than 18 months
        | of long negotiations to license anythin from ARM. That's
        | not sustainable now that there's competition.
 
        | duskwuff wrote:
        | That's already where their focus is. Most of ARM's
        | customers are licensing specific cores from ARM, not the
        | ISA as a whole.
 
| jaas wrote:
| Who exactly are the customers for this chip?
 
___________________________________________________________________
(page generated 2021-12-02 23:01 UTC)