|
| snvzz wrote:
| Some context: RISC-V Summit is next week, and RISC-V
| international has just approved a batch of important
| extensions[0]. With these extensions, RISC-V is not missing
| anything relative to ARM and x86 ISAs in terms of functionality.
|
| I expect a lot of tape-outs to happen this month, as core vendors
| were probably holding for the announced ratifications, in fear of
| last minute changes. Next year is going to be exciting.
|
| [0]: https://riscv.org/announcements/2021/12/riscv-
| ratifies-15-ne...
| [deleted]
| socialdemocrat wrote:
| That is great news! Is there any friendly intro/coverage
| anywhere of the new vector extension?
|
| I am curious about the final design. Would be interesting to
| hear how people think it compares with ARMs scalable vector
| extensions.
| snvzz wrote:
| There's been a few talks on the topic. They're archived in
| e.g. youtube.
|
| I like it. It's fairly simple and clean, yet powerful.
|
| There was also some discussion here in HN months ago, about
| an article comparing RISC-V V extension and ARM SVE.
|
| The article itself got several things wrong about V, but the
| discussion[0] was interesting.
|
| [0] https://news.ycombinator.com/item?id=27063748
| [deleted]
| monocasa wrote:
| I wouldn't say RISC-V isn't missing anything. The lack of
| add/subtract with carry is an issue for efficient runtime of
| many JITed languages like JavaScript.
|
| That being said, I don't think it's the worse thing in the
| world like some do. The focus now should be on compiled code
| since JITs by definition can make runtime descions on if some
| future extension that fixes this deficiency exists or not. The
| J extension has stalled for the moment, but with these other
| extensions ratified there should be more bandwidth available
| hopefully.
| teruakohatu wrote:
| Can't vendor's making desktop/mobile class CPUs detect the
| equivalent pattern and optimize it in microcode or silicon?
|
| Or is that what we are trying to get away from?
| monocasa wrote:
| Maybe, but it's a leap, IMO. The equivalent patterns are 3x
| as long, and modify tons of arch visible state for their
| intermediate results which leaves more work for those
| combined instructions to do.
|
| The complaint is valid, IMO, and would show up on the
| filtration test they used to come up with ops if they were
| working with JITs too rather than just what's in AOT code.
| socialdemocrat wrote:
| Anyone able to put this in context? How fast are these cores
| compared to various ARM, Intel and AMD cores? At what level can
| they compete?
| sanxiyn wrote:
| > With a projected score of 11+ SPECInt2006/GHz, the SiFive
| Performance P650 brings RISC-V into a new category of high-end
| computing applications.
|
| 11+ SPECInt2006/GHz is comparable to Apple Icestorm
| microarchitecture. Apple Firestorm microarchitecture is roughly
| 2x better at 22 SPECInt2006/GHz.
| Symmetry wrote:
| How impressive that number is rather depends on how many GHz
| they're managing. In general the slower you design your clock
| to clock, the faster you can make all your caches. Plus the
| slower you clock your core, designed in or not, the lower the
| number of clock cycles it takes to talk to main memory.
| pantalaimon wrote:
| Mind you that raw core performance is not everything, memory
| bandwidth and caches are crucial to make sure the CPU isn't
| waiting for data all the time.
| sanxiyn wrote:
| Yes, but SPECint includes all such effects. As long as
| SPECint benchmarks (such as GCC) are representative of your
| workload, it works fine.
| tlb wrote:
| I trust that the Apple benchmarks include all such
| effects. I'm less convinced that the RISC-V "projections"
| include them. SPECint2006 is supposed to be measured with
| real memory and an OS. Per-GHz numbers can't accurately
| reflect main memory latency, since its speed doesn't
| scale with the CPU clock.
| spear wrote:
| Right, and "per GHz" numbers are also not very useful
| because you can't just crank up the GHz when you need
| performance. Even with the same process technology, you
| can't assume different microarchitectures will max out at
| the same frequency.
| sebow wrote:
| If i recall correctly the sifive unmatched is still pretty slow
| compared to ARM(
| https://www.phoronix.com/scan.php?page=article&item=hifive-u...
| ).Now this board is not the one in question(P650) but we'll
| have to observe upcoming benchmarks [for which i recommend
| phoronix]
|
| Obviously you can't even think about comparing it further with
| Intel & AMD, but when you look at the history of something like
| ARM(which i believe is 30-40 years old), riscv came a long way
| pretty fast, and the good thing it's a solid choice for the
| future due being open.
| sebow wrote:
| Sweet, are there any resources on transitioning/migrating or
| differences between x86_64 and riscv; or the ISAs are drastically
| different that it's just better to dive in head-first?
| bruce343434 wrote:
| > With a projected score of 11+ SPECInt2006/GHz
|
| That seems to imply a certain integer arithmetic performance, but
| I wonder what the floating point performance is. They could have
| just said "X flops".
|
| Comparing to other benchmarks at [1], I have no idea, because
| they all have denormalized results, so totals, rather than per
| GHz per core. Nice reporting.
|
| How fast is this thing? Pentium? first gen i3? current gent ryzen
| 5? The fact that they are being so obtuse about it leads me to
| believe performance isn't great.
|
| [1] https://www.spec.org/cgi-
| bin/osgresults?conf=cint2006;op=dum...
| wmf wrote:
| I'd compare it to an Atom "efficiency" core.
| marcodiego wrote:
| Faster than ARM A-77:
| https://www.phoronix.net/image.php?id=2021&image=sifive_p650... .
| Performance comparable to Apple Icestorm architecture, the
| 'efficiency' cores in M1. Considering A-710 is the fastest ARM
| core currently available and its successor will only be available
| next year, SiFive is just a few years before real competition
| starts in an arena currently dominated by ARM.
|
| This will be beautiful to watch.
| [deleted]
| zozbot234 wrote:
| It will be interesting to see a comparison on power-efficiency
| as well as performance. RISC-V implementations have shown a
| pretty sizeable advantage wrt. power use in the past, and we
| don't quite know how this advantage compares in these larger,
| performance-focused designs.
| dmitrygr wrote:
| > just a few years before real competition starts
|
| Are you assuming the competition will just sit and do nothing?
| GhettoComputers wrote:
| Good enough" matters more than benchmarks. They can make
| supercomputers but it doesn't matter to someone who wants a
| $100 computer.
| dmitrygr wrote:
| All riscv thingies i see today are decidedly not $100. I do
| see plenty of arm designs running linux under $10 though
| baybal2 wrote:
| This is something genuinely interesting from riscv crowd for the
| first time
| danielEM wrote:
| Once it gets to the shelfes at reasonable price will be happy to
| work with/on it.
|
| Curious how IP pricing compares to ARM in this case and how much
| would I need to put on top of it to tape out own batch of
| processors
| snvzz wrote:
| The license to the ISA itself is free.
|
| There's several vendors besides RISC-V offering cores for
| licensing. There's even some OSHW cores that can be freely
| used.
|
| Even if we choose to ignore the technical prowess of being a
| true 5th generation RISC ISA built with hindsight no other ISA
| has, what's IMHO a big deal in RISC-V is the mere availability
| of this market of cores.
|
| It poses a threat to ARM's business model, where ARM licenses
| cores and ISA, but nobody else than ARM can license cores to
| others.
| Teknoman117 wrote:
| As far as OSHW cores go, it's so very nice to be able to
| throw something together in verilog and be able to inherit a
| compiler and not be trampling on someone else's copyright...
| dmitrygr wrote:
| > built with hindsight no other ISA has
|
| Why do all the riscv fans Conveniently ignore aarch64 when
| they make statements like this? It was in fact a completely
| clean new design, based on hindsight, by people who know what
| they are doing, and with no legacy Cruft.
| FullyFunctional wrote:
| I'm a fan of RISC-V but the freedom is a large part of it.
| Aarch64 _is_ a very well designed ISA and _clearly_ has a
| lot of benefit of hindsight. The load pair /store pair
| instructions, the addressing modes, fixed 32-bit
| instruction size, etc. It all really helps. I suspect that
| Apple was actively part of designing it.
|
| I think however that RISC-V isn't that much worse and
| because of the freedom we will almost certainly see more
| implementation of RISC-V. I'd be watching Tenstorrent,
| SiFive, Rivos, Esperanto, and maybe Alibaba/T-Head.
| brucehoult wrote:
| Aarch64 obviously _isn 't_ a completely clean sheet design.
| It was constrained by having to execute on the same CPU
| pipelines as 32 bit code, at least for the first decade or
| so. And the 32 bit mode has to perform well. There are tens
| of millions of Raspberry Pi 3s and 4s (and later model Pi
| 2s) which have 64 bit CPUs but have never seen a 64 bit
| instruction in their lives. Android phones have been
| supporting both 32 and 64 bit apps for a long time.
|
| The "by people who know what they are doing" thing is just
| pure FUD. Sure, ARM employs some competent people, but no
| more so than IBM, Intel, AMD or the various members of
| RISC-V International.
| snvzz wrote:
| >Why do all the riscv fans Conveniently ignore aarch64 when
| they make statements like this? It was in fact a completely
| clean new design, based on hindsight, by people who know
| what they are doing, and with no legacy Cruft.
|
| aarch64 seems poorly designed to me.
|
| ARMv7 had thumb, but for some reason ARMv8 did not
| incorporate any lessons from that. As a result, code
| density is bad; ARMv8 binaries are huge.
|
| ARMv9, to be available in chips next year, is just a higher
| profile of required extensions, and does nothing to fix
| that.
|
| Ever wonder why M1 needs such huge L1 cache? Well, now you
| know.
|
| Considering ARMv9 will be competing against RVA22, I don't
| have much hope for ARM.
| dmitrygr wrote:
| > for some reason ARMv8 did not incorporate any lessons
| from that.
|
| I used to think so too, until I asked some more
| knowledgeable people about it. Turns out the lesson _IS_
| that not having it is better. Fixed-sized instructions
| make a decoding significantly simpler, making it much
| easier to make very wide front ends
| brucehoult wrote:
| A little easier, not much easier. A number of
| organisations are making very wide RISC-V
| implementations, and one has already published how their
| decoder works. It's modular, with each block looking at
| 48 bits of code (the first 16 overlapping with the
| previous block) and decoding either two 16 bit
| instructions, or one aligned 32 bit instruction, or one
| misaligned 32 bit instruction with a following 16 bit
| instruction, or one misaligned 32 bit instruction
| followed by an ignored start of another misaligned 32 bit
| instruction.
|
| You can put as many of these modules side by side as you
| want. There is a serial dependency between them in that
| each block has to tell the next block whether its last 16
| bits are the start of a misaligned 32 bit instruction or
| not. That could become an issue with really really wide
| but for something decoding e.g. 16 bytes at a time (4 to
| 8 instructions) it's not an issue.
|
| There is a trade-off between a little bit of decoder
| complexity and a lot of improved code density -- but
| nowhere near to the same extent as say x86.
| adrian_b wrote:
| ARMv8 code density is quite good for a fixed-length ISA
| and is of course much better than that of RISC-V.
|
| RISC-V has only one good feature for code density, the
| combined compare-and-branch instructions, but even this
| feature was designed poorly, because it does not have all
| the kinds of compare-and-branch that are needed, e.g. if
| you want safe code that checks for overflows, the number
| of required instructions and the code size explode. Only
| unsafe code, without run-time checks, can have an
| acceptable size in RISC-V.
|
| ARMv8 has an adequate unused space in the branch opcode
| map, where combined compare-and-branch instructions could
| be added, and with a larger branch offset range than in
| RISC-V, in which case the code size advantage of ARMv8
| vs. RISC-V would increase significantly.
|
| While the combined compare-and-branch of RISC-V are good
| for code density, because branches are very frequent, the
| rest of the ISA is bad and the worst is the lack of
| indexed addressing, which frequently requires 2 RISC-V
| instructions instead of 1 ARM instruction.
| brucehoult wrote:
| I'm not sure how you missed RISC-V's big feature for code
| density -- the "C" extension, giving it arbitrarily mixed
| 16 and 32 bit opcodes.
|
| I've heard of that feature before somewhere else. It gave
| the company that invented it unparalleled code density in
| their 32 bit systems and propelled them to the heights of
| success in mobile devices. What was their name? Wait ..
| oh, yes ... ARM.
|
| Why they forgot this in their 64 bit ISA is a mystery.
| The best theory I can come up with is that they thought
| the industry had shaken out and amd64 was the only
| competition they were going to have, ever. Aarch64 does
| indeed have very good code density for a fixed-length 32
| bit opcode ISA, and comes very close to matching amd64.
| They may have thought that was going to be good enough.
|
| Note: the RISC-V "C" extension is technically optional,
| but the only CPU cores I know of that don't implement it
| are academic toys, student projects, and tiny cores for
| use in FPGAs where they are running programs with only a
| few hundred instructions in them. Once you get over even
| maybe 1 KB of code it's cheaper in resources to implement
| "C" than to provide more program storage.
| zozbot234 wrote:
| The thing with lack of shifted indexed addressing is that
| it just might not matter all that much beyond toy
| examples. Address calculations can generally be folded in
| with other code, particularly in loops which are a common
| case. So it's only rarely that you actually need those
| extra instructions.
| adrian_b wrote:
| Shifted indexed addressing is needed more seldom, but
| indexed addressing, i.e. register + register, is needed
| in every loop that accesses memory.
|
| There are 2 ways of programming a loop that addresses
| memory with a minimum of instructions.
|
| One way, which is preferable e.g. on Intel/AMD, is to
| reuse the loop counter as the index into the data
| structure that is accessed, so each load/store needs a
| base register + index register addressing, which is
| missing in RISC-V.
|
| The second way, which is preferable e.g. on POWER and
| which is also available on ARM, is to use an addressing
| mode with auto-update, where the offset used in loads or
| stores is added into the base register. This is also
| missing in RISC-V.
|
| Because none of the 2 methods works in RISC-V with a
| minimum number of instructions, like in all other CPUs,
| all such loops, which are very frequent, need pairs of
| instructions in RISC-V, corresponding to single
| instructions in the other CPUs.
| brucehoult wrote:
| A big difference here is that the RISC-V instructions are
| usually all 16 bits in size while the Aarch64 and POWER
| instructions are all 32 bits in size. So the code size is
| the same.
|
| Also, high performance Aarch64 and POWER implementations
| are likely to be splitting those instructions into two
| decoupled uops in the back end.
|
| Performance-critical loops are unrolled on all ISAs to
| minimise loop control overhead and also to allow
| scheduling instructions to allow for the several cycle
| latency of loads from even L1 cache. When you do that,
| indexed addressing and auto-update addressing are still
| doing both operations for every load or store which, as
| well as being a lot of operations, introduces sequential
| dependency between the instructions. The RISC-V way
| allows the use of simple load/store with offset -- all of
| which are independent of each other -- with one merged
| update of each pointer at the end of the loop. POWER and
| Aarch64 compilers for high performance microarchitectures
| use the RISC-V structure for unrolled loops anyway.
|
| So indexed addressing and auto-update addressing give no
| advantage for code size, and don't help performance at
| the high end.
| snvzz wrote:
| >in which case the code size advantage of ARMv8 vs.
| RISC-V would increase significantly.
|
| Many things could be said about ARMv8, but that it has
| good code size is not one of it. It does, in fact, have
| abysmal code density. Both RISC-V and x86-64 produce
| significantly smaller binaries. For RISC-V, we're talking
| about a 20% reduction of size.
|
| There's a wealth of papers on this, but you can verify
| this trivially yourself, by either compiling binaries for
| different architectures from the same sources, or
| comparing binaries in Linux distributions that support
| RISC-V and ARM.
|
| >where combined compare-and-branch instructions could be
| added, and with a larger branch offset range than in
| RISC-V
|
| If your argument is that ARMv8 could get better over
| time, I hate to be the bearer of bad news. ARMv9 code
| density isn't any better.
|
| >and the worst is the lack of indexed addressing, which
| frequently requires 2 RISC-V instructions instead of 1
| ARM instruction.
|
| These patterns are standardized, and they become one
| instruction after fusion.
|
| RISC-V, unlike the previous generation of ISAs, was
| thoroughly designed with hindsight on fusion. The
| simplest microarchitectures can of course omit it
| altogether, but the cost of fusion in RISC-V is low; I
| have seen it quoted at 400 gates.
| brucehoult wrote:
| Instruction fusion is a possibility for the future, which
| has been discussed academically, but no one implements it
| at present. I'm not sure anyone will -- it's too much
| complexity for simple cores, and not needed for big OoO
| cores.
|
| The one fusion implementation I'm aware of if the SiFive
| 7-series combining a conditional branch that jumps
| forward over exactly one instruction. It turns the
| instruction pair into predicated execution.
|
| I agree with everything else. In particular the code
| density. Anyone can download Ubuntu or Fedora images for
| the same release for amd64, arm64, and riscv64. Mount
| them and run "size" on any selection of binaries you
| want. The RISC-V ones are consistently and significantly
| smaller than the other two, with arm64 the biggest.
| pohl wrote:
| _Ever wonder why M1 needs such huge L1 cache? Well, now
| you know._
|
| I'm not sure I follow this, but it reminds me to ask:
| does RISC-V allow for designs to have both efficiency &
| performance cores like the ARM big.LITTLE concept? Has
| anyone made one yet?
| brucehoult wrote:
| Of course you can do it. SiFive has been allowing
| customers to configure core complexes with a mixture of
| different core types for years -- for example mixing U84
| cores with U74 or U54. If you want to do a BIG.little
| thing with transferring a running program from one core
| type to another that's just a software thing -- and using
| cores with the same ISA but different microarchitecture.
|
| To date the examples of this that have been shipped to
| the public have used cores with similar
| microarchitecture, but a different set of extensions.
|
| For example the U54-MC in the HiFive Unleashed and in the
| Microsemi Polarfire SoC FPGAs use four U54 cores plus one
| E51 core for "real time" tasks. The E51 doesn't have an
| FPU or MMU or Supervisor mode. The U74-MC in the HiFive
| Unmatched is similar.
|
| Alibaba's ICE SoC, which you may have seen videos of
| running Android, has two C910 Out-of-Order cores (similar
| to ARM A72/A73) implementing RV64GC, and a third C910
| core that also has a vector processing unit with two
| pipes with 256 bit vector ALU each, plus 128 bit vector
| load and store pipes.
| [deleted]
| fartcannon wrote:
| So I guess we should expect to hear a lot of FUD about RISC-V
| over the coming years.
| marcodiego wrote:
| No need to wait. Already happened in 2018:
| https://www.theregister.com/2018/07/10/arm_riscv_website/
|
| https://www.extremetech.com/wp-
| content/uploads/2018/07/arm-r...
| snvzz wrote:
| And it is how many learned about RISC-V's existence.
|
| It will be a PR disaster long remembered. One for the
| textbooks.
| snvzz wrote:
| This is a real possibility, albeit a sad one.
|
| No amount of FUD will save ARM. Only pivoting into a
| different business model could.
| duskwuff wrote:
| Honestly, ARM is fine. They're no longer the only game in
| town, but they've still got a huge head start.
| snvzz wrote:
| They'll be fine if they focus on their microarchitectures
| rather than the ISA (where IMHO they've already lost),
| and make the process for obtaining a license much more
| streamlined; I've heard it takes no less than 18 months
| of long negotiations to license anythin from ARM. That's
| not sustainable now that there's competition.
| duskwuff wrote:
| That's already where their focus is. Most of ARM's
| customers are licensing specific cores from ARM, not the
| ISA as a whole.
| jaas wrote:
| Who exactly are the customers for this chip?
___________________________________________________________________
(page generated 2021-12-02 23:01 UTC) |