|
| nuancebydefault wrote:
| It seems that operations on FPGAs can run much more efficiently
| than their cpu equivalent. For an 'AND' operation, a cpu needs to
| load code and data from a memory into registers, run the logic
| and write the result register back to some memory. This while
| filling up the pipeline for subsequent operations.
|
| The FPGA on the other hand has the output ready one clock cycle
| after the inputs stream in, and can have many such operations in
| parallel. One might ask, why are cpus not being replaced by
| FPGAs?
|
| Another interesting question, can software (recipes for cpus) be
| transpiled to be efficiently run on FPGAs?
|
| I could ask GPT those questions, but the HN community will
| provide more insight I guess.
| pfyra wrote:
| > Another interesting question, can software (recipes for cpus)
| be transpiled to be efficiently run on FPGAs?
|
| Yes. At least for c and c++. It is called High Level Synthesis.
| Lramseyer wrote:
| These are really good questions to be asking, and to help with
| that let's consider 3 attributes of compute complexity: time,
| space, and memory
|
| The traditional way of computing on a CPU is in essence a list
| of instructions to be computed. These instructions all go to
| the same place (the CPU core) to be computed. Since the space
| is constant, the instructions are computed sequentially in
| time. Most programmers aren't concerned with redesigning a CPU,
| so we typically only think about computing in time (and memory
| of course)
|
| On an FPGA (and custom silicon) the speedup comes from being
| able to compute in both time and space. Instead of your
| instructions existing in memory, and computed in time, they can
| be represented in separate logic elements (in space) and they
| can each do separate things in time. So in a way, you're
| trading space for time. This is how the speed gains are
| achieved.
|
| Where this all breaks down is the optimization and scheduling.
| A sequential task is relatively easy to optimize since you're
| optimizing in time (and memory to an extent.) Scheduling is
| easy too, since, they can be prioritized and queued up.
| However, when you're computing in space, you have to optimize
| in 2 spatial dimensions and in time. When you have multiple
| tasks that that need to be completed, you then need to place
| them together and not have them overlap.
|
| Think trying to fit a ton of different shaped tiles on a table,
| where you need to be constantly adding and removing tiles in a
| way that doesn't disrupt the placement of other tiles (at least
| not too often.) It's kind of a pain, but for some more
| constrained problem sets, it might make sense.
|
| These aren't impossible problems, and for some tasks, the time
| or power usage savings is worth the additional complexity. But
| sequential optimization is way easier, and good enough for most
| tasks. However, if our desire for faster computing outpaces our
| ability to make faster CPUs, you may see more FPGAs doing this
| sort of thing. We already have FPGAs that are capable of
| partial reconfiguration, and some pretty good software tools to
| go along with it.
|
| TL;DR: Geometry is hard.
| toast0 wrote:
| > The FPGA on the other hand has the output ready one clock
| cycle after the inputs stream in, and can have many such
| operations in parallel. One might ask, why are cpus not being
| replaced by FPGAs?
|
| FPGAs are more or less a flexible replacement for an
| application specific (logic level) integrated circuit. A CPU
| can do a wide variety of tasks, with a small penalty for
| switching tasks. An ASIC can do one thing and that's it, a FPGA
| can do many things, but with a large penalty for task
| switching. (you can have a CPU as an ASIC or an FPGA, but...).
| ASICs require a lot of upfront design work and costs, so you
| can't use them for everything. ASICs and especially CPUs tend
| to be able to achieve a higher clock speed that FPGAs, but it
| kind of depends.
|
| > Another interesting question, can software (recipes for cpus)
| be transpiled to be efficiently run on FPGAs?
|
| Not really; the way problems are solved is drastically
| different, and I'd expect most things would need to be
| reconceptualized to fit. And a lot of software isn't really
| suited to living as a logic circuit. Exceptions would be
| encoding, compression, encryption, the inverses of all of
| those, signal processing, etc. Things where you have a data
| pipeline and 'the same thing' happens to all the data.
| jcranmer wrote:
| FPGAs are the next big frontier for software development, and
| have been since the '90s, they just need the programming model
| worked out. This is the traditional story told about FPGAs, but
| GPGPU programming suddenly overtaking FPGA development about
| 2010 despite their awkward programming models makes that story
| rather suspect. The thing is, a lot of the benefits of FPGAs
| are really best-case scenarios, and when you move to more
| typical scenarios, their competitiveness as an architecture
| dwindles dramatically.
|
| Pipelining on an FPGA requires being able to find, and fill,
| spatial duplication of the operations being done. If you've got
| conditional operations in a pipeline, now your pipeline isn't
| so full anymore, and this hurts performance on an FPGA far more
| than on a CPU (which spends a lot of power trying to keep its
| pipelines full). But needing to keep the pipelines spatially
| connected also means you have to be able to find a physical
| connection between the two stages of a pipeline, and the
| physical length of that connection also imposes limitations on
| the frequency you can run the FPGA at.
|
| If you care about FLOPS (or throughput in general), the problem
| with FPGAs is that they are running at a clock speed about a
| tenth of a CPU. This requires a 10x improvement in performance
| just to stand still; given that software development for FPGAs
| requires essentially a completely different mindset than for
| CPUs or even GPUs, it's not common to have use cases that work
| well on FPGAs.
|
| (I should say that a lot of my information about programming
| FPGAs comes from ex-FPGA developers, and the "ex-" part will
| certainly have its own form of bias in these opinions).
| davemp wrote:
| Yeah I don't really see FPGAs ever making their way down to
| consumers the way GPUs and CPUs have (end users actually
| programming them).
|
| For (semi) fixed pipeline operations FPGAs will basically
| always be worse than some slightly more specialized ASIC like
| a GPU/AI engine.
|
| One area FPGAs can be exceptionally good at is real-time
| operations. You have much better control over timing in the
| general on FPGAs vs MCU/CPUs, but I don't think that's
| inherent (you could probably alter the mcu architecture a bit
| and close the gap).
|
| I could be wrong but I also think you get better power draw
| for things like mid to low volume glue chips in embedded
| systems because you're not powering big SRAM banks and DMAs
| just to pipe data between a couple hardware interfaces. This
| is only because of market forces though obviously, because if
| mid to low volume ASICs become viable in terms of dev time
| they'll be much better.
| pjc50 wrote:
| > One might ask, why are cpus not being replaced by FPGAs?
|
| Most of the time you want data-dependent execution. FPGA
| systems excel at "fixed pipeline" systems, where you have e.g.
| an audio filter chain .. but even that is usually done in
| efficient DSP CPUs.
|
| > Another interesting question, can software (recipes for cpus)
| be transpiled to be efficiently run on FPGAs?
|
| A _subset_ can. Things like recursion are right out. Various
| companies have tools to do this, but you usually end up having
| to rework either the source you 're feeding them, or the HDL
| output.
| burnished wrote:
| They both use the same kind of components; the FPGA does not
| have a speed advantage, you are simply comparing the speed of a
| very simple circuit element to the speed of a very complicated
| pipeline.
|
| You would use an FPGA to simulate a special purpose circuit,
| which would be faster than a CPU for its specific purpose. We
| have CPUs because having a general purpose processing chip is
| incredibly handy when you want to be able to do more than one
| thing.
|
| EDIT: I forgot to mention that the device outputs in one clock
| cycle by definition: if your clock is too fast then your
| components output signals dont have time to stabilize and you
| will get read errors, so you ensure your clock is slow enough
| for everything to stabilize.
| JackSlateur wrote:
| For the same reasons we do not replace CPUs with GPUs: not the
| right tool
|
| Check out the instruction set of modern CPUs
| convolvatron wrote:
| one big problem is memory. basic cpus have alot of facilities
| for high-speed synchronous interface with DRAM, and truly vast
| amount of resource for cache.
|
| partially as a result, a good model for compiling code to fpgas
| uses a dataflow paradigm, since we don't need to serialize all
| operations through a memory fetch, cache, or even register
| file.
|
| if we hadn't decided to move all our computing to the cloud, I
| suspect fpga accelerator boards for applications which map well
| to that model would have some traction in specialized areas.
| signal processing is definitely one such.
| quadrature wrote:
| >One might ask, why are cpus not being replaced by FPGAs?
|
| they do sometimes !, for very specific applications. The
| problem is that an FPGA is programmed for one specific task and
| would have to be taken offline and reprogrammed if you wanted
| to do something else with it. Its not general purpose like a
| CPU where you can load up any program and have it run.
|
| Programming an FPGA is also comparatively much harder to reason
| about than a CPU because of the parallelism and timing you
| described.
| MSFT_Edging wrote:
| Some of the more modern Xilinx stuff has features where you
| don't need to take down the whole FPGA to reload a bitstream
| onto part of the chip. Its really neat, you can do live
| reprogramming of one component and leave the others alone or
| have an A/B setup where one updates while the other is
| unchanged.
| JohnFen wrote:
| Yes, I'm working on a Xilinx ARM processor with an FPGA.
| The FPGA and the CPU are independent units in the chip that
| can each operate with or without the other. We can indeed
| reprogram the FPGA without taking the system down.
| davemp wrote:
| It goes even further. You can partially reconfigure the
| FPGA fabric itself:
| https://support.xilinx.com/s/article/34924?language=en_US
| quadrature wrote:
| That is really cool, hadn't heard of that before.
| barelyauser wrote:
| What is simpler: making logical circuit "A" or making a circuit
| that emulates logical circuit "A" and its relatives?
| markx2 wrote:
| If anyone in unaware you can buy the very impressive Pocket.
| https://www.analogue.co/pocket
|
| The current list of what it can do with FPGA is listed here -
| https://openfpga-cores-inventory.github.io/analogue-pocket/ and
| the inevitable sub-reddit is a good resource.
| https://old.reddit.com/r/AnaloguePocket/
| gchadwick wrote:
| There's also the MiSTer project: https://github.com/MiSTer-
| devel/Wiki_MiSTer/wiki. Not hand-held (yet...) and hardware is
| less slick but a bunch more systems and also fully open source.
| phendrenad2 wrote:
| MiSTer makes me kind of sad, the DE10-nano board it's based
| on is 7 years old at this point, and the actual FPGA chip on
| the board is probably over twice as old as that. And this is
| still the peak of hobby FPGA chips. I wonder why Moore's Law
| is hitting the FPGA industry particularly hard all of a
| sudden.
| willis936 wrote:
| There are better FPGA options, they're just more expensive.
| The DE-10 Nano was strategically chosen as "powerful enough
| to meet most wants while still being within a reasonable
| budget".
|
| No one's going to plunk down $10k for a 19 EV Zynq
| UltraScale+ with 1.1M LEs, but they will spend $200 on a
| Cylcone V with 210k LEs.
| MrHeather wrote:
| The article says FPGAs are too power hungry for handheld
| devices. Did Analogue do anything special to solve this problem
| on the Pocket?
| agg23 wrote:
| That's honestly not true at all; it all just depends on your
| platform. On the Pocket, the FPGA _is_ the processor (there
| are actually two FPGAs, one for the actual emulation core,
| and one for scaling video, and there's technically a PIC
| microcontroller for uploading bitstreams and managing UI).
| The FPGAs are still not much power compared to the display
| itself. With the in-built current sensor on the dev kits, the
| highest we've measured drawn by the main FPGA is ~300mAh. Now
| this sensor isn't going to be the best measurement, but it's
| something to go off of.
| eulgro wrote:
| > ~300 mAh
|
| mA? You're not very convincing here.
| WhiteDawn wrote:
| Personally I think this is the biggest selling feature of
| FPGA based emulation.
|
| The reality is both Software and FPGA emulation can be done
| very well and with very low latency, however to achieve
| this in software you generally require high end power
| hungry hardware.
|
| A steam deck can run a highly accurate sega genesis
| emulator with read-ahead rollback, screen scaling, shaders
| and all the fixings no problem, but in theory the pocket
| can provide the exact same experience with an order of
| magnitude less power.
|
| It's not quite apples to oranges of course, but the
| comfortable battery life does make the pocket much more
| practical.
| agg23 wrote:
| When being nitpicky about latency is where FPGAs truly
| shine. You lose a good bit of it by connecting to HDMI (I
| think the Pocket docked is 1/4 a frame, and MiSTer has a
| similar mode) (EDIT: MiSTer can do 4 scanlines, but it's
| not compatible with some displays), but when we're
| talking about analog display methods or inputs, you can
| achieve accurate timings with much less effort than on a
| modern day computer.
|
| For a full computer like the Steam Deck, you have to deal
| with preemption, display buffers, and more, which _will_
| add latency. Now if you went bare metal, you could
| definitely drive a display with super low latency,
| hardware accurate emulation, but obviously that's not
| what most people are doing.
| agg23 wrote:
| Not to draw attention to myself or anything, but if you're
| interested in learning to make cores for the Analogue Pocket or
| MiSTer (or similar) platforms, I highly recommend taking a look
| at the resources and wiki I'm slowly building -
| https://github.com/agg23/analogue-pocket-utils/
|
| I started ~7 months ago with approximately no FPGA or hardware
| experience, have now ported ~6 cores from MiSTer to Pocket, and
| just released my first core of my own, the original Tamagotchi
| - https://github.com/agg23/fpga-tamagotchi/
|
| If you want to join in, I and several other devs are very
| willing to help talk you through it. We primarily are on the
| FPGAming Discord server - https://discord.gg/Gmcmdhzs - which
| is probably the best place to get a hold of me as well.
| jonny_eh wrote:
| I also recommend the official dock. It basically turns it into
| an easy to use Mister.
| sph wrote:
| My mind is blown but I'm also wondering if this isn't some kind
| of incredible over-engineering? Surely CPUs are fast enough to
| emulate these kind of devices in software. If they aren't, they
| must be an order of magnitude simpler in complexity.
|
| I wouldn't ordinarily care about emulators, but actual hardware
| emulators is the craziest thing I've heard in a while. All that
| for a small handheld console?
|
| If only I was not so broke...
| lprib wrote:
| Sure it would probably be cheaper to chuck a cortex-A* or
| similar mid-range MCU in there. One advantage of FPGAs that
| it can achieve "perfect" emulation of a Z80 (or other) since
| it's running on the logic gate level. No software task
| latency, no extra sound buffering, etc. It can re-create the
| original clock-per-clock.
| arein3 wrote:
| It's impressive as well
| agg23 wrote:
| Software is orders of magnitude simpler in complexity, yes.
| The difference between a software emulator and a logic level
| emulator are immense.
|
| But take the example of the difficulties with a software NES
| emulator:
|
| In hardware, there is one clock that is fed into the 3 main
| disparate systems; the CPU, APU (audio), and PPU (picture).
| They all use different clock dividers, but they're still fed
| off of the same source clock. Each of these chips operate in
| parallel to produce the output expected, and there's some
| bidirectional communication going on there as well.
|
| In a software emulator, the only parallel you get is on
| multiple cores, but you can approximate it with threading
| (i.e. preemption). For simplicity, you stick with a single
| thread. You run 3 steps of the PPU at once, then one step of
| the CPU and APU. You've basically just sped through the first
| two steps, because who will notice those two cycles? They
| took no "real" time, they were performed as fast as the
| software could perform them. Probably doesn't matter, as no
| one could tell that for 10ns, this happened.
|
| You need to add input. You use USB. That has a minimum
| polling interval of 1000Hz, plus your emulator processing
| time (is it going to have to go in the "next frame" packet?),
| but controls on systems like the NES were practically
| instantly available the moment the CPU read.
|
| Now you need to produce output. You want to hook up your
| video, but wait, you need to feed it into a framebuffer.
| That's at least one frame of latency unless you're able to
| precompute everything for the next frame. Your input is
| delayed a frame, because it has to be fed into the next
| batch, the previous batch (for this frame) is already done.
| You use the basis of 60fps (which is actually slightly wrong)
| to time your ticking of your emulator.
|
| Now you need to hook up audio. Audio must go into a buffer or
| it will under/overflow. This adds latency, and you need to
| stay on top of how close you are to falling outside of your
| bounds. But you were using FPS for pacing, so now how to you
| reconcile that?
|
| ----
|
| Cycle accurate and low latency software solutions are
| certainly not easy, and it's impossible for true low latency
| on actual OS running CPUs. Embedded-style systems with RTOSes
| might be able to get pretty close, but it's still not going
| to be the same as being able to guarantee the exact same (or
| as near as we can tell) timing for every cycle.
|
| I want to be clear that none of these hardware
| implementations are actually that accurate, but they could
| be, and people are working hard to improve them constantly
| rtkwe wrote:
| The benefit of FPGAs is you can get nearly gate perfect
| emulation of an old games system. We've had emulators for
| years that get most things right but some games and minor
| things in old games require specific software patches to
| ensure the odd why they used the chips available produces the
| same output. There's a great old article from 2011 about the
| power required at the time to get a nearly perfect emulation
| of a NES. [0] The goal with the Pocket and all of Analogue's
| consoles isn't to be just another emulation machine but to
| run as close as possible to the original at a hardware level.
| That's their whole niche, hardware level 'emulation' of old
| consoles.
|
| [0] https://arstechnica.com/gaming/2011/08/accuracy-takes-
| power-...
| Waterluvian wrote:
| Emulating "accurately" is so difficult that not even
| Nintendo's Game Boy emulator on the Switch does it properly.
| I've been replaying old games and comparing some questionable
| moments with my original Game Boy, and the timings are not
| quite right in some cases.
|
| For example in Link's Awakening, there's a wiggle screen
| effect done by writing to OAM during HBlank. On the Switch it
| lags very differently than my GB (try it by getting into the
| bed where you find the ocarina). Or with Metroid 2, the sound
| when you kill an Omega Metroid is different too. It pitch
| shifts along with the "win" jingle.
|
| These have almost zero impact on playability. But for purists
| and emudevs it's a popular pursuit.
| photochemsyn wrote:
| Here's a nice series that picks up where this one leaves off
| (shows how flip-flop/LUT units are organized into cells inside a
| PLB, programmable logic block). It also is the first step in a
| tutorial on using Verilog, building a hardware finite state
| machine, and eventually a RISC-V processor on a FPGA:
|
| https://www.digikey.com/en/maker/projects/introduction-to-fp...
| user070223 wrote:
| From my understanding
|
| FPGA doesn't have the instruction pipeline as the command is
| encoded in the gates themselves. It means that on runtime the
| FPGA is not turing complete[0] as opposed to the CPU[1].
|
| There is a phrase "data is code and code is data" in security
| context. The new saying if FPGA would ever replace cpus' as the
| main computation hardware(as you don't need turing complete when
| you keep using the same apps[microservices]) is something like
| "code is execution and execution is code" as you imprint the code
| in the gates. It would get rid of a whole class/subclass of
| memory safety vulernabilitie.
|
| This paradigm change is like what webassembly did to the web. The
| slogan should be "make the bitstream go mainstream" Some made a
| demo running wasm on fpga[1], not sure if using a cpu or directly
|
| of course you move complexity to compiling, and increase loading
| speed, all for order of magnitude faster execution
|
| Companies devloped high level synthesis compilers but it's
| diffcult and challenging as you need to synchronize parallel
| excution piplines which you don't have to in cpu since it has
| steady clock rate for each step in the pipeline
|
| A copmany named legup computing(acquired by microchip) compiled
| memcached/redis applications to fpga and improved perfromance &
| power efficency by an order of magnitude(10x)
|
| There are a lot of intellectual properties in hardware design as
| opposed to software so tools and knowledge is scarce.
|
| If anyone works / want to work on this problem hit me up in the
| comments
|
| [0] Unless you implement a cpu on top of the fpga :)
|
| [1] Assuming infinte memory, which is false, but good enough
|
| [2] https://github.com/denisvasilik/wasm-fpga
| proto_lambda wrote:
| > FPGA doesn't have the instruction pipeline as the command is
| encoded in the gates themselves. It means that on runtime the
| FPGA is not turing complete[0] as opposed to the CPU[1].
|
| That obviously depends entirely on the circuit, many
| sufficiently advanced circuits probably end up being
| accidentally turing complete.
| JohnFen wrote:
| You can implement turing-complete CPUs in FPGA fabric.
| proto_lambda wrote:
| That's exactly what OP's footnotes say, yes.
| jschveibinz wrote:
| We used them for real time array signal processing and beam-
| forming. They worked great.
| y0ungarmanii wrote:
| I saw various comments about how FPGAs are not ready for consumer
| hardware, apple is using them in the airpod max already (probably
| for filtering audio)
|
| Check the link below
| https://www.ifixit.com/Teardown/AirPods+Max+Teardown/139369
|
| They really excel for high throughput & low latency - which noise
| canceling sounds like a good example of! In addition to this,
| they are already being used in communication systems & data
| centers to speed up latency sensitive computations. Edge AI seems
| like a big market that they will be used for soon, probably more
| likely b/c they can be flashed unlike ASICs and new NN
| architectures drop every couple of years.
| burnished wrote:
| Neat. If the author is around, might I suggest pushing some of
| the 'why use an FPGA' to the front? I think it would benefit from
| a more concrete example motivating the use of an FPGA - like a
| picture of some simple circuit using a seven segment display on a
| broad board next to a picture of an FPGA implementing the same
| circuit in order to make it more clear that it is a substitute
| for putting experiments together by hand. I think it will help
| newcomers better contextualize what is happening and why.
|
| I think in the same vein your wrap up of why you might want to do
| something in hardware vs software is great and well placed.
|
| Hmmm, I guess now is as good a time as any to bumblefuck around
| with small electronics projects for fun. Thanks for the reminder!
| beardyw wrote:
| > Neat. If the author is around, might I suggest pushing some
| of the 'why use an FPGA' to the front?
|
| I think the problem is identifying cases where you really need
| an FPGA. Most of the time you don't.
| burnished wrote:
| I suggest it purely for educational purposes. The first
| struggle isn't identifying the best use case - its
| understanding wtf is going on. Putting it in terms of
| something more familiar is helpful for that.
|
| Your thing would make for a wonderful followup topic though.
| cycomanic wrote:
| What do you mean by "you". Maybe "you" as in a general
| consumer don't need an FPGA, but I guess one could argue a
| general consumer doesn't need a general purpose computer
| either.
|
| There are certainly many use cases where you absolutely do
| need an FPGA, i.e. anything were you need to process large
| amount of IO in realtime. For example the guys from simulavr
| (talk about how they use an FPGA for display correction)
| here: https://simulavr.com/blog/testing-ar-mode-image-
| processing/
|
| Many modern devices would not function without FPGAs
| JohnFen wrote:
| > anything were you need to process large amount of IO in
| realtime.
|
| I'm working on a FPGA-based system right now. We're using
| an FPGA precisely because this is what we're doing -- about
| a hundred I/O ports that have to be processed with as
| little latency as possible.
| beardyw wrote:
| I think we can agree that this discussion does not involve
| general consumers!
|
| "Many cases" is not the opposite of most cases.
| kanetw wrote:
| (SimulaVR dev) It's not wrong to say that in most cases,
| tasks are better solved without an FPGA. But when you need
| one you need one (or an ASIC if you have the volume and
| don't need reconfigurability)
| asdfman123 wrote:
| This is meant to be an introduction though, right? You can
| simply write "some people do X, and others claim Y is better"
| then move on.
|
| I read several paragraphs of the article and I still don't
| know why you'd use one, despite taking computer architecture
| and analog electronics courses in undergrad.
|
| I don't want to read about logic gates again and I don't want
| to read about the nuances before I broadly understand what
| the point is.
|
| For anyone else still wondering, here's Wikipedia:
|
| > FPGAs have a remarkable role in embedded system development
| due to their capability to start system software development
| simultaneously with hardware, enable system performance
| simulations at a very early phase of the development, and
| allow various system trials and design iterations before
| finalizing the system architecture.
|
| Basically, rapid prototyping I guess. That makes sense.
| awjlogan wrote:
| If that was an ask for a specific example, one of the most
| common uses for FPGAs is DSPs. Say you have a simple FIR
| filter of, say, 63 taps. To do this in a CPU requires you
| to load two values and do a multiply/accumulate for each
| tap in sequence. Very (!!) optimistically, that's about 192
| instructions. With an FPGA, you can do all the
| multiplications in parallel and then just sum the outputs -
| probably done in 2 cycles and with pipelining your
| throughput could be a sample every clock.
|
| If the FPGA is too slow, too power inefficient etc you can
| (if you have the money!) take the same core design and put
| it in an ASIC. The FPGA provides an excellent prototyping
| environment; in this example you can tune the filter
| parameters before committing to a full ASIC.
| pjc50 wrote:
| > multiply/accumulate for each tap in sequence. Very (!!)
| optimistically, that's about 128 instructions
|
| This is what all those vector instructions are for.
|
| FPGA is kind of invaluable if you have lots of streams
| coming in at high megabit rates, though, and need to
| preprocess down to a rate the CPU and memory bus can
| handle.
| awjlogan wrote:
| Yes, indeed :) Didn't want to muddy the waters with
| vector instructions, and it's fair to say that the
| dedicated DSP chip market has been squeezed by FPGAs on
| one side and vectorised (even lightly, like the
| Cortex-M4/M7 DSP extension) CPUs on the other.
| asdfman123 wrote:
| Explain it to me like I'm your mom.
| nfriedly wrote:
| I've read that AMD's 7040-series mobile CPUs will have an "FPGA-
| based AI engine developed by Xilinx" [1] - I'm wondering how
| _programmable_ that will be.
|
| I know there's been some performance difficulties emulating the
| PlayStation 3's various floating point modes. It's the kid of
| thing that I think an on-chip FPGA could theoretically help with,
| although I don't know if it'd be worth the trouble in this
| specific case. (Or if AMD's implementation will be flexible
| enough to help.)
|
| [1]: https://www.anandtech.com/show/18844/amd-unveils-ryzen-
| mobil...
| sph wrote:
| Sadly the article doesn't go into details about how the
| programmable RAM is wired to the actual logic gates, which seems
| to me the most interesting and challenging part of designing an
| FPGA.
|
| In my mediocre understanding of digital circuits, RAM is usually
| addressable, so it has to be wired in a more direct manner to
| enable such a design.
|
| I posted this article because someone mentioned some Ryzen chip
| having an FPGA in another post, and I am now left wondering:
|
| 1. why don't we have more user-programmable FPGAs in our fancy
| desktop mainboards
|
| 2. is there a SoC board, ARM or RISC-V based, with an FPGA on
| board? The slower the CPU, the more useful an FPGA would be to
| accelerate compute tasks
| duskwuff wrote:
| > Sadly the article doesn't go into details about how the
| programmable RAM is wired to the actual logic gates
|
| Not sure what you mean by that. Do you mean how a RAM is used
| as a lookup table to implement logic gates, how routing works,
| or how block RAM is integrated into the FPGA fabric?
|
| > is there a SoC board, ARM or RISC-V based, with an FPGA on
| board?
|
| Better yet, there are a number of FPGAs available with an ARM
| SoC on board. Xilinx Zynq, Intel Cyclone V SoC, various others.
| pjc50 wrote:
| > RAM is usually addressable, so it has to be wired in a more
| direct manner to enable such a design
|
| DRAM is necessarily a grid.
|
| SRAM, in e.g. the standard 6-transistor cell form, you can kind
| of dump individual bits anywhere you need one.
|
| > why don't we have more user-programmable FPGAs in our fancy
| desktop mainboards
|
| They tend to be horrifyingly expensive and there are few use
| cases you can't outperform with a GPU or even just vector
| instructions. Most of the interesting use cases for FPGAs are
| when you have direct access to the pins and can wire them up to
| high-speed signalling, which really isn't home user friendly.
|
| Also all the tooling is proprietary.
|
| > is there a SoC board, ARM or RISC-V based, with an FPGA on
| board
|
| Buy a medium sized FPGA and download a CPU of your choice.
|
| (I have a downloadable-CPU-sized FPGA board on my desk for
| testing not yet shipped ASIC designs. It costs about six
| thousand dollars and has a 48-week lead time on Farnell)
| sph wrote:
| > Buy a medium sized FPGA and download a CPU of your choice.
|
| Damn, _of course_ one would be able to download a CPU and
| "emulate it" in hardware.
|
| I never imagined that would be possible. Now I'm thinking
| that I had infinite free time, I would buy an FPGA and design
| a modern Lisp CPU. A RISC-V based design with native Lisp
| support. Who needs hardware when you can just emulate it in
| an FPGA.
|
| That's seriously cool technology.
| MSFT_Edging wrote:
| As for question 1, they're far more common in server grade
| stuff where typically they are baked in. Consumer stuff just
| doesn't need/use as much IO throughput and muxing that the FPGA
| provides on say, a large networking switch.
|
| There are PCIe compatible FPGAs that you can plug into your
| desktop like a graphics card to accelerate certain tasks. In
| general though, our workstation hardware just isn't specialized
| enough to require them, but can be extended to do so. If
| something is a large enough business model, they'll just make
| an ASIC.
| aphedox wrote:
| After Intel acquired Altera they released a series of x86 Xeon
| chips with integrated FPGAs. Look up the Xeon 6138P.
| wildzzz wrote:
| Both Intel and Xilinx sell FPGAs with hard ARM cores inside so
| you can run real Linux while being able to interface with
| custom logic. Additionally, it's pretty common to create ARM,
| RISC-V, or PowerPC soft cores in the FPGA when there is no hard
| cores available. These mimic the real cores and will run
| software while allowing for things like custom instructions
| that can take advantage of the flexibility of FPGA fabric. The
| Xilinx Zynq and Intel Cyclone V have options for hard ARM
| cores. There are various designs of boards out there you can
| buy that implement Arduino or Raspberry Pi shield
| compatibility. The XUP PYNQ-Z2 supports both interfaces and
| runs a Zynq-7000 with a real ARM core.
|
| You can do other things with soft cores that are not possible
| with an off the shelf CPU like triple mode redundancy. This is
| when you run a lot of the logic in triplicate and vote on the
| results to prevent a bit flip from messing up the software.
| This is common for space-based CPUs that are running on FPGAs.
| It's expensive to design a new chip in a very small run so it's
| much cheaper to just put the core on an off the shelf FPGA and
| use the rest of the FPGA fabric for custom logic functions.
| gchadwick wrote:
| > Sadly the article doesn't go into details about how the
| programmable RAM is wired to the actual logic gates, which
| seems to me the most interesting and challenging part of
| designing an FPGA.
|
| It does, that's the part under the 'Look-Up Tables' section.
| The key is there aren't any actual logic gates just lots of
| little RAMs. You implement an arbitrary blob of logic by having
| the inputs form the address then the RAM gives the result of
| the logical function.
| stephen_g wrote:
| Well, they do have some logic gates - usually the cells have
| at least one flip flop, as well as the LUT.
| roadbuster wrote:
| > You implement an arbitrary blob of logic by having the
| inputs form the address > then the RAM gives the result of
| the logical function.
|
| This is incorrect. Modern FPGAs are composed of small,
| configurable blocks which contain all sorts of logic. The
| idea is that the configurable blocks can be (internally)
| wired-up to implement your logic of choice. The wiring
| configuration is "loaded" at power-on and retained in
| memories within each, configurable block.
| gchadwick wrote:
| Well indeed modern FPGA fabric along with the various fixed
| function blocks can be very complex, but this is a
| beginners 'How Does an FPGA Work?' for which a bunch of
| LUTs connected by programmable interconnect is a useful
| approximation.
| PragmaticPulp wrote:
| > 1. why don't we have more user-programmable FPGAs in our
| fancy desktop mainboards
|
| It has been tried, but GPUs are so fast and efficient enough
| that it's rarely worth it.
|
| It's very easy to attach an FPGA to the PCIe bus as an add-in
| card exactly like your GPU. In fact, many FPGA dev boards come
| in exactly this format. They're available, they're just not in
| demand.
|
| > 2. is there a SoC board, ARM or RISC-V based, with an FPGA on
| board? The slower the CPU, the more useful an FPGA would be to
| accelerate compute tasks
|
| Plenty of FPGA parts include ARM cores. It's a fairly standard
| chip configuration.
|
| You can also connect an FPGA and an SoC with PCIe or other
| interconnects. It's really not an obstacle.
|
| FPGAs just aren't very efficient from a cost or dev time
| perspective for most applications. They're indispensable when
| you need them, though.
| rjsw wrote:
| There are plenty of boards that have one of the combined ARM &
| FPGA chips, Zynq (Xylinx/AMD) or Cyclone (Altera/Intel).
| dddiaz1 wrote:
| Another really cool use case for FPGAs is for ultra fast analysis
| of genomic data. This guide walks you through setting up an F1
| instance (AWS FPGA) to do that: https://aws-
| quickstart.github.io/quickstart-illumina-dragen/
| mpd wrote:
| I really enjoyed the recent Hackerbox[0] featuring an FPGA. I'd
| never worked with one prior to that.
|
| https://hackerboxes.com/collections/past-hackerboxes/product...
| jokoon wrote:
| So can a large FPGA be somehow used to brute force encryption?
|
| I don't really understand electronics to see if a GPU could be
| faster than a FPGA, but my guess is yes?
|
| It seems that anything that can be programmed is inherently
| slower than a FPGA equivalent doing the same task.
|
| Does larger enough key size always defeat a FPGA?
|
| I would guess that it becomes power and cost prohibitive for a
| private company to deliver such possibility, but of course, a
| large government entity like the NSA might have enough resource
| to pay for enough FPGA to decrypt most things.
| braho wrote:
| Even though the FPGA fabric might encode the solution more
| effectively, there are other important differentiators: clock
| speed and memory bandwidth. GPUs have higher clock speeds and
| typically better memory bandwidth (related of course).
|
| With the higher clock speed, GPUs can well outperform FPGAs for
| many problems.
___________________________________________________________________
(page generated 2023-05-03 23:00 UTC) |