[HN Gopher] How Does an FPGA Work?
___________________________________________________________________
 
How Does an FPGA Work?
 
Author : sph
Score  : 146 points
Date   : 2023-05-03 17:11 UTC (5 hours ago)
 
web link (learn.sparkfun.com)
w3m dump (learn.sparkfun.com)
 
| nuancebydefault wrote:
| It seems that operations on FPGAs can run much more efficiently
| than their cpu equivalent. For an 'AND' operation, a cpu needs to
| load code and data from a memory into registers, run the logic
| and write the result register back to some memory. This while
| filling up the pipeline for subsequent operations.
| 
| The FPGA on the other hand has the output ready one clock cycle
| after the inputs stream in, and can have many such operations in
| parallel. One might ask, why are cpus not being replaced by
| FPGAs?
| 
| Another interesting question, can software (recipes for cpus) be
| transpiled to be efficiently run on FPGAs?
| 
| I could ask GPT those questions, but the HN community will
| provide more insight I guess.
 
  | pfyra wrote:
  | > Another interesting question, can software (recipes for cpus)
  | be transpiled to be efficiently run on FPGAs?
  | 
  | Yes. At least for c and c++. It is called High Level Synthesis.
 
  | Lramseyer wrote:
  | These are really good questions to be asking, and to help with
  | that let's consider 3 attributes of compute complexity: time,
  | space, and memory
  | 
  | The traditional way of computing on a CPU is in essence a list
  | of instructions to be computed. These instructions all go to
  | the same place (the CPU core) to be computed. Since the space
  | is constant, the instructions are computed sequentially in
  | time. Most programmers aren't concerned with redesigning a CPU,
  | so we typically only think about computing in time (and memory
  | of course)
  | 
  | On an FPGA (and custom silicon) the speedup comes from being
  | able to compute in both time and space. Instead of your
  | instructions existing in memory, and computed in time, they can
  | be represented in separate logic elements (in space) and they
  | can each do separate things in time. So in a way, you're
  | trading space for time. This is how the speed gains are
  | achieved.
  | 
  | Where this all breaks down is the optimization and scheduling.
  | A sequential task is relatively easy to optimize since you're
  | optimizing in time (and memory to an extent.) Scheduling is
  | easy too, since, they can be prioritized and queued up.
  | However, when you're computing in space, you have to optimize
  | in 2 spatial dimensions and in time. When you have multiple
  | tasks that that need to be completed, you then need to place
  | them together and not have them overlap.
  | 
  | Think trying to fit a ton of different shaped tiles on a table,
  | where you need to be constantly adding and removing tiles in a
  | way that doesn't disrupt the placement of other tiles (at least
  | not too often.) It's kind of a pain, but for some more
  | constrained problem sets, it might make sense.
  | 
  | These aren't impossible problems, and for some tasks, the time
  | or power usage savings is worth the additional complexity. But
  | sequential optimization is way easier, and good enough for most
  | tasks. However, if our desire for faster computing outpaces our
  | ability to make faster CPUs, you may see more FPGAs doing this
  | sort of thing. We already have FPGAs that are capable of
  | partial reconfiguration, and some pretty good software tools to
  | go along with it.
  | 
  | TL;DR: Geometry is hard.
 
  | toast0 wrote:
  | > The FPGA on the other hand has the output ready one clock
  | cycle after the inputs stream in, and can have many such
  | operations in parallel. One might ask, why are cpus not being
  | replaced by FPGAs?
  | 
  | FPGAs are more or less a flexible replacement for an
  | application specific (logic level) integrated circuit. A CPU
  | can do a wide variety of tasks, with a small penalty for
  | switching tasks. An ASIC can do one thing and that's it, a FPGA
  | can do many things, but with a large penalty for task
  | switching. (you can have a CPU as an ASIC or an FPGA, but...).
  | ASICs require a lot of upfront design work and costs, so you
  | can't use them for everything. ASICs and especially CPUs tend
  | to be able to achieve a higher clock speed that FPGAs, but it
  | kind of depends.
  | 
  | > Another interesting question, can software (recipes for cpus)
  | be transpiled to be efficiently run on FPGAs?
  | 
  | Not really; the way problems are solved is drastically
  | different, and I'd expect most things would need to be
  | reconceptualized to fit. And a lot of software isn't really
  | suited to living as a logic circuit. Exceptions would be
  | encoding, compression, encryption, the inverses of all of
  | those, signal processing, etc. Things where you have a data
  | pipeline and 'the same thing' happens to all the data.
 
  | jcranmer wrote:
  | FPGAs are the next big frontier for software development, and
  | have been since the '90s, they just need the programming model
  | worked out. This is the traditional story told about FPGAs, but
  | GPGPU programming suddenly overtaking FPGA development about
  | 2010 despite their awkward programming models makes that story
  | rather suspect. The thing is, a lot of the benefits of FPGAs
  | are really best-case scenarios, and when you move to more
  | typical scenarios, their competitiveness as an architecture
  | dwindles dramatically.
  | 
  | Pipelining on an FPGA requires being able to find, and fill,
  | spatial duplication of the operations being done. If you've got
  | conditional operations in a pipeline, now your pipeline isn't
  | so full anymore, and this hurts performance on an FPGA far more
  | than on a CPU (which spends a lot of power trying to keep its
  | pipelines full). But needing to keep the pipelines spatially
  | connected also means you have to be able to find a physical
  | connection between the two stages of a pipeline, and the
  | physical length of that connection also imposes limitations on
  | the frequency you can run the FPGA at.
  | 
  | If you care about FLOPS (or throughput in general), the problem
  | with FPGAs is that they are running at a clock speed about a
  | tenth of a CPU. This requires a 10x improvement in performance
  | just to stand still; given that software development for FPGAs
  | requires essentially a completely different mindset than for
  | CPUs or even GPUs, it's not common to have use cases that work
  | well on FPGAs.
  | 
  | (I should say that a lot of my information about programming
  | FPGAs comes from ex-FPGA developers, and the "ex-" part will
  | certainly have its own form of bias in these opinions).
 
    | davemp wrote:
    | Yeah I don't really see FPGAs ever making their way down to
    | consumers the way GPUs and CPUs have (end users actually
    | programming them).
    | 
    | For (semi) fixed pipeline operations FPGAs will basically
    | always be worse than some slightly more specialized ASIC like
    | a GPU/AI engine.
    | 
    | One area FPGAs can be exceptionally good at is real-time
    | operations. You have much better control over timing in the
    | general on FPGAs vs MCU/CPUs, but I don't think that's
    | inherent (you could probably alter the mcu architecture a bit
    | and close the gap).
    | 
    | I could be wrong but I also think you get better power draw
    | for things like mid to low volume glue chips in embedded
    | systems because you're not powering big SRAM banks and DMAs
    | just to pipe data between a couple hardware interfaces. This
    | is only because of market forces though obviously, because if
    | mid to low volume ASICs become viable in terms of dev time
    | they'll be much better.
 
  | pjc50 wrote:
  | > One might ask, why are cpus not being replaced by FPGAs?
  | 
  | Most of the time you want data-dependent execution. FPGA
  | systems excel at "fixed pipeline" systems, where you have e.g.
  | an audio filter chain .. but even that is usually done in
  | efficient DSP CPUs.
  | 
  | > Another interesting question, can software (recipes for cpus)
  | be transpiled to be efficiently run on FPGAs?
  | 
  | A _subset_ can. Things like recursion are right out. Various
  | companies have tools to do this, but you usually end up having
  | to rework either the source you 're feeding them, or the HDL
  | output.
 
  | burnished wrote:
  | They both use the same kind of components; the FPGA does not
  | have a speed advantage, you are simply comparing the speed of a
  | very simple circuit element to the speed of a very complicated
  | pipeline.
  | 
  | You would use an FPGA to simulate a special purpose circuit,
  | which would be faster than a CPU for its specific purpose. We
  | have CPUs because having a general purpose processing chip is
  | incredibly handy when you want to be able to do more than one
  | thing.
  | 
  | EDIT: I forgot to mention that the device outputs in one clock
  | cycle by definition: if your clock is too fast then your
  | components output signals dont have time to stabilize and you
  | will get read errors, so you ensure your clock is slow enough
  | for everything to stabilize.
 
  | JackSlateur wrote:
  | For the same reasons we do not replace CPUs with GPUs: not the
  | right tool
  | 
  | Check out the instruction set of modern CPUs
 
  | convolvatron wrote:
  | one big problem is memory. basic cpus have alot of facilities
  | for high-speed synchronous interface with DRAM, and truly vast
  | amount of resource for cache.
  | 
  | partially as a result, a good model for compiling code to fpgas
  | uses a dataflow paradigm, since we don't need to serialize all
  | operations through a memory fetch, cache, or even register
  | file.
  | 
  | if we hadn't decided to move all our computing to the cloud, I
  | suspect fpga accelerator boards for applications which map well
  | to that model would have some traction in specialized areas.
  | signal processing is definitely one such.
 
  | quadrature wrote:
  | >One might ask, why are cpus not being replaced by FPGAs?
  | 
  | they do sometimes !, for very specific applications. The
  | problem is that an FPGA is programmed for one specific task and
  | would have to be taken offline and reprogrammed if you wanted
  | to do something else with it. Its not general purpose like a
  | CPU where you can load up any program and have it run.
  | 
  | Programming an FPGA is also comparatively much harder to reason
  | about than a CPU because of the parallelism and timing you
  | described.
 
    | MSFT_Edging wrote:
    | Some of the more modern Xilinx stuff has features where you
    | don't need to take down the whole FPGA to reload a bitstream
    | onto part of the chip. Its really neat, you can do live
    | reprogramming of one component and leave the others alone or
    | have an A/B setup where one updates while the other is
    | unchanged.
 
      | JohnFen wrote:
      | Yes, I'm working on a Xilinx ARM processor with an FPGA.
      | The FPGA and the CPU are independent units in the chip that
      | can each operate with or without the other. We can indeed
      | reprogram the FPGA without taking the system down.
 
        | davemp wrote:
        | It goes even further. You can partially reconfigure the
        | FPGA fabric itself:
        | https://support.xilinx.com/s/article/34924?language=en_US
 
      | quadrature wrote:
      | That is really cool, hadn't heard of that before.
 
  | barelyauser wrote:
  | What is simpler: making logical circuit "A" or making a circuit
  | that emulates logical circuit "A" and its relatives?
 
| markx2 wrote:
| If anyone in unaware you can buy the very impressive Pocket.
| https://www.analogue.co/pocket
| 
| The current list of what it can do with FPGA is listed here -
| https://openfpga-cores-inventory.github.io/analogue-pocket/ and
| the inevitable sub-reddit is a good resource.
| https://old.reddit.com/r/AnaloguePocket/
 
  | gchadwick wrote:
  | There's also the MiSTer project: https://github.com/MiSTer-
  | devel/Wiki_MiSTer/wiki. Not hand-held (yet...) and hardware is
  | less slick but a bunch more systems and also fully open source.
 
    | phendrenad2 wrote:
    | MiSTer makes me kind of sad, the DE10-nano board it's based
    | on is 7 years old at this point, and the actual FPGA chip on
    | the board is probably over twice as old as that. And this is
    | still the peak of hobby FPGA chips. I wonder why Moore's Law
    | is hitting the FPGA industry particularly hard all of a
    | sudden.
 
      | willis936 wrote:
      | There are better FPGA options, they're just more expensive.
      | The DE-10 Nano was strategically chosen as "powerful enough
      | to meet most wants while still being within a reasonable
      | budget".
      | 
      | No one's going to plunk down $10k for a 19 EV Zynq
      | UltraScale+ with 1.1M LEs, but they will spend $200 on a
      | Cylcone V with 210k LEs.
 
  | MrHeather wrote:
  | The article says FPGAs are too power hungry for handheld
  | devices. Did Analogue do anything special to solve this problem
  | on the Pocket?
 
    | agg23 wrote:
    | That's honestly not true at all; it all just depends on your
    | platform. On the Pocket, the FPGA _is_ the processor (there
    | are actually two FPGAs, one for the actual emulation core,
    | and one for scaling video, and there's technically a PIC
    | microcontroller for uploading bitstreams and managing UI).
    | The FPGAs are still not much power compared to the display
    | itself. With the in-built current sensor on the dev kits, the
    | highest we've measured drawn by the main FPGA is ~300mAh. Now
    | this sensor isn't going to be the best measurement, but it's
    | something to go off of.
 
      | eulgro wrote:
      | > ~300 mAh
      | 
      | mA? You're not very convincing here.
 
      | WhiteDawn wrote:
      | Personally I think this is the biggest selling feature of
      | FPGA based emulation.
      | 
      | The reality is both Software and FPGA emulation can be done
      | very well and with very low latency, however to achieve
      | this in software you generally require high end power
      | hungry hardware.
      | 
      | A steam deck can run a highly accurate sega genesis
      | emulator with read-ahead rollback, screen scaling, shaders
      | and all the fixings no problem, but in theory the pocket
      | can provide the exact same experience with an order of
      | magnitude less power.
      | 
      | It's not quite apples to oranges of course, but the
      | comfortable battery life does make the pocket much more
      | practical.
 
        | agg23 wrote:
        | When being nitpicky about latency is where FPGAs truly
        | shine. You lose a good bit of it by connecting to HDMI (I
        | think the Pocket docked is 1/4 a frame, and MiSTer has a
        | similar mode) (EDIT: MiSTer can do 4 scanlines, but it's
        | not compatible with some displays), but when we're
        | talking about analog display methods or inputs, you can
        | achieve accurate timings with much less effort than on a
        | modern day computer.
        | 
        | For a full computer like the Steam Deck, you have to deal
        | with preemption, display buffers, and more, which _will_
        | add latency. Now if you went bare metal, you could
        | definitely drive a display with super low latency,
        | hardware accurate emulation, but obviously that's not
        | what most people are doing.
 
  | agg23 wrote:
  | Not to draw attention to myself or anything, but if you're
  | interested in learning to make cores for the Analogue Pocket or
  | MiSTer (or similar) platforms, I highly recommend taking a look
  | at the resources and wiki I'm slowly building -
  | https://github.com/agg23/analogue-pocket-utils/
  | 
  | I started ~7 months ago with approximately no FPGA or hardware
  | experience, have now ported ~6 cores from MiSTer to Pocket, and
  | just released my first core of my own, the original Tamagotchi
  | - https://github.com/agg23/fpga-tamagotchi/
  | 
  | If you want to join in, I and several other devs are very
  | willing to help talk you through it. We primarily are on the
  | FPGAming Discord server - https://discord.gg/Gmcmdhzs - which
  | is probably the best place to get a hold of me as well.
 
  | jonny_eh wrote:
  | I also recommend the official dock. It basically turns it into
  | an easy to use Mister.
 
  | sph wrote:
  | My mind is blown but I'm also wondering if this isn't some kind
  | of incredible over-engineering? Surely CPUs are fast enough to
  | emulate these kind of devices in software. If they aren't, they
  | must be an order of magnitude simpler in complexity.
  | 
  | I wouldn't ordinarily care about emulators, but actual hardware
  | emulators is the craziest thing I've heard in a while. All that
  | for a small handheld console?
  | 
  | If only I was not so broke...
 
    | lprib wrote:
    | Sure it would probably be cheaper to chuck a cortex-A* or
    | similar mid-range MCU in there. One advantage of FPGAs that
    | it can achieve "perfect" emulation of a Z80 (or other) since
    | it's running on the logic gate level. No software task
    | latency, no extra sound buffering, etc. It can re-create the
    | original clock-per-clock.
 
      | arein3 wrote:
      | It's impressive as well
 
    | agg23 wrote:
    | Software is orders of magnitude simpler in complexity, yes.
    | The difference between a software emulator and a logic level
    | emulator are immense.
    | 
    | But take the example of the difficulties with a software NES
    | emulator:
    | 
    | In hardware, there is one clock that is fed into the 3 main
    | disparate systems; the CPU, APU (audio), and PPU (picture).
    | They all use different clock dividers, but they're still fed
    | off of the same source clock. Each of these chips operate in
    | parallel to produce the output expected, and there's some
    | bidirectional communication going on there as well.
    | 
    | In a software emulator, the only parallel you get is on
    | multiple cores, but you can approximate it with threading
    | (i.e. preemption). For simplicity, you stick with a single
    | thread. You run 3 steps of the PPU at once, then one step of
    | the CPU and APU. You've basically just sped through the first
    | two steps, because who will notice those two cycles? They
    | took no "real" time, they were performed as fast as the
    | software could perform them. Probably doesn't matter, as no
    | one could tell that for 10ns, this happened.
    | 
    | You need to add input. You use USB. That has a minimum
    | polling interval of 1000Hz, plus your emulator processing
    | time (is it going to have to go in the "next frame" packet?),
    | but controls on systems like the NES were practically
    | instantly available the moment the CPU read.
    | 
    | Now you need to produce output. You want to hook up your
    | video, but wait, you need to feed it into a framebuffer.
    | That's at least one frame of latency unless you're able to
    | precompute everything for the next frame. Your input is
    | delayed a frame, because it has to be fed into the next
    | batch, the previous batch (for this frame) is already done.
    | You use the basis of 60fps (which is actually slightly wrong)
    | to time your ticking of your emulator.
    | 
    | Now you need to hook up audio. Audio must go into a buffer or
    | it will under/overflow. This adds latency, and you need to
    | stay on top of how close you are to falling outside of your
    | bounds. But you were using FPS for pacing, so now how to you
    | reconcile that?
    | 
    | ----
    | 
    | Cycle accurate and low latency software solutions are
    | certainly not easy, and it's impossible for true low latency
    | on actual OS running CPUs. Embedded-style systems with RTOSes
    | might be able to get pretty close, but it's still not going
    | to be the same as being able to guarantee the exact same (or
    | as near as we can tell) timing for every cycle.
    | 
    | I want to be clear that none of these hardware
    | implementations are actually that accurate, but they could
    | be, and people are working hard to improve them constantly
 
    | rtkwe wrote:
    | The benefit of FPGAs is you can get nearly gate perfect
    | emulation of an old games system. We've had emulators for
    | years that get most things right but some games and minor
    | things in old games require specific software patches to
    | ensure the odd why they used the chips available produces the
    | same output. There's a great old article from 2011 about the
    | power required at the time to get a nearly perfect emulation
    | of a NES. [0] The goal with the Pocket and all of Analogue's
    | consoles isn't to be just another emulation machine but to
    | run as close as possible to the original at a hardware level.
    | That's their whole niche, hardware level 'emulation' of old
    | consoles.
    | 
    | [0] https://arstechnica.com/gaming/2011/08/accuracy-takes-
    | power-...
 
    | Waterluvian wrote:
    | Emulating "accurately" is so difficult that not even
    | Nintendo's Game Boy emulator on the Switch does it properly.
    | I've been replaying old games and comparing some questionable
    | moments with my original Game Boy, and the timings are not
    | quite right in some cases.
    | 
    | For example in Link's Awakening, there's a wiggle screen
    | effect done by writing to OAM during HBlank. On the Switch it
    | lags very differently than my GB (try it by getting into the
    | bed where you find the ocarina). Or with Metroid 2, the sound
    | when you kill an Omega Metroid is different too. It pitch
    | shifts along with the "win" jingle.
    | 
    | These have almost zero impact on playability. But for purists
    | and emudevs it's a popular pursuit.
 
| photochemsyn wrote:
| Here's a nice series that picks up where this one leaves off
| (shows how flip-flop/LUT units are organized into cells inside a
| PLB, programmable logic block). It also is the first step in a
| tutorial on using Verilog, building a hardware finite state
| machine, and eventually a RISC-V processor on a FPGA:
| 
| https://www.digikey.com/en/maker/projects/introduction-to-fp...
 
| user070223 wrote:
| From my understanding
| 
| FPGA doesn't have the instruction pipeline as the command is
| encoded in the gates themselves. It means that on runtime the
| FPGA is not turing complete[0] as opposed to the CPU[1].
| 
| There is a phrase "data is code and code is data" in security
| context. The new saying if FPGA would ever replace cpus' as the
| main computation hardware(as you don't need turing complete when
| you keep using the same apps[microservices]) is something like
| "code is execution and execution is code" as you imprint the code
| in the gates. It would get rid of a whole class/subclass of
| memory safety vulernabilitie.
| 
| This paradigm change is like what webassembly did to the web. The
| slogan should be "make the bitstream go mainstream" Some made a
| demo running wasm on fpga[1], not sure if using a cpu or directly
| 
| of course you move complexity to compiling, and increase loading
| speed, all for order of magnitude faster execution
| 
| Companies devloped high level synthesis compilers but it's
| diffcult and challenging as you need to synchronize parallel
| excution piplines which you don't have to in cpu since it has
| steady clock rate for each step in the pipeline
| 
| A copmany named legup computing(acquired by microchip) compiled
| memcached/redis applications to fpga and improved perfromance &
| power efficency by an order of magnitude(10x)
| 
| There are a lot of intellectual properties in hardware design as
| opposed to software so tools and knowledge is scarce.
| 
| If anyone works / want to work on this problem hit me up in the
| comments
| 
| [0] Unless you implement a cpu on top of the fpga :)
| 
| [1] Assuming infinte memory, which is false, but good enough
| 
| [2] https://github.com/denisvasilik/wasm-fpga
 
  | proto_lambda wrote:
  | > FPGA doesn't have the instruction pipeline as the command is
  | encoded in the gates themselves. It means that on runtime the
  | FPGA is not turing complete[0] as opposed to the CPU[1].
  | 
  | That obviously depends entirely on the circuit, many
  | sufficiently advanced circuits probably end up being
  | accidentally turing complete.
 
    | JohnFen wrote:
    | You can implement turing-complete CPUs in FPGA fabric.
 
      | proto_lambda wrote:
      | That's exactly what OP's footnotes say, yes.
 
| jschveibinz wrote:
| We used them for real time array signal processing and beam-
| forming. They worked great.
 
| y0ungarmanii wrote:
| I saw various comments about how FPGAs are not ready for consumer
| hardware, apple is using them in the airpod max already (probably
| for filtering audio)
| 
| Check the link below
| https://www.ifixit.com/Teardown/AirPods+Max+Teardown/139369
| 
| They really excel for high throughput & low latency - which noise
| canceling sounds like a good example of! In addition to this,
| they are already being used in communication systems & data
| centers to speed up latency sensitive computations. Edge AI seems
| like a big market that they will be used for soon, probably more
| likely b/c they can be flashed unlike ASICs and new NN
| architectures drop every couple of years.
 
| burnished wrote:
| Neat. If the author is around, might I suggest pushing some of
| the 'why use an FPGA' to the front? I think it would benefit from
| a more concrete example motivating the use of an FPGA - like a
| picture of some simple circuit using a seven segment display on a
| broad board next to a picture of an FPGA implementing the same
| circuit in order to make it more clear that it is a substitute
| for putting experiments together by hand. I think it will help
| newcomers better contextualize what is happening and why.
| 
| I think in the same vein your wrap up of why you might want to do
| something in hardware vs software is great and well placed.
| 
| Hmmm, I guess now is as good a time as any to bumblefuck around
| with small electronics projects for fun. Thanks for the reminder!
 
  | beardyw wrote:
  | > Neat. If the author is around, might I suggest pushing some
  | of the 'why use an FPGA' to the front?
  | 
  | I think the problem is identifying cases where you really need
  | an FPGA. Most of the time you don't.
 
    | burnished wrote:
    | I suggest it purely for educational purposes. The first
    | struggle isn't identifying the best use case - its
    | understanding wtf is going on. Putting it in terms of
    | something more familiar is helpful for that.
    | 
    | Your thing would make for a wonderful followup topic though.
 
    | cycomanic wrote:
    | What do you mean by "you". Maybe "you" as in a general
    | consumer don't need an FPGA, but I guess one could argue a
    | general consumer doesn't need a general purpose computer
    | either.
    | 
    | There are certainly many use cases where you absolutely do
    | need an FPGA, i.e. anything were you need to process large
    | amount of IO in realtime. For example the guys from simulavr
    | (talk about how they use an FPGA for display correction)
    | here: https://simulavr.com/blog/testing-ar-mode-image-
    | processing/
    | 
    | Many modern devices would not function without FPGAs
 
      | JohnFen wrote:
      | > anything were you need to process large amount of IO in
      | realtime.
      | 
      | I'm working on a FPGA-based system right now. We're using
      | an FPGA precisely because this is what we're doing -- about
      | a hundred I/O ports that have to be processed with as
      | little latency as possible.
 
      | beardyw wrote:
      | I think we can agree that this discussion does not involve
      | general consumers!
      | 
      | "Many cases" is not the opposite of most cases.
 
      | kanetw wrote:
      | (SimulaVR dev) It's not wrong to say that in most cases,
      | tasks are better solved without an FPGA. But when you need
      | one you need one (or an ASIC if you have the volume and
      | don't need reconfigurability)
 
    | asdfman123 wrote:
    | This is meant to be an introduction though, right? You can
    | simply write "some people do X, and others claim Y is better"
    | then move on.
    | 
    | I read several paragraphs of the article and I still don't
    | know why you'd use one, despite taking computer architecture
    | and analog electronics courses in undergrad.
    | 
    | I don't want to read about logic gates again and I don't want
    | to read about the nuances before I broadly understand what
    | the point is.
    | 
    | For anyone else still wondering, here's Wikipedia:
    | 
    | > FPGAs have a remarkable role in embedded system development
    | due to their capability to start system software development
    | simultaneously with hardware, enable system performance
    | simulations at a very early phase of the development, and
    | allow various system trials and design iterations before
    | finalizing the system architecture.
    | 
    | Basically, rapid prototyping I guess. That makes sense.
 
      | awjlogan wrote:
      | If that was an ask for a specific example, one of the most
      | common uses for FPGAs is DSPs. Say you have a simple FIR
      | filter of, say, 63 taps. To do this in a CPU requires you
      | to load two values and do a multiply/accumulate for each
      | tap in sequence. Very (!!) optimistically, that's about 192
      | instructions. With an FPGA, you can do all the
      | multiplications in parallel and then just sum the outputs -
      | probably done in 2 cycles and with pipelining your
      | throughput could be a sample every clock.
      | 
      | If the FPGA is too slow, too power inefficient etc you can
      | (if you have the money!) take the same core design and put
      | it in an ASIC. The FPGA provides an excellent prototyping
      | environment; in this example you can tune the filter
      | parameters before committing to a full ASIC.
 
        | pjc50 wrote:
        | > multiply/accumulate for each tap in sequence. Very (!!)
        | optimistically, that's about 128 instructions
        | 
        | This is what all those vector instructions are for.
        | 
        | FPGA is kind of invaluable if you have lots of streams
        | coming in at high megabit rates, though, and need to
        | preprocess down to a rate the CPU and memory bus can
        | handle.
 
        | awjlogan wrote:
        | Yes, indeed :) Didn't want to muddy the waters with
        | vector instructions, and it's fair to say that the
        | dedicated DSP chip market has been squeezed by FPGAs on
        | one side and vectorised (even lightly, like the
        | Cortex-M4/M7 DSP extension) CPUs on the other.
 
        | asdfman123 wrote:
        | Explain it to me like I'm your mom.
 
| nfriedly wrote:
| I've read that AMD's 7040-series mobile CPUs will have an "FPGA-
| based AI engine developed by Xilinx" [1] - I'm wondering how
| _programmable_ that will be.
| 
| I know there's been some performance difficulties emulating the
| PlayStation 3's various floating point modes. It's the kid of
| thing that I think an on-chip FPGA could theoretically help with,
| although I don't know if it'd be worth the trouble in this
| specific case. (Or if AMD's implementation will be flexible
| enough to help.)
| 
| [1]: https://www.anandtech.com/show/18844/amd-unveils-ryzen-
| mobil...
 
| sph wrote:
| Sadly the article doesn't go into details about how the
| programmable RAM is wired to the actual logic gates, which seems
| to me the most interesting and challenging part of designing an
| FPGA.
| 
| In my mediocre understanding of digital circuits, RAM is usually
| addressable, so it has to be wired in a more direct manner to
| enable such a design.
| 
| I posted this article because someone mentioned some Ryzen chip
| having an FPGA in another post, and I am now left wondering:
| 
| 1. why don't we have more user-programmable FPGAs in our fancy
| desktop mainboards
| 
| 2. is there a SoC board, ARM or RISC-V based, with an FPGA on
| board? The slower the CPU, the more useful an FPGA would be to
| accelerate compute tasks
 
  | duskwuff wrote:
  | > Sadly the article doesn't go into details about how the
  | programmable RAM is wired to the actual logic gates
  | 
  | Not sure what you mean by that. Do you mean how a RAM is used
  | as a lookup table to implement logic gates, how routing works,
  | or how block RAM is integrated into the FPGA fabric?
  | 
  | > is there a SoC board, ARM or RISC-V based, with an FPGA on
  | board?
  | 
  | Better yet, there are a number of FPGAs available with an ARM
  | SoC on board. Xilinx Zynq, Intel Cyclone V SoC, various others.
 
  | pjc50 wrote:
  | > RAM is usually addressable, so it has to be wired in a more
  | direct manner to enable such a design
  | 
  | DRAM is necessarily a grid.
  | 
  | SRAM, in e.g. the standard 6-transistor cell form, you can kind
  | of dump individual bits anywhere you need one.
  | 
  | > why don't we have more user-programmable FPGAs in our fancy
  | desktop mainboards
  | 
  | They tend to be horrifyingly expensive and there are few use
  | cases you can't outperform with a GPU or even just vector
  | instructions. Most of the interesting use cases for FPGAs are
  | when you have direct access to the pins and can wire them up to
  | high-speed signalling, which really isn't home user friendly.
  | 
  | Also all the tooling is proprietary.
  | 
  | > is there a SoC board, ARM or RISC-V based, with an FPGA on
  | board
  | 
  | Buy a medium sized FPGA and download a CPU of your choice.
  | 
  | (I have a downloadable-CPU-sized FPGA board on my desk for
  | testing not yet shipped ASIC designs. It costs about six
  | thousand dollars and has a 48-week lead time on Farnell)
 
    | sph wrote:
    | > Buy a medium sized FPGA and download a CPU of your choice.
    | 
    | Damn, _of course_ one would be able to download a CPU and
    | "emulate it" in hardware.
    | 
    | I never imagined that would be possible. Now I'm thinking
    | that I had infinite free time, I would buy an FPGA and design
    | a modern Lisp CPU. A RISC-V based design with native Lisp
    | support. Who needs hardware when you can just emulate it in
    | an FPGA.
    | 
    | That's seriously cool technology.
 
  | MSFT_Edging wrote:
  | As for question 1, they're far more common in server grade
  | stuff where typically they are baked in. Consumer stuff just
  | doesn't need/use as much IO throughput and muxing that the FPGA
  | provides on say, a large networking switch.
  | 
  | There are PCIe compatible FPGAs that you can plug into your
  | desktop like a graphics card to accelerate certain tasks. In
  | general though, our workstation hardware just isn't specialized
  | enough to require them, but can be extended to do so. If
  | something is a large enough business model, they'll just make
  | an ASIC.
 
  | aphedox wrote:
  | After Intel acquired Altera they released a series of x86 Xeon
  | chips with integrated FPGAs. Look up the Xeon 6138P.
 
  | wildzzz wrote:
  | Both Intel and Xilinx sell FPGAs with hard ARM cores inside so
  | you can run real Linux while being able to interface with
  | custom logic. Additionally, it's pretty common to create ARM,
  | RISC-V, or PowerPC soft cores in the FPGA when there is no hard
  | cores available. These mimic the real cores and will run
  | software while allowing for things like custom instructions
  | that can take advantage of the flexibility of FPGA fabric. The
  | Xilinx Zynq and Intel Cyclone V have options for hard ARM
  | cores. There are various designs of boards out there you can
  | buy that implement Arduino or Raspberry Pi shield
  | compatibility. The XUP PYNQ-Z2 supports both interfaces and
  | runs a Zynq-7000 with a real ARM core.
  | 
  | You can do other things with soft cores that are not possible
  | with an off the shelf CPU like triple mode redundancy. This is
  | when you run a lot of the logic in triplicate and vote on the
  | results to prevent a bit flip from messing up the software.
  | This is common for space-based CPUs that are running on FPGAs.
  | It's expensive to design a new chip in a very small run so it's
  | much cheaper to just put the core on an off the shelf FPGA and
  | use the rest of the FPGA fabric for custom logic functions.
 
  | gchadwick wrote:
  | > Sadly the article doesn't go into details about how the
  | programmable RAM is wired to the actual logic gates, which
  | seems to me the most interesting and challenging part of
  | designing an FPGA.
  | 
  | It does, that's the part under the 'Look-Up Tables' section.
  | The key is there aren't any actual logic gates just lots of
  | little RAMs. You implement an arbitrary blob of logic by having
  | the inputs form the address then the RAM gives the result of
  | the logical function.
 
    | stephen_g wrote:
    | Well, they do have some logic gates - usually the cells have
    | at least one flip flop, as well as the LUT.
 
    | roadbuster wrote:
    | > You implement an arbitrary blob of logic by having the
    | inputs form the address > then the RAM gives the result of
    | the logical function.
    | 
    | This is incorrect. Modern FPGAs are composed of small,
    | configurable blocks which contain all sorts of logic. The
    | idea is that the configurable blocks can be (internally)
    | wired-up to implement your logic of choice. The wiring
    | configuration is "loaded" at power-on and retained in
    | memories within each, configurable block.
 
      | gchadwick wrote:
      | Well indeed modern FPGA fabric along with the various fixed
      | function blocks can be very complex, but this is a
      | beginners 'How Does an FPGA Work?' for which a bunch of
      | LUTs connected by programmable interconnect is a useful
      | approximation.
 
  | PragmaticPulp wrote:
  | > 1. why don't we have more user-programmable FPGAs in our
  | fancy desktop mainboards
  | 
  | It has been tried, but GPUs are so fast and efficient enough
  | that it's rarely worth it.
  | 
  | It's very easy to attach an FPGA to the PCIe bus as an add-in
  | card exactly like your GPU. In fact, many FPGA dev boards come
  | in exactly this format. They're available, they're just not in
  | demand.
  | 
  | > 2. is there a SoC board, ARM or RISC-V based, with an FPGA on
  | board? The slower the CPU, the more useful an FPGA would be to
  | accelerate compute tasks
  | 
  | Plenty of FPGA parts include ARM cores. It's a fairly standard
  | chip configuration.
  | 
  | You can also connect an FPGA and an SoC with PCIe or other
  | interconnects. It's really not an obstacle.
  | 
  | FPGAs just aren't very efficient from a cost or dev time
  | perspective for most applications. They're indispensable when
  | you need them, though.
 
  | rjsw wrote:
  | There are plenty of boards that have one of the combined ARM &
  | FPGA chips, Zynq (Xylinx/AMD) or Cyclone (Altera/Intel).
 
| dddiaz1 wrote:
| Another really cool use case for FPGAs is for ultra fast analysis
| of genomic data. This guide walks you through setting up an F1
| instance (AWS FPGA) to do that: https://aws-
| quickstart.github.io/quickstart-illumina-dragen/
 
| mpd wrote:
| I really enjoyed the recent Hackerbox[0] featuring an FPGA. I'd
| never worked with one prior to that.
| 
| https://hackerboxes.com/collections/past-hackerboxes/product...
 
| jokoon wrote:
| So can a large FPGA be somehow used to brute force encryption?
| 
| I don't really understand electronics to see if a GPU could be
| faster than a FPGA, but my guess is yes?
| 
| It seems that anything that can be programmed is inherently
| slower than a FPGA equivalent doing the same task.
| 
| Does larger enough key size always defeat a FPGA?
| 
| I would guess that it becomes power and cost prohibitive for a
| private company to deliver such possibility, but of course, a
| large government entity like the NSA might have enough resource
| to pay for enough FPGA to decrypt most things.
 
  | braho wrote:
  | Even though the FPGA fabric might encode the solution more
  | effectively, there are other important differentiators: clock
  | speed and memory bandwidth. GPUs have higher clock speeds and
  | typically better memory bandwidth (related of course).
  | 
  | With the higher clock speed, GPUs can well outperform FPGAs for
  | many problems.
 
___________________________________________________________________
(page generated 2023-05-03 23:00 UTC)