proxy70

	[HN Gopher] How Does an FPGA Work? ___________________________________________________________________ How Does an FPGA Work? Author : sph Score : 146 points Date : 2023-05-03 17:11 UTC (5 hours ago)
	web link (learn.sparkfun.com)
	w3m dump (learn.sparkfun.com)
	\| nuancebydefault wrote: \| It seems that operations on FPGAs can run much more efficiently \| than their cpu equivalent. For an 'AND' operation, a cpu needs to \| load code and data from a memory into registers, run the logic \| and write the result register back to some memory. This while \| filling up the pipeline for subsequent operations. \| \| The FPGA on the other hand has the output ready one clock cycle \| after the inputs stream in, and can have many such operations in \| parallel. One might ask, why are cpus not being replaced by \| FPGAs? \| \| Another interesting question, can software (recipes for cpus) be \| transpiled to be efficiently run on FPGAs? \| \| I could ask GPT those questions, but the HN community will \| provide more insight I guess. \| pfyra wrote: \| > Another interesting question, can software (recipes for cpus) \| be transpiled to be efficiently run on FPGAs? \| \| Yes. At least for c and c++. It is called High Level Synthesis. \| Lramseyer wrote: \| These are really good questions to be asking, and to help with \| that let's consider 3 attributes of compute complexity: time, \| space, and memory \| \| The traditional way of computing on a CPU is in essence a list \| of instructions to be computed. These instructions all go to \| the same place (the CPU core) to be computed. Since the space \| is constant, the instructions are computed sequentially in \| time. Most programmers aren't concerned with redesigning a CPU, \| so we typically only think about computing in time (and memory \| of course) \| \| On an FPGA (and custom silicon) the speedup comes from being \| able to compute in both time and space. Instead of your \| instructions existing in memory, and computed in time, they can \| be represented in separate logic elements (in space) and they \| can each do separate things in time. So in a way, you're \| trading space for time. This is how the speed gains are \| achieved. \| \| Where this all breaks down is the optimization and scheduling. \| A sequential task is relatively easy to optimize since you're \| optimizing in time (and memory to an extent.) Scheduling is \| easy too, since, they can be prioritized and queued up. \| However, when you're computing in space, you have to optimize \| in 2 spatial dimensions and in time. When you have multiple \| tasks that that need to be completed, you then need to place \| them together and not have them overlap. \| \| Think trying to fit a ton of different shaped tiles on a table, \| where you need to be constantly adding and removing tiles in a \| way that doesn't disrupt the placement of other tiles (at least \| not too often.) It's kind of a pain, but for some more \| constrained problem sets, it might make sense. \| \| These aren't impossible problems, and for some tasks, the time \| or power usage savings is worth the additional complexity. But \| sequential optimization is way easier, and good enough for most \| tasks. However, if our desire for faster computing outpaces our \| ability to make faster CPUs, you may see more FPGAs doing this \| sort of thing. We already have FPGAs that are capable of \| partial reconfiguration, and some pretty good software tools to \| go along with it. \| \| TL;DR: Geometry is hard. \| toast0 wrote: \| > The FPGA on the other hand has the output ready one clock \| cycle after the inputs stream in, and can have many such \| operations in parallel. One might ask, why are cpus not being \| replaced by FPGAs? \| \| FPGAs are more or less a flexible replacement for an \| application specific (logic level) integrated circuit. A CPU \| can do a wide variety of tasks, with a small penalty for \| switching tasks. An ASIC can do one thing and that's it, a FPGA \| can do many things, but with a large penalty for task \| switching. (you can have a CPU as an ASIC or an FPGA, but...). \| ASICs require a lot of upfront design work and costs, so you \| can't use them for everything. ASICs and especially CPUs tend \| to be able to achieve a higher clock speed that FPGAs, but it \| kind of depends. \| \| > Another interesting question, can software (recipes for cpus) \| be transpiled to be efficiently run on FPGAs? \| \| Not really; the way problems are solved is drastically \| different, and I'd expect most things would need to be \| reconceptualized to fit. And a lot of software isn't really \| suited to living as a logic circuit. Exceptions would be \| encoding, compression, encryption, the inverses of all of \| those, signal processing, etc. Things where you have a data \| pipeline and 'the same thing' happens to all the data. \| jcranmer wrote: \| FPGAs are the next big frontier for software development, and \| have been since the '90s, they just need the programming model \| worked out. This is the traditional story told about FPGAs, but \| GPGPU programming suddenly overtaking FPGA development about \| 2010 despite their awkward programming models makes that story \| rather suspect. The thing is, a lot of the benefits of FPGAs \| are really best-case scenarios, and when you move to more \| typical scenarios, their competitiveness as an architecture \| dwindles dramatically. \| \| Pipelining on an FPGA requires being able to find, and fill, \| spatial duplication of the operations being done. If you've got \| conditional operations in a pipeline, now your pipeline isn't \| so full anymore, and this hurts performance on an FPGA far more \| than on a CPU (which spends a lot of power trying to keep its \| pipelines full). But needing to keep the pipelines spatially \| connected also means you have to be able to find a physical \| connection between the two stages of a pipeline, and the \| physical length of that connection also imposes limitations on \| the frequency you can run the FPGA at. \| \| If you care about FLOPS (or throughput in general), the problem \| with FPGAs is that they are running at a clock speed about a \| tenth of a CPU. This requires a 10x improvement in performance \| just to stand still; given that software development for FPGAs \| requires essentially a completely different mindset than for \| CPUs or even GPUs, it's not common to have use cases that work \| well on FPGAs. \| \| (I should say that a lot of my information about programming \| FPGAs comes from ex-FPGA developers, and the "ex-" part will \| certainly have its own form of bias in these opinions). \| davemp wrote: \| Yeah I don't really see FPGAs ever making their way down to \| consumers the way GPUs and CPUs have (end users actually \| programming them). \| \| For (semi) fixed pipeline operations FPGAs will basically \| always be worse than some slightly more specialized ASIC like \| a GPU/AI engine. \| \| One area FPGAs can be exceptionally good at is real-time \| operations. You have much better control over timing in the \| general on FPGAs vs MCU/CPUs, but I don't think that's \| inherent (you could probably alter the mcu architecture a bit \| and close the gap). \| \| I could be wrong but I also think you get better power draw \| for things like mid to low volume glue chips in embedded \| systems because you're not powering big SRAM banks and DMAs \| just to pipe data between a couple hardware interfaces. This \| is only because of market forces though obviously, because if \| mid to low volume ASICs become viable in terms of dev time \| they'll be much better. \| pjc50 wrote: \| > One might ask, why are cpus not being replaced by FPGAs? \| \| Most of the time you want data-dependent execution. FPGA \| systems excel at "fixed pipeline" systems, where you have e.g. \| an audio filter chain .. but even that is usually done in \| efficient DSP CPUs. \| \| > Another interesting question, can software (recipes for cpus) \| be transpiled to be efficiently run on FPGAs? \| \| A _subset_ can. Things like recursion are right out. Various \| companies have tools to do this, but you usually end up having \| to rework either the source you 're feeding them, or the HDL \| output. \| burnished wrote: \| They both use the same kind of components; the FPGA does not \| have a speed advantage, you are simply comparing the speed of a \| very simple circuit element to the speed of a very complicated \| pipeline. \| \| You would use an FPGA to simulate a special purpose circuit, \| which would be faster than a CPU for its specific purpose. We \| have CPUs because having a general purpose processing chip is \| incredibly handy when you want to be able to do more than one \| thing. \| \| EDIT: I forgot to mention that the device outputs in one clock \| cycle by definition: if your clock is too fast then your \| components output signals dont have time to stabilize and you \| will get read errors, so you ensure your clock is slow enough \| for everything to stabilize. \| JackSlateur wrote: \| For the same reasons we do not replace CPUs with GPUs: not the \| right tool \| \| Check out the instruction set of modern CPUs \| convolvatron wrote: \| one big problem is memory. basic cpus have alot of facilities \| for high-speed synchronous interface with DRAM, and truly vast \| amount of resource for cache. \| \| partially as a result, a good model for compiling code to fpgas \| uses a dataflow paradigm, since we don't need to serialize all \| operations through a memory fetch, cache, or even register \| file. \| \| if we hadn't decided to move all our computing to the cloud, I \| suspect fpga accelerator boards for applications which map well \| to that model would have some traction in specialized areas. \| signal processing is definitely one such. \| quadrature wrote: \| >One might ask, why are cpus not being replaced by FPGAs? \| \| they do sometimes !, for very specific applications. The \| problem is that an FPGA is programmed for one specific task and \| would have to be taken offline and reprogrammed if you wanted \| to do something else with it. Its not general purpose like a \| CPU where you can load up any program and have it run. \| \| Programming an FPGA is also comparatively much harder to reason \| about than a CPU because of the parallelism and timing you \| described. \| MSFT_Edging wrote: \| Some of the more modern Xilinx stuff has features where you \| don't need to take down the whole FPGA to reload a bitstream \| onto part of the chip. Its really neat, you can do live \| reprogramming of one component and leave the others alone or \| have an A/B setup where one updates while the other is \| unchanged. \| JohnFen wrote: \| Yes, I'm working on a Xilinx ARM processor with an FPGA. \| The FPGA and the CPU are independent units in the chip that \| can each operate with or without the other. We can indeed \| reprogram the FPGA without taking the system down. \| davemp wrote: \| It goes even further. You can partially reconfigure the \| FPGA fabric itself: \| https://support.xilinx.com/s/article/34924?language=en_US \| quadrature wrote: \| That is really cool, hadn't heard of that before. \| barelyauser wrote: \| What is simpler: making logical circuit "A" or making a circuit \| that emulates logical circuit "A" and its relatives? \| markx2 wrote: \| If anyone in unaware you can buy the very impressive Pocket. \| https://www.analogue.co/pocket \| \| The current list of what it can do with FPGA is listed here - \| https://openfpga-cores-inventory.github.io/analogue-pocket/ and \| the inevitable sub-reddit is a good resource. \| https://old.reddit.com/r/AnaloguePocket/ \| gchadwick wrote: \| There's also the MiSTer project: https://github.com/MiSTer- \| devel/Wiki_MiSTer/wiki. Not hand-held (yet...) and hardware is \| less slick but a bunch more systems and also fully open source. \| phendrenad2 wrote: \| MiSTer makes me kind of sad, the DE10-nano board it's based \| on is 7 years old at this point, and the actual FPGA chip on \| the board is probably over twice as old as that. And this is \| still the peak of hobby FPGA chips. I wonder why Moore's Law \| is hitting the FPGA industry particularly hard all of a \| sudden. \| willis936 wrote: \| There are better FPGA options, they're just more expensive. \| The DE-10 Nano was strategically chosen as "powerful enough \| to meet most wants while still being within a reasonable \| budget". \| \| No one's going to plunk down $10k for a 19 EV Zynq \| UltraScale+ with 1.1M LEs, but they will spend $200 on a \| Cylcone V with 210k LEs. \| MrHeather wrote: \| The article says FPGAs are too power hungry for handheld \| devices. Did Analogue do anything special to solve this problem \| on the Pocket? \| agg23 wrote: \| That's honestly not true at all; it all just depends on your \| platform. On the Pocket, the FPGA _is_ the processor (there \| are actually two FPGAs, one for the actual emulation core, \| and one for scaling video, and there's technically a PIC \| microcontroller for uploading bitstreams and managing UI). \| The FPGAs are still not much power compared to the display \| itself. With the in-built current sensor on the dev kits, the \| highest we've measured drawn by the main FPGA is ~300mAh. Now \| this sensor isn't going to be the best measurement, but it's \| something to go off of. \| eulgro wrote: \| > ~300 mAh \| \| mA? You're not very convincing here. \| WhiteDawn wrote: \| Personally I think this is the biggest selling feature of \| FPGA based emulation. \| \| The reality is both Software and FPGA emulation can be done \| very well and with very low latency, however to achieve \| this in software you generally require high end power \| hungry hardware. \| \| A steam deck can run a highly accurate sega genesis \| emulator with read-ahead rollback, screen scaling, shaders \| and all the fixings no problem, but in theory the pocket \| can provide the exact same experience with an order of \| magnitude less power. \| \| It's not quite apples to oranges of course, but the \| comfortable battery life does make the pocket much more \| practical. \| agg23 wrote: \| When being nitpicky about latency is where FPGAs truly \| shine. You lose a good bit of it by connecting to HDMI (I \| think the Pocket docked is 1/4 a frame, and MiSTer has a \| similar mode) (EDIT: MiSTer can do 4 scanlines, but it's \| not compatible with some displays), but when we're \| talking about analog display methods or inputs, you can \| achieve accurate timings with much less effort than on a \| modern day computer. \| \| For a full computer like the Steam Deck, you have to deal \| with preemption, display buffers, and more, which _will_ \| add latency. Now if you went bare metal, you could \| definitely drive a display with super low latency, \| hardware accurate emulation, but obviously that's not \| what most people are doing. \| agg23 wrote: \| Not to draw attention to myself or anything, but if you're \| interested in learning to make cores for the Analogue Pocket or \| MiSTer (or similar) platforms, I highly recommend taking a look \| at the resources and wiki I'm slowly building - \| https://github.com/agg23/analogue-pocket-utils/ \| \| I started ~7 months ago with approximately no FPGA or hardware \| experience, have now ported ~6 cores from MiSTer to Pocket, and \| just released my first core of my own, the original Tamagotchi \| - https://github.com/agg23/fpga-tamagotchi/ \| \| If you want to join in, I and several other devs are very \| willing to help talk you through it. We primarily are on the \| FPGAming Discord server - https://discord.gg/Gmcmdhzs - which \| is probably the best place to get a hold of me as well. \| jonny_eh wrote: \| I also recommend the official dock. It basically turns it into \| an easy to use Mister. \| sph wrote: \| My mind is blown but I'm also wondering if this isn't some kind \| of incredible over-engineering? Surely CPUs are fast enough to \| emulate these kind of devices in software. If they aren't, they \| must be an order of magnitude simpler in complexity. \| \| I wouldn't ordinarily care about emulators, but actual hardware \| emulators is the craziest thing I've heard in a while. All that \| for a small handheld console? \| \| If only I was not so broke... \| lprib wrote: \| Sure it would probably be cheaper to chuck a cortex-A* or \| similar mid-range MCU in there. One advantage of FPGAs that \| it can achieve "perfect" emulation of a Z80 (or other) since \| it's running on the logic gate level. No software task \| latency, no extra sound buffering, etc. It can re-create the \| original clock-per-clock. \| arein3 wrote: \| It's impressive as well \| agg23 wrote: \| Software is orders of magnitude simpler in complexity, yes. \| The difference between a software emulator and a logic level \| emulator are immense. \| \| But take the example of the difficulties with a software NES \| emulator: \| \| In hardware, there is one clock that is fed into the 3 main \| disparate systems; the CPU, APU (audio), and PPU (picture). \| They all use different clock dividers, but they're still fed \| off of the same source clock. Each of these chips operate in \| parallel to produce the output expected, and there's some \| bidirectional communication going on there as well. \| \| In a software emulator, the only parallel you get is on \| multiple cores, but you can approximate it with threading \| (i.e. preemption). For simplicity, you stick with a single \| thread. You run 3 steps of the PPU at once, then one step of \| the CPU and APU. You've basically just sped through the first \| two steps, because who will notice those two cycles? They \| took no "real" time, they were performed as fast as the \| software could perform them. Probably doesn't matter, as no \| one could tell that for 10ns, this happened. \| \| You need to add input. You use USB. That has a minimum \| polling interval of 1000Hz, plus your emulator processing \| time (is it going to have to go in the "next frame" packet?), \| but controls on systems like the NES were practically \| instantly available the moment the CPU read. \| \| Now you need to produce output. You want to hook up your \| video, but wait, you need to feed it into a framebuffer. \| That's at least one frame of latency unless you're able to \| precompute everything for the next frame. Your input is \| delayed a frame, because it has to be fed into the next \| batch, the previous batch (for this frame) is already done. \| You use the basis of 60fps (which is actually slightly wrong) \| to time your ticking of your emulator. \| \| Now you need to hook up audio. Audio must go into a buffer or \| it will under/overflow. This adds latency, and you need to \| stay on top of how close you are to falling outside of your \| bounds. But you were using FPS for pacing, so now how to you \| reconcile that? \| \| ---- \| \| Cycle accurate and low latency software solutions are \| certainly not easy, and it's impossible for true low latency \| on actual OS running CPUs. Embedded-style systems with RTOSes \| might be able to get pretty close, but it's still not going \| to be the same as being able to guarantee the exact same (or \| as near as we can tell) timing for every cycle. \| \| I want to be clear that none of these hardware \| implementations are actually that accurate, but they could \| be, and people are working hard to improve them constantly \| rtkwe wrote: \| The benefit of FPGAs is you can get nearly gate perfect \| emulation of an old games system. We've had emulators for \| years that get most things right but some games and minor \| things in old games require specific software patches to \| ensure the odd why they used the chips available produces the \| same output. There's a great old article from 2011 about the \| power required at the time to get a nearly perfect emulation \| of a NES. [0] The goal with the Pocket and all of Analogue's \| consoles isn't to be just another emulation machine but to \| run as close as possible to the original at a hardware level. \| That's their whole niche, hardware level 'emulation' of old \| consoles. \| \| [0] https://arstechnica.com/gaming/2011/08/accuracy-takes- \| power-... \| Waterluvian wrote: \| Emulating "accurately" is so difficult that not even \| Nintendo's Game Boy emulator on the Switch does it properly. \| I've been replaying old games and comparing some questionable \| moments with my original Game Boy, and the timings are not \| quite right in some cases. \| \| For example in Link's Awakening, there's a wiggle screen \| effect done by writing to OAM during HBlank. On the Switch it \| lags very differently than my GB (try it by getting into the \| bed where you find the ocarina). Or with Metroid 2, the sound \| when you kill an Omega Metroid is different too. It pitch \| shifts along with the "win" jingle. \| \| These have almost zero impact on playability. But for purists \| and emudevs it's a popular pursuit. \| photochemsyn wrote: \| Here's a nice series that picks up where this one leaves off \| (shows how flip-flop/LUT units are organized into cells inside a \| PLB, programmable logic block). It also is the first step in a \| tutorial on using Verilog, building a hardware finite state \| machine, and eventually a RISC-V processor on a FPGA: \| \| https://www.digikey.com/en/maker/projects/introduction-to-fp... \| user070223 wrote: \| From my understanding \| \| FPGA doesn't have the instruction pipeline as the command is \| encoded in the gates themselves. It means that on runtime the \| FPGA is not turing complete[0] as opposed to the CPU[1]. \| \| There is a phrase "data is code and code is data" in security \| context. The new saying if FPGA would ever replace cpus' as the \| main computation hardware(as you don't need turing complete when \| you keep using the same apps[microservices]) is something like \| "code is execution and execution is code" as you imprint the code \| in the gates. It would get rid of a whole class/subclass of \| memory safety vulernabilitie. \| \| This paradigm change is like what webassembly did to the web. The \| slogan should be "make the bitstream go mainstream" Some made a \| demo running wasm on fpga[1], not sure if using a cpu or directly \| \| of course you move complexity to compiling, and increase loading \| speed, all for order of magnitude faster execution \| \| Companies devloped high level synthesis compilers but it's \| diffcult and challenging as you need to synchronize parallel \| excution piplines which you don't have to in cpu since it has \| steady clock rate for each step in the pipeline \| \| A copmany named legup computing(acquired by microchip) compiled \| memcached/redis applications to fpga and improved perfromance & \| power efficency by an order of magnitude(10x) \| \| There are a lot of intellectual properties in hardware design as \| opposed to software so tools and knowledge is scarce. \| \| If anyone works / want to work on this problem hit me up in the \| comments \| \| [0] Unless you implement a cpu on top of the fpga :) \| \| [1] Assuming infinte memory, which is false, but good enough \| \| [2] https://github.com/denisvasilik/wasm-fpga \| proto_lambda wrote: \| > FPGA doesn't have the instruction pipeline as the command is \| encoded in the gates themselves. It means that on runtime the \| FPGA is not turing complete[0] as opposed to the CPU[1]. \| \| That obviously depends entirely on the circuit, many \| sufficiently advanced circuits probably end up being \| accidentally turing complete. \| JohnFen wrote: \| You can implement turing-complete CPUs in FPGA fabric. \| proto_lambda wrote: \| That's exactly what OP's footnotes say, yes. \| jschveibinz wrote: \| We used them for real time array signal processing and beam- \| forming. They worked great. \| y0ungarmanii wrote: \| I saw various comments about how FPGAs are not ready for consumer \| hardware, apple is using them in the airpod max already (probably \| for filtering audio) \| \| Check the link below \| https://www.ifixit.com/Teardown/AirPods+Max+Teardown/139369 \| \| They really excel for high throughput & low latency - which noise \| canceling sounds like a good example of! In addition to this, \| they are already being used in communication systems & data \| centers to speed up latency sensitive computations. Edge AI seems \| like a big market that they will be used for soon, probably more \| likely b/c they can be flashed unlike ASICs and new NN \| architectures drop every couple of years. \| burnished wrote: \| Neat. If the author is around, might I suggest pushing some of \| the 'why use an FPGA' to the front? I think it would benefit from \| a more concrete example motivating the use of an FPGA - like a \| picture of some simple circuit using a seven segment display on a \| broad board next to a picture of an FPGA implementing the same \| circuit in order to make it more clear that it is a substitute \| for putting experiments together by hand. I think it will help \| newcomers better contextualize what is happening and why. \| \| I think in the same vein your wrap up of why you might want to do \| something in hardware vs software is great and well placed. \| \| Hmmm, I guess now is as good a time as any to bumblefuck around \| with small electronics projects for fun. Thanks for the reminder! \| beardyw wrote: \| > Neat. If the author is around, might I suggest pushing some \| of the 'why use an FPGA' to the front? \| \| I think the problem is identifying cases where you really need \| an FPGA. Most of the time you don't. \| burnished wrote: \| I suggest it purely for educational purposes. The first \| struggle isn't identifying the best use case - its \| understanding wtf is going on. Putting it in terms of \| something more familiar is helpful for that. \| \| Your thing would make for a wonderful followup topic though. \| cycomanic wrote: \| What do you mean by "you". Maybe "you" as in a general \| consumer don't need an FPGA, but I guess one could argue a \| general consumer doesn't need a general purpose computer \| either. \| \| There are certainly many use cases where you absolutely do \| need an FPGA, i.e. anything were you need to process large \| amount of IO in realtime. For example the guys from simulavr \| (talk about how they use an FPGA for display correction) \| here: https://simulavr.com/blog/testing-ar-mode-image- \| processing/ \| \| Many modern devices would not function without FPGAs \| JohnFen wrote: \| > anything were you need to process large amount of IO in \| realtime. \| \| I'm working on a FPGA-based system right now. We're using \| an FPGA precisely because this is what we're doing -- about \| a hundred I/O ports that have to be processed with as \| little latency as possible. \| beardyw wrote: \| I think we can agree that this discussion does not involve \| general consumers! \| \| "Many cases" is not the opposite of most cases. \| kanetw wrote: \| (SimulaVR dev) It's not wrong to say that in most cases, \| tasks are better solved without an FPGA. But when you need \| one you need one (or an ASIC if you have the volume and \| don't need reconfigurability) \| asdfman123 wrote: \| This is meant to be an introduction though, right? You can \| simply write "some people do X, and others claim Y is better" \| then move on. \| \| I read several paragraphs of the article and I still don't \| know why you'd use one, despite taking computer architecture \| and analog electronics courses in undergrad. \| \| I don't want to read about logic gates again and I don't want \| to read about the nuances before I broadly understand what \| the point is. \| \| For anyone else still wondering, here's Wikipedia: \| \| > FPGAs have a remarkable role in embedded system development \| due to their capability to start system software development \| simultaneously with hardware, enable system performance \| simulations at a very early phase of the development, and \| allow various system trials and design iterations before \| finalizing the system architecture. \| \| Basically, rapid prototyping I guess. That makes sense. \| awjlogan wrote: \| If that was an ask for a specific example, one of the most \| common uses for FPGAs is DSPs. Say you have a simple FIR \| filter of, say, 63 taps. To do this in a CPU requires you \| to load two values and do a multiply/accumulate for each \| tap in sequence. Very (!!) optimistically, that's about 192 \| instructions. With an FPGA, you can do all the \| multiplications in parallel and then just sum the outputs - \| probably done in 2 cycles and with pipelining your \| throughput could be a sample every clock. \| \| If the FPGA is too slow, too power inefficient etc you can \| (if you have the money!) take the same core design and put \| it in an ASIC. The FPGA provides an excellent prototyping \| environment; in this example you can tune the filter \| parameters before committing to a full ASIC. \| pjc50 wrote: \| > multiply/accumulate for each tap in sequence. Very (!!) \| optimistically, that's about 128 instructions \| \| This is what all those vector instructions are for. \| \| FPGA is kind of invaluable if you have lots of streams \| coming in at high megabit rates, though, and need to \| preprocess down to a rate the CPU and memory bus can \| handle. \| awjlogan wrote: \| Yes, indeed :) Didn't want to muddy the waters with \| vector instructions, and it's fair to say that the \| dedicated DSP chip market has been squeezed by FPGAs on \| one side and vectorised (even lightly, like the \| Cortex-M4/M7 DSP extension) CPUs on the other. \| asdfman123 wrote: \| Explain it to me like I'm your mom. \| nfriedly wrote: \| I've read that AMD's 7040-series mobile CPUs will have an "FPGA- \| based AI engine developed by Xilinx" [1] - I'm wondering how \| _programmable_ that will be. \| \| I know there's been some performance difficulties emulating the \| PlayStation 3's various floating point modes. It's the kid of \| thing that I think an on-chip FPGA could theoretically help with, \| although I don't know if it'd be worth the trouble in this \| specific case. (Or if AMD's implementation will be flexible \| enough to help.) \| \| [1]: https://www.anandtech.com/show/18844/amd-unveils-ryzen- \| mobil... \| sph wrote: \| Sadly the article doesn't go into details about how the \| programmable RAM is wired to the actual logic gates, which seems \| to me the most interesting and challenging part of designing an \| FPGA. \| \| In my mediocre understanding of digital circuits, RAM is usually \| addressable, so it has to be wired in a more direct manner to \| enable such a design. \| \| I posted this article because someone mentioned some Ryzen chip \| having an FPGA in another post, and I am now left wondering: \| \| 1. why don't we have more user-programmable FPGAs in our fancy \| desktop mainboards \| \| 2. is there a SoC board, ARM or RISC-V based, with an FPGA on \| board? The slower the CPU, the more useful an FPGA would be to \| accelerate compute tasks \| duskwuff wrote: \| > Sadly the article doesn't go into details about how the \| programmable RAM is wired to the actual logic gates \| \| Not sure what you mean by that. Do you mean how a RAM is used \| as a lookup table to implement logic gates, how routing works, \| or how block RAM is integrated into the FPGA fabric? \| \| > is there a SoC board, ARM or RISC-V based, with an FPGA on \| board? \| \| Better yet, there are a number of FPGAs available with an ARM \| SoC on board. Xilinx Zynq, Intel Cyclone V SoC, various others. \| pjc50 wrote: \| > RAM is usually addressable, so it has to be wired in a more \| direct manner to enable such a design \| \| DRAM is necessarily a grid. \| \| SRAM, in e.g. the standard 6-transistor cell form, you can kind \| of dump individual bits anywhere you need one. \| \| > why don't we have more user-programmable FPGAs in our fancy \| desktop mainboards \| \| They tend to be horrifyingly expensive and there are few use \| cases you can't outperform with a GPU or even just vector \| instructions. Most of the interesting use cases for FPGAs are \| when you have direct access to the pins and can wire them up to \| high-speed signalling, which really isn't home user friendly. \| \| Also all the tooling is proprietary. \| \| > is there a SoC board, ARM or RISC-V based, with an FPGA on \| board \| \| Buy a medium sized FPGA and download a CPU of your choice. \| \| (I have a downloadable-CPU-sized FPGA board on my desk for \| testing not yet shipped ASIC designs. It costs about six \| thousand dollars and has a 48-week lead time on Farnell) \| sph wrote: \| > Buy a medium sized FPGA and download a CPU of your choice. \| \| Damn, _of course_ one would be able to download a CPU and \| "emulate it" in hardware. \| \| I never imagined that would be possible. Now I'm thinking \| that I had infinite free time, I would buy an FPGA and design \| a modern Lisp CPU. A RISC-V based design with native Lisp \| support. Who needs hardware when you can just emulate it in \| an FPGA. \| \| That's seriously cool technology. \| MSFT_Edging wrote: \| As for question 1, they're far more common in server grade \| stuff where typically they are baked in. Consumer stuff just \| doesn't need/use as much IO throughput and muxing that the FPGA \| provides on say, a large networking switch. \| \| There are PCIe compatible FPGAs that you can plug into your \| desktop like a graphics card to accelerate certain tasks. In \| general though, our workstation hardware just isn't specialized \| enough to require them, but can be extended to do so. If \| something is a large enough business model, they'll just make \| an ASIC. \| aphedox wrote: \| After Intel acquired Altera they released a series of x86 Xeon \| chips with integrated FPGAs. Look up the Xeon 6138P. \| wildzzz wrote: \| Both Intel and Xilinx sell FPGAs with hard ARM cores inside so \| you can run real Linux while being able to interface with \| custom logic. Additionally, it's pretty common to create ARM, \| RISC-V, or PowerPC soft cores in the FPGA when there is no hard \| cores available. These mimic the real cores and will run \| software while allowing for things like custom instructions \| that can take advantage of the flexibility of FPGA fabric. The \| Xilinx Zynq and Intel Cyclone V have options for hard ARM \| cores. There are various designs of boards out there you can \| buy that implement Arduino or Raspberry Pi shield \| compatibility. The XUP PYNQ-Z2 supports both interfaces and \| runs a Zynq-7000 with a real ARM core. \| \| You can do other things with soft cores that are not possible \| with an off the shelf CPU like triple mode redundancy. This is \| when you run a lot of the logic in triplicate and vote on the \| results to prevent a bit flip from messing up the software. \| This is common for space-based CPUs that are running on FPGAs. \| It's expensive to design a new chip in a very small run so it's \| much cheaper to just put the core on an off the shelf FPGA and \| use the rest of the FPGA fabric for custom logic functions. \| gchadwick wrote: \| > Sadly the article doesn't go into details about how the \| programmable RAM is wired to the actual logic gates, which \| seems to me the most interesting and challenging part of \| designing an FPGA. \| \| It does, that's the part under the 'Look-Up Tables' section. \| The key is there aren't any actual logic gates just lots of \| little RAMs. You implement an arbitrary blob of logic by having \| the inputs form the address then the RAM gives the result of \| the logical function. \| stephen_g wrote: \| Well, they do have some logic gates - usually the cells have \| at least one flip flop, as well as the LUT. \| roadbuster wrote: \| > You implement an arbitrary blob of logic by having the \| inputs form the address > then the RAM gives the result of \| the logical function. \| \| This is incorrect. Modern FPGAs are composed of small, \| configurable blocks which contain all sorts of logic. The \| idea is that the configurable blocks can be (internally) \| wired-up to implement your logic of choice. The wiring \| configuration is "loaded" at power-on and retained in \| memories within each, configurable block. \| gchadwick wrote: \| Well indeed modern FPGA fabric along with the various fixed \| function blocks can be very complex, but this is a \| beginners 'How Does an FPGA Work?' for which a bunch of \| LUTs connected by programmable interconnect is a useful \| approximation. \| PragmaticPulp wrote: \| > 1. why don't we have more user-programmable FPGAs in our \| fancy desktop mainboards \| \| It has been tried, but GPUs are so fast and efficient enough \| that it's rarely worth it. \| \| It's very easy to attach an FPGA to the PCIe bus as an add-in \| card exactly like your GPU. In fact, many FPGA dev boards come \| in exactly this format. They're available, they're just not in \| demand. \| \| > 2. is there a SoC board, ARM or RISC-V based, with an FPGA on \| board? The slower the CPU, the more useful an FPGA would be to \| accelerate compute tasks \| \| Plenty of FPGA parts include ARM cores. It's a fairly standard \| chip configuration. \| \| You can also connect an FPGA and an SoC with PCIe or other \| interconnects. It's really not an obstacle. \| \| FPGAs just aren't very efficient from a cost or dev time \| perspective for most applications. They're indispensable when \| you need them, though. \| rjsw wrote: \| There are plenty of boards that have one of the combined ARM & \| FPGA chips, Zynq (Xylinx/AMD) or Cyclone (Altera/Intel). \| dddiaz1 wrote: \| Another really cool use case for FPGAs is for ultra fast analysis \| of genomic data. This guide walks you through setting up an F1 \| instance (AWS FPGA) to do that: https://aws- \| quickstart.github.io/quickstart-illumina-dragen/ \| mpd wrote: \| I really enjoyed the recent Hackerbox[0] featuring an FPGA. I'd \| never worked with one prior to that. \| \| https://hackerboxes.com/collections/past-hackerboxes/product... \| jokoon wrote: \| So can a large FPGA be somehow used to brute force encryption? \| \| I don't really understand electronics to see if a GPU could be \| faster than a FPGA, but my guess is yes? \| \| It seems that anything that can be programmed is inherently \| slower than a FPGA equivalent doing the same task. \| \| Does larger enough key size always defeat a FPGA? \| \| I would guess that it becomes power and cost prohibitive for a \| private company to deliver such possibility, but of course, a \| large government entity like the NSA might have enough resource \| to pay for enough FPGA to decrypt most things. \| braho wrote: \| Even though the FPGA fabric might encode the solution more \| effectively, there are other important differentiators: clock \| speed and memory bandwidth. GPUs have higher clock speeds and \| typically better memory bandwidth (related of course). \| \| With the higher clock speed, GPUs can well outperform FPGAs for \| many problems. ___________________________________________________________________ (page generated 2023-05-03 23:00 UTC)