[HN Gopher] Ask HN: How does a CPU communicate with a GPU? ___________________________________________________________________ Ask HN: How does a CPU communicate with a GPU? I've been learning about computer architecture [1] and I've become comfortable with my understanding of how a processor communicates with main memory - be it directly, with the presence of caches or even virtual memory - and I/O peripherals. But something that seems weirdly absent from the courses I took and what I have found online is how the CPU communicates with other processing units, such as GPUs - not only that, but an in-depth description of interconnecting different systems with buses (by in-depth I mean an RTL example/description). I understand that as you add more hardware to a machine, complexity increases and software must intervene - so a generalistic answer won't exist and the answer will depend on the implementation being talked about. That's fine by me. What I'm looking for is a description of how a CPU tells a GPU to start executing a program. Through what means do they communicate - a bus? How does such a communication instance look like? I'd love get pointers to resources such as books and lectures that are more hands-on/implementation aware. [1] Just so that my background knowledge is clear: I've concluded NAND2TETRIS, watched and concluded Berkeley's 2020 CS61C and have read a good chunk of H&P (both Computer Architecture: A Quantitative Approach and Computer Organization and Design: RISC-V edition), and now am moving on to Onur Mutlu's lectures on advanced computer architecture. Author : pedrolins Score : 58 points Date : 2022-03-30 20:17 UTC (2 hours ago) | simne wrote: | Lot of things happen there. | | But most important, PCIe bus is serial bus, which have | virtualized interface, so there is no physical process of | communication, what happen more similar to Ethernet network, mean | on each device exists few endpoints, each has it's own controller | with its own address and few registers to store state and | transitions, and memory buffer(s). | | Videocards usually have many behaviors. In simplest modes, they | behave just as RAM mapped to large chunk of system RAM space, | plus video registers to control video output, and to control | address mapping of video ram, and to switch modes. | | In more complex modes, Videocards generate interrupts (just | special type of message on PCIe). | | In 3D modes, which are most complex, Videocontroller take data | from its own memory (which mapped to system space), there are | stored tree of graphic primitives, some draw directly from | videoram, but for others used bus master option of PCIe, in which | videocontroller read additional data (textures) from predefined | chunks of system RAM. | | About GPU operation, usually, CPU copy data to Videoram directly, | than ask videocontroller to run program in videoram, and when | complete, GPU issue interrupt, and than CPU copied result from | videoram. | | Recent additions where, add GPU possibility to read data from | system disks, using mentioned before bus master, but those | additions are not already wide implemented. | simne wrote: | For beginner, I think the best to begin read about Atari | consoles, Atari-65/130, NES, as their ideas where later | implemented in all commodity videocards, just slightly | extended. | | BTW all modern videos use bank-switching. | melenaboija wrote: | It is old and I am not sure everything still applies but I found | this course useful to understand how GPUs work: | | Intro to Parallel Programming: | | https://classroom.udacity.com/courses/cs344 | | https://developer.nvidia.com/udacity-cs344-intro-parallel-pr... | aliasaria wrote: | There is some good information on how PCI-Express works here: | https://blog.ovhcloud.com/how-pci-express-works-and-why-you-... | dragontamer wrote: | I'm no expert on PCIe, but its been described to me as a network. | | PCIe has switches, addresses, and so forth. Very much like IP- | addresses, except PCIe operates on a significantly faster level. | | At its lowest-level, PCIe x1 is a single "lane", a singular | stream of zeros-and-ones (with various framing / error correction | on top). PCIe x2, x4, x8, and x16 are simply 2x, 4x, 8x, or 16 | lanes running in parallel and independently. | | ------- | | PCIe is a very large and complex protocol however. This "serial" | comms can become abstracted into Memory-mapped I/O. Instead of | programming at the "packet" level, most PCIe operations are seen | as just RAM. | | > even virtual memory | | So you understand virtual memory? PCIe abstractions go up to and | include the virtual memory system. When your OS sets aside some | virtual-memory for PCIe devices, when programs read/write to | those memory-addresses, the OS (and PCIe bridge) will translate | those RAM reads/writes into PCIe messages. | | -------- | | I now handwave a few details and note: GPUs do the same thing on | their end. GPUs can also have a "virtual memory" that they | read/write to, and translates into PCIe messages. | | This leads to a system called "Shared Virtual Memory" which has | become very popular in a lot of GPGPU programming circles. When | the CPU (or GPU) read/write to a memory address, it is then | automatically copied over to the other device as needed. Caching | layers are layered on top to improve the efficiency (Some SVM may | exist on the CPU-side, so the GPU will fetch the data and store | it in its own local memory / caches, but always rely upon the CPU | as the "main owner" of the data. The reverse, GPU-side shared | memory, also exists, where the CPU will communicate with the | GPU). | | To coordinate access to RAM properly, the entire set of atomic | operations + memory barriers have been added to PCIe 3.0+. So you | can perform "compare-and-swap" to shared virtual memory, and | read/write to these virtual memory locations in a standardized | way across all PCIe devices. | | PCIe 4.0 and PCIe 5.0 are adding more and more features, making | PCIe feel more-and-more like a "shared memory system", akin to | cache-coherence strategies that multi-CPU / multi-socket CPUs use | to share RAM with each other. In the long term, I expect Future | PCIe standards to push the interface even further in this "like a | dual-CPU-socket" memory-sharing paradigm. | | This is great because you can have 2-CPUs + 4 GPUs on one system, | and when GPU#2 writes to Address#0xF1235122, the shared-virtual- | memory system automatically translates that to its "physical" | location (wherever it is), and the lower-level protocols pass the | data to the correct location without any assistance from the | programmer. | | This means that a GPU can do things like perform a linked-list | traversal (or tree traversal), even if all of the nodes of the | tree/list are in CPU#1, CPU#2, GPU#4, and GPU#1. The shared- | virtual-memory paradigm just handwaves the details and lets PCIe | 3.0 / 4.0 / 5.0 protocols handle the details automatically. | simne wrote: | I agree that PCIe is mostly shared memory system. | | But for videocards this sharing is unequal, because their RAM | sizes exceeds 32bit address space, and lot of still used | mainboards have 32bit PCIe controller, so all PCIe addresses | should be inside 4GB address space, and this is seen on windows | machines as total installed memory is nor all, but minus | approximately 0.5GB, from which 256MB is videoram access | window. | | So in most cases, remain in force rule, that videocard share | all it's memory through 256mb window using bank-switching. | | As for GPU read main system memory, usually this is useless, | because vram is magnitudes faster, even if not consider usage | of bus bandwidth by other devices, like HDD/SSD. | | And in most cases, only usage of access GPU to main system | memory, is traditional read of textures (for 3D accelerator) | from main system memory - for example ALL 3D software using GPU | rendering, could only use for this videoram, none use system | ram. | roschdal wrote: | Through the electrical wires in the PCI express port. | danielmarkbruce wrote: | I could be misunderstanding the context of the question, but I | think OP is imagining some sophisticated communication logic | involved at the chip level. The CPU doesn't know anything much | about the GPU other than it's there and data can be sent back | and forth to it. It doesn't know what any of the data means. | | I think the logic OP imagines does exist, but it's actually in | the compiler (eg the cuda compiler), figuring exactly what | bytes to send which will start a program etc. | coolspot wrote: | Not in the compiler but in GPU driver. A graphic program (or | compute) just calls APIs (DirectX/Vulkan/CUDA) of a driver, | which then knows how to do that on a low-level writing to | particular regions of RAM mapped to GPU registers. | danielmarkbruce wrote: | Yes! This is correct. My bad, it's been too long. I guess | either way the point is that it's done in software, not | hardware. | lxgr wrote: | There's also odd/interesting architectures like one of | the earlier Raspberry Pis, where the GPU was actually | running its own operating system that would take care of | things like shader compilation. | | In that case, what's actually being written to | shared/mapped memory is very high level instructions that | are then compiled or interpreted on the GPU (which is | really an entire computer, CPU and all) itself. | alberth wrote: | Nit pick... | | Technically it's not "through" the electrical wires, it's | actually through the electrical field created _around_ the | electrical wires. | | Veritasium explains https://youtu.be/bHIhgxav9LY | tux3 wrote: | Nitpicking the nitpick: the energy is what's in the fields, | but the electrical wires aren't just for show, the electrons | do need to be able to move in the wire for there to be a | current, and the physical properties of the wire have a big | impact on the signal. | | So things get very complicated and unintuitive, especially at | high frequencies, but it's okay to say through the wire! | a9h74j wrote: | And as you might be alluding, particularly high | frequencies: in the skin (via skin effect) of the wire! | | I'll confess I have never seen a plot of actual rms current | density vs radius related to skin effect. | rayiner wrote: | Typically CPU and GPU communicate over the PCI Express bus. (It's | not technically a bus but a point to point connection.) From the | perspective of software running on the CPU, these days, that | communication is typically in the form of memory-mapped IO. The | GPU has registers and memory mapped into the CPU address space | using PCIE. A write to a particular address generates a message | on the PCIE bus that's received by the GPU and produces a write | to a GPU register or GPU memory. | | The GPU also has access to system memory through the PCIE bus. | Typically, the CPU will construct buffers in memory with data | (textures, vertices), commands, and GPU code. It will then store | the buffer address in a GPU register and ring some sort of | "doorbell" by writing to another GPU register. The GPU | (specifically, the GPU command processor) will then read the | buffers from system memory, and start executing the commands. | Those commands can include, for example, loading GPU shader | programs into shader memory and triggering the shaders to execute | those shaders. | Keyframe wrote: | If OP or anyone else wants to see this firsthand.. well shit, I | feel old now, but.. try an exercise into assembly programming | of commodore 64. Get a VICE emulator and dig into it for a few | weeks. It's real easy to get into, CPU (6502 based), video chip | (VIC II), sound chip (famous SID), ROM chips.. they all love in | this address space (yeah, not mentioning pages), CPU has three | registers.. it's also real fun to get into, even to this day. | vletal wrote: | Nice exercise. Similarly I learned most about basic computer | architecture by programing 8050 in ASM as well as C. | | And I'm 32. Am I old yet? I'm not right? Right? | silisili wrote: | Sorry pal! | | I remember playing Halo in my early 20's, and chatting with | a guy from LA who was 34. Wow, he's so old, why was he | still playing video games. | | Here I sit in my late 30's...still playing games when I | have time, denying that I'm old, despite the noises I make | getting up and random aches and pains. | Keyframe wrote: | 40s are new thirties, my friend. Also, painkillers help. | jeroenhd wrote: | There's a nice guide by Ben Eater on Youtube about a | breadboard computers: https://www.youtube.com/playlist?list=P | LowKtXNTBypFbtuVMUVXN... | | It doesn't sport any modern features like DMA, but builds up | from the core basics: a 6502 chip, a clock, and a blinking | LED, all hooked up on a breadboard. He also built a basic VGA | card and explains protocols like PS/2, USB, and SPI. It's a | great introduction or refresher into the low level hardware | concepts behind computers. You can even buy kits to play | along at home! | zokier wrote: | Is my understanding correct that compared to those historical | architectures, modern GPUs are a lot more asynchronous? | | What I mean that these days you'd issue a data transfer or | program execution on the GPU, they will complete at its own | pace and the CPU in the meanwhile continues executing other | code; in contrast in those 8 bitters you'd poke a video | register or whatev and expect that to have more immediate | effect allowing those famous race the beam effects etc? | Keyframe wrote: | There were interrupts telling you when certain things | happened. If anything, it was asynchronous. Big thing is | also that you had to tally the cost of what you eere doing. | There was a budget of how many cycles you got per line, per | screen and then fit whatever you had to in that. With | playing sound it was common to draw color when you fed the | music into SID so you could tell, like a crude debug/ad hoc | printf, how many cycles your music routines ate. | divbzero wrote: | Going one deeper, how does the communication work on a physical | level? I'm guessing the wires of the PCI Express bus passively | propagate the voltage and the CPU and GPU do "something" with | that voltage? | throw82473751 wrote: | Voltages yes.. usually its all binary digital signals, | running serial/parallel and following some communication | protocol. Maybe you should have a look at something really | simple/old like UART communication to get some idea how this | works and then study next how this is scaled up over PCIE to | understand the chat between CPU/GPU? | | Or maybe not, one does not need all the details, so often | just scaled concepts :) | | https://en.m.wikipedia.org/wiki/Universal_asynchronous_recei. | .. | | Edit: Wait it is really already QAM over PCIE? Yeah then UART | is a gross simplification, but maybe still a good one to | start with depending on knowledge level? | _3u10 wrote: | https://pcisig.com/sites/default/files/files/PCI_Express_El | e... It doesn't say QAM explicitly but it has all the QAM | terminology like 128 codes. Inter symbol interference etc. | I'm not an RF guy by any stretch but it sounds like QAM to | me. | | This is an old spec. I think it's like equivalent to | QAM-512 for PCIe 6 | rayiner wrote: | PCI-E isn't QAM. It's NRZ over a differential link, with | 64/66b encoding, and then scrambled to reduce long runs of | 0s or 1s. | wyldfire wrote: | It might be easier to start with older or simpler/slower | buses. ISA, SPI, I2C. In some ways ISA is very different - | latching multiple parallel channels together instead of | ganging independent serial lanes. But it makes sense to start | off simple and consider the evolution. Modern PCIe layers | several awesome technologies together, especially FEC. | Originally they used 8b10b but I see now they're using | 242b256b. | rayiner wrote: | Before you get that deep, you need to step back for a bit. | The CPU is itself several different processors and | controllers. Look at a modern Intel CPU: | https://www.anandtech.com/show/3922/intels-sandy-bridge- | arch.... The individual x86 cores are connected via a ring | bus to a system agent. The ring bus is a kind of parallel | bus. In general, a parallel bus works by having every device | on the bus operating on a clock. At each clock tick (or after | some number of clock ticks), data can be transferred by | pulling address lines high or low to signify an address, and | pulling data lines high or low to signify the data value to | be written to that address. | | The system agent then receives the memory operation and looks | at the system address map. If the target address is PCI-E | memory, it generates a PCI-E transaction using its built-in | PCI-E controller. The PCI-E bus is actually a multi-lane | serial bus. Each lane is a pair of wires using differential | signaling | (https://en.wikipedia.org/wiki/Differential_signalling). Bits | are sent on each lane according to a clock by manipulating | the voltages on the differential pairs. The voltage swings | don't correspond directly to 0s and 1s. Because of the data | rates involved and the potential for interference, cross- | talk, etc., an extremely complex mechanism is used to turn | bits into voltage swings on the differential pairs: https://p | cisig.com/sites/default/files/files/PCI_Express_Ele... | | From the perspective of software, however, it's just bits | sent over a wire. The bits encode a PCI-E message packet: | https://www.semisaga.com/2019/07/pcie-tlp-header-packet- | form.... The packet has headers, address information, and | data information. But basically the packet can encode | transactions such as a memory write or read or register write | or read. | tenebrisalietum wrote: | Older CPUs - the CPU had a bunch of A pins (address), a bunch | of D pins (data). | | The A pins would be a binary representation of an address, | and the D pins would be the binary representation of data. | | A couple of other pins would select behavior (read or write) | and allow handshaking. | | Those pins were connected to everything else that needed to | talk with the CPU on a physical level, such as RAM, I/O | devices, and connectors for expansion. Think 10-base-T | networking where multiple nodes are physically modulating one | common wire on an electrical level. Same concept, but you | have many more wires (and they're way shorter). | | Arbitration logic was needed so things didn't step on each | other. Sometimes things did anyway and you couldn't talk to | certain devices in certain ways or your system would lock up | or misbehave. | | Were there "switches" to isolate and select among various | banks of components? Sure, they are known as "gate arrays" - | those could be ASICs or implemented with simple 74xxx ICs. | | Things like NuBus and PCI came about - the bus controller is | directly connected and addressable to the CPU as a device, | but everything else is connected to the bus controller, so | now the new-style bus isn't tied to the CPU and can operate | at a different speed and CPU and bus speed are now decoupled. | (This was done on video controllers in the old 8-bit days as | well - to get to video RAM you had to talk to the video chip, | and couldn't talk to video RAM directly on some 8-bit | systems). | | PCIE is no longer a bus, it's more like switched Ethernet - | there's packets and switching and data goes over what's | basically one wire - this ends up being faster and more | reliable if you use advanced modulation schemes than keeping | multiple wires in sync at high speeds. The controllers facing | the CPU still implement the same interface, though. | _3u10 wrote: | It's signaled similar to QAM. Far more complicated than GPIO | type stuff. Think FM radio / spread spectrum rather than | bitbanging / old school serial / parallel ports. | | Similar to old school modems if the line is noisy it can drop | to lower "baud" rates. You can manually try to recover higher | rates if the noise is gone but it's simpler to just reboot. | tux3 wrote: | Oh, that is _several_ levels deeper! PCIe is a big standard | with several layers of abstraction, and it 's far from | passive. | | The different versions of PCIe use a different encoding, so | it's hard to sum it all up in a couple sentences in terms of | what the voltage does. | monkeybutton wrote: | IMO memory-mapped IO is the coolest thing since sliced bread. | It's a great example in computing where many different kinds of | hardware can all be brought together under a relatively simple | abstraction. | the__alchemist wrote: | It was a glorious "click" when learning embedded programming. | Even when writing Rust in typical desktop uses, it all | feels... abstract. Computer program logic. Where does the | magic happen? Where do you go from abstract logic to making | things happen? The answer is in voltatile memory reads and | writes to memory-mapped IO. You write a word to a memory | address, and a voltage changes. Etc. | justsomehnguy wrote: | TL;DR: bi-directional memory access with some means to notify the | other part about "something has changed". | | It's not that different for any other PIC/E device, be it a | network card or a disk/HBA/RAID controller. | | If you want to understand how it came to this - look at the | history of ISA, PCI/PCI-X, a short stint for AGP and finally | PCI-E. | | Other comments provides a good ELI15 for the topic. | | A minor note about "bus" - for PCEe it is mostly a historic term, | because it's a serial, P2P connection, though the process of | enumerating and qurying the devices is still very akin to what | you would do on some bus-based system, e.g.: SAS is a serial | "bus", compared to SCSI, but still you operate with it as some | "logical" bus, because it is easier for humans to grok it this | way. | dyingkneepad wrote: | On my system, the CPU sees the GPU as a PCI device. The "PCI | config space" [0] is a standard thing and so the CPU can read it | and figure out its device ID, vendor ID, revision, class, etc. | From that, the OS looks at its PCI drivers and tries to find | which one claims to drive that specific PCI device_id/vendor_id | combination (or class in case there's some kind of generic | universal driver for a certain class). | | From there, the driver pretty much knows what to do. But | primarily the driver will map the registers to memory addresses, | so accessing offset 0xF0 from that map is equivalent as accessing | register 0xF0. The definition of what each register does is | something that the HW developers provide to the SW developers | [1]. | | Setting modes (screen resolution) and a lot of other stuff is | done directly by reading and writing to these registers. At some | point they also have to talk about memory (and virtual addresses) | and there's quite a complicated dance to map GPU virtual memory | to CPU virtual memory. On discrete GPUs the data is actually | "sent" to the memory somehow through the PCI bus (I suppose the | GPU can read directly from the memory without going through the | CPU?), but in the driver this is usually abstracted to "this is | another memory map". On integrated systems both the CPU and GPU | read directly from the system memory, but they may not share all | caches so extra care is required here. In fact, caches may also | mess the communication on discrete graphics, so extra care is | always required. This paragraph is mostly done by the Kernel | driver in Linux. | | At some point the CPU will tell the GPU that a certain region of | memory is the framebuffer to be displayed. And then the CPU will | formulate binary programs that are written in the GPU's machine | code, and the CPU will submit those programs (batches) and the | GPU will execute them. These programs are generally in the form | of "I'm using textures from these addresses, this memory holds | the fragment shader, this other holds the geometry shader, the | configuration of threading and execution units is described in | this structure as you specified, SSBO index 0 is at this address, | now go and run everything". After everything is done the CPU may | even get an interrupt from the GPU saying things are done, so | they can notify user space. This paragraph describes mostly the | work done by the user space driver (in Linux, this is Mesa), | which implements OpenGL/Vulkan/etc abstractions. | | [0]: https://en.wikipedia.org/wiki/PCI_configuration_space [1]: | https://01.org/linuxgraphics/documentation/hardware-specific... | derekzhouzhen wrote: | Other has mentioned MMIO. MMIO has several kinds: | | 1. CPU accessing GPU hw with uncache-able MMIO, such as lower | level register access | | 2. GPU accessing CPU memory with cache-able MMIO, or DMA. such as | command and data stream | | 3. CPU accessing GPU memory with cache-able MMIO, such as | textures | | They all happen on the bus with different latency and bandwidth. | ar_te wrote: | And I you looking for some strange architecture forgoten by | time:). https://www.copetti.org/writings/consoles/sega-saturn/ | throwra620 wrote: | brooksbp wrote: | Woah there, my dude. Let's try to understand a simple model | first. | | A CPU can access memory. When a CPU performs loads & stores it | initiates transactions containing the address of the memory. | Therefore, it is a bus master--it initiates transactions. A slave | accepts transactions and services them. The interconnect routes | those transactions to the appropriate hardware, e.g. the DDR | controller, based on the system address map. | | Let's add a CPU, interconnect, and 2GB of DRAM memory: | +-------+ | CPU | +---m---+ | | +---s--------------------+ | Interconnect | | +-------m----------------+ | | +----s-----------+ | DDR controller | | +----------------+ System Address Map: | 0x8000_0000 - 0x0000_0000 DDR controller | | So, a memory access to 0x0004_0000 is going to DRAM memory | storage. | | Let's add a GPU. +-------+ +-------+ | | CPU | | GPU | +---m---+ +---s---+ | | | +---s------------m-------+ | Interconnect | | +-------m----------------+ | | +----s-----------+ | DDR controller | | +----------------+ System Address Map: | 0x9000_0000 - 0x8000_0000 GPU 0x8000_0000 - 0x0000_0000 | DDR controller | | Now the CPU can perform loads & stores from/to the GPU. The CPU | can read/write registers in the GPU. But that's only one-way | communication. Let's make the GPU a bus master as well: | +-------+ +-------+ | CPU | | GPU | | +---m---+ +--s-m--+ | | | +---s | -----------m-s-----+ | Interconnect | | +-------m----------------+ | | +----s-----------+ | DDR controller | | +----------------+ System Address Map: | 0x9000_0000 - 0x8000_0000 GPU 0x8000_0000 - 0x0000_0000 | DDR controller | | Now, the GPU can not only receive transactions, but it can also | initiate transactions. Which also means it has access to DRAM | memory too. | | But this is still only one-way communication (CPU->GPU). How can | the GPU communicate to the CPU? Well, both have access to DRAM | memory. The CPU can store information in DRAM memory (0x8000_0000 | - 0x0000_0000) and then write to a register in the GPU | (0x9000_0000 - 0x8000_0000) to inform the GPU that the | information is ready. The GPU then reads that information from | DRAM memory. In the other direction, the GPU can store | information in DRAM memory, and then send an interrupt to the CPU | to inform the CPU that the information is ready. The CPU then | reads that information from DRAM memory. An alternative to using | interrupts is to have the CPU poll. The GPU stores information in | DRAM memory and then sets some bit in DRAM memory. The CPU polls | on this bit in DRAM memory, and when it changes, the CPU knows | that it can read the information in DRAM memory that was | previously written by the GPU. | | Hope this helps. It's very fun stuff! | pizza234 wrote: | You'll find a very good introduction in the comparch book "Write | Great Code, Volume 1", chapter 12 ("Input and Output"), which | also explains the history of system buses (therefore, you'll find | an explanation of how ISA works). | | Interestingly, there is a footnote explaining that "Computer | Architecture: A Quantitative Approach provided a good chapter on | I/O devices and buses; sadly, as it covered very old peripheral | devices, the authors dropped the chapter rather than updating it | in subsequent revisions." | throwmeariver1 wrote: | Everyone in tech should read the book "Understanding the Digital | World" by Brian W. Kernighan. | arduinomancer wrote: | Is it very in-depth or more for layman readers? | throwmeariver1 wrote: | Most normal people would get a red head when reading it and | techies would nod along and sometimes say "uh... so that's | how it really works". It's in between but a good primer on | the essentials. | dyingkneepad wrote: | Is this before or after they read Knuth? | zoenolan wrote: | Other are not wrong in saying Memory mapped IO. taking a look at | the Amiga hardware Reference manual [1] and a simple example [2] | or a NES programming guide [3] would be a good way to see this in | operation. | | A more modern CPU/GPU setup is likely to use a ring buffer. The | buffer will be in CPU memory. That memory is also mapped into the | GPU address space. The Driver on the CPU will write commands into | the buffer which the GPU will execute. These will be different to | the shader unit instruction set. | | Commands would be setting some internal GPU register to a value. | Allowing the setting resolution, framebuffer base pointer, set up | the output resolution, setting the mouse pointer position, | reference a texture from system memory, load a shader, execute a | shader, set a fence value (Useful for seeing when a resource, | texture, shader is no longer in use). | | Hierarchical DMA buffers are a useful feature of some DMA | engines. You can think of them as similar to sub routines. The | command buffer can contain an instruction to switch execution to | another chunk of memory. This allows the driver to reuse | operations or expensive to generate sequences. OpenGL's display | list commonly compiled down to separate buffer. | | [1] https://archive.org/details/amiga-hardware-reference- | manual-... | | [2] https://www.reaktor.com/blog/crash-course-to-amiga- | assembly-... | | [3] https://www.nesdev.org/wiki/Programming_guide | chubot wrote: | BTW I believe memory maps are set up by the ioctl() system call | on Unix (including OS X), which is kind of a "catch all" hole | poked through the kernel. Not sure about Windows. | | I didn't understand that for a long time ... | | I would like to see a "hello world GPU" example. I think you | open() the device and the ioctl() it ... But what happens when | things go wrong? | | Similar to this "Hello JIT", where it shows you have to call | mmap() to change permissions on the memory to execute dynamically | generated code. | | https://blog.reverberate.org/2012/12/hello-jit-world-joy-of-... | | I guess one problem is that this may be typically done in vendor | code and they don't necessarily commit to an interface? They make | you link their huge SDK ___________________________________________________________________ (page generated 2022-03-30 23:01 UTC) |