[HN Gopher] Automated CPU Design with AI
___________________________________________________________________
 
Automated CPU Design with AI
 
Author : skilled
Score  : 61 points
Date   : 2023-07-02 20:59 UTC (2 hours ago)
 
web link (arxiv.org)
w3m dump (arxiv.org)
 
| dooglius wrote:
| Doesn't seem to be any discussion of what the inputs and outputs
| actually are here, at least for the "coarse-grained" approach.
| Suspect there is some "scaffolding" around e.g. register map and
| memory access, and the rest is essentially learning a map from
| (instruction, register input vals)->(register output val, control
| registers for memory access)
 
| westurner wrote:
| From the abstract; "Pushing the Limits of Machine Design:
| Automated CPU Design with AI" (2023)
| https://arxiv.org/abs/2306.12456 :
| 
| > _[...] This approach generates the circuit logic, which is
| represented by a graph structure called Binary Speculation
| Diagram (BSD), of the CPU design from only external input-output
| observations instead of formal program code. During the
| generation of BSD, Monte Carlo-based expansion and the distance
| of Boolean functions are used to guarantee accuracy and
| efficiency, respectively. By efficiently exploring a search space
| of unprecedented size 10^{10^{540}}, which is the largest one of
| all machine-designed objects to our best knowledge, and thus
| pushing the limits of machine design, our approach generates an
| industrial-scale RISC-V CPU within only 5 hours. The taped-out
| CPU successfully runs the Linux operating system and performs
| comparably against the human-designed Intel 80486SX CPU. In
| addition to learning the world 's first CPU only from input-
| output observations, which may reform the semiconductor industry
| by significantly reducing the design cycle, our approach even
| autonomously discovers human knowledge of the von Neumann
| architecture._
| 
| The von Neumann (and Mark) architectures have an instruction
| pipeline bottleneck maybe by design for serial debuggability; as
| compared with IDK in-RAM computing with existing RAM geometries?
| (See also: "Rowhammer for qubits")
| 
| (Edit: High-Bandwidth Memory; hbm2e vs gddr6x (2023)
| https://en.wikipedia.org/wiki/High_Bandwidth_Memory )
| 
| Hopefully part of the fitness function is determined by the
| presence and severity of hardware side channels and electron
| tunneling; does it filter out candidate designs with side-channel
| vulnerabilities (that are presumed undetectable with TLA+)?
 
  | westurner wrote:
  | And then maybe someday design reconfigurable - probably modular
  | - semiconductor fabrication facility to produce the design(s)?
 
| xeonmc wrote:
| Pentium FDIV bug, round two incoming.
 
| brucethemoose2 wrote:
| > The implemented program is executed on a Linux cluster
| including 68 servers, each of which is equipped with 2 Intel Xeon
| Gold 6230 CPUs.
| 
| > We verify our output netlist on the FPGAs and tape out the chip
| with 65nm technology. The automatically designed CPU was sent to
| the manufacturer in December 2021.
 
| granthamb wrote:
| It wasn't clear to me that they had implemented a page table (I
| think that's the S extensions?) which I would think would make
| the I/O space much more complex and difficult to represent. Lack
| of VA translation would make this CPU much less comparable to a
| 486SX.
 
| behnamoh wrote:
| Wasn't Google Tensor already designed by one of Google's AIs? I
| remember it made a big deal because people thought Google could
| improve their chips much faster than the competition.
 
  | rowanG077 wrote:
  | That was just placement not abstract circuit design.
 
| optimalsolver wrote:
| Could some really alien CPU architectures be discovered with this
| method?
| 
| Just wondering how far from human design-space you could end up
| with this.
 
  | ninkendo wrote:
  | Silicon validation is a huge part of the overall cost of
  | bringing up a chip, because it's so important that the physical
  | hardware do what it's supposed to do. So it's gonna be limited
  | to behaving exactly as the validation specifies, which likely
  | will limit how "alien" it will actually be.
 
| amelius wrote:
| I didn't read the paper but judging from the abstract it's
| probably a technique for design space exploration.
| 
| I.e., they manually designed the CPU but left a (large) number of
| parameters open, then used AI to find an optimum for those
| parameters.
| 
| So anything the AI did was completely correctness-preserving.
| 
| Note that this may sound like it's a small achievement, but keep
| in mind that for modern CPUs the search of the design space is
| hugely important, and probably the reason for the success of e.g.
| Apple's M1.
 
| bsder wrote:
| I find this paper _extremely_ suspicious.
| 
| If this _actually_ worked, it should be able to cough up a 6502,
| 6809, 8051, etc. as well since they are so much simpler--
| especially since they even mention a Commodore 64.
| 
| The fact that they don't do this stinks very strongly. There are
| other concerning signs in the paper as well.
 
  | staunton wrote:
  | Why should it produce those designs? Are they in any known
  | sense optimal?
 
    | bsder wrote:
    | > Why should it produce those designs? Are they in any known
    | sense optimal?
    | 
    | Yes. 6502 was quite cheap for the day so is much more optimal
    | for cost than most designs. The 6809 was done fixing the
    | mistakes of the 6800 and it's implementation is much more
    | orthogonal. The 6800 and 8051 are probably the best
    | documented. All of them have extremely long lived tool chains
    | and support. Pick your optimality.
    | 
    | In addition, then "Why should it produce a RISC-V design?"
    | RISC-V is definitely sub-optimal on quite a few fronts.
    | 
    | If a system is doing actual _CPU design_ , as claimed by the
    | paper, those designs (6502, 6809, 8051) are a simple sanity
    | check. The designs are extensively documented to the point
    | that we have web pages that simulate them down to the
    | transisitor. You should be able to provide a "relatively"
    | small input and get back a compatible design as an output. A
    | 6502 has only 3500 or so transistors. That's on the order of
    | the complexity they claim in the paper.
    | 
    | This would prevent someone like me from saying: "You
    | basically stuffed a RISC-V design into the training set,
    | managed to launder it through ML/AI to get the computer to
    | cough it back up, then deployed a legion of humans to patch
    | the result suffciently that it could be called "Linux
    | compatible", and finally barfed out a publication with 6
    | pages of link references in a 12 page paper."
    | 
    | Here's the touchstone for whether AI is doing chip design:
    | "When AI can distinguish between control plane and datapath
    | and _synthesize and place them differently_ , AI is doing
    | actual design."
 
    | sitkack wrote:
    | I don't think any reviewer of the paper would ask why they
    | didn't use one the processors mentioned.
    | 
    | I think of lots of reasons to do it with a riscv
    | * lots of excellent simulators and emulators         * great
    | tool chains         * both software (Verilog, VHDL)
    | implementations as well as hardware         * regular,
    | compact instruction set (no condition codes)
    | 
    | Using anything _besides_ RISC-V would have been an order of
    | magnitude harder.
 
___________________________________________________________________
(page generated 2023-07-02 23:00 UTC)