proxy70

	[HN Gopher] What is a flop? ___________________________________________________________________ What is a flop? Author : RafelMri Score : 43 points Date : 2023-09-05 08:50 UTC (14 hours ago)
	web link (nhigham.com)
	w3m dump (nhigham.com)
	\| paulddraper wrote: \| Floating Point Operations Per \| nightmonkey wrote: \| My last two startups. \| HPsquared wrote: \| You could use it as a unit of accounting for investing losses. \| A gigaflop would be a loss of 1 billion, etc. \| [deleted] \| yujian wrote: \| this \| jjgreen wrote: \| I'm usually a fan of Higham, but the last few posts have been \| weak. \| liotier wrote: \| Flops ? \| jjgreen wrote: \| Very witty Wilde ... \| MattyDub wrote: \| Can somebody explain why a square root is also considered a flop? \| Surely that involves more work than the other four operations the \| article listed. Is there some hardware algorithm for the square \| root that is as fast as (e.g.) division? \| AlbertCory wrote: \| Disappointed this is not about basketball. \| dragontamer wrote: \| My personal notes on this subject: \| \| * MIPS was perhaps the integer equivalent to FLOP, still used in \| modern microcontrollers because the 8051 at 12MHZ would only \| execute 1MIPS (12 clocks per instruction). Modern 8051 chips \| obviously have sped up to 1 clock per instruction, but MIPS (and \| Dhrystone MIPS in particular) are still a common benchmark today. \| \| * FLOPs is very difficult to calculate in theory because modern \| CPUs have vector units, and multiple pipelines per core. You \| could have 3x AVX512 instructions in parallel on today's CPUs on \| a single core. \| \| * FLOPs we're traditionally a 64-bit operation for the \| supercomputer community. Today, most FLOPs are 32-bit for video \| games. Finally, the deep learning / neural net guys have \| popularized 16-bit flops, and even 8-bit iops. \| \| * 'The' flop is a misnomer because it's almost always the \| multiply-and-accumulate instruction: X = A + B * C. Which... Is \| two operations per instruction (per shader/SIMD lane). Eeehhh \| whatever. Who cares about these details? \| \| * As 'Dhrystone' is the benchmark for MIPS, the benchmark for \| 64-bit flops is Linpack. \| gumby wrote: \| > MIPS was perhaps the integer equivalent to FLOP, still used \| in modern microcontrollers because the 8051 at 12MHZ would only \| execute 1MIPS (12 clocks per instruction) \| \| Actually the origin of this term was VAX MIPS (VAX 780 \| specifically) because that was a ubiquitous, pretty fast for \| its time minicomputer. There were faster machines, and slower \| mainframes still being built, but that was what the late 70s \| were like. \| \| When the 8051 was released in 1980 it surely didn't run at 12 \| MHz! Back then the Z80 sold because it could run 8080 code at a \| blistering 2 MHz. \| \| BTW the benchmark for FLOPS in those days was Whetstone, hence \| the otherwise weird name "Dhrystone" \| dragontamer wrote: \| The 1981 manual for the 8051 contains numerous references to \| 12MHz. \| \| http://bitsavers.informatik.uni- \| stuttgart.de/components/inte... \| \| It was 12T clocked: even though the clock was 12MHz, it would \| only operate at 1MHz / 1MIPS, because it took 12-clock-ticks \| to even perform one addition. \| \| IIRC, there was a standard crystal (11.0592 MHz crystal?? I \| forget exactly) for the communications at the time. So going \| just above 11 MHz (or really, just above 11.0592 MHz) was \| needed for reliable serial comms. \| gumby wrote: \| Wow, right on page 1-2! I'm surprised -- I don't remember \| anything running that fast back then. Thanks. \| \| (Love those old Intel books too) \| \| Nevertheless, FWIW, MIPS started out as Vax MIPS, and at \| first people often used to write "VAX MIPS". \| segfaultbuserr wrote: \| > 'The' flop is a misnomer because it's almost always the \| multiply-and-accumulate instruction: X = A + B * C. Which... Is \| two operations per instruction (per shader/SIMD lane). Eeehhh \| whatever. Who cares about these details? \| \| If FMA is supported, it can either be counted as one or two \| operations, depending on the rule of the benchmark involved or \| the marketing of the processor. The marketing specification of \| a processor's theoretical peak performance sometimes counts a \| single-instruction FMA as two operations. On the other hand, \| for the purpose of code profiling, counting FMA as one \| operation is more realistic... As you said, who cares about \| these details? \| fluoridation wrote: \| Given that the point of the FLOPS unit is to compare \| processors, it does make more sense to count complex \| instructions as more than a single floating-point operation. \| If one CPU could multiply a 4x4 matrix by a vector in a \| single instruction that can run a million times per second, \| and another CPU needed ~32 instructions and so can only \| multiply 500k matrices per second but retires 16 million \| instructions in that same second, it would be silly to \| compare instructions instead of multiplications and \| additions. \| dragontamer wrote: \| As a computer-engineer, the circuit design needed to make a \| fast multiplication operation (ie: Wallace Tree, and \| similar) are an order-of-magnitude larger than the circuit \| design needed for fast addition (ie: a Kogge-Stone Carry \| lookahead Adder). \| \| This idea that additions and multiplications can be \| combined like this as "equivalent operations" is kinda \| bullshit. But hey, if its "how its done" (and its done this \| way because multiply-then-add is how you do matrix- \| multiplications...) then so be it. \| \| Just remember that this is an arbitrary subdivision of a \| matrix multiplication operation, that may not have much \| relevance as a benchmark outside of matrix multiplications. \| fluoridation wrote: \| It was just an example, not necessarily a realistic one. \| The point is that we want to compare how quickly a \| processor will compute our problem, not how many \| instructions it's going to execute. If it was a car you \| want to compare things like its top speed and \| acceleration, not something inane like engine revolutions \| per kilometer. You measure and compare things that are \| relevant to the user, not implementation details. \| namirez wrote: \| Floating point operations per ...? \| GenericDev wrote: \| Floating Point Operations Per Second [1] \| \| [1] https://academickids.com/encyclopedia/index.php/FLOPS \| Zambyte wrote: \| > One should speak in the singular of a FLOPS and not of a \| FLOP, although the latter is frequently encountered. The \| final S stands for second and does not indicate a plural. \| \| The author of this post seems to have fallen for this error. \| dataflow wrote: \| I think their point was that the p stood for "per", not for \| the second letter of "operation". ___________________________________________________________________ (page generated 2023-09-05 23:00 UTC)