[HN Gopher] What is a flop?
___________________________________________________________________
 
What is a flop?
 
Author : RafelMri
Score  : 43 points
Date   : 2023-09-05 08:50 UTC (14 hours ago)
 
web link (nhigham.com)
w3m dump (nhigham.com)
 
| paulddraper wrote:
| Floating Point Operations Per
 
| nightmonkey wrote:
| My last two startups.
 
  | HPsquared wrote:
  | You could use it as a unit of accounting for investing losses.
  | A gigaflop would be a loss of 1 billion, etc.
 
| [deleted]
 
| yujian wrote:
| this
 
| jjgreen wrote:
| I'm usually a fan of Higham, but the last few posts have been
| weak.
 
  | liotier wrote:
  | Flops ?
 
    | jjgreen wrote:
    | Very witty Wilde ...
 
| MattyDub wrote:
| Can somebody explain why a square root is also considered a flop?
| Surely that involves more work than the other four operations the
| article listed. Is there some hardware algorithm for the square
| root that is as fast as (e.g.) division?
 
| AlbertCory wrote:
| Disappointed this is not about basketball.
 
| dragontamer wrote:
| My personal notes on this subject:
| 
| * MIPS was perhaps the integer equivalent to FLOP, still used in
| modern microcontrollers because the 8051 at 12MHZ would only
| execute 1MIPS (12 clocks per instruction). Modern 8051 chips
| obviously have sped up to 1 clock per instruction, but MIPS (and
| Dhrystone MIPS in particular) are still a common benchmark today.
| 
| * FLOPs is very difficult to calculate in theory because modern
| CPUs have vector units, and multiple pipelines per core. You
| could have 3x AVX512 instructions in parallel on today's CPUs on
| a single core.
| 
| * FLOPs we're traditionally a 64-bit operation for the
| supercomputer community. Today, most FLOPs are 32-bit for video
| games. Finally, the deep learning / neural net guys have
| popularized 16-bit flops, and even 8-bit iops.
| 
| * 'The' flop is a misnomer because it's almost always the
| multiply-and-accumulate instruction: X = A + B * C. Which... Is
| two operations per instruction (per shader/SIMD lane). Eeehhh
| whatever. Who cares about these details?
| 
| * As 'Dhrystone' is the benchmark for MIPS, the benchmark for
| 64-bit flops is Linpack.
 
  | gumby wrote:
  | > MIPS was perhaps the integer equivalent to FLOP, still used
  | in modern microcontrollers because the 8051 at 12MHZ would only
  | execute 1MIPS (12 clocks per instruction)
  | 
  | Actually the origin of this term was VAX MIPS (VAX 780
  | specifically) because that was a ubiquitous, pretty fast for
  | its time minicomputer. There were faster machines, and slower
  | mainframes still being built, but that was what the late 70s
  | were like.
  | 
  | When the 8051 was released in 1980 it surely didn't run at 12
  | MHz! Back then the Z80 sold because it could run 8080 code at a
  | blistering 2 MHz.
  | 
  | BTW the benchmark for FLOPS in those days was Whetstone, hence
  | the otherwise weird name "Dhrystone"
 
    | dragontamer wrote:
    | The 1981 manual for the 8051 contains numerous references to
    | 12MHz.
    | 
    | http://bitsavers.informatik.uni-
    | stuttgart.de/components/inte...
    | 
    | It was 12T clocked: even though the clock was 12MHz, it would
    | only operate at 1MHz / 1MIPS, because it took 12-clock-ticks
    | to even perform one addition.
    | 
    | IIRC, there was a standard crystal (11.0592 MHz crystal?? I
    | forget exactly) for the communications at the time. So going
    | just above 11 MHz (or really, just above 11.0592 MHz) was
    | needed for reliable serial comms.
 
      | gumby wrote:
      | Wow, right on page 1-2! I'm surprised -- I don't remember
      | anything running that fast back then. Thanks.
      | 
      | (Love those old Intel books too)
      | 
      | Nevertheless, FWIW, MIPS started out as Vax MIPS, and at
      | first people often used to write "VAX MIPS".
 
  | segfaultbuserr wrote:
  | > 'The' flop is a misnomer because it's almost always the
  | multiply-and-accumulate instruction: X = A + B * C. Which... Is
  | two operations per instruction (per shader/SIMD lane). Eeehhh
  | whatever. Who cares about these details?
  | 
  | If FMA is supported, it can either be counted as one or two
  | operations, depending on the rule of the benchmark involved or
  | the marketing of the processor. The marketing specification of
  | a processor's theoretical peak performance sometimes counts a
  | single-instruction FMA as two operations. On the other hand,
  | for the purpose of code profiling, counting FMA as one
  | operation is more realistic... As you said, who cares about
  | these details?
 
    | fluoridation wrote:
    | Given that the point of the FLOPS unit is to compare
    | processors, it does make more sense to count complex
    | instructions as more than a single floating-point operation.
    | If one CPU could multiply a 4x4 matrix by a vector in a
    | single instruction that can run a million times per second,
    | and another CPU needed ~32 instructions and so can only
    | multiply 500k matrices per second but retires 16 million
    | instructions in that same second, it would be silly to
    | compare instructions instead of multiplications and
    | additions.
 
      | dragontamer wrote:
      | As a computer-engineer, the circuit design needed to make a
      | fast multiplication operation (ie: Wallace Tree, and
      | similar) are an order-of-magnitude larger than the circuit
      | design needed for fast addition (ie: a Kogge-Stone Carry
      | lookahead Adder).
      | 
      | This idea that additions and multiplications can be
      | combined like this as "equivalent operations" is kinda
      | bullshit. But hey, if its "how its done" (and its done this
      | way because multiply-then-add is how you do matrix-
      | multiplications...) then so be it.
      | 
      | Just remember that this is an arbitrary subdivision of a
      | matrix multiplication operation, that may not have much
      | relevance as a benchmark outside of matrix multiplications.
 
        | fluoridation wrote:
        | It was just an example, not necessarily a realistic one.
        | The point is that we want to compare how quickly a
        | processor will compute our problem, not how many
        | instructions it's going to execute. If it was a car you
        | want to compare things like its top speed and
        | acceleration, not something inane like engine revolutions
        | per kilometer. You measure and compare things that are
        | relevant to the user, not implementation details.
 
| namirez wrote:
| Floating point operations per ...?
 
  | GenericDev wrote:
  | Floating Point Operations Per Second [1]
  | 
  | [1] https://academickids.com/encyclopedia/index.php/FLOPS
 
    | Zambyte wrote:
    | > One should speak in the singular of a FLOPS and not of a
    | FLOP, although the latter is frequently encountered. The
    | final S stands for second and does not indicate a plural.
    | 
    | The author of this post seems to have fallen for this error.
 
    | dataflow wrote:
    | I think their point was that the p stood for "per", not for
    | the second letter of "operation".
 
___________________________________________________________________
(page generated 2023-09-05 23:00 UTC)