[HN Gopher] There is no 'printf'
___________________________________________________________________
 
There is no 'printf'
 
Author : pr0zac
Score  : 112 points
Date   : 2021-10-20 15:01 UTC (1 days ago)
 
web link (www.netmeister.org)
w3m dump (www.netmeister.org)
 
| monocasa wrote:
| There is 'printf'. It's just that printf (and the rest of the
| standard library) is technically as much a part of the C language
| as the language grammar itself, and C compilers are welcome to
| use innate knowledge of those functions for optimizations. The
| other place you typically see this is calls to functions like
| memcpy/memset being elided to inline vector ops or CISC copies,
| or on simpler systems, large manual zeroing and copying being
| elided the other way to a memset or memcpy call.
| 
| C compilers will typically have an escape hatch for envs like
| deeply embedded systems and kernels like gcc's -ffreestanding and
| -fno-builtin that says "but for real though, don't assume std lib
| functions exist or you know what they are based on the function's
| name".
| 
|  One of my favorite parts of rust as someone who
| uses it for deeply embedded systems is the separation of core and
| std (where core is the subset of std that only requires memcpy,
| memset, and one other I'm forgetting). The rest of the standard
| library is ultimately an optional part of the language with
| compiler optimizations focused on general benefits rather than
| knowing at the complier how something like printf works. no_std
| is such a nicer env than the half done ports of newlib or pdclib
| that everyone uses in C embedded land. 
 
| tptacek wrote:
| Huh, this is pretty great; I've always fussily used fputs() when
| I'm just printing static strings, and apparently I don't need to
| bother, since the compiler will just do it for me.
 
| guerrilla wrote:
| Moar please. I'm loving these counterintuitive C optimization
| gotchas lately[1]. They are like little brain teasers.
| 
| 1. https://news.ycombinator.com/item?id=28930271
 
  | 0xcde4c3db wrote:
  | About a year ago there was something of a "joke isEven()
  | implementation discourse" on Twitter, which eventually evolved
  | a sort of informal optimizer abuse contest. For example:
  | 
  | https://twitter.com/zeuxcg/status/1291872698453258241
  | 
  | https://twitter.com/jckarter/status/1428071485827022849
 
    | aw1621107 wrote:
    | OK, those are horrifying and fascinating, and they basically
    | break my brain.
    | 
    | Is there a explanation somewhere of why the first one
    | "works"? The second one I think is the compiler assuming the
    | default case will never be hit since it'll result in infinite
    | recursion, which is UB under C++, so it's basically assuming
    | 0<=x<=3 and optimizing from there. Is that correct?
    | 
    | The first one I'm less certain about. The only thing I can
    | think of is that the compiler deduces an upper limit of
    | INT_MAX - 1 to avoid signed overflow, and then somehow
    | figuring out the true/false pattern from there? Still a bit
    | of a gap in my understanding there.
 
      | barsonme wrote:
      | My guess: since overflowing int is UB, and the only value
      | of n that stops the recursion is zero, the compiler assumes
      | that n must be zero and checks accordingly.
      | 
      | That doesn't explain why it uses test dil, 1 instead of
      | test dil, dil or cmp 0 or whatever.
 
      | davemp wrote:
      | Optimizers have to keep the same input/output pairs unless
      | there is undefined behavior. In the second function the
      | truth table looks like:                   in    | out
      | ----------         0b000 | 1         0b001 | 0
      | 0b010 | 1         0b011 | 0         0b100 | don't care
      | .               .               .         MAX   | don't
      | care
      | 
      | The compiler just chooses the most efficient way it knows
      | to get the filled out entries correct which happens to be:
      | in    | ~in[0]         ----------         0b000 | 1
      | 0b001 | 0         0b010 | 1         0b011 | 0         0b100
      | | 1               .               .               .
      | MAX   | 1
      | 
      | It would have been just as valid to do:
      | in    | in[2] or ~in[0]         ----------         0b000 |
      | 1         0b001 | 0         0b010 | 1         0b011 | 0
      | 0b100 | 1         0b101 | 1               .               .
      | .         MAX   | 1
      | 
      | The first function's table looks like:                   in
      | | out         ----------         0b000 | 1         0b001 |
      | don't care         0b010 | don't care               .
      | .               .         MAX   | don't care
      | 
      | And the compiler still likes the even check in this case,
      | which makes sense.
 
        | notriddle wrote:
        | The first function (the `n == 0 || !isEven(n+1)`
        | recursive function) has defined behavior for negative
        | numbers. That's probably why it compiled to an even
        | number check.
 
  | archi42 wrote:
  | It's all fun and games until you write (or review) C/C++ test
  | cases for a compiler or disassembler ;-) It never stopped to
  | amaze me how good the compiler was to figure out that I
  | actually wrote very complicated "return 0".
 
| eikenberry wrote:
| https://web.archive.org/web/20211019052752/https://www.netme...
 
| GoblinSlayer wrote:
| Imagine somebody thought omitting the return statement and doing
| whatever the compiler likes is a good feature to have.
 
  | dboreham wrote:
  | Like Scala?
 
    | dnautics wrote:
    | pretty sure scala (and most FP) has a well-defined "what to
    | do when you leave off the return statement", not one that "is
    | up to the compiler"
 
| [deleted]
 
| qwerty456127 wrote:
| > puts(3) only returns "a nonnegative integer on success and EOF
| on error"
| 
| How does it decide which nonnegative integer to return?
 
  | robotresearcher wrote:
  | It's arbitrary. The article shows an implementation that
  | returns 10 (ASCII '\n'). But the spec says it doesn't matter,
  | so you should only be using it to test >0 for success.
 
    | Bayart wrote:
    | The correct implementation is _obviously_ to return 1 on
    | success !
 
  | woodruffw wrote:
  | That's answered below:
  | 
  | > On success, puts(3) appears to return '\n', the newline or
  | line feed (LF) character, which has ASCII value... 10.
  | 
  | But note that that isn't standard behavior. The language in
  | POSIX[1] is identical to that in the blog post. `puts` is free
  | to return whatever positive number it wants on return.
  | 
  | [1]:
  | https://pubs.opengroup.org/onlinepubs/9699919799/functions/p...
 
| cyberge99 wrote:
| Apparently there is no available capacity for that site either.
 
  | Bang2Bay wrote:
  | https://search.yahoo.com/ for
  | 
  | There is no 'printf'
  | 
  | and look through the cache
 
| ltr_ wrote:
| [off topic] I always wondered how '%n' is used in production
| code.
 
| mormegil wrote:
| So, why does puts do "return r ? EOF : '\n';"? Some backwards
| compatibility? Or is there a logical reason for that?
 
  | _kst_ wrote:
  | That particular implementation probably returns the result of
  | the last fputc() or equivalent that it called.
  | 
  | puts() returns EOF (typically -1) on error, or some unspecified
  | non-negative value on success.
  | 
  | fputc() returns EOF on error or the written character, treated
  | as an unsigned char and converted to int, on success.
  | 
  | Don't expect all puts() implementations to do the same thing.
  | For example, the glibc implementation appears to return the
  | number of characters written on success. Implementations are
  | free to rely on implementation-defined behavior. User code
  | that's intended to be portable cannot.
 
    | LukeShu wrote:
    | That particular implementation (NetBSD's) (which is
    | transcribed in to the article) does something more optimized
    | than making repeated calls to `putchar()`.
    | 
    | But as pdw's link shows, what you suggest is exactly what the
    | historical implementation was. So NetBSD is simply matching
    | historical Unix.
 
  | masklinn wrote:
  | Per the man:
  | 
  | > puts() and fputs() return a nonnegative number on success, or
  | EOF on error.
  | 
  | r is the result of the write, if it's nonzero the write failed
  | and thus so did puts.
 
    | m45t3r wrote:
    | Yeah, but I think the question was why EOF and "\n". It could
    | as easily just return 1 or -1 for example, and it would make
    | more sense I think.
 
      | kevin_thibedeau wrote:
      | puts() always adds a line termination so success means that
      | '\n' is the last char for that implementation.
 
  | pdw wrote:
  | It's what historic Unix did:
  | https://github.com/v7unix/v7unix/blob/master/v7/usr/src/libc...
  | 
  | Why it did that? I'm not sure, but at the time C did not have
  | 'void' functions: every function returned a value. They
  | probably wanted to make the behavior of the stdlib functions
  | deterministic, even if the return value was useless and
  | undocumented.
 
| anonymousiam wrote:
| Compiler optimization can sometimes cause unpredictable or even
| incorrect behavior. Below is a blob of C code for the TI MSP430
| compiler that exemplifies at least one of TI's optimization bugs:
| 
| // Define Common Communications Frame
| 
| typedef volatile union commFrameType
| 
| {                 struct            {              unsigned
| SyncHeader:16;              unsigned MessageID:8;
| unsigned short MessageData[msgDataSize];  // ID-unique data
| unsigned CRC:8;             // LSB of CCITT-16 for above data
| } __attribute__ ((packed)) Frame;            unsigned char
| b[16];         // Accessible as raw bytes as well
| unsigned short w[8];          // Accessible as raw words as well
| unsigned long  l[4];          // Accessible as raw long words as
| well
| 
| } __attribute__ ((packed)) CommFrame;
| 
| static CommFrame IpcMessage = { FRAME_SYNC_R, IpcBlankMessage };
| // If frame was accepted into TX queue, prepare next frame for
| transmission
| 
| // IpcMessage.Frame.MessageID++; // Bump up to next message type
| 
| // IpcMessage.Frame.MessageID += 1;
| 
| // The above two lines that are commented out cause a bizzare
| linker error if either are used instead of the line below.
| IpcMessage.Frame.MessageID = IpcMessage.Frame.MessageID + 1; //
| Bump up to next message type
| 
| The MSP-430 is a 16-bit microcontroller and the packed CommFrame
| structure has Frame.MessageID on an odd-byte boundary. Some
| processors might raise a SIGBUS, but TI says that it's okay to
| access a byte on an odd address boundary.
| 
| It's pretty silly that i++; and i+=1; don't work, but i=i+1; is
| just fine.
 
  | secondcoming wrote:
  | 'unsigned MessageID:8;' isn't the same as 'unsigned char
  | MessageId'
 
| RcouF1uZ4gsC wrote:
| This is a bit like saying there is no '+';
| 
| Because if you put in                   return 1+2+3;
| 
| And look at the assembly code, you will see that the compiler
| generated something like                   return 6;
| 
| The compiler is allowed to take advantage of the standard to
| substitute in more efficient code that does the same thing.
| 
| IIRC, for C++, it would actually be ok if std::vector was
| implemented completely as a compiler intrinsic with no actual
| header file. (No compiler I am aware of actually does it that
| way).
 
  | dnautics wrote:
  | yeah but everyone knows that "there is no +"; It's an operator,
  | and in C, anyways operators are special and expected to not
  | necessarily do C-function-ey things, e.g, "take arguments of
  | different types and add them successfully" not everyone is
  | aware that C has "anointed functions" (including, I believe
  | malloc) that the compiler is allowed to fiddle with.
 
  | malkia wrote:
  | Is there more info to this, I remember this from Commmon Lisp
  | (but details evade me) that the compiler can take benefit of
  | certain specific functions and rely on them being... "open
  | coded" - e.g. it can produce more efficient code by replacing
  | these with something more suitable...
  | http://www.sbcl.org/manual/#Open-Coding-and-Inline-Expansion
  | 
  | https://www.thecodingforums.com/threads/what-is-the-meaning-...
 
  | talaketu wrote:
  | > more efficient code that does the same thing
  | 
  | In this case, it produces a different result.
 
    | masklinn wrote:
    | It produces a different ub, which is ub.
    | 
    | Furthermore observability would be defined in terms of the C
    | abstract machine, "observing" by decompiling the program is
    | out of scope.
 
      | talaketu wrote:
      | oh right
      | 
      | > But what if you're not using C99 or newer?
      | 
      | UB - that takes all the fun out of it.
 
  | Someone wrote:
  | Code that does                 #include 
  | 
  | must compile, so that _header_ must exist (whether it is stored
  | in a _file_ is the implementer's choice. AFAIK, the standard
  | carefully avoids the use of the term 'header file')
  | 
  | Also, I think code that doesn't do that include must fail to
  | compile when it tries to use _std::vector_. So, logically, that
  | header must exist.
 
    | gpderetta wrote:
    | Well not really. The preprocessor is part of the compiler, so
    | it only needs set a flag to tell the compiler proper to
    | enable std::vector.
 
| rrauenza wrote:
| Quick Summary:
| 
| The C compiler optimizer replaces printf("Hello World!\n") with
| puts("Hello World!\n") and the implicit return from main()
| changes from 13 (the return value of printf) to 10 (the return
| value of puts)
 
  | moffkalast wrote:
  | Calls on puts you say?
 
    | helmholtz wrote:
    | Brilliant.
 
    | enlyth wrote:
    | In other words long volatility
 
___________________________________________________________________
(page generated 2021-10-21 23:00 UTC)