proxy70

	[HN Gopher] There is no 'printf' ___________________________________________________________________ There is no 'printf' Author : pr0zac Score : 112 points Date : 2021-10-20 15:01 UTC (1 days ago)
	web link (www.netmeister.org)
	w3m dump (www.netmeister.org)
	\| monocasa wrote: \| There is 'printf'. It's just that printf (and the rest of the \| standard library) is technically as much a part of the C language \| as the language grammar itself, and C compilers are welcome to \| use innate knowledge of those functions for optimizations. The \| other place you typically see this is calls to functions like \| memcpy/memset being elided to inline vector ops or CISC copies, \| or on simpler systems, large manual zeroing and copying being \| elided the other way to a memset or memcpy call. \| \| C compilers will typically have an escape hatch for envs like \| deeply embedded systems and kernels like gcc's -ffreestanding and \| -fno-builtin that says "but for real though, don't assume std lib \| functions exist or you know what they are based on the function's \| name". \| \| One of my favorite parts of rust as someone who \| uses it for deeply embedded systems is the separation of core and \| std (where core is the subset of std that only requires memcpy, \| memset, and one other I'm forgetting). The rest of the standard \| library is ultimately an optional part of the language with \| compiler optimizations focused on general benefits rather than \| knowing at the complier how something like printf works. no_std \| is such a nicer env than the half done ports of newlib or pdclib \| that everyone uses in C embedded land. \| tptacek wrote: \| Huh, this is pretty great; I've always fussily used fputs() when \| I'm just printing static strings, and apparently I don't need to \| bother, since the compiler will just do it for me. \| guerrilla wrote: \| Moar please. I'm loving these counterintuitive C optimization \| gotchas lately[1]. They are like little brain teasers. \| \| 1. https://news.ycombinator.com/item?id=28930271 \| 0xcde4c3db wrote: \| About a year ago there was something of a "joke isEven() \| implementation discourse" on Twitter, which eventually evolved \| a sort of informal optimizer abuse contest. For example: \| \| https://twitter.com/zeuxcg/status/1291872698453258241 \| \| https://twitter.com/jckarter/status/1428071485827022849 \| aw1621107 wrote: \| OK, those are horrifying and fascinating, and they basically \| break my brain. \| \| Is there a explanation somewhere of why the first one \| "works"? The second one I think is the compiler assuming the \| default case will never be hit since it'll result in infinite \| recursion, which is UB under C++, so it's basically assuming \| 0<=x<=3 and optimizing from there. Is that correct? \| \| The first one I'm less certain about. The only thing I can \| think of is that the compiler deduces an upper limit of \| INT_MAX - 1 to avoid signed overflow, and then somehow \| figuring out the true/false pattern from there? Still a bit \| of a gap in my understanding there. \| barsonme wrote: \| My guess: since overflowing int is UB, and the only value \| of n that stops the recursion is zero, the compiler assumes \| that n must be zero and checks accordingly. \| \| That doesn't explain why it uses test dil, 1 instead of \| test dil, dil or cmp 0 or whatever. \| davemp wrote: \| Optimizers have to keep the same input/output pairs unless \| there is undefined behavior. In the second function the \| truth table looks like: in \| out \| ---------- 0b000 \| 1 0b001 \| 0 \| 0b010 \| 1 0b011 \| 0 0b100 \| don't care \| . . . MAX \| don't \| care \| \| The compiler just chooses the most efficient way it knows \| to get the filled out entries correct which happens to be: \| in \| ~in[0] ---------- 0b000 \| 1 \| 0b001 \| 0 0b010 \| 1 0b011 \| 0 0b100 \| \| 1 . . . \| MAX \| 1 \| \| It would have been just as valid to do: \| in \| in[2] or ~in[0] ---------- 0b000 \| \| 1 0b001 \| 0 0b010 \| 1 0b011 \| 0 \| 0b100 \| 1 0b101 \| 1 . . \| . MAX \| 1 \| \| The first function's table looks like: in \| \| out ---------- 0b000 \| 1 0b001 \| \| don't care 0b010 \| don't care . \| . . MAX \| don't care \| \| And the compiler still likes the even check in this case, \| which makes sense. \| notriddle wrote: \| The first function (the `n == 0 \|\| !isEven(n+1)` \| recursive function) has defined behavior for negative \| numbers. That's probably why it compiled to an even \| number check. \| archi42 wrote: \| It's all fun and games until you write (or review) C/C++ test \| cases for a compiler or disassembler ;-) It never stopped to \| amaze me how good the compiler was to figure out that I \| actually wrote very complicated "return 0". \| eikenberry wrote: \| https://web.archive.org/web/20211019052752/https://www.netme... \| GoblinSlayer wrote: \| Imagine somebody thought omitting the return statement and doing \| whatever the compiler likes is a good feature to have. \| dboreham wrote: \| Like Scala? \| dnautics wrote: \| pretty sure scala (and most FP) has a well-defined "what to \| do when you leave off the return statement", not one that "is \| up to the compiler" \| [deleted] \| qwerty456127 wrote: \| > puts(3) only returns "a nonnegative integer on success and EOF \| on error" \| \| How does it decide which nonnegative integer to return? \| robotresearcher wrote: \| It's arbitrary. The article shows an implementation that \| returns 10 (ASCII '\n'). But the spec says it doesn't matter, \| so you should only be using it to test >0 for success. \| Bayart wrote: \| The correct implementation is _obviously_ to return 1 on \| success ! \| woodruffw wrote: \| That's answered below: \| \| > On success, puts(3) appears to return '\n', the newline or \| line feed (LF) character, which has ASCII value... 10. \| \| But note that that isn't standard behavior. The language in \| POSIX[1] is identical to that in the blog post. `puts` is free \| to return whatever positive number it wants on return. \| \| [1]: \| https://pubs.opengroup.org/onlinepubs/9699919799/functions/p... \| cyberge99 wrote: \| Apparently there is no available capacity for that site either. \| Bang2Bay wrote: \| https://search.yahoo.com/ for \| \| There is no 'printf' \| \| and look through the cache \| ltr_ wrote: \| [off topic] I always wondered how '%n' is used in production \| code. \| mormegil wrote: \| So, why does puts do "return r ? EOF : '\n';"? Some backwards \| compatibility? Or is there a logical reason for that? \| _kst_ wrote: \| That particular implementation probably returns the result of \| the last fputc() or equivalent that it called. \| \| puts() returns EOF (typically -1) on error, or some unspecified \| non-negative value on success. \| \| fputc() returns EOF on error or the written character, treated \| as an unsigned char and converted to int, on success. \| \| Don't expect all puts() implementations to do the same thing. \| For example, the glibc implementation appears to return the \| number of characters written on success. Implementations are \| free to rely on implementation-defined behavior. User code \| that's intended to be portable cannot. \| LukeShu wrote: \| That particular implementation (NetBSD's) (which is \| transcribed in to the article) does something more optimized \| than making repeated calls to `putchar()`. \| \| But as pdw's link shows, what you suggest is exactly what the \| historical implementation was. So NetBSD is simply matching \| historical Unix. \| masklinn wrote: \| Per the man: \| \| > puts() and fputs() return a nonnegative number on success, or \| EOF on error. \| \| r is the result of the write, if it's nonzero the write failed \| and thus so did puts. \| m45t3r wrote: \| Yeah, but I think the question was why EOF and "\n". It could \| as easily just return 1 or -1 for example, and it would make \| more sense I think. \| kevin_thibedeau wrote: \| puts() always adds a line termination so success means that \| '\n' is the last char for that implementation. \| pdw wrote: \| It's what historic Unix did: \| https://github.com/v7unix/v7unix/blob/master/v7/usr/src/libc... \| \| Why it did that? I'm not sure, but at the time C did not have \| 'void' functions: every function returned a value. They \| probably wanted to make the behavior of the stdlib functions \| deterministic, even if the return value was useless and \| undocumented. \| anonymousiam wrote: \| Compiler optimization can sometimes cause unpredictable or even \| incorrect behavior. Below is a blob of C code for the TI MSP430 \| compiler that exemplifies at least one of TI's optimization bugs: \| \| // Define Common Communications Frame \| \| typedef volatile union commFrameType \| \| { struct { unsigned \| SyncHeader:16; unsigned MessageID:8; \| unsigned short MessageData[msgDataSize]; // ID-unique data \| unsigned CRC:8; // LSB of CCITT-16 for above data \| } __attribute__ ((packed)) Frame; unsigned char \| b[16]; // Accessible as raw bytes as well \| unsigned short w[8]; // Accessible as raw words as well \| unsigned long l[4]; // Accessible as raw long words as \| well \| \| } __attribute__ ((packed)) CommFrame; \| \| static CommFrame IpcMessage = { FRAME_SYNC_R, IpcBlankMessage }; \| // If frame was accepted into TX queue, prepare next frame for \| transmission \| \| // IpcMessage.Frame.MessageID++; // Bump up to next message type \| \| // IpcMessage.Frame.MessageID += 1; \| \| // The above two lines that are commented out cause a bizzare \| linker error if either are used instead of the line below. \| IpcMessage.Frame.MessageID = IpcMessage.Frame.MessageID + 1; // \| Bump up to next message type \| \| The MSP-430 is a 16-bit microcontroller and the packed CommFrame \| structure has Frame.MessageID on an odd-byte boundary. Some \| processors might raise a SIGBUS, but TI says that it's okay to \| access a byte on an odd address boundary. \| \| It's pretty silly that i++; and i+=1; don't work, but i=i+1; is \| just fine. \| secondcoming wrote: \| 'unsigned MessageID:8;' isn't the same as 'unsigned char \| MessageId' \| RcouF1uZ4gsC wrote: \| This is a bit like saying there is no '+'; \| \| Because if you put in return 1+2+3; \| \| And look at the assembly code, you will see that the compiler \| generated something like return 6; \| \| The compiler is allowed to take advantage of the standard to \| substitute in more efficient code that does the same thing. \| \| IIRC, for C++, it would actually be ok if std::vector was \| implemented completely as a compiler intrinsic with no actual \| header file. (No compiler I am aware of actually does it that \| way). \| dnautics wrote: \| yeah but everyone knows that "there is no +"; It's an operator, \| and in C, anyways operators are special and expected to not \| necessarily do C-function-ey things, e.g, "take arguments of \| different types and add them successfully" not everyone is \| aware that C has "anointed functions" (including, I believe \| malloc) that the compiler is allowed to fiddle with. \| malkia wrote: \| Is there more info to this, I remember this from Commmon Lisp \| (but details evade me) that the compiler can take benefit of \| certain specific functions and rely on them being... "open \| coded" - e.g. it can produce more efficient code by replacing \| these with something more suitable... \| http://www.sbcl.org/manual/#Open-Coding-and-Inline-Expansion \| \| https://www.thecodingforums.com/threads/what-is-the-meaning-... \| talaketu wrote: \| > more efficient code that does the same thing \| \| In this case, it produces a different result. \| masklinn wrote: \| It produces a different ub, which is ub. \| \| Furthermore observability would be defined in terms of the C \| abstract machine, "observing" by decompiling the program is \| out of scope. \| talaketu wrote: \| oh right \| \| > But what if you're not using C99 or newer? \| \| UB - that takes all the fun out of it. \| Someone wrote: \| Code that does #include \| \| must compile, so that _header_ must exist (whether it is stored \| in a _file_ is the implementer's choice. AFAIK, the standard \| carefully avoids the use of the term 'header file') \| \| Also, I think code that doesn't do that include must fail to \| compile when it tries to use _std::vector_. So, logically, that \| header must exist. \| gpderetta wrote: \| Well not really. The preprocessor is part of the compiler, so \| it only needs set a flag to tell the compiler proper to \| enable std::vector. \| rrauenza wrote: \| Quick Summary: \| \| The C compiler optimizer replaces printf("Hello World!\n") with \| puts("Hello World!\n") and the implicit return from main() \| changes from 13 (the return value of printf) to 10 (the return \| value of puts) \| moffkalast wrote: \| Calls on puts you say? \| helmholtz wrote: \| Brilliant. \| enlyth wrote: \| In other words long volatility ___________________________________________________________________ (page generated 2021-10-21 23:00 UTC)