[HN Gopher] Python is 1.3x faster by just adjusting some compili...
___________________________________________________________________
 
Python is 1.3x faster by just adjusting some compiling options for
libpython
 
Author : chx
Score  : 72 points
Date   : 2021-06-12 19:52 UTC (3 hours ago)
 
web link (www.facebook.com)
w3m dump (www.facebook.com)
 
| jchw wrote:
| There's nothing wrong with this post factually, but the tone
| sucks. It has an immensely combative energy for what is not
| really a charged subject matter.
| 
| Like sure. Today, a lot of the historical reasons for things seem
| silly and irrelevant. At one point, they did not seem silly and
| irrelevant. For compatibility with stuff sticking around from
| those days, we get some performance penalties that are not
| strictly necessary. I don't think anyone is doing that to be an
| asshole, so the oddly antagonistic tone seems unjustified.
| 
| And yes, Windows with a module-level namespace is cleaner in this
| regard, but Windows design is entirely different and has plenty
| of its own skeletons. ELF does not, to me, feel significantly
| more horrible than PE. And I'm not speaking from inexperience; I
| did at least write a couple of ELF and PE parsing softwares over
| time, most recently go-winloader[1].
| 
| Do we need to override symbols in the same library? Probably
| not... _kind of_. Your modules may in fact not need this.
| However, libc probably does. Take a look at what symbols
| libpthread exports on your system some time.
| 
| I hate to be the person to point this out, but please consider
| not approaching subjects from this position. It feels alienating,
| and I have no idea why it's necessary to have such a tone.
| 
| [1]: https://github.com/jchv/go-winloader
 
  | ineedasername wrote:
  | Agreed. I was trying to find the words for what I was so put
  | off by the article, but you nailed it. The tone made me want to
  | disagree with it just by default. Luckily I 1) recognize that I
  | am not qualified to have an opinion on the technical details
  | and 2) Ruthlessly crush instinctual responses until I've
  | thought them through with less emotion. (most of the time...
  | I'm not robot, or perfect)
  | 
  | Someone in a sibling threat said it's not bad to write like
  | that is for catharsis... I guess to blow off steam or
  | something. But if the method of blowing off steam is belittling
  | other smart people that don't always make perfect decisions
  | then it's probably not a great way to go. If you need to write
  | it for catharsis, go for it, but there's no need to publish it.
  | 
  | Otherwise, my questions on the technical side: Would this
  | performance hit and the alternative option have been obvious at
  | the time? If so, was there a reasonable trade off for why this
  | approach was taken? Or was this choice only wrong in
  | retrospect?
 
  | fpgaminer wrote:
  | Not OP, but I read the "antagonistic" style of the post as just
  | the usual catharsis humor. All in-jest. I've used that style of
  | writing plenty before. It's a good way to blow off the steam of
  | working with these rather absurd, archaic systems that we have
  | to tackle on a daily basis. Programming can feel a bit
  | kafkaesque at times, so a bit of aggressive/dark humor goes a
  | long way.
  | 
  | But I do agree, it felt too thick. Still a very interesting
  | topic regardless.
 
  | zitterbewegung wrote:
  | It actually seems to miss a few points. (I also agree that the
  | post has not enough levity to balance out the negative tone).
  | 
  | 1. PEP 445 makes the use case of LD_PRELOAD irrelevant.
  | 
  | 2. A change like this would go under obvious code review and
  | testing to make it into a released version.
  | 
  | 3. The risk of a regression would still exist but that can
  | either be caught by #2 or the existing unit testing already in
  | Python.
  | 
  | (Disclaimer: I have contributed to the Python codebase)
 
  | Lammy wrote:
  | > It has an immensely combative energy for what is not really a
  | charged subject matter.
  | 
  | It becomes a charged subject matter when one works at companies
  | like Google and Facebook and gets used to navigating
  | performance reviews.
 
    | ineedasername wrote:
    | Would this tone of expression be appropriate in navigating
    | performance reviews? I mean the question honestly: My own
    | answer is "no", but I don't know the culture of performance
    | reviews at companies like that.
 
    | jchw wrote:
    | This is interesting. Not saying you are incorrect, but, I
    | have worked at Google for a few years and didn't pick up on
    | this, most people seem abundantly polite. But, I can just as
    | easily chalk that up to limited experience, since there is
    | clearly quite a lot of different things going on in any large
    | company.
 
  | CalChris wrote:
  | The OP was writing about a 29 year old design decision, and he
  | wasn't writing about a person. Design decisions don't have
  | feelings. I found his no holds barred clarity about something
  | as obscure as dynamic linking namespaces made for an easier if
  | still not easy read.
  | 
  | But that said, I don't think dynamic linking is in the ELF
  | spec. I believe that's a _de facto_ OS + dev tools thing rather
  | than an ELF spec _de jure_ thing. His points are still valid.
 
    | jordigh wrote:
    | > I found his no holds barred clarity
    | 
    | Being right is no excuse to being an asshole.
    | 
    | The attitude will appeal to some. It will strike many others
    | in the wrong way and put them on the defensive.
    | 
    | There's no reason to write this way. A concise, well-
    | articulated, non-combative post will appeal to everyone and
    | still convey the same information.
 
| derefr wrote:
| Doesn't gVisor require symbol interposition to do its sandboxing
| thing? (At least, for binaries with static-linked runtimes, like
| the type Golang produces by default.)
 
  | dathinab wrote:
  | If you have a fully static-linked library you already don't
  | have symbol interposition.
  | 
  | Furthermore this options still allow the thinks you need
  | interposition for, for calls from/to external dynamic linked
  | libraries like libc.
  | 
  | But most important gVisor is based around intercepting system
  | calls (over simplified), for which you don't need symbol
  | interposition.
 
  | falldmg wrote:
  | Symbol interposition? I don't know for sure, but I would guess
  | gVisor is using ptrace or another mechanism, to interpose on
  | syscalls, not library calls. But these flags, I believe, only
  | impact interposition of symbols in the same library, so even if
  | gVisor did use interposition for something, it may not matter.
 
| codelord wrote:
| I would just read the linked post:
| 
| https://bugs.python.org/issue38980?fbclid=IwAR0cyfahpBywNzbq...
| 
| As it contains almost the same info without the rant and with
| better explanation.
 
| kevingadd wrote:
| Anyone got an archive link? I can't read this without making a
| Facebook account and signing in
 
  | eptcyka wrote:
  | I read it without signing in. The "Not now" link is greyed out
  | and 4 points smaller and not a button. But it's there.
 
    | ineedasername wrote:
    | I did the first time around, but then I closed the tab, and
    | when I wanted to go back to look at something in more detail
    | I was blocked unless I signed in. Luckily someone posted the
    | full text in another comment.
 
    | mct wrote:
    | I'm seeing "You must log in to continue," with no "not now"
    | option.
 
      | qwertox wrote:
      | Probably expects JavaScript enabled or something. I also
      | don't have a "not now" option. No JavaScript, no CSS.
      | 
      | Honestly, how can someone into tech post something like
      | this on FB?
 
        | OJFord wrote:
        | Leaving aside whether or not they should want to post it
        | there, I'm surprised it has an audience.
        | 
        | Someone saw it and shared it to HN; enough read it to
        | upvote it this much.. maybe Facebook's more popular than
        | I thought! (That sounds silly or sarcastic, but 'among HN
        | users and similar' I'm serious.)
 
| cerved wrote:
| Right post, wrong platform
 
| stereo wrote:
| Text if you don't want to visit Facebook:
| 
| Summary: Python is 1.3x faster when compiled in a way that re-
| examines shitty technical decisions from the 1990s. ELF is the
| executable and shared library format on Linux and other Unixy
| systems. It comes to us from 1992's Solaris 2.0, from back before
| even the first season of the X-Files aired. ELF files (like
| X-Files) are full of barely-understood horrors described only in
| dusty old documents that nobody reads. If you don't know anything
| about symbol visibility, semantic interposition, relocations, the
| PLT, and the GOT, ELF will eat your program's performance.
| (Granted, that's better than being eaten by some monster from a
| secret underground government base.)
| 
| ELF kills performance because it tries too hard to make the new-
| in-1992 world of dynamic linking look and act like the old world
| of static linking. ELF goes to tremendous lengths to make sure
| that every reference to a function or a variable throughout a
| process refers to the same function or variable no matter what
| shared library contains each reference. Everything is consistent.
| 
| This approach is clean, elegant, and wrong: the cost of
| maintaining this ridiculous bijection between symbol name and
| symbol address is that each reference to a function or variable
| needs to go through a table of pointers that the dynamic linker
| maintains --- even when the reference is one function in a shared
| library calling another function in the same shared library. Yes,
| `mylibrary_foo()` in `libmylibrary.so` has to pay for the
| equivalent of a virtual function call every time it calls
| `mylibrary_bar()` just in case some other shared library loaded
| earlier happened to provide a different `mylibrary_bar()`. That
| basically never happens. (Weak symbols are an exception, but
| that's a subject for a different rant.)
| 
| (Windows took a different approach and got it right. In Windows,
| it's okay for multiple DLLs to provide the same symbol, and
| there's no sad and desperate effort to pretend that a single
| namespace is still cool.)
| 
| There's basically one case where anyone actually relies on this
| ELF table lookup stuff (called "interposition"): `LD_PRELOAD`.
| `LD_PRELOAD` lets you provide your own implementation of any
| function in a program by pre-loading a shared library containing
| that function before a program starts. If your `LD_PRELOAD`ed
| library provides a `mylibrary_bar()`, the ELF table lookup goo
| will make sure that `mylibrary_foo()` calls your `LD_PRELOAD`ed
| `mylibrary_bar()` instead of the one in your program. It's nice
| and dynamic, right? In exchange for every program on earth being
| massively slower than it has to be all the time, you, programmer,
| can replace `mylibrary_bar()` with `printf("XXX calling bar!!!")`
| by setting an environment variable. Good trade-off, right?
| 
| LOL. There is no trade-off. You don't get to choose between
| performance and flexibility. You don't get to choose one. You get
| to choose zero things. Interposition has been broken for years: a
| certain non-GNU upstart compiler starting with "c" has been
| committing the unforgivable sin of optimizing calls between
| functions in the same shared library. Clang will inline that call
| from `mylibrary_foo()` to `mylibrary_bar()`, ELF be damned, and
| it's right to do so, because interposition is ridiculous and
| stupid and optimizes for c00l l1inker tr1ckz over the things
| people buy computers to actually do --- like render 314341 layers
| of nested iframe.
| 
| Still, this Clang thing does mean that `LD_PRELOAD` interposition
| no longer affects _all_ calls, because with Clang, contra the
| specification, will inline some calls to functions not marked
| inline --- which breaks some people 's c00l l1inker tr1ckz . But
| we're all still paying the cost of PLT calls and GOT lookups
| anyway, all to support a feature (`LD_PRELOAD`) that doesn't even
| work reliably anymore, because, well, why change the defaults?
| 
| Eventually, someone working on Python (ironically, of all things)
| noticed this waste of good performance. "Let's tell the compiler
| to do what Clang does accidentally, but all the time, and on
| purpose". Python got 30% faster without having to touch a single
| line of code in the Python interpreter.
| 
| (This state of affairs is clearly evidence in favor of the
| software industry's assessment of its own intellectual prowess
| and justifies software people randomly commenting on things
| outside their alleged expertise.)
| 
| All programs should be built with `-Bsymbolic` and `-fno-
| semantic-interposition`. All symbols should be hidden by default.
| `LD_PRELOAD` still works in this mode, but only for calls
| _between_ shared libraries, not calls _inside_ shared libraries.
| One day, I hope as a profession we learn to change the default
| settings on our tools.
 
  | qwertox wrote:
  | Thank you. This link asked me to sign in in a very broken page
  | (I block most of Facebook's domains), and am wondering if this
  | is just someone who posted it on FB or if it is a post from the
  | engineering team at FB.
 
    | aioprisan wrote:
    | Just someone posting on FB
 
  | rocqua wrote:
  | Not sure this is ok copyright wise.
 
    | ineedasername wrote:
    | I'm not sure Facebook's privacy intrusions are ok ethically
    | wise. So there's competing value systems at work.
 
  | dathinab wrote:
  | This has interesting parallels with how some languages include
  | the library version in the "symbolic name" (mangled name, fully
  | qualified name etc).
  | 
  | This often allows loading of multiple versions of the same
  | dependency in the same program without ugly hacks. Which is
  | grate if you have multiple dependencies which both have the
  | same sub-dependency (each internal only to their dependent) but
  | need different versions.
  | 
  | It's kinda a nightmare if you run into this problem in
  | languages which don't support it.
 
| Scaevolus wrote:
| This is true for _libpython_ (the shared library version), which
| is the default on some distros (RedHat, Fedora, Arch), but many
| others (Debian, Ubuntu) use statically linked Python and never
| paid this performance tax.
 
  | dheera wrote:
  | I think using Pypy instead of CPython will give you several
  | times the performance boost as any of this.
 
    | dharmab wrote:
    | Pypy is not a drop-in replacement for CPython. It does not
    | support many libraries that rely on C extensions, and targets
    | a slightly older version of the language.
 
    | dathinab wrote:
    | Likely, but it doesn't work with all applications.
 
| th0ma5 wrote:
| Not entirely related but I recently started playing with the
| built in "dis" library and it is fun to see the compiled
| representation of functions that the runtime executes. Just an
| FYI if you're ever bored and are looking to get more familiar
| with assembly, it is a very approachable thing to play with.
 
| BurningFrog wrote:
| I'm confused.
| 
| Is this about something I can do to speed up our 3.8 Python code,
| or about why Python 3.8 is faster than 3.7?
 
  | ineedasername wrote:
  | I think it was addressed by python already:
  | 
  |  _Eventually, someone working on Python (ironically, of all
  | things) noticed this waste of good performance_
  | 
  | But It would be good to know when & what versions.
  | 
  | I'm also not sure why this is "ironic". Who else but the
  | experts on python would be more likely to discover this &
  | resolve the issue? Which basically makes the whole thing a non-
  | issue:
  | 
  | Python creators made a choice when creating python. A while
  | later they realized they could improve performance by
  | revisiting that choice.
  | 
  | The tone of the article makes it sound like this was an
  | embarrassing mistake of massive proportions.
 
| kzrdude wrote:
| Completely beside the article - the first specializing, adaptive
| interpreter (PEP 659) improvements have been merged to CPython
| these last weeks, and hopefully we can see updates about
| benchmarks and performance sooner or later.
 
| geofft wrote:
| Related: https://developers.redhat.com/blog/2020/06/25/red-hat-
| enterp...
| 
| > _This article focuses on one specific performance improvement
| in the python38 package. As we 'll explain, Python 3.8 is built
| with the GNU Compiler Collection (GCC)'s -fno-semantic-
| interposition flag. Enabling this flag disables semantic
| interposition, which can increase run speed by as much as 30%._
| 
| (not logged in to FB, so maybe TFA is a reference to this one?)
 
___________________________________________________________________
(page generated 2021-06-12 23:00 UTC)