[HN Gopher] A Heisenbug lurking in async Python
___________________________________________________________________
 
A Heisenbug lurking in async Python
 
Author : willm
Score  : 336 points
Date   : 2023-02-11 17:25 UTC (5 hours ago)
 
web link (textual.textualize.io)
w3m dump (textual.textualize.io)
 
| dataflow wrote:
| The notion of fire-and-forget is itself the problem. Even with
| threads, you should have them join the main thread before the
| program exits. Which implies you should hold strong references to
| them until then. Most people don't go out of their way to do this
| even when they're able to, but that's what you're supposed to do.
 
| bornfreddy wrote:
| Wow. What a strange design decision, as evidenced by sheer number
| of developers who don't / didn't know about this (myself
| included). I hope this gets _fixed_ instead of just documented.
 
  | jcheng wrote:
  | Agreed, I'm really surprised at all the comments defending this
  | behavior. I suspect there is a non-obvious reason why it's this
  | way, but "you should've read the docs" and "but why _wouldn't_
  | you hold your own strong reference" are weird takes IMHO.
 
| boomskats wrote:
| As someone who happens to be eternally grateful to the author for
| his contribution to the Python ecosystem [0], I kinda feel like
| this comment thread is overreacting to his overreaction. When I
| look at this post all I see is a useful, well explained, byte-
| size writeup that a search engine might recommend to someone
| looking for help in writing async Python.
| 
| Maybe it's because a bunch of my friends are Scottish and I get
| their sense of humour.
| 
| [0]: https://rich.readthedocs.io/ (yes I'm talking about the
| fancy new progress bar that pip got recently)
 
| rlpb wrote:
| This issue doesn't exist with Trio's structured concurrency
| model. In other words, the problem is already solved.
 
  | nbadg wrote:
  | I'll +1 the Trio shoutout [1], but it's worth emphasizing that
  | the core concept of Trio (nurseries) now exists in the stdlib
  | in the form of task groups [2]. The article mentions this very
  | briefly, but it's easy to miss, and I wouldn't describe it as a
  | solution to this bug, anyways. Rather, it's more of a different
  | way of writing multitasking code, which happens to make this
  | class of bug impossible.
  | 
  | [1] https://github.com/python-trio/trio
  | 
  | [2] https://docs.python.org/3/library/asyncio-task.html#task-
  | gro...
 
    | Tanjreeve wrote:
    | Oh good so now we can all move to this years Async flavour in
    | Python.
 
| edfletcher_t137 wrote:
| This is a great blog post. Concise, lacking fluff or extraneous
| prose, it gets right to the point, presents the primary-source
| reference and then gets right to the solution. A bit of
| editorializing in the middle but that's completely allowed when
| writing this tightly. Well damn done, OP.
| 
| And also it's _great_ information that I - like I 'm sure many of
| you - also never noticed. THANK YOU!
 
  | [deleted]
 
  | mgsk wrote:
  | What does this add this isn't already right there in the
  | documentation?
 
    | nkrisc wrote:
    | If there was nothing to add then there wouldn't be loads of
    | projects on GitHub making exactly this mistake.
 
    | Jtsummers wrote:
    | It draws attention to a problem that a lot of people have
    | created for themselves by not reading the documentation (or
    | not recalling it if they read it). I guess the author could
    | have just linked the documentation but then they couldn't
    | have added the additional context of the github search
    | demonstrating how common it is.
 
      | newaccount74 wrote:
      | I must have looked through the docs for create_task a dozen
      | times while trying to figure out how async/await works in
      | Python but still managed to overlook this part.
 
        | edflsafoiewq wrote:
        | That is unsurprising. It was first added as a brief note
        | only in 3.9, and expanded to its present length only in
        | 3.10.
 
    | klyrs wrote:
    | The author doesn't go into much detail on that point: this
    | warning should be present in documentation of many Python
    | libraries that use create_task and return the result to the
    | user unless that library stores those tasks in a collection
    | as is recommended -- at which point the library author had
    | better roll their own garbage collection!
 
  | isoprophlex wrote:
  | Well, I don't know, I kinda miss the human angle. I'd have
  | loved to first read six paragraphs about how the author's
  | grandmother raised them on home grown threads and greenlets :^)
 
    | nickjj wrote:
    | > I'd have loved to first read six paragraphs about how the
    | author's grandmother raised them on home grown threads and
    | greenlets.
    | 
    | With recipes, often times your problem is you want to learn
    | how to make something where having the steps listed out is
    | the most important thing. The story behind the recipe isn't
    | important to solve your problem but for tech the story around
    | the choice is important. Often times the "why" is really
    | important and I really like hearing about what led someone to
    | use something first. Often times that's more important or
    | equally as important as the implementation details.
    | 
    | It wouldn't make sense for this post given its title but if
    | someone were making a post about why they chose to use async
    | in Python I'd expect and hope that half of the post goes into
    | the gory details of how they tried alternatives and what
    | their shortcomings were for their specific use cases. That
    | would help me as the reader generalize their post to my
    | specific use cases and see if it applies.
 
      | bialpio wrote:
      | Off-topic but the life story is there to make them eligible
      | to be protected by copyright. IANAL.
      | 
      | Source: https://copyrightalliance.org/are-recipes-
      | cookbooks-protecte...
 
        | flandish wrote:
        | Interesting. I always thought it was search engine
        | optimization.
 
        | aidenn0 wrote:
        | SEO is definitely a big part of it; Google penalized
        | pages where people closed or navigated away quickly.
 
        | fbdab103 wrote:
        | I immediately bounce from those Stackoverflow clones that
        | keep appearing up at the top of searches. So, I am
        | wondering how much this is still weighted in the scores.
 
        | gdprrrr wrote:
        | https://github.com/quenhus/uBlock-Origin-dev-filter
 
        | jonas21 wrote:
        | You might. But many people don't. They just want an
        | answer and don't care if it's a clone or not.
 
        | chucksmash wrote:
        | Had this driven home recently, watching a younger dev
        | happily clicking links I've long ago blocked via browser
        | extension (w3schools AND geeksforgeeks _in one session_ )
 
        | rmbyrro wrote:
        | SEO makes total sense. I always add grandma keywords when
        | I'm searching for Python stuff on Google.
        | 
        | Like: "grandma, how the hell have I still not memorized
        | the API and keep needing to resort to the same doc pages
        | again and again?"
        | 
        | Now I trained ChatGPT with grandma letters from when I
        | was young, so it will answer just like if it was my
        | grandma.
 
        | water-your-self wrote:
        | Its engagement optimization. Adsense pays more if you
        | spend more time on the page
 
        | yunohn wrote:
        | When is the last time you heard of online recipe blogs
        | enforcing copyright claims on other blogspam? Ridiculous.
        | 
        | The real reason is simple, people who write recipes
        | aren't robots - they're expressing their stories and
        | emotions, while explaining how to make food that's dear
        | to them..
 
| throwaway81523 wrote:
| There's a similar thing in tkinter but I guess users discover it
| faster, since the failure if you don't save the reference shows
| up fairly quickly.
 
| Lammy wrote:
| I experienced a heisenbug exactly like this in Ruby when trying
| to `while case Ractor::receive`:
| https://github.com/okeeblow/DistorteD/blob/dd2a99285072982d3...
 
| zzzeek wrote:
| I think asyncio is kind of neat for what it's good at, but
| beginner programmers who have never wrote code before are going
| directly to using Python asyncio (i know this because they are
| telling me so when they post sqlalchemy discussions). This is
| just wrong.
 
| samwillis wrote:
| This is one of many reasons I'm sceptical of the current trend in
| Python to "async all the things". The nuance to how it operates
| is often opaque to the developer, particularly those less
| experienced.
| 
| GUI toolkits (like Textual) however are a really good use case
| for Asyncio. Human interaction with a program is inherently
| asynchronous, using async/await so that you can more cleanly
| specify your control flow is so much better than complicated
| callbacks. Using async/await in front end JS code for example is
| a delight.
| 
| Where I'm particularly unconvinced of their use is in server side
| view and api end point processing. The majority of the time you
| have maybe a couple of IO opps that depend on each other. There
| is often little than can be parallelised (within a request) and
| so there are few performance gains to be a made. Traditional
| synchronous imperative code run with a multithreaded server is
| proven, scalable and much easier to debug.
| 
| There are always places where it's useful though, things such as
| long running requests (websockets, long polling), or those very
| rare occurrences where you do have many easily parallelizable IO
| opps within one short request.
 
  | heavyset_go wrote:
  | > _Where I 'm particularly unconvinced of their use is in
  | server side view and api end point processing. The majority of
  | the time you have maybe a couple of IO opps that depend on each
  | other. There is often little than can be parallelised (within a
  | request) and so there are few performance gains to be a made.
  | Traditional synchronous imperative code run with a
  | multithreaded server is proven, scalable and much easier to
  | debug. Traditional synchronous imperative code run with a
  | multithreaded server is proven, scalable and much easier to
  | debug._
  | 
  | Python doesn't have multithreading that scales or supports real
  | parallelism. asyncio has very measurable performance benefits
  | for exactly that use case you've mentioned versus threaded
  | servers.
 
    | zzzeek wrote:
    | Sorry that's not accurate. Asyncio and threading offer the
    | same variety of "parallelism" , which is that both can wait
    | on multiple io streams at once (the gil is released waiting
    | on io). Neither offer CPU parallelism, unless lots of your
    | CPU work is in native extensions that release the gil. In
    | that unusual case, threading would offer parallelism where
    | asyncio wouldn't.
    | 
    | Asyncio's single advantage is you can wait on _lots_ of io
    | streams, like many thousands, very cheaply without having to
    | roll non blocking IO queueing code directly.
 
      | heavyset_go wrote:
      | I didn't say that asyncio offered parallelism, I'm pointing
      | out that normal assumptions about multithreading you'd make
      | with other languages don't always apply to Python. You'd
      | typically assume that threads offer parallelism, a property
      | you might choose to use them for over something like
      | single-threaded asyncio.
      | 
      | I've found that for even IO bound workloads, the amount of
      | throughput plateaus when using a relatively small amount of
      | threads despite the GIL being released on IO.
 
  | Topgamer7 wrote:
  | These days with graphql, or complex microservices
  | architectures, you could have multiple hops to fulfil l the
  | original request.
  | 
  | Flask sync will hold that thread hostage until the request is
  | done. Where async with properly used async libs will allow
  | other requests to process.
  | 
  | We often have medium sized reports take seconds. That is a lot
  | of time to wait. And would just end up bloating your service
  | scaling to handle more connections.
  | 
  | Any service with decently long lived network requests will
  | benefit from event loop handled scheduling.
 
  | traverseda wrote:
  | >Where I'm particularly unconvinced of their use is in server
  | side view and api end point processing.
  | 
  | Sure, performance isn't going to get better, but for websockets
  | and server sent events the occasional long-lived async task can
  | be great. Especially when you need to poll something, or check
  | in on a subprocess.
 
  | nbadg wrote:
  | The thing is, there's a lot more nuance to it than this.
  | Async/await is part of the language syntax in python, but
  | asyncio is only one particular implementation of an event loop
  | framework to power it. But really what async/await provides is
  | a general-purpose cooperative multitasking syntax. This allows
  | other libraries to implement their own event loop frameworks,
  | each with their own different semantics and considerations (the
  | two best-known alternatives being Curio and Trio). At a
  | language level, there's nothing even forcing you to use
  | async/await for ascync IO -- you could, if you really wanted,
  | probably write a library that used it to start threads and
  | await their completion.
  | 
  | So you have, from highest-level to lowest-level: application
  | code, async/await language syntax, the event loop framework,
  | and then the implementation of the event loop itself. The OP
  | article concerns a peculiar implementation detail in the lowest
  | level that makes it very easy to write bugs at the highest
  | level.
  | 
  | But that means that even if you do "async all the things",
  | you'll only encounter this situation if you write your
  | application code in a particular way. It just so happens that
  | "in a particular way" is, in this case, the overwhelming
  | majority of how people write it, which is, of course, why the
  | OP article is relevant.
 
    | heavyset_go wrote:
    | > _The OP article concerns a peculiar implementation detail
    | in the lowest level that makes it very easy to write bugs at
    | the highest level._
    | 
    | Are other async implementations using the asyncio.Task
    | abstraction? I haven't looked into it, but I assumed that
    | asyncio.Task was tied to the asyncio implementation and event
    | loop.
 
  | pdonis wrote:
  | _> GUI toolkits (like Textual) however are a really good use
  | case for Asyncio._
  | 
  | Only if the GUI toolkit is explicitly written to be asyncio-
  | aware and use asyncio's event loop. Textual appears to be
  | written specifically to do that.
  | 
  | However, other GUI toolkits that I'm aware of that have Python
  | bindings aren't written that way. Qt, for example, uses its own
  | event loop, and if you want anything other than a GUI event to
  | be fed into Qt's event loop so your event-driven code can
  | process it, you have to do that by hand and make sure it works.
  | There is no point in even trying to use another event loop,
  | such as Python's asyncio event loop, since that loop will never
  | run while Qt's event loop is running.
 
  | samsquire wrote:
  | I am a huge fan of parallel and async code. I spend a lot of
  | time researching it and trying to design software that is
  | easily parallelisable.
  | 
  | Many GUIs use the event/message pump pattern, such as Windows
  | 32 API. Qt does something with its event loop (QEventLoop)
  | 
  | Threads are a rather low level instrument to get background
  | tasks going because the interface between the main thread and
  | the threads is rather omitted.
  | 
  | In Java you could use a ConcurrentLinkedQueue. And in Python
  | you can use JoinableQueue.
  | 
  | I am heavily interested in this space because I want to write
  | understandable software that anybody can pick up and work with.
  | I worked on a JMS log viewer that used threads but would crash
  | with ConcurrentModificationException due to not being thread
  | safe. I changed it to be thread safe but its performance
  | dropped through the floor. In my learnings since then I should
  | hast sharded each JMS connection topic to its own thread or
  | multiplexed multiple JMS topics per thread and loop over them.
  | The main thread can interrogate the thread with a lock, that
  | should be faster than every thread trying to acquire the lock.
  | It would be driven by the main thread but the work is done in
  | the background. The threads can keep the fetched messages in
  | memory until the main thread is ready for them.
  | 
  | I think with the right abstraction, thread safety can be
  | achieved and concurrency shouldn't be something to be afraid
  | of. It is very difficult and challenging working at the low
  | levels of concurrency such as a concurrent browser engine.
  | (I've not done that though.)
  | 
  | This is why languages such as Pony lang, Inko, Cyber and
  | Erlang, Elixir are so promising. We can build high performance
  | systems that parallelise.
  | 
  | Writing an async/await pipeline that looks synchronous is far
  | easier to understand and maintain than nested callbacks. So I
  | can see where async is useful. I just hope we can design async
  | software to be simpler to maintain and extend.
 
  | whoopdeepoo wrote:
  | I don't write any colored function code in python, I'd much
  | rather work with process/thread pools
 
    | Animats wrote:
    | Me too, but threading is botched in Python. Not just the
    | Global Interpreter Lock. Some Python packages are not thread-
    | safe, and it's not documented which ones are not. Years ago I
    | discovered that CPickle was not thread safe, and that wasn't
    | considered a problem.
 
  | michael_j_x wrote:
  | I am not sure I agree that the GUI is a good use case for
  | async. A human interaction with the program must almost always
  | pre-empt whatever the program was running, so I can not see how
  | a cooperative multi-threading runtime like async Python can
  | work in such a scenario.
 
| kodablah wrote:
| It is for this reason in Temporal Python[0], where we wrote a
| custom durable asyncio event loop, that we maintain strong
| references to tasks that are created in workflows. This wouldn't
| be hard for other event loop implementations to do too.
| 
| 0 - https://github.com/temporalio/sdk-python
 
  | make3 wrote:
  | he never said it was hard, his point is that it's unintuitive &
  | a lot of people don't know or don't remember
 
    | kodablah wrote:
    | I mean the default asyncio event loop can be
    | replaced/extended where you won't have to know/remember on
    | each create_task. But yes, it is an unintuitive default.
 
| NelsonMinar wrote:
| Does anyone understand why the event loop only keeps weak
| references to tasks? It'd seem wise to do something to stop it
| from being garbage collected while running, maybe also while
| waiting to run.
 
  | coopsmoss wrote:
  | I agree, I think this is very unpythonic behavior
 
  | masklinn wrote:
  | Only guess I'd have is to protect the system against infinite-
  | loop tasks, but I don't remember any other runtime caring and
  | an a task which never terminates seems easier to diagnose than
  | one which disappears on you.
 
  | kortex wrote:
  | Because it's almost always the case that the consumer is going
  | to keep a reference to the task in some way, so that is the
  | logical choice for the "primary owner" of the task. Python
  | doesn't have ownership per se like rust, but if you keep more
  | than one hard reference to an object around, it'll prevent
  | collection, so in cases such as this it makes sense to
  | designate one primary owner and have all other references be
  | weakref.
 
    | skitter wrote:
    | > if you keep more than one hard reference to an object
    | around, it'll prevent collection
    | 
    | Which is the behavior the parent comment asks for.
 
| anthomtb wrote:
| Well, looks like I know what I am doing first thing on Monday. I
| converted a bunch of code to asyncio a while back. I have yet to
| run into any heisenbug in that code and want to keep it that way.
 
| cpburns2009 wrote:
| I've been working on a PySide6 application recently using
| asyncio. I read the docs but totally overlooked the requirement
| to hold references to tasks created with `create_task()`.
 
| dehrmann wrote:
| Eww. What's especially nasty is this is the opposite behavior of
| threads.
 
| aeturnum wrote:
| I really think this writer doth protest too much.
| 
| Yes, the base async interface is confusing and overly complex.
| It's a downside! As they note lots of people have stepped in to
| provide better helpers (like TaskGroups) - but these are the docs
| for the base library!
| 
| > _But who reads all the docs? And who has perfect recall if they
| do?_
| 
| Everyone reads the docs? That is why you don't need perfect
| recall because you can read them whenever you want.
| 
| Python has lots of confusing corner cases ("" is truthy, you need
| to remember to call copy [or maybe deepcopy!] sometimes, all the
| other situations where you confuse weak v.s. strong references).
| They cause really common bugs. It's just a hazard of the language
| in general and the choices it makes (much like tasks being
| objects is a hazard). I do understand why people think they can
| throw away task references (based on other languages) - but this
| is Python! The garbage collector exists and you gotta check if
| you own the object or something else does.
| 
| Edit: this feels like an experienced Python developer, who has
| already internalized all the older, non-async Python weirdness,
| being taken aback by weirdness they didn't expect. Like, I feel
| you, it does suck - but it's not a bug that values you don't
| retain may get garbage collected.
 
  | No1 wrote:
  | He didn't even have to read "all the docs" - just the ones that
  | pertain the the function that he is using. And then not ignore
  | the section marked "Important" _and_ the highlighted  "Note".
 
    | richbell wrote:
    | What if he read the docs for that function prior to the
    | "important" note being added?
 
  | Karunamon wrote:
  | > _Everyone reads the docs?_
  | 
  | The author goes on to say they found this pattern lurking in
  | various projects on github. So, no. The problem is that this
  | behavior is subtle, not intuitive, and unless you are reading
  | the actual documentation top to bottom (and not just the
  | function signature and first paragraph from the pop up in your
  | IDE) you will likely get bitten by this.
  | 
  | What is the point of your comment? The author _shouldn 't_ have
  | called out the upturned rake in the darkened shed?
 
    | rollcat wrote:
    | > The author goes on to say they found this pattern lurking
    | in various projects on github.
    | 
    | I'd call it an anti-pattern. If you spawn a process/thread,
    | and never wait/join it, it means you don't actually care what
    | it does, if it crashes, etc. I don't see a problem with
    | Python's behavior here.
 
    | aeturnum wrote:
    | I wouldn't say _shouldn 't_ - they are free to do what they
    | want. But this is a blog post about something that can trip
    | you up that the docs highlight - which the author calls a
    | "heisenbug". The author doesn't even have a suggestion for
    | the docs, which already calls out the problem they
    | encountered, they just note that there are helpers for this
    | problem (which is true).
    | 
    | The point of my comment is that subtle, non intuitive things
    | like this are all over Python and, while this one is
    | _particularly bad_ , this blog post makes it seem like more
    | of an aberration than it is.
 
  | IshKebab wrote:
  | > Everyone reads the docs?
  | 
  | Wow I've heard people say that everyone _should_ read all of
  | the docs (which isn 't really true) but I've never heard anyone
  | claim that everyone _does_ read all of the docs! Wild.
 
  | raverbashing wrote:
  | > "" is truthy
  | 
  | Humm, no? Unless you mean ("",)                   >>> not ""
  | True
 
    | aeturnum wrote:
    | Oh, sorry, you are right - "" is false-y, even though it's a
    | valid empty value. So it's hard to tell the difference
    | between a value not being filled and a value being filled
    | with an empty value.
    | 
    | ex:                 answers = {}       answers["I exist"] =
    | ""       if answers["I exist"]:           print("a")
    | 
    | does not print.
 
      | fbdab103 wrote:
      | I guess I am too deeply in the Python ecosystem to see a
      | problem here. Unless you want to check for the existence of
      | "I exist"? In which case, the Python Way would be
      | answers = {}       answers["I exist"] = ""       if "I
      | exist" in answers:           print("a")
 
        | pacaro wrote:
        | Maybe                 ...       if answers.get('I
        | exist'):         print('a')
        | 
        | Which is why you should always explicitly check for
        | _None_ if that is your intent.
 
        | aeturnum wrote:
        | It's not a problem? The async interface isn't a problem
        | either. It's just a thing you have to remember about
        | python: "most input is truthy except for the input that
        | isn't"
        | 
        | "Most of the time you don't disrupt your program by not
        | keeping the returned reference in scope except for when
        | you do"
        | 
        | It's just a thing that trips people up.
 
        | dwattttt wrote:
        | > It's just a thing you have to remember ...
        | 
        | The more of these things there are, the more brainpower
        | you devote to remembering the right way to do things; if
        | you don't you introduce bugs, a subtle, painful one here.
 
        | heavyset_go wrote:
        | "Empty containers are falsy" is a Python fundamental,
        | this isn't a subtle bug, but an obvious one.
 
        | fbdab103 wrote:
        | Truthy is a Pythonic core principle of the language. It
        | is not an edge case phenomenon in the language which I
        | would expect a regular practitioner to confuse.
        | 
        | https://docs.python.org/3/library/stdtypes.html#truth-
        | value-...
 
        | aeturnum wrote:
        | I mean, I've seen bugs around that in code I've worked on
        | and I've created bugs where it's a factor.
        | 
        | Weakrefs are also a core part of the language:
        | https://docs.python.org/3/library/weakref.html . You
        | can't use python without using them.
 
        | fiddlerwoaroof wrote:
        | What I learned when I wrote Python professionally was
        | "never rely on truthiness" explicitly writing out a
        | boolean expression that does what you want is more
        | explicit ("explicit is better than implicit", PEP 8) and
        | prevents a whole class of bugs down the line.
 
        | nemetroid wrote:
        | PEP 8, which you mention, explicitly recommends relying
        | on truthiness:
        | 
        | > For sequences, (strings, lists, tuples), use the fact
        | that empty sequences are false:                 #
        | Correct:       if not seq:       if seq:            #
        | Wrong:       if len(seq):       if not len(seq):
 
        | AeroNotix wrote:
        | PEP8 is touted a lot as if it is a perfectly correct tome
        | of ... correctness. I've worked in Python long enough to
        | know that it both doesn't cover everything and the advice
        | is sometimes actively bad.
 
      | heavyset_go wrote:
      | > _if answers[ "I exist"]:_                   if "I exist"
      | in answers:              ...
 
      | wizzwizz4 wrote:
      | > _So it 's hard to tell the difference between a value not
      | being filled and a value being filled with an empty value._
      | >>> answers = {}       >>> if answers["I don't exist"]:
      | ...     print("a")            Traceback (most recent call
      | last):         File "", line 1, in 
      | if answers["I don't exist"]:       KeyError: "I don't
      | exist"
      | 
      | The method you're trying to use doesn't work _anyway_ : it
      | doesn't matter that it's confusing. You'd have the same
      | problem with the value False.
 
  | Etheryte wrote:
  | I think you may be too bold with the assumption here,
  | personally I would wager that the majority of people who write
  | Python don't even know Python has official docs outside of a
  | site called Stack Overflow.
 
  | leni536 wrote:
  | Considering how many times I need to add site:python.org to my
  | python search queries to actually get to the docs, I assume
  | that a surprisingly low number of python developers actually
  | read the docs.
 
    | 0x008 wrote:
    | If you use Druck duck go you can prefix search with "!py3"
 
  | iforgotpassword wrote:
  | > Everyone reads the docs?
  | 
  | For Python? The language where everyone just cobbles together
  | random code from the internet and other repos? I can totally
  | see how this mistake happens left and right. The bar of entry
  | for this language is way too low to assume only rigorous senior
  | devs use it.
 
| bandyaboot wrote:
| He doesn't really get into what makes this a Heisenbug, only that
| it's indeterminate in nature. Would attaching a debugger/stepping
| through the code make it less likely that your task would get
| garbage collected out from under you?
 
  | Izkata wrote:
  | You're probably going to need a reference to the task in order
  | to inspect it in the debugger. Creating that reference prevents
  | the bug.
 
  | foobarbecue wrote:
  | Yeah, he seems to be re-defining the term to mean "a bug that
  | occurs occasionally depending on system state" as opposed to "a
  | bug that changes behavior when you observe it closely e.g. in a
  | debugger."
 
    | macintux wrote:
    | The first is a common way of using the term Heisenbug. I
    | first heard it used that way 10 years ago when discussing
    | Erlang's error handling model.
 
  | throwaway81523 wrote:
  | CPython does most of its memory management by reference
  | counting, which fails to reclaim circular structure. So to make
  | sure it gets everything, it occasionally runs a conventional
  | tracing GC. If the GC happens to run just after you create that
  | async task, the task itself can get collected, it sounds like.
  | It's good to know about this and is (my own editorializing) yet
  | another reason Python3 should have used Erlang-style
  | concurrency instead of this async stuff.
 
| No1 wrote:
| His argument hinges on "I can't be bothered to read the docs on
| the stuff I'm using." So instead of reading the docs on
| coroutines and tasks before using them, writes a rant about how
| it's all wrong because he didn't understand how it works.
| 
| On a more fundamental level, why would anyone assume that a
| coroutine is guaranteed to complete if it is never awaited? There
| is no reason a scheduler could not be totally lazy and only
| execute the coroutine once awaited.
| 
| At least he bothered to make note of TaskGroups, also clearly
| shown in his documentation screenshot, immediately above the
| section marked _Important_ that went ignored, and finishes with
| "As long as all the tasks you spin up are in TaskGroups, you
| should be fine." Yep, that's all there was to it.
 
  | ptx wrote:
  | > _There is no reason a scheduler could not be totally lazy and
  | only execute the coroutine once awaited._
  | 
  | Isn't the point of create_task (which is what the article is
  | about) to launch concurrent tasks without immediately awaiting
  | them? The example in the docs [1] wouldn't work (in the stated
  | manner) if the task didn't start until it was awaited.
  | 
  | > _At least he bothered to make note of TaskGroups [...] Yep,
  | that 's all there was to it._
  | 
  | That only works on Python 3.11, which was released just a few
  | months ago. Debian still uses 3.9, for example, so the
  | TaskGroups solution can't be used everywhere yet.
  | 
  | [1] https://docs.python.org/3/library/asyncio-
  | task.html#coroutin...
 
  | zackees wrote:
  | [dead]
 
| [deleted]
 
| [deleted]
 
| m3047 wrote:
| Hrmmmm.
| 
| > But who reads all the docs?
| 
| asyncio.create_task() doesn't exist in 3.6, and I can't find the
| string "to avoid a task disappearing" in the doc, so I'll go out
| on a limb: there is no such doc. However I see the reference to
| weakref.WeakSet.
 
  | Jtsummers wrote:
  | The world didn't end in 2016. Welcome to seven years in the
  | future where this documentation does, in fact, exist:
  | 
  | https://docs.python.org/3/library/asyncio-task.html#asyncio....
 
| cutler wrote:
| Maybe grafting async onto a single threaded dynamic language just
| isn't such a good idea in the first place.
 
  | murphy214 wrote:
  | bingo
 
| qxmat wrote:
| Python has a few weird issues like this. The last one I
| encountered was with a class inheriting Thread, join and the SQL
| Server ODBC driver on Linux. Fairly sure I hit page faults thanks
| to a shallow copy on driver allocated string data but didn't have
| the time to investigate like the hero of this blog post.
 
| whoopdeepoo wrote:
| > But who reads all the docs
| 
| Why is this so common? Do people seriously not read a
| language/library documentation? That's the absolute first thing I
| do when evaluating a technology.
 
  | adamckay wrote:
  | Because people have deadlines and need to get things working.
  | You read enough to figure out how to do what you need to do and
  | then mostly move on.
  | 
  | This function was added in 3.7 with no note on the importance
  | of saving a reference. In 3.9 a note was added "Save a
  | reference to the result of this function, to avoid a task
  | disappearing mid execution." which was then expanded with the
  | explanation of a weak reference in 3.10.
 
  | skitter wrote:
  | It absolutely is common. People see there is a len function
  | that takes one argument, they call len(some_collection), see
  | that it indeed returns the number of items in the collection
  | like they expect and move on. They don't expect len to return a
  | negative number instead on Thursdays, and of course it doesn't
  | because that would be a pretty big footgun. People also see
  | that there is a create_task function that takes a coroutine,
  | they call create_task(some_coroutine), see that the coroutine
  | indeed runs like they expect, and move on. Sure, you're
  | _supposed_ to await the result, but maybe they don 't need the
  | awaited value anymore, only the side effects, and see that it
  | still works.
 
  | throwaway81523 wrote:
  | I had a manager who actually told me not to read docs. I was a
  | bad report and read them anyway.
 
| winter_blue wrote:
| This article just makes me feel like Python, while a language
| with nice-ish syntax, is a language that was poorly hacked and
| put together with little concern/thought about the real-world
| implications of poor design decisions like this async design
| decision (and also dynamic typing - _a terrible thing in any_
| language).
 
  | crdrost wrote:
  | Most languages have something like this, usually around async.
  | 
  | For instance NodeJS has had a bit of this around promises, and
  | eventually needed to institute the rule "if a promise rejects
  | with an error, anf nobody is around to hear it, we will crash
  | your program on the assumption that you probably needed to
  | clean up some resources but didn't and now they're going to
  | leak. Listen to the error with a handler that does nothing, if
  | we are wrong about that."
 
    | macintux wrote:
    | One of many reasons I like Erlang: _everything_ is async, so
    | you have plenty of tooling /libraries/core language features
    | to support you.
 
  | photochemsyn wrote:
  | 'async footguns' returns 20,000+ hits on Google. Top one
  | happens to be:
  | 
  | https://news.ycombinator.com/item?id=32086973
  | 
  | > "Async seems to be the first big "footgun" of Rust. It's
  | widespread enough that you can't really avoid interacting with
  | it, yet it's bad enough that it makes..."
 
| deschutes wrote:
| Fun stuff. Why aren't unfinished tasks gc roots?
 
| [deleted]
 
| [deleted]
 
| dehrmann wrote:
| Another common async footgun I see is unthrottled gathering, and
| no throttling mechanism in the standard library. Once you gather
| an unspecified number of awaitables, bad things start to happen,
| either with CPU starvation, local IO starvation, or hammering an
| external service.
| 
| What I like about threads is they make dangerous things like this
| harder, and you have to put more thought into how much concurrent
| work you want outstanding. They also handle CPU starvation better
| for things that are latency-sensitive. I've seen degenerate
| requests tie up the event loop with 500 ms of processing time.
 
  | rednafi wrote:
  | Huh! Unless you're using semaphores, you can also recreate
  | similar situation with threads. Spin up a whole bunch of
  | threads and send all of them towards some shared object or make
  | 100s of requests with them.
  | 
  | There's not much difference between spinning up threads
  | explicitly and creating async task with asyncio.create_task. In
  | either case, you can throttle them with semaphores.
 
    | dehrmann wrote:
    | I don't have a source or affected versions, but semaphores
    | can scale poorly. I vaguely remember each blocked acquire
    | getting checked on every event loop iteration, or something
    | silly like that.
 
| acjohnson55 wrote:
| Something linters can help with would think?
 
  | ryanianian wrote:
  | C++ has nodiscard which is super useful for scenarios like this
  | where ownership can be tricky.
 
| smetj wrote:
| Start a thread/greenthread/fiber/process/task without holding a
| reference to at least tie all loose ends at exit? Hmm dunno.
 
  | tgv wrote:
  | You can do that in go. You don't even get a reference to the
  | thread/goroutine.
 
  | nixpulvis wrote:
  | Fire and forget.
 
| crabbone wrote:
| In many years since asyncio has been added, I have never used it
| willingly, outside of the cases where a third-party library
| required it. There has never been a practical benefit for any of
| that stuff when compared to select. It always worked poorly and
| never justified the effort one has to put into writing code that
| uses the library. The behavior OP describes is just one of the
| many bad design decisions that are so characteristic of this
| library.
 
| pyuser583 wrote:
| I don't find this behavior odd at all. Dereferencing unassigned
| values is normal Python garbage collector behavior. Threads are
| an exception (no pun intended), but they're an exception in lots
| of ways - just try pickling them.
 
| samsquire wrote:
| Thank you for this. This is really useful information.
| 
| I recently adapted some garbage collection code to add register
| scanning.
| 
| I can imagine all sorts of subtle bugs where things go away
| randomly. One problem I have with my multithreaded code is that
| sometimes a thread crashes and the logs are so long I don't
| notice. From my perspective the thread is just not doing
| anything.
| 
| Sometimes the absence of behaviour can be really tricky to debug!
 
| sgt wrote:
| Is this something go developers also have to be careful with when
| using goroutines?
 
  | gerad wrote:
  | No. But sometimes goroutines have the opposite problem, where
  | they don't terminate and get cleaned up.
  | 
  | https://betterprogramming.pub/common-goroutine-leaks-that-yo...
 
    | [deleted]
 
    | candiddevmike wrote:
    | Is there an (easy?) test for checking goroutine leaks?
 
      | Snawoot wrote:
      | Yes, it's visible on goroutine profile, provided by built-
      | in profiler pprof. E.g.: https://github.com/mysteriumnetwor
      | k/node/issues/5311#issueco...
 
  | Jtsummers wrote:
  | No. Goroutines don't generate a reference to hold onto, either.
  | They just run until they or the program terminate.
 
  | [deleted]
 
| makomk wrote:
| Well, this explains that one really annoying intermittent bug
| that I was having in some asyncio-based code.
 
| aardvark179 wrote:
| The same problem or something similar exists in many languages.
| Threads are GC roots because the OS knows about them, but this
| may not be true for lightweight threads or async callbacks.
| 
| It is hard to fix because you don't want to introduce references
| from an old object (such as a list of callbacks) to many new
| objects as that will introduce GC issues, and many other
| potential leaks.
 
| jarboot wrote:
| If I want to create a task that runs even after the function
| returns, ie "async def f():
| asyncio.create_task(coro=10_second_coro.run()); return;" is there
| any way to mitigate this? Function-scoped set of tasks?
 
  | nhumrich wrote:
  | Yes, read the last part of the included documentation and hold
  | onto background tasks.
 
  | jmholla wrote:
  | Your task is implicitly not function-scoped as you want it to
  | survive exiting the function. What your doing here would be
  | better architecturally done with threads. async is not a direct
  | replacement for threading.
  | 
  | But, you could also return the task object to the caller and
  | have them manage it. There's also nothing async about your
  | function, so you don't need the async or to await it.
 
| cmstodd wrote:
| Thanks for posting.
 
| nixpulvis wrote:
| Hey, at least it's documented... good developers actually RTFM.
| 
| I can't comment on the design of this API, because I don't feel
| like learning the library, but in some performance critical
| applications these sorts of contracts aren't all that uncommon.
| Granted, this is python, I guess it's a bit more suspicious, IDK.
 
  | vbernat wrote:
  | The documentation update is quite recent (Python 3.11). It was
  | added after this ticket: https://bugs.python.org/issue44665
  | (not the first ticket around this problem).
 
| [deleted]
 
| osigurdson wrote:
| A little pedantic but HUP concerns the fundamental limits of
| simultaneously knowing a particle's position and momentum, not
| about observation impacting outcomes.
 
| notatoad wrote:
| wow. yeah, this absolutely explains a heisenbug that i've been
| chasing for a while. and i can't count the number of times i've
| had that exact doc page open on my screen in the last few months,
| and never bothered to read that block of text that starts with
| "important"...
| 
| thanks
 
| aldenpage wrote:
| That's extremely insidious. I suppose I never encountered this
| issue because I almost always call asyncio.gather(*), which makes
| having a collection of tasks natural.
 
  | kortex wrote:
  | This is good form. It makes top-level control flow easier to
  | follow, and keeps the concurrency scoped.
 
| BiteCode_dev wrote:
| And this is why trio got it right, and why I think the task
| groups (nurseries from trio) can't arrive soon enough in the
| stdlib.
| 
| Because not only you must maintain a reference to any task, but
| you should also explicitly await it somewhere, using something
| like asyncio.wait() or asyncio.gather().
| 
| Most people don't know this, and it makes asyncio very difficult
| to use for them.
 
___________________________________________________________________
(page generated 2023-02-11 23:00 UTC)