proxy70

	[HN Gopher] A Heisenbug lurking in async Python ___________________________________________________________________ A Heisenbug lurking in async Python Author : willm Score : 336 points Date : 2023-02-11 17:25 UTC (5 hours ago)
	web link (textual.textualize.io)
	w3m dump (textual.textualize.io)
	\| dataflow wrote: \| The notion of fire-and-forget is itself the problem. Even with \| threads, you should have them join the main thread before the \| program exits. Which implies you should hold strong references to \| them until then. Most people don't go out of their way to do this \| even when they're able to, but that's what you're supposed to do. \| bornfreddy wrote: \| Wow. What a strange design decision, as evidenced by sheer number \| of developers who don't / didn't know about this (myself \| included). I hope this gets _fixed_ instead of just documented. \| jcheng wrote: \| Agreed, I'm really surprised at all the comments defending this \| behavior. I suspect there is a non-obvious reason why it's this \| way, but "you should've read the docs" and "but why _wouldn't_ \| you hold your own strong reference" are weird takes IMHO. \| boomskats wrote: \| As someone who happens to be eternally grateful to the author for \| his contribution to the Python ecosystem [0], I kinda feel like \| this comment thread is overreacting to his overreaction. When I \| look at this post all I see is a useful, well explained, byte- \| size writeup that a search engine might recommend to someone \| looking for help in writing async Python. \| \| Maybe it's because a bunch of my friends are Scottish and I get \| their sense of humour. \| \| [0]: https://rich.readthedocs.io/ (yes I'm talking about the \| fancy new progress bar that pip got recently) \| rlpb wrote: \| This issue doesn't exist with Trio's structured concurrency \| model. In other words, the problem is already solved. \| nbadg wrote: \| I'll +1 the Trio shoutout [1], but it's worth emphasizing that \| the core concept of Trio (nurseries) now exists in the stdlib \| in the form of task groups [2]. The article mentions this very \| briefly, but it's easy to miss, and I wouldn't describe it as a \| solution to this bug, anyways. Rather, it's more of a different \| way of writing multitasking code, which happens to make this \| class of bug impossible. \| \| [1] https://github.com/python-trio/trio \| \| [2] https://docs.python.org/3/library/asyncio-task.html#task- \| gro... \| Tanjreeve wrote: \| Oh good so now we can all move to this years Async flavour in \| Python. \| edfletcher_t137 wrote: \| This is a great blog post. Concise, lacking fluff or extraneous \| prose, it gets right to the point, presents the primary-source \| reference and then gets right to the solution. A bit of \| editorializing in the middle but that's completely allowed when \| writing this tightly. Well damn done, OP. \| \| And also it's _great_ information that I - like I 'm sure many of \| you - also never noticed. THANK YOU! \| [deleted] \| mgsk wrote: \| What does this add this isn't already right there in the \| documentation? \| nkrisc wrote: \| If there was nothing to add then there wouldn't be loads of \| projects on GitHub making exactly this mistake. \| Jtsummers wrote: \| It draws attention to a problem that a lot of people have \| created for themselves by not reading the documentation (or \| not recalling it if they read it). I guess the author could \| have just linked the documentation but then they couldn't \| have added the additional context of the github search \| demonstrating how common it is. \| newaccount74 wrote: \| I must have looked through the docs for create_task a dozen \| times while trying to figure out how async/await works in \| Python but still managed to overlook this part. \| edflsafoiewq wrote: \| That is unsurprising. It was first added as a brief note \| only in 3.9, and expanded to its present length only in \| 3.10. \| klyrs wrote: \| The author doesn't go into much detail on that point: this \| warning should be present in documentation of many Python \| libraries that use create_task and return the result to the \| user unless that library stores those tasks in a collection \| as is recommended -- at which point the library author had \| better roll their own garbage collection! \| isoprophlex wrote: \| Well, I don't know, I kinda miss the human angle. I'd have \| loved to first read six paragraphs about how the author's \| grandmother raised them on home grown threads and greenlets :^) \| nickjj wrote: \| > I'd have loved to first read six paragraphs about how the \| author's grandmother raised them on home grown threads and \| greenlets. \| \| With recipes, often times your problem is you want to learn \| how to make something where having the steps listed out is \| the most important thing. The story behind the recipe isn't \| important to solve your problem but for tech the story around \| the choice is important. Often times the "why" is really \| important and I really like hearing about what led someone to \| use something first. Often times that's more important or \| equally as important as the implementation details. \| \| It wouldn't make sense for this post given its title but if \| someone were making a post about why they chose to use async \| in Python I'd expect and hope that half of the post goes into \| the gory details of how they tried alternatives and what \| their shortcomings were for their specific use cases. That \| would help me as the reader generalize their post to my \| specific use cases and see if it applies. \| bialpio wrote: \| Off-topic but the life story is there to make them eligible \| to be protected by copyright. IANAL. \| \| Source: https://copyrightalliance.org/are-recipes- \| cookbooks-protecte... \| flandish wrote: \| Interesting. I always thought it was search engine \| optimization. \| aidenn0 wrote: \| SEO is definitely a big part of it; Google penalized \| pages where people closed or navigated away quickly. \| fbdab103 wrote: \| I immediately bounce from those Stackoverflow clones that \| keep appearing up at the top of searches. So, I am \| wondering how much this is still weighted in the scores. \| gdprrrr wrote: \| https://github.com/quenhus/uBlock-Origin-dev-filter \| jonas21 wrote: \| You might. But many people don't. They just want an \| answer and don't care if it's a clone or not. \| chucksmash wrote: \| Had this driven home recently, watching a younger dev \| happily clicking links I've long ago blocked via browser \| extension (w3schools AND geeksforgeeks _in one session_ ) \| rmbyrro wrote: \| SEO makes total sense. I always add grandma keywords when \| I'm searching for Python stuff on Google. \| \| Like: "grandma, how the hell have I still not memorized \| the API and keep needing to resort to the same doc pages \| again and again?" \| \| Now I trained ChatGPT with grandma letters from when I \| was young, so it will answer just like if it was my \| grandma. \| water-your-self wrote: \| Its engagement optimization. Adsense pays more if you \| spend more time on the page \| yunohn wrote: \| When is the last time you heard of online recipe blogs \| enforcing copyright claims on other blogspam? Ridiculous. \| \| The real reason is simple, people who write recipes \| aren't robots - they're expressing their stories and \| emotions, while explaining how to make food that's dear \| to them.. \| throwaway81523 wrote: \| There's a similar thing in tkinter but I guess users discover it \| faster, since the failure if you don't save the reference shows \| up fairly quickly. \| Lammy wrote: \| I experienced a heisenbug exactly like this in Ruby when trying \| to `while case Ractor::receive`: \| https://github.com/okeeblow/DistorteD/blob/dd2a99285072982d3... \| zzzeek wrote: \| I think asyncio is kind of neat for what it's good at, but \| beginner programmers who have never wrote code before are going \| directly to using Python asyncio (i know this because they are \| telling me so when they post sqlalchemy discussions). This is \| just wrong. \| samwillis wrote: \| This is one of many reasons I'm sceptical of the current trend in \| Python to "async all the things". The nuance to how it operates \| is often opaque to the developer, particularly those less \| experienced. \| \| GUI toolkits (like Textual) however are a really good use case \| for Asyncio. Human interaction with a program is inherently \| asynchronous, using async/await so that you can more cleanly \| specify your control flow is so much better than complicated \| callbacks. Using async/await in front end JS code for example is \| a delight. \| \| Where I'm particularly unconvinced of their use is in server side \| view and api end point processing. The majority of the time you \| have maybe a couple of IO opps that depend on each other. There \| is often little than can be parallelised (within a request) and \| so there are few performance gains to be a made. Traditional \| synchronous imperative code run with a multithreaded server is \| proven, scalable and much easier to debug. \| \| There are always places where it's useful though, things such as \| long running requests (websockets, long polling), or those very \| rare occurrences where you do have many easily parallelizable IO \| opps within one short request. \| heavyset_go wrote: \| > _Where I 'm particularly unconvinced of their use is in \| server side view and api end point processing. The majority of \| the time you have maybe a couple of IO opps that depend on each \| other. There is often little than can be parallelised (within a \| request) and so there are few performance gains to be a made. \| Traditional synchronous imperative code run with a \| multithreaded server is proven, scalable and much easier to \| debug. Traditional synchronous imperative code run with a \| multithreaded server is proven, scalable and much easier to \| debug._ \| \| Python doesn't have multithreading that scales or supports real \| parallelism. asyncio has very measurable performance benefits \| for exactly that use case you've mentioned versus threaded \| servers. \| zzzeek wrote: \| Sorry that's not accurate. Asyncio and threading offer the \| same variety of "parallelism" , which is that both can wait \| on multiple io streams at once (the gil is released waiting \| on io). Neither offer CPU parallelism, unless lots of your \| CPU work is in native extensions that release the gil. In \| that unusual case, threading would offer parallelism where \| asyncio wouldn't. \| \| Asyncio's single advantage is you can wait on _lots_ of io \| streams, like many thousands, very cheaply without having to \| roll non blocking IO queueing code directly. \| heavyset_go wrote: \| I didn't say that asyncio offered parallelism, I'm pointing \| out that normal assumptions about multithreading you'd make \| with other languages don't always apply to Python. You'd \| typically assume that threads offer parallelism, a property \| you might choose to use them for over something like \| single-threaded asyncio. \| \| I've found that for even IO bound workloads, the amount of \| throughput plateaus when using a relatively small amount of \| threads despite the GIL being released on IO. \| Topgamer7 wrote: \| These days with graphql, or complex microservices \| architectures, you could have multiple hops to fulfil l the \| original request. \| \| Flask sync will hold that thread hostage until the request is \| done. Where async with properly used async libs will allow \| other requests to process. \| \| We often have medium sized reports take seconds. That is a lot \| of time to wait. And would just end up bloating your service \| scaling to handle more connections. \| \| Any service with decently long lived network requests will \| benefit from event loop handled scheduling. \| traverseda wrote: \| >Where I'm particularly unconvinced of their use is in server \| side view and api end point processing. \| \| Sure, performance isn't going to get better, but for websockets \| and server sent events the occasional long-lived async task can \| be great. Especially when you need to poll something, or check \| in on a subprocess. \| nbadg wrote: \| The thing is, there's a lot more nuance to it than this. \| Async/await is part of the language syntax in python, but \| asyncio is only one particular implementation of an event loop \| framework to power it. But really what async/await provides is \| a general-purpose cooperative multitasking syntax. This allows \| other libraries to implement their own event loop frameworks, \| each with their own different semantics and considerations (the \| two best-known alternatives being Curio and Trio). At a \| language level, there's nothing even forcing you to use \| async/await for ascync IO -- you could, if you really wanted, \| probably write a library that used it to start threads and \| await their completion. \| \| So you have, from highest-level to lowest-level: application \| code, async/await language syntax, the event loop framework, \| and then the implementation of the event loop itself. The OP \| article concerns a peculiar implementation detail in the lowest \| level that makes it very easy to write bugs at the highest \| level. \| \| But that means that even if you do "async all the things", \| you'll only encounter this situation if you write your \| application code in a particular way. It just so happens that \| "in a particular way" is, in this case, the overwhelming \| majority of how people write it, which is, of course, why the \| OP article is relevant. \| heavyset_go wrote: \| > _The OP article concerns a peculiar implementation detail \| in the lowest level that makes it very easy to write bugs at \| the highest level._ \| \| Are other async implementations using the asyncio.Task \| abstraction? I haven't looked into it, but I assumed that \| asyncio.Task was tied to the asyncio implementation and event \| loop. \| pdonis wrote: \| _> GUI toolkits (like Textual) however are a really good use \| case for Asyncio._ \| \| Only if the GUI toolkit is explicitly written to be asyncio- \| aware and use asyncio's event loop. Textual appears to be \| written specifically to do that. \| \| However, other GUI toolkits that I'm aware of that have Python \| bindings aren't written that way. Qt, for example, uses its own \| event loop, and if you want anything other than a GUI event to \| be fed into Qt's event loop so your event-driven code can \| process it, you have to do that by hand and make sure it works. \| There is no point in even trying to use another event loop, \| such as Python's asyncio event loop, since that loop will never \| run while Qt's event loop is running. \| samsquire wrote: \| I am a huge fan of parallel and async code. I spend a lot of \| time researching it and trying to design software that is \| easily parallelisable. \| \| Many GUIs use the event/message pump pattern, such as Windows \| 32 API. Qt does something with its event loop (QEventLoop) \| \| Threads are a rather low level instrument to get background \| tasks going because the interface between the main thread and \| the threads is rather omitted. \| \| In Java you could use a ConcurrentLinkedQueue. And in Python \| you can use JoinableQueue. \| \| I am heavily interested in this space because I want to write \| understandable software that anybody can pick up and work with. \| I worked on a JMS log viewer that used threads but would crash \| with ConcurrentModificationException due to not being thread \| safe. I changed it to be thread safe but its performance \| dropped through the floor. In my learnings since then I should \| hast sharded each JMS connection topic to its own thread or \| multiplexed multiple JMS topics per thread and loop over them. \| The main thread can interrogate the thread with a lock, that \| should be faster than every thread trying to acquire the lock. \| It would be driven by the main thread but the work is done in \| the background. The threads can keep the fetched messages in \| memory until the main thread is ready for them. \| \| I think with the right abstraction, thread safety can be \| achieved and concurrency shouldn't be something to be afraid \| of. It is very difficult and challenging working at the low \| levels of concurrency such as a concurrent browser engine. \| (I've not done that though.) \| \| This is why languages such as Pony lang, Inko, Cyber and \| Erlang, Elixir are so promising. We can build high performance \| systems that parallelise. \| \| Writing an async/await pipeline that looks synchronous is far \| easier to understand and maintain than nested callbacks. So I \| can see where async is useful. I just hope we can design async \| software to be simpler to maintain and extend. \| whoopdeepoo wrote: \| I don't write any colored function code in python, I'd much \| rather work with process/thread pools \| Animats wrote: \| Me too, but threading is botched in Python. Not just the \| Global Interpreter Lock. Some Python packages are not thread- \| safe, and it's not documented which ones are not. Years ago I \| discovered that CPickle was not thread safe, and that wasn't \| considered a problem. \| michael_j_x wrote: \| I am not sure I agree that the GUI is a good use case for \| async. A human interaction with the program must almost always \| pre-empt whatever the program was running, so I can not see how \| a cooperative multi-threading runtime like async Python can \| work in such a scenario. \| kodablah wrote: \| It is for this reason in Temporal Python[0], where we wrote a \| custom durable asyncio event loop, that we maintain strong \| references to tasks that are created in workflows. This wouldn't \| be hard for other event loop implementations to do too. \| \| 0 - https://github.com/temporalio/sdk-python \| make3 wrote: \| he never said it was hard, his point is that it's unintuitive & \| a lot of people don't know or don't remember \| kodablah wrote: \| I mean the default asyncio event loop can be \| replaced/extended where you won't have to know/remember on \| each create_task. But yes, it is an unintuitive default. \| NelsonMinar wrote: \| Does anyone understand why the event loop only keeps weak \| references to tasks? It'd seem wise to do something to stop it \| from being garbage collected while running, maybe also while \| waiting to run. \| coopsmoss wrote: \| I agree, I think this is very unpythonic behavior \| masklinn wrote: \| Only guess I'd have is to protect the system against infinite- \| loop tasks, but I don't remember any other runtime caring and \| an a task which never terminates seems easier to diagnose than \| one which disappears on you. \| kortex wrote: \| Because it's almost always the case that the consumer is going \| to keep a reference to the task in some way, so that is the \| logical choice for the "primary owner" of the task. Python \| doesn't have ownership per se like rust, but if you keep more \| than one hard reference to an object around, it'll prevent \| collection, so in cases such as this it makes sense to \| designate one primary owner and have all other references be \| weakref. \| skitter wrote: \| > if you keep more than one hard reference to an object \| around, it'll prevent collection \| \| Which is the behavior the parent comment asks for. \| anthomtb wrote: \| Well, looks like I know what I am doing first thing on Monday. I \| converted a bunch of code to asyncio a while back. I have yet to \| run into any heisenbug in that code and want to keep it that way. \| cpburns2009 wrote: \| I've been working on a PySide6 application recently using \| asyncio. I read the docs but totally overlooked the requirement \| to hold references to tasks created with `create_task()`. \| dehrmann wrote: \| Eww. What's especially nasty is this is the opposite behavior of \| threads. \| aeturnum wrote: \| I really think this writer doth protest too much. \| \| Yes, the base async interface is confusing and overly complex. \| It's a downside! As they note lots of people have stepped in to \| provide better helpers (like TaskGroups) - but these are the docs \| for the base library! \| \| > _But who reads all the docs? And who has perfect recall if they \| do?_ \| \| Everyone reads the docs? That is why you don't need perfect \| recall because you can read them whenever you want. \| \| Python has lots of confusing corner cases ("" is truthy, you need \| to remember to call copy [or maybe deepcopy!] sometimes, all the \| other situations where you confuse weak v.s. strong references). \| They cause really common bugs. It's just a hazard of the language \| in general and the choices it makes (much like tasks being \| objects is a hazard). I do understand why people think they can \| throw away task references (based on other languages) - but this \| is Python! The garbage collector exists and you gotta check if \| you own the object or something else does. \| \| Edit: this feels like an experienced Python developer, who has \| already internalized all the older, non-async Python weirdness, \| being taken aback by weirdness they didn't expect. Like, I feel \| you, it does suck - but it's not a bug that values you don't \| retain may get garbage collected. \| No1 wrote: \| He didn't even have to read "all the docs" - just the ones that \| pertain the the function that he is using. And then not ignore \| the section marked "Important" _and_ the highlighted "Note". \| richbell wrote: \| What if he read the docs for that function prior to the \| "important" note being added? \| Karunamon wrote: \| > _Everyone reads the docs?_ \| \| The author goes on to say they found this pattern lurking in \| various projects on github. So, no. The problem is that this \| behavior is subtle, not intuitive, and unless you are reading \| the actual documentation top to bottom (and not just the \| function signature and first paragraph from the pop up in your \| IDE) you will likely get bitten by this. \| \| What is the point of your comment? The author _shouldn 't_ have \| called out the upturned rake in the darkened shed? \| rollcat wrote: \| > The author goes on to say they found this pattern lurking \| in various projects on github. \| \| I'd call it an anti-pattern. If you spawn a process/thread, \| and never wait/join it, it means you don't actually care what \| it does, if it crashes, etc. I don't see a problem with \| Python's behavior here. \| aeturnum wrote: \| I wouldn't say _shouldn 't_ - they are free to do what they \| want. But this is a blog post about something that can trip \| you up that the docs highlight - which the author calls a \| "heisenbug". The author doesn't even have a suggestion for \| the docs, which already calls out the problem they \| encountered, they just note that there are helpers for this \| problem (which is true). \| \| The point of my comment is that subtle, non intuitive things \| like this are all over Python and, while this one is \| _particularly bad_ , this blog post makes it seem like more \| of an aberration than it is. \| IshKebab wrote: \| > Everyone reads the docs? \| \| Wow I've heard people say that everyone _should_ read all of \| the docs (which isn 't really true) but I've never heard anyone \| claim that everyone _does_ read all of the docs! Wild. \| raverbashing wrote: \| > "" is truthy \| \| Humm, no? Unless you mean ("",) >>> not "" \| True \| aeturnum wrote: \| Oh, sorry, you are right - "" is false-y, even though it's a \| valid empty value. So it's hard to tell the difference \| between a value not being filled and a value being filled \| with an empty value. \| \| ex: answers = {} answers["I exist"] = \| "" if answers["I exist"]: print("a") \| \| does not print. \| fbdab103 wrote: \| I guess I am too deeply in the Python ecosystem to see a \| problem here. Unless you want to check for the existence of \| "I exist"? In which case, the Python Way would be \| answers = {} answers["I exist"] = "" if "I \| exist" in answers: print("a") \| pacaro wrote: \| Maybe ... if answers.get('I \| exist'): print('a') \| \| Which is why you should always explicitly check for \| _None_ if that is your intent. \| aeturnum wrote: \| It's not a problem? The async interface isn't a problem \| either. It's just a thing you have to remember about \| python: "most input is truthy except for the input that \| isn't" \| \| "Most of the time you don't disrupt your program by not \| keeping the returned reference in scope except for when \| you do" \| \| It's just a thing that trips people up. \| dwattttt wrote: \| > It's just a thing you have to remember ... \| \| The more of these things there are, the more brainpower \| you devote to remembering the right way to do things; if \| you don't you introduce bugs, a subtle, painful one here. \| heavyset_go wrote: \| "Empty containers are falsy" is a Python fundamental, \| this isn't a subtle bug, but an obvious one. \| fbdab103 wrote: \| Truthy is a Pythonic core principle of the language. It \| is not an edge case phenomenon in the language which I \| would expect a regular practitioner to confuse. \| \| https://docs.python.org/3/library/stdtypes.html#truth- \| value-... \| aeturnum wrote: \| I mean, I've seen bugs around that in code I've worked on \| and I've created bugs where it's a factor. \| \| Weakrefs are also a core part of the language: \| https://docs.python.org/3/library/weakref.html . You \| can't use python without using them. \| fiddlerwoaroof wrote: \| What I learned when I wrote Python professionally was \| "never rely on truthiness" explicitly writing out a \| boolean expression that does what you want is more \| explicit ("explicit is better than implicit", PEP 8) and \| prevents a whole class of bugs down the line. \| nemetroid wrote: \| PEP 8, which you mention, explicitly recommends relying \| on truthiness: \| \| > For sequences, (strings, lists, tuples), use the fact \| that empty sequences are false: # \| Correct: if not seq: if seq: # \| Wrong: if len(seq): if not len(seq): \| AeroNotix wrote: \| PEP8 is touted a lot as if it is a perfectly correct tome \| of ... correctness. I've worked in Python long enough to \| know that it both doesn't cover everything and the advice \| is sometimes actively bad. \| heavyset_go wrote: \| > _if answers[ "I exist"]:_ if "I exist" \| in answers: ... \| wizzwizz4 wrote: \| > _So it 's hard to tell the difference between a value not \| being filled and a value being filled with an empty value._ \| >>> answers = {} >>> if answers["I don't exist"]: \| ... print("a") Traceback (most recent call \| last): File "", line 1, in \| if answers["I don't exist"]: KeyError: "I don't \| exist" \| \| The method you're trying to use doesn't work _anyway_ : it \| doesn't matter that it's confusing. You'd have the same \| problem with the value False. \| Etheryte wrote: \| I think you may be too bold with the assumption here, \| personally I would wager that the majority of people who write \| Python don't even know Python has official docs outside of a \| site called Stack Overflow. \| leni536 wrote: \| Considering how many times I need to add site:python.org to my \| python search queries to actually get to the docs, I assume \| that a surprisingly low number of python developers actually \| read the docs. \| 0x008 wrote: \| If you use Druck duck go you can prefix search with "!py3" \| iforgotpassword wrote: \| > Everyone reads the docs? \| \| For Python? The language where everyone just cobbles together \| random code from the internet and other repos? I can totally \| see how this mistake happens left and right. The bar of entry \| for this language is way too low to assume only rigorous senior \| devs use it. \| bandyaboot wrote: \| He doesn't really get into what makes this a Heisenbug, only that \| it's indeterminate in nature. Would attaching a debugger/stepping \| through the code make it less likely that your task would get \| garbage collected out from under you? \| Izkata wrote: \| You're probably going to need a reference to the task in order \| to inspect it in the debugger. Creating that reference prevents \| the bug. \| foobarbecue wrote: \| Yeah, he seems to be re-defining the term to mean "a bug that \| occurs occasionally depending on system state" as opposed to "a \| bug that changes behavior when you observe it closely e.g. in a \| debugger." \| macintux wrote: \| The first is a common way of using the term Heisenbug. I \| first heard it used that way 10 years ago when discussing \| Erlang's error handling model. \| throwaway81523 wrote: \| CPython does most of its memory management by reference \| counting, which fails to reclaim circular structure. So to make \| sure it gets everything, it occasionally runs a conventional \| tracing GC. If the GC happens to run just after you create that \| async task, the task itself can get collected, it sounds like. \| It's good to know about this and is (my own editorializing) yet \| another reason Python3 should have used Erlang-style \| concurrency instead of this async stuff. \| No1 wrote: \| His argument hinges on "I can't be bothered to read the docs on \| the stuff I'm using." So instead of reading the docs on \| coroutines and tasks before using them, writes a rant about how \| it's all wrong because he didn't understand how it works. \| \| On a more fundamental level, why would anyone assume that a \| coroutine is guaranteed to complete if it is never awaited? There \| is no reason a scheduler could not be totally lazy and only \| execute the coroutine once awaited. \| \| At least he bothered to make note of TaskGroups, also clearly \| shown in his documentation screenshot, immediately above the \| section marked _Important_ that went ignored, and finishes with \| "As long as all the tasks you spin up are in TaskGroups, you \| should be fine." Yep, that's all there was to it. \| ptx wrote: \| > _There is no reason a scheduler could not be totally lazy and \| only execute the coroutine once awaited._ \| \| Isn't the point of create_task (which is what the article is \| about) to launch concurrent tasks without immediately awaiting \| them? The example in the docs [1] wouldn't work (in the stated \| manner) if the task didn't start until it was awaited. \| \| > _At least he bothered to make note of TaskGroups [...] Yep, \| that 's all there was to it._ \| \| That only works on Python 3.11, which was released just a few \| months ago. Debian still uses 3.9, for example, so the \| TaskGroups solution can't be used everywhere yet. \| \| [1] https://docs.python.org/3/library/asyncio- \| task.html#coroutin... \| zackees wrote: \| [dead] \| [deleted] \| [deleted] \| m3047 wrote: \| Hrmmmm. \| \| > But who reads all the docs? \| \| asyncio.create_task() doesn't exist in 3.6, and I can't find the \| string "to avoid a task disappearing" in the doc, so I'll go out \| on a limb: there is no such doc. However I see the reference to \| weakref.WeakSet. \| Jtsummers wrote: \| The world didn't end in 2016. Welcome to seven years in the \| future where this documentation does, in fact, exist: \| \| https://docs.python.org/3/library/asyncio-task.html#asyncio.... \| cutler wrote: \| Maybe grafting async onto a single threaded dynamic language just \| isn't such a good idea in the first place. \| murphy214 wrote: \| bingo \| qxmat wrote: \| Python has a few weird issues like this. The last one I \| encountered was with a class inheriting Thread, join and the SQL \| Server ODBC driver on Linux. Fairly sure I hit page faults thanks \| to a shallow copy on driver allocated string data but didn't have \| the time to investigate like the hero of this blog post. \| whoopdeepoo wrote: \| > But who reads all the docs \| \| Why is this so common? Do people seriously not read a \| language/library documentation? That's the absolute first thing I \| do when evaluating a technology. \| adamckay wrote: \| Because people have deadlines and need to get things working. \| You read enough to figure out how to do what you need to do and \| then mostly move on. \| \| This function was added in 3.7 with no note on the importance \| of saving a reference. In 3.9 a note was added "Save a \| reference to the result of this function, to avoid a task \| disappearing mid execution." which was then expanded with the \| explanation of a weak reference in 3.10. \| skitter wrote: \| It absolutely is common. People see there is a len function \| that takes one argument, they call len(some_collection), see \| that it indeed returns the number of items in the collection \| like they expect and move on. They don't expect len to return a \| negative number instead on Thursdays, and of course it doesn't \| because that would be a pretty big footgun. People also see \| that there is a create_task function that takes a coroutine, \| they call create_task(some_coroutine), see that the coroutine \| indeed runs like they expect, and move on. Sure, you're \| _supposed_ to await the result, but maybe they don 't need the \| awaited value anymore, only the side effects, and see that it \| still works. \| throwaway81523 wrote: \| I had a manager who actually told me not to read docs. I was a \| bad report and read them anyway. \| winter_blue wrote: \| This article just makes me feel like Python, while a language \| with nice-ish syntax, is a language that was poorly hacked and \| put together with little concern/thought about the real-world \| implications of poor design decisions like this async design \| decision (and also dynamic typing - _a terrible thing in any_ \| language). \| crdrost wrote: \| Most languages have something like this, usually around async. \| \| For instance NodeJS has had a bit of this around promises, and \| eventually needed to institute the rule "if a promise rejects \| with an error, anf nobody is around to hear it, we will crash \| your program on the assumption that you probably needed to \| clean up some resources but didn't and now they're going to \| leak. Listen to the error with a handler that does nothing, if \| we are wrong about that." \| macintux wrote: \| One of many reasons I like Erlang: _everything_ is async, so \| you have plenty of tooling /libraries/core language features \| to support you. \| photochemsyn wrote: \| 'async footguns' returns 20,000+ hits on Google. Top one \| happens to be: \| \| https://news.ycombinator.com/item?id=32086973 \| \| > "Async seems to be the first big "footgun" of Rust. It's \| widespread enough that you can't really avoid interacting with \| it, yet it's bad enough that it makes..." \| deschutes wrote: \| Fun stuff. Why aren't unfinished tasks gc roots? \| [deleted] \| [deleted] \| dehrmann wrote: \| Another common async footgun I see is unthrottled gathering, and \| no throttling mechanism in the standard library. Once you gather \| an unspecified number of awaitables, bad things start to happen, \| either with CPU starvation, local IO starvation, or hammering an \| external service. \| \| What I like about threads is they make dangerous things like this \| harder, and you have to put more thought into how much concurrent \| work you want outstanding. They also handle CPU starvation better \| for things that are latency-sensitive. I've seen degenerate \| requests tie up the event loop with 500 ms of processing time. \| rednafi wrote: \| Huh! Unless you're using semaphores, you can also recreate \| similar situation with threads. Spin up a whole bunch of \| threads and send all of them towards some shared object or make \| 100s of requests with them. \| \| There's not much difference between spinning up threads \| explicitly and creating async task with asyncio.create_task. In \| either case, you can throttle them with semaphores. \| dehrmann wrote: \| I don't have a source or affected versions, but semaphores \| can scale poorly. I vaguely remember each blocked acquire \| getting checked on every event loop iteration, or something \| silly like that. \| acjohnson55 wrote: \| Something linters can help with would think? \| ryanianian wrote: \| C++ has nodiscard which is super useful for scenarios like this \| where ownership can be tricky. \| smetj wrote: \| Start a thread/greenthread/fiber/process/task without holding a \| reference to at least tie all loose ends at exit? Hmm dunno. \| tgv wrote: \| You can do that in go. You don't even get a reference to the \| thread/goroutine. \| nixpulvis wrote: \| Fire and forget. \| crabbone wrote: \| In many years since asyncio has been added, I have never used it \| willingly, outside of the cases where a third-party library \| required it. There has never been a practical benefit for any of \| that stuff when compared to select. It always worked poorly and \| never justified the effort one has to put into writing code that \| uses the library. The behavior OP describes is just one of the \| many bad design decisions that are so characteristic of this \| library. \| pyuser583 wrote: \| I don't find this behavior odd at all. Dereferencing unassigned \| values is normal Python garbage collector behavior. Threads are \| an exception (no pun intended), but they're an exception in lots \| of ways - just try pickling them. \| samsquire wrote: \| Thank you for this. This is really useful information. \| \| I recently adapted some garbage collection code to add register \| scanning. \| \| I can imagine all sorts of subtle bugs where things go away \| randomly. One problem I have with my multithreaded code is that \| sometimes a thread crashes and the logs are so long I don't \| notice. From my perspective the thread is just not doing \| anything. \| \| Sometimes the absence of behaviour can be really tricky to debug! \| sgt wrote: \| Is this something go developers also have to be careful with when \| using goroutines? \| gerad wrote: \| No. But sometimes goroutines have the opposite problem, where \| they don't terminate and get cleaned up. \| \| https://betterprogramming.pub/common-goroutine-leaks-that-yo... \| [deleted] \| candiddevmike wrote: \| Is there an (easy?) test for checking goroutine leaks? \| Snawoot wrote: \| Yes, it's visible on goroutine profile, provided by built- \| in profiler pprof. E.g.: https://github.com/mysteriumnetwor \| k/node/issues/5311#issueco... \| Jtsummers wrote: \| No. Goroutines don't generate a reference to hold onto, either. \| They just run until they or the program terminate. \| [deleted] \| makomk wrote: \| Well, this explains that one really annoying intermittent bug \| that I was having in some asyncio-based code. \| aardvark179 wrote: \| The same problem or something similar exists in many languages. \| Threads are GC roots because the OS knows about them, but this \| may not be true for lightweight threads or async callbacks. \| \| It is hard to fix because you don't want to introduce references \| from an old object (such as a list of callbacks) to many new \| objects as that will introduce GC issues, and many other \| potential leaks. \| jarboot wrote: \| If I want to create a task that runs even after the function \| returns, ie "async def f(): \| asyncio.create_task(coro=10_second_coro.run()); return;" is there \| any way to mitigate this? Function-scoped set of tasks? \| nhumrich wrote: \| Yes, read the last part of the included documentation and hold \| onto background tasks. \| jmholla wrote: \| Your task is implicitly not function-scoped as you want it to \| survive exiting the function. What your doing here would be \| better architecturally done with threads. async is not a direct \| replacement for threading. \| \| But, you could also return the task object to the caller and \| have them manage it. There's also nothing async about your \| function, so you don't need the async or to await it. \| cmstodd wrote: \| Thanks for posting. \| nixpulvis wrote: \| Hey, at least it's documented... good developers actually RTFM. \| \| I can't comment on the design of this API, because I don't feel \| like learning the library, but in some performance critical \| applications these sorts of contracts aren't all that uncommon. \| Granted, this is python, I guess it's a bit more suspicious, IDK. \| vbernat wrote: \| The documentation update is quite recent (Python 3.11). It was \| added after this ticket: https://bugs.python.org/issue44665 \| (not the first ticket around this problem). \| [deleted] \| osigurdson wrote: \| A little pedantic but HUP concerns the fundamental limits of \| simultaneously knowing a particle's position and momentum, not \| about observation impacting outcomes. \| notatoad wrote: \| wow. yeah, this absolutely explains a heisenbug that i've been \| chasing for a while. and i can't count the number of times i've \| had that exact doc page open on my screen in the last few months, \| and never bothered to read that block of text that starts with \| "important"... \| \| thanks \| aldenpage wrote: \| That's extremely insidious. I suppose I never encountered this \| issue because I almost always call asyncio.gather(*), which makes \| having a collection of tasks natural. \| kortex wrote: \| This is good form. It makes top-level control flow easier to \| follow, and keeps the concurrency scoped. \| BiteCode_dev wrote: \| And this is why trio got it right, and why I think the task \| groups (nurseries from trio) can't arrive soon enough in the \| stdlib. \| \| Because not only you must maintain a reference to any task, but \| you should also explicitly await it somewhere, using something \| like asyncio.wait() or asyncio.gather(). \| \| Most people don't know this, and it makes asyncio very difficult \| to use for them. ___________________________________________________________________ (page generated 2023-02-11 23:00 UTC)