[HN Gopher] Zoom: Remote Code Execution with XMPP Stanza Smuggling
___________________________________________________________________
 
Zoom: Remote Code Execution with XMPP Stanza Smuggling
 
Author : Flowdalic
Score  : 173 points
Date   : 2022-05-24 15:00 UTC (8 hours ago)
 
web link (bugs.chromium.org)
w3m dump (bugs.chromium.org)
 
| thinkmassive wrote:
| Heh, it's like an AIM punter, but better!
 
| jeffbee wrote:
| At some point we are going to need enforceable professional
| standards that effectively deal with commercial software
| publishers who choose to parse untrusted inputs in non-
| performance-sensitive contexts with C libraries.
 
  | TedDoesntTalk wrote:
  | We are? Why?
 
    | defen wrote:
    | Since most software users are not tech-savvy and care about
    | convenience and price significantly more than they care about
    | security (revealed preference), the "worse is better"
    | phenomenon incentivizes commercial developers to implement
    | the minimum security practices that their customers will
    | bear. This is individually rational for the developers and
    | the users, but the result is untold billions of dollars of
    | costs costs. Regulation would be one way to change the
    | incentives.
 
  | turminal wrote:
  | This bug has nothing to do with language choice.
  | 
  | I agree that better professional standards and accountability
  | should be introduced for software like zoom though.
 
  | userbinator wrote:
  | No. We don't need more authoritarian dystopia.
 
| bobbylarrybobby wrote:
| Having multiple, potentially different parsers is incredibly
| dangerous. One person used the fact that different plist parsers
| in the macOS kernel choked in different ways when interpreting
| malformed xml, leading some to believe the plist was "safe"
| because it did not grant certain permissions, while others
| trusted this "safe" plist but believed it did grant these
| permissions.
| 
| https://blog.siguza.net/psychicpaper/
 
| dqv wrote:
| I didn't even consider the existence of XMPP vulns until I
| listened to the Darknet Diaries episode about Kik[0]. It's a
| really interesting class of vulnerabilities.
| 
| [0]: https://darknetdiaries.com/episode/93/
 
| dgellow wrote:
| Some relevant info in case you don't want to read the whole
| description but wonder if you're concerned by the issue:
| 
| > Zoom fixed the server-side issues in February and client-side
| issues on April 24 in version 5.10.4.
| 
| > Zoom published a security bulletin about client-side fixes at
| https://explore.zoom.us/en/trust/security/security-bulletin
| 
| CVE-2022-25235 CVE-2022-25236 Fixed-2022-Apr-24 CVE-2022-22784
| CVE-2022-22785 CVE-2022-22786 CVE-2022-22787
 
| kevincox wrote:
| This is another lesson that you should always parse+serialize
| rather that just validate. It is much harder to smuggle data this
| way to exploit different parsers.
| 
| Basically the set of all messages that will satisfy your
| validator is far larger than the set of all messages that will be
| produced by your serializer.
 
  | lovasoa wrote:
  | I am not sure this applies in this case. I don't know how
  | Zoom's XMPP backend works, but it could very well parse and
  | serialize and still be vulnerable. If the xml library accepts
  | invalid 3-byte utf8 characters on parse, then its internal
  | representation supports these characters, and I don't see why
  | they would not be serialized just as well.
 
  | fsflover wrote:
  | Or, it's another lesson that you should not completely trust
  | any code but compartmentalize instead. Thanks to Qubes OS, I am
  | still safe, since Zoom is running in a hardware-virtualized VM.
 
    | JoshTriplett wrote:
    | I'm safe as well, because I only use the web version of Zoom.
    | Code you don't trust should always run in a sandbox, if it
    | runs at all.
 
      | fsflover wrote:
      | This is however a very different level of sandboxing.
 
        | JoshTriplett wrote:
        | Sure, but it's much easier for most people to run things
        | in a browser sandbox.
 
    | jeffbee wrote:
    | How is that helpful? This exploit completely replaces the
    | Zoom software with arbitrary attacker software and it
    | executes in your VM that has access to camera, microphone,
    | network, and presumably screen recording. It sounds to me
    | like the highest possible level of access and your VM is just
    | performative.
 
      | fsflover wrote:
      | 1. It will not have access to anything else than Zoom.
      | 
      | 2. It will not have access to the camera or network, when
      | I'm not using Zoom.
      | 
      | 3. If I'm using a disposable VM, it's cleaned every reboot.
      | 
      | > and presumably screen recording
      | 
      | Screen recording of this VM.
 
        | jeffbee wrote:
        | How is screen recording only of Zoom itself of any use to
        | you?
 
        | fsflover wrote:
        | If needed, I can move a presentation to that VM, or open
        | a browser in it.
        | 
        | It gets a bit complicated if you want to share a screen
        | from another VM, see https://forum.qubes-os.org/t/share-
        | screen-of-qube-with-anoth...
 
  | ifratric1 wrote:
  | XMPP servers (including Zoom's) already parse + serialize ;)
 
| robertlagrant wrote:
| This vuln writeup is extremely well written. Actually quite
| interesting to read!
 
| twoodfin wrote:
| The XML parsing/validation bugs are, I suppose, not shocking, but
| deeply disappointing.
| 
| The _one thing_ XML  & its tooling were supposed to get right was
| document well-formed-ness. Sure, it might be a mess of a standard
| in other ways, but at least we could agree what a parser should
| and shouldn't accept! (Not the case for the HTML tag soup of then
| or now.)
| 
| That, 25 years on, a popular XML processor can't even meet that
| low bar for _tag names_ is maddening.
 
  | jerf wrote:
  | Unfortunately, the problem here is programmers moreso than
  | formats. It literally doesn't matter what you specify,
  | programmers will not implement it to a T. Most programmers
  | simply don't know that every single detail matters. Many of
  | those who may have some idea don't really care, since they
  | can't imagine how something like this could happen.
  | 
  | It's not just XML. It's every ecosystem I've ever used. Push it
  | around the edges and you _will_ find things.
  | 
  | This is neat, not because it is special to JSON in particular
  | but because it's an example of examining a good chunk of a
  | large ecosystem: https://seriot.ch/projects/parsing_json.html
  | Consider this is likely to be true in any ecosystem that
  | doesn't make it a top priority to avoid.
 
    | mwcampbell wrote:
    | I suppose it's safest to use a binary format where variable-
    | length fields are prefixed with their length.
 
      | amluto wrote:
      | More generally, if you want to include a block of
      | untrustworthy structured data in a protocol, it's very much
      | preferable to do so in a way that does not require
      | inspecting the data in question to figure out where it ends
      | and thus where the outer protocol resumes.
      | 
      | English is not immune. Think about "who's on first" --
      | there is no way to distinguish the untrustworthy name "who"
      | from a grammatical part of the conversation.
 
      | jandrese wrote:
      | Sure if you like ingesting 4GB records. There is nothing
      | inherently safer in binary formats. It's easy to write
      | parsers that can handle properly formatted files, it is
      | when you're dealing with corrupt or misformed files that
      | everything gets complicated.
 
        | teakettle42 wrote:
        | > There is nothing inherently safer in binary formats.
        | 
        | Sure there is. Barring a pathologically bad wire format
        | design, they're easier to parse than an equivalent human
        | editable encoding.
        | 
        | Eliminating the human-editing ability requirement also
        | enables us to:
        | 
        | - Avoid introducing character encoding -- a huge problem
        | space just on its own -- into the list of things that all
        | parsers must get right.
        | 
        | - Define non-malleable encodings; in other words, ensure
        | that there exists only one valid encoding for any valid
        | message, eliminating parser bugs that emerge around
        | handling (or not) multiple different ways to encode the
        | same thing.
 
      | jerf wrote:
      | Assuming properly-created data, yes. You aren't immune to
      | problems but you will reduce them, especially in a memory-
      | safe language.
      | 
      | Unfortunately, in a security context, that is not only not
      | guaranteed, but will be actively attacked, so in practice
      | I'm not sure it buys you _that_ much from a security
      | perspective. A net positive, I think, but certainly not
      | enough that you ca metaphorically kick back and enjoy your
      | lemonade.
      | 
      | The binary format is one of the oldest of security
      | vulnerabilities, by simply claiming a length of larger than
      | the buffer allocated in the C program, though I'm inclined
      | to credit that particular joy to C and not the data itself.
      | Nowadays there aren't many languages where simply claiming
      | to be really long will get you anywhere like that.
 
      | ajsnigrutin wrote:
      | Sure, until someone sets the prefix to 100MB large, and
      | sends zero bytes of data :)
 
    | IshKebab wrote:
    | I disagree. The way the format is designed has a direct
    | effect on how likely implementors are to implement it
    | correctly. So the format designers bear some responsibility.
    | 
    | For example how many Protobuf parser libraries have security
    | bugs? I'm guessing very few because the standard is nice and
    | simple, and it's very clearly defined without much "it's
    | probably like this" wiggle room (much easier for binary
    | formats!).
    | 
    | XML had a ton of unnecessary complexity that could have been
    | avoided to make implementations simpler. I haven't actually
    | read this bug so let's see if it was one of:
    | 
    | * Closing tags having to repeat the name / two different ways
    | of closing tags.
    | 
    | * CDATA
    | 
    | * Namespaces (especially how they are defined)
    | 
    | * &entities;
    | 
    | Edit: Ha it wasn't any of those - but it was still an issue
    | with text based formats. Seems like Expat assumes the content
    | is _valid_ UTF-8 (and doesn 't validate it), while Gloox
    | assumes it is ASCII. Obviously this couldn't have happened
    | with binary formats.
    | 
    | If you care about security DON'T USE TEXT FORMATS!
 
      | salawat wrote:
      | Wrong.
      | 
      | If you care about security, _verify your goddamn
      | invariants_.
      | 
      | This is not a software problem. This is a lazy
      | programmer/software engineer problem. Electrical
      | Engineering, or hell, any matyre engineering field
      | understands this concept.
      | 
      | If you have mot read your entire codepath, _you have no
      | idea what it is you are doing_.
      | 
      | Welcome to why my life as a QA is effing miserable. Every
      | bit of ignorance by devs following the philosophy of
      | "abstraction is good" is dealt with at the level of
      | Software BoM audit.
      | 
      | All hail Time to Market!
 
        | KronisLV wrote:
        | > If you care about security, verify your goddamn
        | invariants.
        | 
        | While it would be nice to be able to do this, sadly we
        | don't have infinite resources, lest we be okay with
        | actually shipping software in 5-10 years instead of 1-2.
        | I know that I would be okay with such a world, but people
        | who pay my salary might not share that point of view. Nor
        | do the people who would have to choose an app to use in
        | the near future, instead of waiting for a decade to do
        | so.
        | 
        | > This is not a software problem. This is a lazy
        | programmer/software engineer problem. Electrical
        | Engineering, or hell, any matyre engineering field
        | understands this concept.
        | 
        | The thing is, that the majority of the development out
        | there is like the Wild West. If my code throws a
        | NullPointerException or a NullReferenceException, then
        | someone is going to be mildly annoyed and it might result
        | in a Jira issue to fix. Code failing in a variety of ways
        | is almost considered normal in some respects, outside of
        | specific (expensive) contexts, where correctness matters
        | a lot.
        | 
        | Admittedly, even in programming there are fields where
        | the stakes are higher, though writing code for planes (as
        | an example) is wildly different than what 90% of people
        | out there would call "programming". Personally, I'd like
        | 100% test coverage (lines, code branches, everything),
        | but outside of these high stakes environments it would be
        | wasteful to do so.
        | 
        | > If you have mot read your entire codepath, you have no
        | idea what it is you are doing.
        | 
        | For many out there, this is pretty much impossible to do
        | in a meaningful way. Let's use something like the Spring
        | framework, a popular option in Java for web dev, a stack
        | that has a rather high level of abstraction. In it, the
        | actual code path that you're dealing with would involve
        | your application code, the framework code (which is
        | likely many times longer than your actual application,
        | uses reflection and other complex mechanisms, overall
        | being truly Eldritch at times), any integrated libraries,
        | as well as the JVM and some other code on your actual
        | system, that interfaces with the JVM.
        | 
        | Even if you toss out Java from the stack, the actual hot
        | code path in any non-trivial piece of software will be
        | pretty difficult to reason about, due to different types
        | of linking, different external package versions etc.
        | Unless you feel okay with very, very slowly stepping
        | through everything with a debugger, which probably still
        | won't give you too good of an idea of what's actually
        | happening and what should have happened.
        | 
        | Though maybe traversing 20 layers of abstraction in
        | Spring and coming out of that debugging session more
        | confused than you were than when you entered it is just a
        | Java/Spring thing, who knows.
        | 
        | > Welcome to why my life as a QA is effing miserable.
        | Every bit of ignorance by devs following the philosophy
        | of "abstraction is good" is dealt with at the level of
        | Software BoM audit.
        | 
        | I think there's plenty of misery to be had all around.
        | For a humorous take at the state of things, have a look
        | at this article:
        | https://www.stilldrinking.org/programming-sucks
        | 
        | > All hail Time to Market!
        | 
        | All hail being able to pay rent by delivering sub-optimal
        | software to meet ever changing business demands in an
        | environment where nobody wants to pay for perfect
        | software. That's simply the world we live in, take it or
        | leave it (e.g. pursue whichever environment feels better
        | to you, within the bounds of your opportunities in life).
 
    | twoodfin wrote:
    | This is just so basic a screwup though. The W3C spec for XML
    | has had a formal syntactic description of valid tag names for
    | decades:
    | 
    | https://www.w3.org/TR/2006/REC-xml11-20060816/#sec-common-
    | sy...
    | 
    | Plenty of libraries get this right because it's so easy.
    | You'd almost have to try--probably by being "clever"--to get
    | it wrong.
 
  | Diggsey wrote:
  | There are just so many issues here.
  | 
  | 1) Don't rely on two parsers having identical behaviour for
  | security. Yes parsers for the same format _should_ behave the
  | same, but bugs happen, so don 't design a system where small
  | differences result in such a catastrophic bug. If you
  | absolutely _have_ to do this, at least use the same parser on
  | both ends.
  | 
  | 2) Don't allow layering violations. All content of XML
  | documents is required to be valid in the configured character
  | encoding. That means layer 1 of your decoder should be
  | converting a byte stream into a character stream, and layers 2+
  | should not even have the opportunity to mess up decoding a
  | character. Efficiency is not a justification, because you can
  | use compile-time techniques to generate the exact same code as
  | if you combined all layers into one. This has the added benefit
  | that it removes edge-cases (if there is one place where bytes
  | are decoded into characters, then you _can 't_ get a bug where
  | that decoding is only broken in tag names, and so your test
  | coverage is automatically better).
  | 
  | 3) Don't transparently download and install stuff without user
  | interaction, regardless of where it comes from!
  | 
  | 4) Revoke certificates for old compromised versions of an
  | installer so that downgrade attacks are not possible.
 
    | iancarroll wrote:
    | > Revoke certificates for old compromised versions of an
    | installer so that downgrade attacks are not possible.
    | 
    | Worth noting that Windows accepts signatures from revoked
    | code signing certificates so long as it has a signed
    | timestamped before the revocation.
 
      | hamandcheese wrote:
      | ....and I assume the revocation can't be back-dated?
 
        | ComputerGuru wrote:
        | timestamps must come from a globally recognized signed
        | source, like digicert or verisign.
 
        | iancarroll wrote:
        | The CA could backdate the CRL's revocation timestamp if
        | they wanted, but it seems unlikely and presumably it's
        | not allowed.
 
    | bombcar wrote:
    | I doubt anyone actively revokes certificates ever - perhaps
    | maybe the game console makers.
 
      | crismigo wrote:
      | dsdas
 
| henearkr wrote:
| Good thing that I never used the standalone client and always the
| in-browser webapp instead.
 
  | user23894295637 wrote:
  | How do you do that? On any OS I tried (Debian, Windows) it
  | always *forces* me to download the standalone client, otherwise
  | I can't join. There's no alternative link ("Join via web") like
  | MS Teams has for example.
  | 
  | I really feel uncomfortable each time I have to install the
  | client on a machine for my relatives :/
 
    | ydant wrote:
    | I've always been able to use the in-browser client, but you
    | have to download the client once or twice before the page
    | will update to show the alternative "use browser". It's
    | definitely an intentional dark pattern.
 
    | mehagar wrote:
    | Check out https://github.com/arkadiyt/zoom-redirector. You
    | can also join meetings from https://pwa.zoom.us/wc/.
 
      | user23894295637 wrote:
      | OMG, thank you so much! That's a huge relief.
      | 
      | I actually started boycotting Zoom meetings where I can. If
      | anyone sends me a zoom invitation and I know that they are
      | not forced by having to be available for larger audiences I
      | suggest them to use basically anything else.
      | 
      | I don't know why, but from the first time I visited their
      | website until today, I have the feeling I can't trust the
      | company.
 
| Flowdalic wrote:
| It appears that Gloox, a relative low-level XMPP-client C
| library, rolled much of its Unicode and XML parsing itself, which
| made such vulnerabilities more likely. There maybe good reasons
| to not re-use existing modules and rely on external libraries,
| especially if you target constraint low-end embedded devices, but
| you should always be aware of the drawbacks. And the Zoom client
| typically does not run on those.
 
  | Aeolun wrote:
  | I find that response a bit strange, since the whole reason the
  | Zoom client has these particular vulnerabilities is because
  | they didn't roll their own, and instead rely on layers of
  | broken libraries.
  | 
  | It's quite possible they'd have more bugs without doing that,
  | but re-using existing modules could just as easily have been an
  | even worse idea.
 
    | eli wrote:
    | I think the point is that Unicode and XML parsing are known
    | to be security critical components and you should take care
    | that they are handled only by well tested code designed
    | specifically for the purpose. You need to not roll your own
    | and also ensure that any third party components didn't roll
    | their own.
 
      | remus wrote:
      | > You need to not roll your own and also ensure that any
      | third party components didn't roll their own.
      | 
      | If you're not writing the code and somebody else isn't
      | writing the code then who is writing the code?!
 
        | eli wrote:
        | A well-tested Unicode library built for security should
        | be doing your Unicode parsing in security critical
        | components.
        | 
        | It's just another way of saying you should be doing a
        | security audit as part of selecting a library and
        | integrating it into your product.
 
    | WesolyKubeczek wrote:
    | Using what everyone and their dog is using is prone to bugs
    | just as much because software without bugs doesn't exist or
    | is not very useful, but it also has the benefit of many
    | versatile eyeballs looking at it in many different contexts.
    | 
    | So if there's a bug found and fixed in libxml2 which is used
    | by almost everything else, everyone else instantly benefits.
    | Same with libicu which is being used, for example, by NodeJS
    | with its huge deployments footprint. Oh, and every freakin'
    | Webkit-based browser out there.
    | 
    | OTOH, they rolled their own, so all bugs they hit are
    | confined only to zoom, and are only guaranteed to get Zoom
    | all the bad press.
    | 
    | Choose your poison carefully.
 
      | Aeolun wrote:
      | If they roll their own it also becomes less interesting to
      | actively exploit.
      | 
      | Obviously this doesn't really work for Zoom any more, since
      | their footprint is too large, but it can stop driveby
      | attackers in other situations. Nobody is going to expend
      | too much effort figuring out joe schmuck's homegrown
      | solution, where they'd happily run a known exploit against
      | the unpatched wordpress server.
 
        | pixl97 wrote:
        | Security by obscurity has been debated to hell and back.
        | It only works if you stay obsecure... and don't leak your
        | code.
 
    | Flowdalic wrote:
    | I get your confusion. But keep in mind that it is not only
    | about just picking the library that shows as first result of
    | your Google search. My naive self thinks that a million
    | dollar company should do some research and evaluate different
    | options when choosing external codebase to build their
    | flagship product on. There a dozens of XMPP libraries, and
    | they picked the one that does not seem to delegate XML and
    | Unicode handling to other libraries, which should raise a
    | flag.
 
    | mwcampbell wrote:
    | I think that's a false dichotomy; IMO the best default choice
    | is to rely on the most well-tested library in any given
    | category. That suggests to me that they should have used
    | expat on the client side.
 
  | zamalek wrote:
  | One of the harder things with XMPP is that it is a badly-formed
  | document up until the connection is closed. You need a SAX-
  | style/event-based parser to handle it. That makes rolling your
  | own understandable in _some_ cases (e.g. dotnet 's System.Xml
  | couldn't do this prior to XLinq).
  | 
  | That being said, as you indicated Gloox is C-based, and the
  | reference implementation of SAX is in C. There is no excuse.
 
    | Flowdalic wrote:
    | > One of the harder things with XMPP is that it is a badly-
    | formed document up until the connection is closed. You need a
    | SAX-style/event-based parser to handle it.
    | 
    | That is a common misconception, although I am not sure of its
    | origin. I know plenty of XMPP implementations that use an XML
    | pull parser.
 
      | zamalek wrote:
      | It's possible by blocking the thread that's reading the
      | XML, but now you're in thread-per-client territory, and
      | that doesn't scale.
 
    | TedDoesntTalk wrote:
    | DOM-based XML parsers use SAX parsing under the hood.
 
      | zamalek wrote:
      | Right, but if they don't give you access to the SAX parser
      | then you are SOL.
 
  | xxpor wrote:
  | This is a very common issue across all of software engineering
  | I've found. But I really don't get why. If I was given the task
  | of parsing Unicode or XML, I'd run and find a library as fast
  | as possible, because that sounds terrible and tedious, and I'd
  | rather do literally anything else!
  | 
  | Why aren't people more lazy, in other words?
 
| rektide wrote:
| How much of Zoom is powered by XMPP? Do we know much about these
| internals? This would be super cool to learn about.
 
___________________________________________________________________
(page generated 2022-05-24 23:00 UTC)