[HN Gopher] BasicBlocker: ISA Redesign to Make Spectre-Immune CP...
___________________________________________________________________
 
BasicBlocker: ISA Redesign to Make Spectre-Immune CPUs Faster
(2021)
 
Author : PaulHoule
Score  : 33 points
Date   : 2023-07-26 17:55 UTC (5 hours ago)
 
web link (arxiv.org)
w3m dump (arxiv.org)
 
| bob1029 wrote:
| Speculative execution, despite whatever flaws, brings a style of
| optimization that you simply cannot substitute with any other.
| Conceptually, the ability to _continuously time travel into the
| future and bring information back_ is a pretty insane form of
| optimization. The fact that this also prefetches memory for us is
| amazing, except in some unhappy adverse contexts. Perhaps we
| should just pause there for a moment and reflect...
| 
| Imagine being able to simultaneously visit 4 retail stores and
| dynamically select items depending on availability and pricing,
| arriving back home having spent the amount of time it takes to
| shop at 1.25 stores while burning 1.5x the fuel of a one-store
| trip.
| 
| There is no amount of ISA redesign or recompilation that can
| accommodate the dynamics of real-world trends in the same ways
| that speculative execution can. Instead of trying to replace
| speculative execution, I think we should try to put it into a
| more secure domain where it can run free and be "dangerous"
| without actually being allowed to cause trouble outside the
| intended scope. Perhaps I am asking for superpositioned cake
| here. Is there a fundamental reason we cannot make speculative
| execution secure?
 
  | insanitybit wrote:
  | > Is there a fundamental reason we cannot make speculative
  | execution secure?
  | 
  | It is secure in many, many contexts. For example, I have no
  | concerns about speculative execution if I'm running a database
  | or service, which is great since those are the areas where
  | performance matters most.
  | 
  | Where it's troublesome is when you need isolation in the
  | presence of arbitrary code execution. My suggestion is that if
  | you ever find yourself in that scenario that you manage your
  | cores manually - ensure that "attacker" code never crosses with
  | anything sensitive on the same core. If you need that next
  | level of security, enable the mitigations.
  | 
  | Pinning your cores is going to help a lot with the mitigations
  | anyways - the TLB doesn't have to be flushed in circumstances
  | where the same process on the same core is switched in, or
  | something like that (someone please explain more, I forget and
  | I don't want to look it up right now). There's some process
  | context id cache blah blah blah, the point is that you can
  | improve things if you pin to a core.
  | 
  | I think basically you're right. Instead of removing an amazing
  | optimization let's find the areas where we can enable it,
  | understand the threat model where it's relevant, and find ways
  | to either reduce the cost of mitigations or otherwise mitigate
  | in a way that's free.
 
  | matu3ba wrote:
  | > Is there a fundamental reason we cannot make speculative
  | execution secure?
  | 
  | Any memory access leads to a time channel, which might be
  | observable or not. As example, hyperthreading is known to
  | create observable side channels even in L1 and L2 cache, since
  | those are shared.
  | 
  | L3 cache is also shared between CPU cores on the same socket,
  | so unless you can ensure the L3 cache data can never be shared
  | you can not entirely eliminate this time channel.
  | 
  | Now getting back to speculative execution: All possible
  | execution sequences must ensure to satisfy all possible cross-
  | interaction rules to not make any time channel visible, which
  | 1. includes restoring previous state fully and 2. not leaking
  | any timing behavior. Just think in your mind of all possible
  | cases, which would need to be verified (the complete
  | instruction set) and if you can think of any sane time travel
  | time leaking cache-aware separation logic to do this.
  | 
  | On top of that it has already been shown that fundamentally the
  | hardware guarantees on cache behavior are broken (they are
  | merely a hints).
 
  | mike_hock wrote:
  | > Is there a fundamental reason we cannot make speculative
  | execution secure
  | 
  | You've said it yourself
  | 
  | > The fact that this also prefetches memory for us is amazing
  | 
  | To be secure, speculatively executed instructions that don't
  | retire, have to have _no_ observable effects, including those
  | observable through timing. They cannot be allowed to modify the
  | cache hierarchy in any way.
 
    | bob1029 wrote:
    | Does speculative execution on my CPU affect your computing
    | environment?
 
  | PaulHoule wrote:
  | Look at the failure of VLIW, a compiler can't know what is
  | going on with the memory system at runtime but the hardware can
  | make a very good guess.
 
  | causality0 wrote:
  | I'm still of the opinion the industry response to spectre was
  | wildly overblown. It's been five years and not one single damn
  | person has been a confirmed victim of it yet we all tied an
  | albatross around our CPU's neck the moment the news broke.
 
    | insanitybit wrote:
    | > It's been five years and not one single damn person has
    | been a confirmed victim of it yet
    | 
    | Well, everyone patched things pretty aggressively, so
    | exploitation isn't really practical. At minimum, that's why.
    | Also, vuln research can take a long time to turn into
    | exploits - there are so many existing primitives that people
    | are exploring right now (ebpf, io_uring) that have
    | _practical_ exploitation primitives already designed, so
    | there isn 't much pressure to go for something that's already
    | patched and that would require a lot of novel research to
    | find reliable primitives for it.
    | 
    | As for the mitigations, just disable them? They're on by
    | default because Linux doesn't know what your use case is, but
    | if your use case isn't relevant to the mitigation's threat
    | model please feel free to disable them. It is very simple to
    | do so.
 
      | causality0 wrote:
      | Oh I did. I never installed the patches in the first place
      | and I've been a regular user of InSpectre.
 
  | JoshTriplett wrote:
  | I do wonder to what degree we could hard-partition caches, such
  | that speculative prefetches go straight into CPU-specific
  | caches, and doesn't get to go into shared caches (e.g. L3)
  | until it stops being speculative.
  | 
  | I also wonder to what degree we could partition between user
  | mode and supervisor mode (and provide similar facilities to
  | partition between user-mode and user-mode-sandbox, such as
  | WebAssembly or other JITs), with the same premise. Let the
  | kernel prefetch things but don't let userspace notice the
  | speculated entries.
 
    | insanitybit wrote:
    | > I do wonder to what degree we could hard-partition caches,
    | such that speculative prefetches go straight into CPU-
    | specific caches, and doesn't get to go into shared caches
    | (e.g. L3) until it stops being speculative.
    | 
    | This is already sort of possible. The TLB flushing can take
    | advantage of the PCID to determine, based on the process,
    | whether the cache must be flushed - this provides _process
    | level_ isolation of the TLB.
    | 
    | I believe recent CPUs are increasing the size of some PCID
    | related components since it's becoming increasingly important
    | post-kPTI.
 
    | matu3ba wrote:
    | This sounds like a huge slowdown for linear memory access
    | patterns (max. throughput), which do not fit into L1+L2
    | cache. I dont see options to prevent L3 cache time behavior
    | being leaked unless one takes performance cuts for memory
    | access patterns.
    | 
    | The only option I do see is something to prevent specific and
    | limited memory as not being allowed into L3 cache altogether.
    | 
    | Since you're active and interested in this stuff: What is the
    | state of art on cpusets for flexible task pinning on cores?
    | "Note. There is a minor chance that a task forks during move
    | and its child remains in the root cpuset." is mentioned in
    | the suse docs https://documentation.suse.com/sle-
    | rt/12-SP5/single-html/SLE..., but without any background
    | explanation and I do not understand what stuff breaks on
    | moving pinned Kernel tasks.
 
  | c-linkage wrote:
  | I am in no way an expert on CPU design, but one way this might
  | be possible is to use "memory tagging" in the sense that CPU
  | execution pipelines are extended through the CPU cache (and
  | possibly into RAM itself) by "tags" that link the state of a
  | cache cell to a branch of speculative execution.
  | 
  | For example, a pre-fetch linked to a speculative execution
  | would be tagged with a CPU-specific speculative execution
  | identifier such that the pre-fetched data would only be
  | accessible in that pipeline. If that speculative execution
  | becomes realized then the tag would be updated (perhaps to
  | zero?) to show it was _actually_ executed and visible to all
  | other CPUs and caches. In all other cases, the speculative
  | execution is abandoned and the tagged cache cells become marked
  | as available and undefined. Circuitry similar to register
  | renaming could be used to handle tagging in the caches at the
  | cost of effectively _halving_ cache sizes.
  | 
  | In a more macro sense, imagine git branches that get merged
  | back into main. The speculative execution only occurs on the
  | branch. When the CPU realizes the prediction was good and
  | doesn't need to rolled back, the branch is merged into the
  | trunk and becomes visible to all other systems having access to
  | the trunk.
 
    | greatfilter251 wrote:
    | [dead]
 
  | kirse wrote:
  | _Imagine being able to simultaneously visit 4 retail stores and
  | dynamically select items depending on availability and pricing,
  | arriving back home having spent the amount of time it takes to
  | shop at 1.25 stores while burning 1.5x the fuel of a one-store
  | trip._
  | 
  | What a fantastic analogy.
 
| kmeisthax wrote:
| This idea seems so simple that I'm pretty sure at least three
| people have independently thought of the same idea when reading
| about branch delay slots. I suspect some of the more strict
| aspects of basic block enforcement would also at least frustrate
| some ROP attacks.
| 
| All in all, seems like a good idea, when can we staple this onto
| existing processors?
 
| [deleted]
 
| shiftingleft wrote:
| Previous discussions:
| 
| https://news.ycombinator.com/item?id=24090632
| 
| https://news.ycombinator.com/item?id=34202099
 
  | CalChris wrote:
  | This is a new version (v2) of their paper.
 
| Joel_Mckay wrote:
| Clock domain crossing is a named problem with a finite set of
| solutions.
| 
| When you get out of abstract logical domains into real world
| physics, than the notion of machine state becomes fuzzier at
| higher clock speeds.
| 
| Good luck. =)
 
| phoe-krk wrote:
| (2021)
 
  | dang wrote:
  | Added. Thanks!
 
  | ChrisArchitect wrote:
  | (2020) even
 
___________________________________________________________________
(page generated 2023-07-26 23:00 UTC)