|
| bob1029 wrote:
| Speculative execution, despite whatever flaws, brings a style of
| optimization that you simply cannot substitute with any other.
| Conceptually, the ability to _continuously time travel into the
| future and bring information back_ is a pretty insane form of
| optimization. The fact that this also prefetches memory for us is
| amazing, except in some unhappy adverse contexts. Perhaps we
| should just pause there for a moment and reflect...
|
| Imagine being able to simultaneously visit 4 retail stores and
| dynamically select items depending on availability and pricing,
| arriving back home having spent the amount of time it takes to
| shop at 1.25 stores while burning 1.5x the fuel of a one-store
| trip.
|
| There is no amount of ISA redesign or recompilation that can
| accommodate the dynamics of real-world trends in the same ways
| that speculative execution can. Instead of trying to replace
| speculative execution, I think we should try to put it into a
| more secure domain where it can run free and be "dangerous"
| without actually being allowed to cause trouble outside the
| intended scope. Perhaps I am asking for superpositioned cake
| here. Is there a fundamental reason we cannot make speculative
| execution secure?
| insanitybit wrote:
| > Is there a fundamental reason we cannot make speculative
| execution secure?
|
| It is secure in many, many contexts. For example, I have no
| concerns about speculative execution if I'm running a database
| or service, which is great since those are the areas where
| performance matters most.
|
| Where it's troublesome is when you need isolation in the
| presence of arbitrary code execution. My suggestion is that if
| you ever find yourself in that scenario that you manage your
| cores manually - ensure that "attacker" code never crosses with
| anything sensitive on the same core. If you need that next
| level of security, enable the mitigations.
|
| Pinning your cores is going to help a lot with the mitigations
| anyways - the TLB doesn't have to be flushed in circumstances
| where the same process on the same core is switched in, or
| something like that (someone please explain more, I forget and
| I don't want to look it up right now). There's some process
| context id cache blah blah blah, the point is that you can
| improve things if you pin to a core.
|
| I think basically you're right. Instead of removing an amazing
| optimization let's find the areas where we can enable it,
| understand the threat model where it's relevant, and find ways
| to either reduce the cost of mitigations or otherwise mitigate
| in a way that's free.
| matu3ba wrote:
| > Is there a fundamental reason we cannot make speculative
| execution secure?
|
| Any memory access leads to a time channel, which might be
| observable or not. As example, hyperthreading is known to
| create observable side channels even in L1 and L2 cache, since
| those are shared.
|
| L3 cache is also shared between CPU cores on the same socket,
| so unless you can ensure the L3 cache data can never be shared
| you can not entirely eliminate this time channel.
|
| Now getting back to speculative execution: All possible
| execution sequences must ensure to satisfy all possible cross-
| interaction rules to not make any time channel visible, which
| 1. includes restoring previous state fully and 2. not leaking
| any timing behavior. Just think in your mind of all possible
| cases, which would need to be verified (the complete
| instruction set) and if you can think of any sane time travel
| time leaking cache-aware separation logic to do this.
|
| On top of that it has already been shown that fundamentally the
| hardware guarantees on cache behavior are broken (they are
| merely a hints).
| mike_hock wrote:
| > Is there a fundamental reason we cannot make speculative
| execution secure
|
| You've said it yourself
|
| > The fact that this also prefetches memory for us is amazing
|
| To be secure, speculatively executed instructions that don't
| retire, have to have _no_ observable effects, including those
| observable through timing. They cannot be allowed to modify the
| cache hierarchy in any way.
| bob1029 wrote:
| Does speculative execution on my CPU affect your computing
| environment?
| PaulHoule wrote:
| Look at the failure of VLIW, a compiler can't know what is
| going on with the memory system at runtime but the hardware can
| make a very good guess.
| causality0 wrote:
| I'm still of the opinion the industry response to spectre was
| wildly overblown. It's been five years and not one single damn
| person has been a confirmed victim of it yet we all tied an
| albatross around our CPU's neck the moment the news broke.
| insanitybit wrote:
| > It's been five years and not one single damn person has
| been a confirmed victim of it yet
|
| Well, everyone patched things pretty aggressively, so
| exploitation isn't really practical. At minimum, that's why.
| Also, vuln research can take a long time to turn into
| exploits - there are so many existing primitives that people
| are exploring right now (ebpf, io_uring) that have
| _practical_ exploitation primitives already designed, so
| there isn 't much pressure to go for something that's already
| patched and that would require a lot of novel research to
| find reliable primitives for it.
|
| As for the mitigations, just disable them? They're on by
| default because Linux doesn't know what your use case is, but
| if your use case isn't relevant to the mitigation's threat
| model please feel free to disable them. It is very simple to
| do so.
| causality0 wrote:
| Oh I did. I never installed the patches in the first place
| and I've been a regular user of InSpectre.
| JoshTriplett wrote:
| I do wonder to what degree we could hard-partition caches, such
| that speculative prefetches go straight into CPU-specific
| caches, and doesn't get to go into shared caches (e.g. L3)
| until it stops being speculative.
|
| I also wonder to what degree we could partition between user
| mode and supervisor mode (and provide similar facilities to
| partition between user-mode and user-mode-sandbox, such as
| WebAssembly or other JITs), with the same premise. Let the
| kernel prefetch things but don't let userspace notice the
| speculated entries.
| insanitybit wrote:
| > I do wonder to what degree we could hard-partition caches,
| such that speculative prefetches go straight into CPU-
| specific caches, and doesn't get to go into shared caches
| (e.g. L3) until it stops being speculative.
|
| This is already sort of possible. The TLB flushing can take
| advantage of the PCID to determine, based on the process,
| whether the cache must be flushed - this provides _process
| level_ isolation of the TLB.
|
| I believe recent CPUs are increasing the size of some PCID
| related components since it's becoming increasingly important
| post-kPTI.
| matu3ba wrote:
| This sounds like a huge slowdown for linear memory access
| patterns (max. throughput), which do not fit into L1+L2
| cache. I dont see options to prevent L3 cache time behavior
| being leaked unless one takes performance cuts for memory
| access patterns.
|
| The only option I do see is something to prevent specific and
| limited memory as not being allowed into L3 cache altogether.
|
| Since you're active and interested in this stuff: What is the
| state of art on cpusets for flexible task pinning on cores?
| "Note. There is a minor chance that a task forks during move
| and its child remains in the root cpuset." is mentioned in
| the suse docs https://documentation.suse.com/sle-
| rt/12-SP5/single-html/SLE..., but without any background
| explanation and I do not understand what stuff breaks on
| moving pinned Kernel tasks.
| c-linkage wrote:
| I am in no way an expert on CPU design, but one way this might
| be possible is to use "memory tagging" in the sense that CPU
| execution pipelines are extended through the CPU cache (and
| possibly into RAM itself) by "tags" that link the state of a
| cache cell to a branch of speculative execution.
|
| For example, a pre-fetch linked to a speculative execution
| would be tagged with a CPU-specific speculative execution
| identifier such that the pre-fetched data would only be
| accessible in that pipeline. If that speculative execution
| becomes realized then the tag would be updated (perhaps to
| zero?) to show it was _actually_ executed and visible to all
| other CPUs and caches. In all other cases, the speculative
| execution is abandoned and the tagged cache cells become marked
| as available and undefined. Circuitry similar to register
| renaming could be used to handle tagging in the caches at the
| cost of effectively _halving_ cache sizes.
|
| In a more macro sense, imagine git branches that get merged
| back into main. The speculative execution only occurs on the
| branch. When the CPU realizes the prediction was good and
| doesn't need to rolled back, the branch is merged into the
| trunk and becomes visible to all other systems having access to
| the trunk.
| greatfilter251 wrote:
| [dead]
| kirse wrote:
| _Imagine being able to simultaneously visit 4 retail stores and
| dynamically select items depending on availability and pricing,
| arriving back home having spent the amount of time it takes to
| shop at 1.25 stores while burning 1.5x the fuel of a one-store
| trip._
|
| What a fantastic analogy.
| kmeisthax wrote:
| This idea seems so simple that I'm pretty sure at least three
| people have independently thought of the same idea when reading
| about branch delay slots. I suspect some of the more strict
| aspects of basic block enforcement would also at least frustrate
| some ROP attacks.
|
| All in all, seems like a good idea, when can we staple this onto
| existing processors?
| [deleted]
| shiftingleft wrote:
| Previous discussions:
|
| https://news.ycombinator.com/item?id=24090632
|
| https://news.ycombinator.com/item?id=34202099
| CalChris wrote:
| This is a new version (v2) of their paper.
| Joel_Mckay wrote:
| Clock domain crossing is a named problem with a finite set of
| solutions.
|
| When you get out of abstract logical domains into real world
| physics, than the notion of machine state becomes fuzzier at
| higher clock speeds.
|
| Good luck. =)
| phoe-krk wrote:
| (2021)
| dang wrote:
| Added. Thanks!
| ChrisArchitect wrote:
| (2020) even
___________________________________________________________________
(page generated 2023-07-26 23:00 UTC) |