proxy70

	[HN Gopher] BasicBlocker: ISA Redesign to Make Spectre-Immune CP... ___________________________________________________________________ BasicBlocker: ISA Redesign to Make Spectre-Immune CPUs Faster (2021) Author : PaulHoule Score : 33 points Date : 2023-07-26 17:55 UTC (5 hours ago)
	web link (arxiv.org)
	w3m dump (arxiv.org)
	\| bob1029 wrote: \| Speculative execution, despite whatever flaws, brings a style of \| optimization that you simply cannot substitute with any other. \| Conceptually, the ability to _continuously time travel into the \| future and bring information back_ is a pretty insane form of \| optimization. The fact that this also prefetches memory for us is \| amazing, except in some unhappy adverse contexts. Perhaps we \| should just pause there for a moment and reflect... \| \| Imagine being able to simultaneously visit 4 retail stores and \| dynamically select items depending on availability and pricing, \| arriving back home having spent the amount of time it takes to \| shop at 1.25 stores while burning 1.5x the fuel of a one-store \| trip. \| \| There is no amount of ISA redesign or recompilation that can \| accommodate the dynamics of real-world trends in the same ways \| that speculative execution can. Instead of trying to replace \| speculative execution, I think we should try to put it into a \| more secure domain where it can run free and be "dangerous" \| without actually being allowed to cause trouble outside the \| intended scope. Perhaps I am asking for superpositioned cake \| here. Is there a fundamental reason we cannot make speculative \| execution secure? \| insanitybit wrote: \| > Is there a fundamental reason we cannot make speculative \| execution secure? \| \| It is secure in many, many contexts. For example, I have no \| concerns about speculative execution if I'm running a database \| or service, which is great since those are the areas where \| performance matters most. \| \| Where it's troublesome is when you need isolation in the \| presence of arbitrary code execution. My suggestion is that if \| you ever find yourself in that scenario that you manage your \| cores manually - ensure that "attacker" code never crosses with \| anything sensitive on the same core. If you need that next \| level of security, enable the mitigations. \| \| Pinning your cores is going to help a lot with the mitigations \| anyways - the TLB doesn't have to be flushed in circumstances \| where the same process on the same core is switched in, or \| something like that (someone please explain more, I forget and \| I don't want to look it up right now). There's some process \| context id cache blah blah blah, the point is that you can \| improve things if you pin to a core. \| \| I think basically you're right. Instead of removing an amazing \| optimization let's find the areas where we can enable it, \| understand the threat model where it's relevant, and find ways \| to either reduce the cost of mitigations or otherwise mitigate \| in a way that's free. \| matu3ba wrote: \| > Is there a fundamental reason we cannot make speculative \| execution secure? \| \| Any memory access leads to a time channel, which might be \| observable or not. As example, hyperthreading is known to \| create observable side channels even in L1 and L2 cache, since \| those are shared. \| \| L3 cache is also shared between CPU cores on the same socket, \| so unless you can ensure the L3 cache data can never be shared \| you can not entirely eliminate this time channel. \| \| Now getting back to speculative execution: All possible \| execution sequences must ensure to satisfy all possible cross- \| interaction rules to not make any time channel visible, which \| 1. includes restoring previous state fully and 2. not leaking \| any timing behavior. Just think in your mind of all possible \| cases, which would need to be verified (the complete \| instruction set) and if you can think of any sane time travel \| time leaking cache-aware separation logic to do this. \| \| On top of that it has already been shown that fundamentally the \| hardware guarantees on cache behavior are broken (they are \| merely a hints). \| mike_hock wrote: \| > Is there a fundamental reason we cannot make speculative \| execution secure \| \| You've said it yourself \| \| > The fact that this also prefetches memory for us is amazing \| \| To be secure, speculatively executed instructions that don't \| retire, have to have _no_ observable effects, including those \| observable through timing. They cannot be allowed to modify the \| cache hierarchy in any way. \| bob1029 wrote: \| Does speculative execution on my CPU affect your computing \| environment? \| PaulHoule wrote: \| Look at the failure of VLIW, a compiler can't know what is \| going on with the memory system at runtime but the hardware can \| make a very good guess. \| causality0 wrote: \| I'm still of the opinion the industry response to spectre was \| wildly overblown. It's been five years and not one single damn \| person has been a confirmed victim of it yet we all tied an \| albatross around our CPU's neck the moment the news broke. \| insanitybit wrote: \| > It's been five years and not one single damn person has \| been a confirmed victim of it yet \| \| Well, everyone patched things pretty aggressively, so \| exploitation isn't really practical. At minimum, that's why. \| Also, vuln research can take a long time to turn into \| exploits - there are so many existing primitives that people \| are exploring right now (ebpf, io_uring) that have \| _practical_ exploitation primitives already designed, so \| there isn 't much pressure to go for something that's already \| patched and that would require a lot of novel research to \| find reliable primitives for it. \| \| As for the mitigations, just disable them? They're on by \| default because Linux doesn't know what your use case is, but \| if your use case isn't relevant to the mitigation's threat \| model please feel free to disable them. It is very simple to \| do so. \| causality0 wrote: \| Oh I did. I never installed the patches in the first place \| and I've been a regular user of InSpectre. \| JoshTriplett wrote: \| I do wonder to what degree we could hard-partition caches, such \| that speculative prefetches go straight into CPU-specific \| caches, and doesn't get to go into shared caches (e.g. L3) \| until it stops being speculative. \| \| I also wonder to what degree we could partition between user \| mode and supervisor mode (and provide similar facilities to \| partition between user-mode and user-mode-sandbox, such as \| WebAssembly or other JITs), with the same premise. Let the \| kernel prefetch things but don't let userspace notice the \| speculated entries. \| insanitybit wrote: \| > I do wonder to what degree we could hard-partition caches, \| such that speculative prefetches go straight into CPU- \| specific caches, and doesn't get to go into shared caches \| (e.g. L3) until it stops being speculative. \| \| This is already sort of possible. The TLB flushing can take \| advantage of the PCID to determine, based on the process, \| whether the cache must be flushed - this provides _process \| level_ isolation of the TLB. \| \| I believe recent CPUs are increasing the size of some PCID \| related components since it's becoming increasingly important \| post-kPTI. \| matu3ba wrote: \| This sounds like a huge slowdown for linear memory access \| patterns (max. throughput), which do not fit into L1+L2 \| cache. I dont see options to prevent L3 cache time behavior \| being leaked unless one takes performance cuts for memory \| access patterns. \| \| The only option I do see is something to prevent specific and \| limited memory as not being allowed into L3 cache altogether. \| \| Since you're active and interested in this stuff: What is the \| state of art on cpusets for flexible task pinning on cores? \| "Note. There is a minor chance that a task forks during move \| and its child remains in the root cpuset." is mentioned in \| the suse docs https://documentation.suse.com/sle- \| rt/12-SP5/single-html/SLE..., but without any background \| explanation and I do not understand what stuff breaks on \| moving pinned Kernel tasks. \| c-linkage wrote: \| I am in no way an expert on CPU design, but one way this might \| be possible is to use "memory tagging" in the sense that CPU \| execution pipelines are extended through the CPU cache (and \| possibly into RAM itself) by "tags" that link the state of a \| cache cell to a branch of speculative execution. \| \| For example, a pre-fetch linked to a speculative execution \| would be tagged with a CPU-specific speculative execution \| identifier such that the pre-fetched data would only be \| accessible in that pipeline. If that speculative execution \| becomes realized then the tag would be updated (perhaps to \| zero?) to show it was _actually_ executed and visible to all \| other CPUs and caches. In all other cases, the speculative \| execution is abandoned and the tagged cache cells become marked \| as available and undefined. Circuitry similar to register \| renaming could be used to handle tagging in the caches at the \| cost of effectively _halving_ cache sizes. \| \| In a more macro sense, imagine git branches that get merged \| back into main. The speculative execution only occurs on the \| branch. When the CPU realizes the prediction was good and \| doesn't need to rolled back, the branch is merged into the \| trunk and becomes visible to all other systems having access to \| the trunk. \| greatfilter251 wrote: \| [dead] \| kirse wrote: \| _Imagine being able to simultaneously visit 4 retail stores and \| dynamically select items depending on availability and pricing, \| arriving back home having spent the amount of time it takes to \| shop at 1.25 stores while burning 1.5x the fuel of a one-store \| trip._ \| \| What a fantastic analogy. \| kmeisthax wrote: \| This idea seems so simple that I'm pretty sure at least three \| people have independently thought of the same idea when reading \| about branch delay slots. I suspect some of the more strict \| aspects of basic block enforcement would also at least frustrate \| some ROP attacks. \| \| All in all, seems like a good idea, when can we staple this onto \| existing processors? \| [deleted] \| shiftingleft wrote: \| Previous discussions: \| \| https://news.ycombinator.com/item?id=24090632 \| \| https://news.ycombinator.com/item?id=34202099 \| CalChris wrote: \| This is a new version (v2) of their paper. \| Joel_Mckay wrote: \| Clock domain crossing is a named problem with a finite set of \| solutions. \| \| When you get out of abstract logical domains into real world \| physics, than the notion of machine state becomes fuzzier at \| higher clock speeds. \| \| Good luck. =) \| phoe-krk wrote: \| (2021) \| dang wrote: \| Added. Thanks! \| ChrisArchitect wrote: \| (2020) even ___________________________________________________________________ (page generated 2023-07-26 23:00 UTC)