proxy70

	[HN Gopher] How I found a bug in Intel Skylake processors (2017) ___________________________________________________________________ How I found a bug in Intel Skylake processors (2017) Author : vinnyglennon Score : 228 points Date : 2021-11-08 16:12 UTC (6 hours ago)
	web link (gallium.inria.fr)
	w3m dump (gallium.inria.fr)
	\| facorreia wrote: \| 2017. \| [deleted] \| lordnacho wrote: \| The problem with bugs deep in the stack is that it is really time \| consuming to establish that they are in fact as deep as they are. \| \| I wrote a Swift iOS app once, and came across an issue with one \| of the collection classes. \| \| Of course, nobody thinks that the Swift libs will be wrong as a \| first guess. So I worked through a number of hypotheses about my \| own code, slowly stripping out pieces that I thought might \| contain an error. And then combinations. I also tried reducing \| the number of entries just to simplify the logs. This worked, but \| of course you are not going to think that there's a library bug \| affecting collections with size > 16, and it wasn't actually a \| theory until I randomly decided to reduce the n. I also \| discovered that it worked just fine on release but not debug, so \| I thought maybe I have some race condition. \| \| More and more stripping down occurred, until I eventually gave up \| using my own project and just started a new one just to see about \| the collection class. I did it for the sake of being thorough, \| rather than actually thinking the lib had a bug in its debug \| implementation. But lo and behold, when I managed to make it \| reproducible and put it on SO, someone from Apple acknowledged \| that they could also see it, and they fixed it. \| \| Naturally if I'd gone direct to testing the lib I'd have saved a \| huge amount of time, but I guess that's the tradeoff from the \| most sensible heuristic: test your own code first, the bug is \| there. \| gh123man wrote: \| > nobody thinks that the Swift libs will be wrong as a first \| guess \| \| This is highly dependent on which version of Swift you started \| with! When Swift introduced the new substring API I hit a bug \| where certain UTF-8 character sequences caused an index out of \| bounds error internally. Unfortunately we learned this in \| production when an entire organization couldn't launch our app \| due to a string they were feeding through it. \| \| That is how your trust in the standard libs is forever broken. \| Library and compiler bugs were quite common in the Swift 1-3 \| days. \| jcelerier wrote: \| Yeah, over the course of my allegedly short career (I'm 29) \| I've reported dozens of bugs against GCC, Clang, MSVC, \| binutils, Qt, SDL, glibc, PortAudio, macOS and other \| foundational stuff... I'm not saying I automatically assume \| "toolchain bug", but my cutoff for seriously pondering "is it \| a bug in $underlying_stuff" is around 30 minutes of "I really \| can't see where in my code things were done wrong" and so far \| this heuristic has consistently held... \| cesarb wrote: \| > but I guess that's the tradeoff from the most sensible \| heuristic: test your own code first, the bug is there. \| \| Also known as "select is not broken" (see for instance \| https://blog.codinghorror.com/the-first-rule-of- \| programming-...). \| yjftsjthsd-h wrote: \| Reminds me of: "It Is Never a Compiler Bug Until It Is" \| (https://r6.ca/blog/20200929T023701Z.html , \| https://news.ycombinator.com/item?id=24636326). The bottom of \| the modern stack is _really_ reliable, until it isn 't;) \| Smoosh wrote: \| Not just "the modern stack". I work mainframes and always \| felt the IBM-supplied environment (compilers, transaction \| processing systems, databases) was rock solid. \| \| Then one day I discovered APARs were a thing. \| \| https://www.ibm.com/support/pages/open-apars-ibm-products- \| av... \| twic wrote: \| Similar story with a bug in the IBM JDK's implementation of \| BigDecimal. Surely if anyone is going to get decimals right \| it's IBM! Took us a long time to stop looking at our code. \| \| (turns out that IBM do get decimals right if you're running on \| z/Architecture, where the code diverts to some hardware- \| accelerated fast path; just not on x86-64 machines used by \| paupers like my project) \| CalChris wrote: \| Debian announcement \| \| https://lists.debian.org/debian-devel/2017/06/msg00308.html \| \| Ahrefs writeup \| \| https://tech.ahrefs.com/skylake-bug-a-detective-story-ab1ad2... \| \| The Intel spec update still labels SKL150 as _No Fix_ but there \| is a microcode update available. Dunno exactly what to make of \| that distinction. \| \| https://www.intel.com/content/www/us/en/processors/core/desk... \| \| Can an x86 program detect whether this update has been applied? \| Can a Linux process set a DONT_HYPERTHREAD_ME_BRO bit? \| BeeOnRope wrote: \| It was "fixed" in a microcode update by disabling the _loop \| stream buffer_ (LSD) which is a special mode of operation for \| very small loops where the instruction decoders and uop cache \| in the CPU are shut down and the loop runs directly out of a \| small cache. Since the problem arose only when the LSD was \| being used, in combination with hyperthreading and high byte \| register use, this effectively avoids the problem. \| \| Of course, disabling the LSD has some costs: CPUs use more \| power and some loops are slower (though some are faster). These \| updates are usually applied silently without user consent, so \| you might quite surprised to find out that after a reboot your \| computation kernel suddenly draws more power or has slowed down \| or sped up. \| \| > Can an x86 program detect whether this update has been \| applied? Can a Linux process set a DONT_HYPERTHREAD_ME_BRO bit? \| \| Yes. One way would be to check the microcode version (available \| in /proc/cpuinfo on Linux, among other places), since the \| version that introduced this fix is known. \| \| Another way would be to run a small loop known to fit in the \| LSD and then check a performance counter event which counts \| uops delivered from the LSD, like lsd.uops. This counter is \| always zero when the LSD is disabled (or realistically you \| could just run _any_ substantial code and check the counter \| since you always have some non-neglible portion of the uops \| coming from the LSD). This is how I check it from the command \| line in practice. \| \| Finally, if you don't have easy access to the counters, you \| could create a loop that has a significant performance \| difference depending on whether it is coming from the LSD or \| not. For example, a loop that crosses a 32-byte boundary will \| run 2 or more cycles when using the decoder or uop cache, but \| could run in 1 cycle in the LSD. Timing such a loop would give \| you a strong indication about whether the LSD is enabled. \| \| --- \| \| Specifically, the cache used is not a dedicated one, but \| rather the IDQ (decoded instruction queue) is reused. This \| queue holds uops and is normally fed by the decoders or the uop \| cache on one end, and which feeds the allocation/rename engine \| on the other. In LSD mode, this queue stops being a queue and \| is instead used as a kind of cache with the loop operations \| "locked down" in the queue and just repeatedly replayed. \| kaladin-jasnah wrote: \| Dumb question, but why is it abbreviated as LS_D_ when it's \| spelled loop stream _b_uffer? \| CalChris wrote: \| It's actually spelled _Loop Stream Detector_ and it dates \| to the _Core 2_ processor family which is circa 2006. The \| LSD is described in section 3.4.2.4 of the Intel \| Optimization Manual, _Optimizing the Loop Stream Detector \| (LSD)._ AnandTech describes how it works. \| \| https://www.anandtech.com/show/2594/4 \| BeeOnRope wrote: \| Yeah that's right. Not sure where I picked up the term \| "... buffer" but a search shows I've been using it for a \| while. \| 13of40 wrote: \| > More experienced programmers know very well that the bug is \| generally in their code: occasionally in third-party libraries; \| very rarely in system libraries \| \| This was the bane of my existence when I worked on testing \| Windows years ago. New SDETs almost invariably fell into the trap \| of assuming any automation error was a "test bug" instead of a \| bug in OS code, even if the OS code in question was written last \| week. \| 1432132143 wrote: \| really guys GFY you know what OEMs do, they disable many features \| every time got some new bug. i.e undervolting now my thinkbook \| fan is always on on my laptop 30* fan is on 29* fan is on can't \| even undervold my cpu now. Realy thx \| wging wrote: \| Previous submission: \| https://news.ycombinator.com/item?id=14686277 \| \| (This is not a complaint; I found the post interesting.) \| dang wrote: \| Thanks! Macroexpanded: \| \| _I found a bug in Intel Skylake processors_ - \| https://news.ycombinator.com/item?id=14686277 - July 2017 (99 \| comments) \| [deleted] \| bjarneh wrote: \| > Binary search always fails? "The Java compiler is acting funny \| today!" \| \| :-) \| Decabytes wrote: \| I'm glad I'm just a pleb programmer, who never has done anything \| so complicated that it would expose processor errata. \| \| And even if I did, I wouldn't have the expertise to even figure \| it out. \| brokenmachine wrote: \| Welcome to the 99.999999999%. \| dfox wrote: \| The issue there is that the hardware is full of totally absurd \| bugs. If you target PC-like userspace or one of the two major \| mobile platforms it is somebody else's job to shield you from \| that. In general CPU level bugs are somewhat rare, but every \| single platform vendor had shipped some kind of silicon that \| contains peripherals that do not work as documented and only by \| chance work with the reference driver implementation. \| SavantIdiot wrote: \| This is a scary place to be: the top-level debug resource for a \| major project. It took almost two years to resolve, but was \| already known as SKL150. Looking at the clang vs. gcc assembly \| without knowledge of SKL150 would be literally impossible to \| debug. GCC -O1 vs -O2 is a clue, but even with the asm diffs, \| wth? Again, scary. \| tinus_hn wrote: \| The world is a scary place; this is basically the same as \| rowhammer which is an issue in computers shipped today. \| woodruffw wrote: \| Unless I'm misunderstanding what you mean, this isn't really \| like rowhammer at all -- it's a uarch/ucode bug, which is \| effectively a programming error within the CPU. Rowhammer is \| a physical flaw in how memory cells in DRAM are laid out, one \| that can be triggered by memory access patterns independent \| of CPU architecture and microarchitecture. \| \| (There are also hundreds of errata like this one in every CPU \| generation. They're _usually_ not easy to exploit, since they \| cause system instability rather than disclosing secret \| material or allowing unintended code execution.) \| zsmi wrote: \| > Rowhammer is a physical flaw in how memory cells in DRAM \| are laid out \| \| It's not really a flaw, more like a consequence of how \| memory cells are laid out. I mean most people want lots of \| bits in their DRAM. Maximizing this parameter necessitates \| that some will be in close proximity. \| woodruffw wrote: \| To my (non-EE) mind, the flaw is the electrical leakage \| between the cells. Tight packing is a consequence of \| economic forces, but I assume there are also technical \| solutions that allow for tight packing (but either offset \| the performance or cost gains). Is that assumption wrong? \| (Genuinely asking!) \| tlb wrote: \| DRAM cells also decay over time (~ 60 milliseconds), but \| memory controllers have some logic to refresh every row \| on a regular schedule so it's not an issue. \| \| They should also have logic to refresh adjacent rows if \| some number of consecutive accesses to a small group of \| rows is detected. This is rare in normal workloads, \| because those accesses normally come from cache. It's \| lame of chipmakers to not fix this. The fix would \| requires the DRAM controller (integrated into modern \| CPUs) to know more about the internals of DRAMs than they \| currently do. \| zsmi wrote: \| In theory DDR5/LPDDR5 added a controller command for \| RowHammer mitigation but I haven't had time to research \| it yet. \| \| See: https://arxiv.org/pdf/2108.06703.pdf \| zsmi wrote: \| There was a good paper on it in 2014. [1] They describe \| the RowHammer attack as: opening and closing (activation \| and precharge) a DRAM row (aggressor row) at a high \| enough rate (hammering) such that it can cause bit-flips \| in physically nearby rows (victim row). \| \| Colloquially, it's basically a change in voltage in one \| place can indirectly cause a change in voltage in another \| place via capacitive coupling. Capacitance increases \| proportional to the inverse of the separating distance so \| only in recent years have things shrunk to the size that \| makes it an issue. \| \| Since having less bits in DRAM is basically not an option \| most mitigation techniques that I know of remove the \| possibility of hammering: possibilities include the OS, \| memory system controller, or DRAM controller changes. \| \| [1] https://users.ece.cmu.edu/~yoonguk/papers/kim- \| isca14.pdf \| woodruffw wrote: \| Much appreciated, thank you. \| [deleted] \| dimitrios1 wrote: \| Apologies if this is off topic -- but I am constantly impressed \| at some of the things I find that come from inria.fr. I first \| came across them when learning OCaml. Seems to be a top notch \| university. \| woodruffw wrote: \| Inria is a research institute, not a university. But they do \| indeed do excellent work! \| bruce343434 wrote: \| The link called "6th Generation Intel(r) Processor Family - \| Specification Update." 404's \| userbinator wrote: \| "gcc/clang/icc/msvc won't usually issue the affected opcode \| pattern and it ends up being rare. SKL150 - Short loops using \| both the AH/BH/CH/DH registers and the corresponding wide \| register _may_ result in unpredictable system behavior. " \| \| I think Intel should regression-test its CPUs using the decades \| of demoscene productions out there, especially those in the \| extreme-size-optimisation categories; testing with almost \| exclusively "mainstream" compiler output is IMHO a bad idea and a \| step down the path to "warranty void if VLC is used" \| (https://news.ycombinator.com/item?id=7205759 ) ___________________________________________________________________ (page generated 2021-11-08 23:00 UTC)