[HN Gopher] Optimizations can have unexpectedly large effects wh...
___________________________________________________________________
 
Optimizations can have unexpectedly large effects when combined
with caches
 
Author : zdw
Score  : 168 points
Date   : 2022-11-14 16:30 UTC (6 hours ago)
 
web link (justinblank.com)
w3m dump (justinblank.com)
 
| nerpderp82 wrote:
| We really don't know what causes perf differences, layout has
| more of an effect than -O1 to -O3.
| 
| "Performance Matters" by Emery Berger
| https://www.youtube.com/watch?v=r-TLSBdHe1A
 
  | tmtvl wrote:
  | I wonder if he mentions running from a different directory and
  | running as a different user to highlight how annoying the
  | effect from the environment is, as USER and PWD are part of the
  | environment.
 
| lob_it wrote:
| Caching db queries and media assets already got a big boost a
| decade ago going from spinning rust to ssd to even faster NVME.
| 
| Optimizing DB queries on spinning rust was an exciting experience
| though. Chasing milliseconds with caching and warming caches up
| still brings back good memories :)
| 
| Scale and architecture makes new designs much easier. 1Mb of
| code... A 100Mb database and 1Gb of media assets is becoming a
| relic like "big iron". Blank slate canvases should skip a lot of
| it.
| 
| Technology debt only gets harder to manage over time.
 
| ISL wrote:
| This happens in life, too. When your counterparts lose continuity
| with your work, they turn to other work and the context-switch to
| return to working with you becomes increasingly hard.
| 
| Helps to be quick.
 
| joosters wrote:
| "Almost all programming can be viewed as an exercise in caching."
 
| hinkley wrote:
| About 15 years ago I was on a team that was having so many
| problems with caching that I began to see it for the liability
| that it is. It started with a 2-4 week delay each release cycle
| while they fixed caching bugs the testers had found, but once I
| joined the project I saw how deep the problem went.
| 
| A number of their features invited a full table scan, which is
| fundamentally incompatible with an LRU cache, and in particular
| has disastrous stair step functions as the working set grows.
| Note that this is also a couple of years before anyone had heard
| of memcached so these were DIY implementations with multiple
| potential sources of bugs.
| 
| About the time I found my feet on the project someone wanted to
| add caching to yet another workflow that was taking 30 seconds to
| run. It was an admin workflow so there was a little tolerance of
| slow response times, but not 30 seconds. I volunteered for the
| story because I thought we could do better than more caches.
| 
| Through conversation, step debugging, and profiling I found the
| spot where they had wanted the caching. What the feature was
| doing was two relatively expensive lookups of the same data,
| filtering it on two very different dimensions, and returning the
| intersection of the two. In typical ORM fashion, both of the
| concerns were taking it upon themselves to fetch their own data
| instead of having it handed to them. Since they were both called
| from the same function, I hoisted the lookup out of the callee
| into the caller, sending the same list to both. Then I changed
| the code again to feed the result of call 1 to call 2, to make it
| work more like consecutive filters.
| 
| The end result was that the page now loaded in 3 seconds, and I
| was able to write better unit tests for this code. One might
| expect that most of that improvement came from running the second
| filter on a much smaller list, but that's where you'd be wrong.
| The majority of that improvement came from the hoisting, not the
| chaining. Due to the amount of work in the filters, the initial
| lookup shouldn't have accounted for more than a third of the page
| load time, but just sharing the same data dropped the response
| time to around 6 seconds. Chaining the filters ended up only
| saving 3 seconds.
| 
| The problem with large data sets is that CPUs have caches too.
| And in a garbage collected programming language, there are
| additional costs to carrying around duplicated data, particularly
| if it is in tree form like this was. Most likely we were smashing
| the hell out of CPU caches and introducing extra GC pressure to
| boot. By passing in the same object tree the second call followed
| warm and/or hot paths through heap memory.
| 
| I'd had a similar but less dramatic experiences a few years
| before that, where I identified that a function, which accounted
| for 10% of the runtime, was being invoked roughly twice as often
| as the problem required. By that point 5% looked like low hanging
| fruit, so I eliminated the duplicate calls and reran the
| benchmarks. But the results didn't make sense so I tweaked and
| prodded and fiddled. The new time was 20% lower than the old
| time, for what should have been 5%. This is already way too long
| so I won't go into the reasons here, but in addition to the
| scenarios I mentioned for the previous issue, there's also
| problems of timing accuracy in profilers, and cyclic use patterns
| that can misattribute the expense of a call to siblings or the
| parent. A function that creates many objects may never reach the
| GC threshold, but a large allocation later in the sequence
| diagram may. Or collide with cache lines that are needed again
| later in the call graph.
 
  | bawolff wrote:
  | I feel like the moral of this story is less "caching is a
  | liability" and more garbage in garbage out. Caching can help
  | things when used appropriately, but its not magic.
 
    | nerpderp82 wrote:
    | Your performance topology is dictated by multiple hierarchies
    | of caches, many of which you might not be aware of. If they
    | disappear your application may no longer function. The cache
    | is no longer and optimization but a critical component.
    | Figuring out how to _fill_ it might be harder than how to
    | invalidate it.
 
      | hinkley wrote:
      | > Figuring out how to fill it might be harder than how to
      | invalidate it.
      | 
      | This always seems to be a problem for someone else or for
      | another day, which is part of the frustration.
      | 
      | Cold caches killing your app are a considerable problem,
      | doubly so if you deploy in the Cloud because now you aren't
      | necessarily guaranteed to be deploying into a hot data
      | center. I think there's an unspoken assumption we haven't
      | really eliminated that was if you came to work and the
      | power was out, you not only expected none of the software
      | to work, but you peer pressured other people into not
      | complaining too loudly about it when the power came back
      | on. Today is a loss. Everyone can see it.
      | 
      | But for instance if I work on an SaaS project in the Cloud,
      | not only can I just lose a region, but for many problem
      | domains my customers have mostly local customers, then I
      | have diurnal cycles of traffic that are customer specific.
      | All the non-bot traffic is in New York until 6am GMT, say.I
      | might want to spin regions down pretty aggressively during
      | parts of the day, but if spinning back up results in a
      | thundering herd problem because 'the cache makes us fast'?
      | Well then the value of my circuit breaker alerts drops to
      | less than zero.
      | 
      | That's just the first time things break. Caches and
      | capacity planning are also at odds, because capacity
      | planning is about planning your worst case and caches are
      | almost always about the happy path.
 
    | hinkley wrote:
    | Silver bullets kill everyone, not just werewolves.
    | 
    | Garbage In, Garbage Out is not wrong per se, but we have in
    | the real world the concept of an attractive nuisance, where
    | the victim is not entirely at fault for their injuries. The
    | thing, being prone to invite misuse and misadventure, bears
    | some responsibility for protecting people from themselves.
    | 
    | Caching is an attractive nuisance. These days it's simply
    | replaced Global Shared State which used to be the boogeyman,
    | but sadly we haven't collectively clued in on them being
    | equivalent.
 
      | bawolff wrote:
      | In this situation it was stuff that would be terrible both
      | with and without caching, that didn't have much to do with
      | caching itself, just the caching just provided a bit of a
      | bandaid. I think the more apt metaphor might be "seatbelts
      | make people drive recklessly".
      | 
      | Would fast computers also be an attractive nuiscense
      | because you dont have to think about performance as much?
 
        | hinkley wrote:
        | > Would fast computers also be an attractive nuisance
        | because you don't have to think about performance as
        | much?
        | 
        | Sometime in the last week or two someone asserted just
        | that. I didn't dig into that discussion very far but on
        | the face of it? Sure. Certainly grabbing the fastest
        | hardware available to Man is an expensive proposition and
        | you should hold that in reserve. You don't get to play
        | that card very often, and much less so today than during
        | the Golden Era. Post Moore's Law for sure, but to an
        | extend post-Dennard as well, since power draw of a
        | facility is a significant cost center. And without
        | Dennard you can make them more powerful but keeping them
        | efficient gets expensive.
 
| dathinab wrote:
| Artickle: speed up lead to avoiding cache misses which lead to
| further speed up which lead to further less cache misses
| 
| This kind of interaction is important to keep in mind.
| 
| It applies to all kinds of caches in all kinds of situations. To
| some degree even the instruction and data caches of modern cpus.
| 
| But most interesting is that in complex systems in (very very)
| rare cases the "whole system effect" can go the other way around
| too (due to a now faster components trashing the cache for other
| components in a way which makes the system as a whole slower).
 
| RicoElectrico wrote:
| What kind of batch process could that be?
 
  | mananaysiempre wrote:
  | Judging by the off-hand mentions in the article, something
  | along the lines of ingesting the cargo manifest of a large
  | ship?
 
| mjb wrote:
| I really love this point.
| 
| For most applications the optimal size of a (time-based) cache is
| around the point that keeping an item around in the cache is
| equal to the cost of accessing it again. This is a point that Jim
| Gray made in the classic "The 5 Minute Rule for Trading Memory
| for Disc Accesses"
| (http://www.hpl.hp.com/techreports/tandem/TR-86.1.pdf) back in
| 1986.
| 
| A side effect of that is that systems that run faster, and
| therefore access each cached it more often, get more benefit out
| of a cache by pushing that set point out further. Whether that
| happens in any given system depends a lot on the access patterns,
| and especially temporal locality within those access patterns,
| but the effect is real in a lot of different kinds of systems.
| 
| > It's common wisdom that systems with caches can fail badly when
| the caches are cold. When every request misses the cache, the
| system is overloaded, and never recovers. What I observed was the
| mirror image.
| 
| The bottom line is that both locality and memoization are huge
| contributor to the performance of systems, both positive and
| negative. When folks (including me) point out the risks of
| caches, we're not implying that caches are always bad. Instead,
| it's pointing out a risk that sometimes they are too _good_ ,
| leading to a system that can't function when the normal level of
| locality and memoization are not available.
 
  | bombcar wrote:
  | Reminds me of:                   There are only two hard things
  | in Computer Science: cache invalidation, and naming things, and
  | off-by-one errors.
  | 
  | What people don't realize is caches are much harder than just
  | invalidation.
  | 
  | I'm a fan of continually testing (and running) things without
  | cache at all just to verify what they do, caching can often
  | cover a multitude of sins that you really don't need to be
  | doing.
 
    | hinkley wrote:
    | Caches are global shared state. Correction: _lossy_ global
    | shared state.
    | 
    | 'Naming things' when they are in a global namespace is really
    | hard, to the point where I'm starting to thing that cache
    | invalidation and naming things are not two problems but the
    | same problem in two manifestations.
 
      | kwhitefoot wrote:
      | > Caches are global shared state
      | 
      | Not necessarily. I designed an object oriented simulation
      | system which had a lot of calculated properties where the
      | calculation was a substantial amount of work but the result
      | would be queried many times before the input the function
      | changed. Caching the results inside the object provided a
      | substantial speed up (orders of magnitude) while keeping
      | the cache completely local and hidden from the client code
      | and keeping the program very modular.
      | 
      | Keeping the cache inside the objects allowed us to keep the
      | parts of the program neatly separated and minimized the
      | amount of knowledge that client code, and more importantly
      | client coders, needed to have about the objects they were
      | using.
 
        | bmm6o wrote:
        | If we're splitting hairs, your situation sounds more like
        | memoization, where you have saved the output of an
        | expensive but deterministic calculation. With the correct
        | assumptions, this can never result in incorrect behavior.
        | Worst case is excessive memory consumption. Whereas a
        | stale cache is something that needs to be taken into
        | account.
 
        | hinkley wrote:
        | What happens when the system of record changes the answer
        | halfway through the transaction?
        | 
        | By caching answers to calculations you've moved the
        | source of truth from the function to the cache. But the
        | cache is also acting as a proxy for the system of record,
        | which is a conflict of interests. That's substantially
        | part of the problem with cache invalidation.
        | 
        | ETA: There's also a common category error here that I see
        | with people who grossly misunderstand DP as 'caching'.
        | First that they think that caching and memoization are
        | the same thing, which they emphatically are not, and
        | second that DP is about memoization, which means they
        | listened for the first ten minutes and then fell asleep.
        | 
        | Tabulation is much more powerful than memoization, so
        | equating DP with memoization is impoverishing it and
        | yourself.
        | 
        | But importantly here, the distinction between memoization
        | and caching is that memoization is scoped while caching
        | is global. With memoization a single call tree gets the
        | same answers _every time_ , which means the mental
        | process of Local Reasoning is maintained. If you know the
        | call graph for this operation you know everything that
        | interacts with it. You know if 'eviction' is even a
        | thing. You also know that your unit tests are testing the
        | code as it will behave under load, not some fictional
        | happy path.
        | 
        | tl;dr: If your calculation is important it should show up
        | in your data flow, not arrive out of band from a magic
        | oracle.
 
  | hinkley wrote:
  | Caching is a weapon of last resort. In most cases it is the
  | last major performance improvement you will see from your code,
  | even if other options were available, and any information
  | architecture problems you have are set in stone or will get
  | worse.
  | 
  | Because of the locality aspect, people will assume that it's
  | 'free' to look up a value multiple times instead of passing
  | these data dependencies between the relevant methods. This
  | turns your cache into global shared state, and once that
  | happens you can't turn the cache off again. There are other
  | problems with this arrangement as well.
  | 
  | The easiest to understand (and yet I've failed on at least one
  | project) is that if the working set for a problem ever becomes
  | proportional to the size of the cache (>1 for single task
  | systems, > 1/n for systems with n concurrent tasks), then you
  | can end up evicting data in the middle of a transaction. The
  | obvious consequence is that it causes performance to jump off a
  | cliff. Particularly bad behavior with LRUs but switching to a
  | different statistical model simply reduces the height of the
  | cliff. There is still a cliff. The less obvious consequence is
  | now you have concurrency bugs, because the code assumes that
  | decisions made about the data will go the same way on each
  | call, but due to eviction, two halves of the transaction may
  | make conflicting decisions with undefined behavior.
  | 
  | But speaking as the guy who fixes the perf issues other people
  | gave up on: Once you introduce caching you poison the
  | performance well. Flame charts lie when cache is used, and
  | after people rely on the cache it is lying in the other
  | direction when the cache is off, because you're calling that
  | path 5 times now and it's swamping other bottlenecks. This is
  | what I really mean when I say it's a weapon of last resort.
  | Because once you use it, it attempts to take any other options
  | off the table. Caching will be most of the last order of
  | magnitude in scalability that your system ever sees. Every
  | 'win' from there on will take more time and energy than most
  | people are willing to invest.
 
    | brightball wrote:
    | I like applying the Circuit Breaker approach to caching.
    | Rather than applying a cache everywhere, setup a cache
    | wrapper that only triggers if the accessed resource is slower
    | than a certain threshold.
    | 
    | The goal is for the cache to go unused most of the time, but
    | during times of unexpected system stress the caches will kick
    | in, provide relief to the system and reduce impact to users.
 
      | zasdffaa wrote:
      | > The goal is for the cache to go unused most of the time
      | 
      | Then you have a useful cache sitting there unused unless at
      | peak time - why? Just enable it always. This makes no
      | sense.
 
        | brightball wrote:
        | Depends on how large the cache is and how much users
        | expect instant updates to their data. If real-time or
        | near real-time is the expectation, then avoiding the
        | cache usage unless necessary preserves their experience
        | without incurring the hurdles of cache invalidation that
        | grow more complex the more you cache.
        | 
        | EDIT:
        | 
        | Another area where this approach is beneficial is with a
        | multi-tenant system where larger tenants querying more
        | data could have a much more significant performance
        | impact than other tenants on the system.
        | 
        | Time of day/week/month usage spikes can also cause a
        | performance degrade for everyone even if it's not
        | typically stressing the system.
        | 
        | This approach is just a mitigation to prevent the system
        | from getting overload during those times. It's not the
        | right approach for every situation. Just a spin on the
        | Circuit Breaker Pattern.
        | 
        | https://en.wikipedia.org/wiki/Circuit_breaker_design_patt
        | ern
 
        | zasdffaa wrote:
        | WTF?
        | 
        | > If real-time or near real-time is the expectation, then
        | avoiding the cache usage
        | 
        | Because looking up a precalculated result in a hash table
        | is so much slower than not looking it up? And what,
        | recalculating it instead is faster than an O(1) hash
        | table lookup? If so,why are you caching?
        | 
        | I am just getting more confused by what people are saying
        | here.
 
        | hinkley wrote:
        | > I am just getting more confused by what people are
        | saying here.
        | 
        | Maybe the takeaway is that this shit is hard and everyone
        | is a little bit confused in their own way.
        | 
        | I'm not saying the only way to win is not to play, but I
        | can neither confirm nor deny rumors that I may have at
        | some point muttered this under my breath.
 
        | addaon wrote:
        | I do think you've hit on a point of confusion in the
        | parent, but...
        | 
        | In a (hard) real-time system the only thing that matters
        | is the worst-case execution time -- if your worst-case
        | time fits in the allowance you win, if it doesn't you
        | lose.
        | 
        | Adding a cache improves the average case (typically), but
        | hurts the worst case (always) -- your worst case goes
        | from "do expensive operation" to "do lookup in cache,
        | fail, do expensive operation." Even if you parallelize
        | misses ("do lookup in cache | do expensive operation, on
        | cache success cancel expensive operation, on cache miss
        | wait for expensive operation" you're generally adding at
        | least a few operations in the critical path.
 
        | salawat wrote:
        | I'm beginning to wonder whether my personal moniker of
        | "signal propagation" is nominatively equivalent to what
        | others call "cache invalidation".
        | 
        | I.e. you cannot by definition have a state change until
        | the signal thereof has traversed a link from A to B. No
        | signal, no change. If you have a cache for one system
        | that delays writes to a backing data store referenced by
        | other systems that are unaware of that cache, then you
        | have necessarily a partition problem.
        | 
        | Every problem in a distributed computing context can
        | generally be reduced to a signal propagation problem.
 
        | ilyt wrote:
        | It's only a problem if you're breaking deadlines. Like
        | having someone's status in communication app update with
        | 10s delay is probably not a a problem, so as long as you
        | don't ever cache it for longer than that, it is entirely
        | fine.
        | 
        | > If you have a cache for one system that delays writes
        | to a backing data store referenced by other systems that
        | are unaware of that cache, then you have necessarily a
        | partition problem.
        | 
        | In most cases people mean caching reads so the problem
        | here is stale reads, not delayed writes. It's not
        | necessarily the same, some component caching (or
        | essentially, queuing) write means that once the write is
        | done, all readers (assuming they don't have any extra
        | cache) see it at once, while with caching reads (and no
        | invalidation), multiple readers can have different view
 
        | hinkley wrote:
        | The one that really jams me (and others) up is that
        | signal propagation and transaction lifetimes line up
        | badly. Caches, being cheap imitations of ACID 'compliant'
        | databases, which themselves cheat all day every day to
        | make pretty numbers, almost never have read repeatable
        | semantics, so if your code splits responsibility for
        | fetching a piece of data you have one of the problems I
        | complained about at/near the top level: Half of a
        | decision can be made with one version of a piece of data
        | and the other half with another, resulting in undefined
        | behavior. In a system that asserts that A = !B you can
        | have an interaction that did some of the activities for A
        | and some of the ones for B (because we read A and A' and
        | assumed they were identical).
 
        | zasdffaa wrote:
        | You're making a lot of assertions without any backup, and
        | they don't make much sense.
        | 
        | > Caches, being cheap imitations of ACID 'compliant'
        | databases
        | 
        | No they're not, not remotely.
        | 
        | > which themselves cheat all day every day
        | 
        | I don't believe a word of this, have you any actual
        | evidence (other than I believe Oracle's serialisable is
        | actually snapshot isolation IIRC)
        | 
        | > Half of a decision can be made with one version of a
        | piece of data and the other half with another, resulting
        | in undefined behavior. In a system that asserts that A =
        | !B you can have an interaction that did some of the
        | activities for A and some of the ones for B
        | 
        | And this just makes no sense, you're claiming that
        | databases are not producing the results they should, if
        | that's the case then I don't believe you understand
        | transaction isolation levels or indeed how database
        | queries work, in which case you have problems other than
        | and larger than caching.
        | 
        | This whole thread s just getting weirder.
 
        | hinkley wrote:
        | I've always been a bit of a fan of debounce for this but
        | many of the obvious implementations are cache based which
        | is essentially inviting people to do the thing I just
        | said I hate.
        | 
        | The cost of counting upvotes or logged in users is not
        | worth the incremental value it affords the user. These
        | are examples of situations where 'eventual consistency'
        | can be embarrassingly easy. If you split the source of
        | truth (SoT) from the system of record (SoR), then the
        | cost of updating the 'truth' can be amortized across the
        | writes to the SoR. The reads do a straight lookup from
        | the SoT, which was the truth at some point but is now the
        | fiction everyone is agreeing to. The distinction is that
        | everyone agrees to the same source of truth. Nobody ever
        | goes asking the SoR for a second opinion, except the SoR,
        | which updates the SoT when appropriate. The SoT is dumb,
        | the clients are dumber, only the code that manages the
        | SoR has any smarts to it.
        | 
        | Read traffic can escalate pretty high in such a system
        | without much stress.
 
        | brightball wrote:
        | Exactly. Another situation is a multitenant system where
        | some tenants are much larger than others, triggering
        | larger queries which potentially have a larger
        | performance impact.
 
        | ilyt wrote:
        | You can queue cache refresh in the background if the
        | object in cache is close to the "limit" of how fresh you
        | want it to be. Need a bit of cleverness to avoid
        | thundering herd problem but otherwise works pretty well.
        | 
        | Also if you really want near-real time you might want to
        | actually push the updates directly to cache instead of
        | relying client to trigger cache filling. cases where you
        | need a ton of data _and_ it served real-time _and_ it 's
        | same object that constantly changes (and not say pointer
        | to a position of video stream) are pretty rare
 
        | hinkley wrote:
        | My problem with the term 'cache' here is that people have
        | expectations about what cache is and when you violate
        | them you tend to discover that they've returned the
        | favor.
        | 
        | > push the updates directly to cache instead of relying
        | client to trigger cache filling
        | 
        | You can use cache software to fulfill such a role but
        | it's not exactly a cache anymore, and keeping people from
        | treating it like one is some kind of cat herding.
        | 
        | You're using it as a data store at this point, which for
        | one has very different eviction semantics - exactly 1
        | copy of each concept, no more, no less. You can store
        | such things in Consul, for instance, and get very low
        | stale read latency. The cache implementation scales
        | higher but breaks sooner. Pick your poison.
 
        | ilyt wrote:
        | Well, it depends on your data model. Like, if you write
        | only a little (compared to reads), write-thru model of
        | cache (write to cache as well as to backing store) will
        | lose you tiny % of hit ratio at profit of never having to
        | worry about stale data.
        | 
        | > My problem with the term 'cache' here is that people
        | have expectations about what cache is and when you
        | violate them you tend to discover that they've returned
        | the favor.
        | 
        | That's with everything that have any level of complexity,
        | just see how many people got transaction isolation wrong
        | just because they had some simplistic view on how it
        | works in database they use.
 
      | pyrolistical wrote:
      | Sounds more like a dark buffer anti pattern
 
      | ilyt wrote:
      | ...instead of having better user experience all the time ?
      | And have more code to manage on top of that ?
      | 
      | Sorry but that sounds like terrible idea.
      | 
      | And when system starts being slow _the cache is empty_ ,
      | because you waited for system to slow down to start filling
      | it
 
      | bawolff wrote:
      | Having a different code path that only triggers during high
      | load feels pretty risky to me.
 
        | bell-cot wrote:
        | _In theory_ , the best way to manage this situation may
        | be with a grayzone - if all is well, then there's only a
        | (say) 5% chance of going to cache. (To at least catch
        | "the cache is broken" errors sooner.) As things get
        | worse, that chance goes up as you cross-fade to mostly
        | using the cache.
        | 
        | Important Warning: If you do not have a very good
        | understanding of your overall system, or neglect to
        | carefully dampen the cross-fade functions - then the
        | whole thing can demonstrate fast and unpredictable swings
        | in behavior. Note that being subpoenaed to testify in a
        | "Why did the Crater City chemical plant blow up?"
        | investigation is usually bad for one's career.
 
        | hinkley wrote:
        | That's better but composition is always the devil in the
        | details. Which might be what you mean when you say:
        | 
        | > or neglect to carefully dampen the cross-fade functions
        | 
        | People with a 'caching problem' don't have one cache,
        | they have a bunch of them. So that 1 in 20 chance of
        | getting grey zoned on one cache, is 1 in 400 chance of
        | getting grey zoned on two caches, and a 1 in 8000 chance
        | to get grey zoned on 3. That is a very small fraction but
        | when you're dealing with the Law of Large Numbers that's
        | something that's happening at least once a second for
        | many services.
        | 
        | But if you're using this you probably have a lot more
        | than 3 caches, and then we have to talk about the odds
        | that any request gets grey zoned at least once. which is
        | going to be 1 - (19/20)^n. At six caches you're already
        | over 1 in 4, which is going to potentially change your
        | mean response time quite a bit.
 
        | bell-cot wrote:
        | > "...which is going to potentially change your mean
        | response time quite a bit."
        | 
        | My experience is that the changing mean response time
        | isn't nearly as bad as the changing standard deviation.
        | 
        | The really big problems tend to arise when you have
        | unsuspected or under-appreciated positive feedback loops
        | - whether in the software, or in associated physical
        | systems and human behaviors. Or resonance effects. Those
        | situations are where good dampening gets really
        | important.
 
    | dist1ll wrote:
    | If I had to pick the two worst sources of bugs I can imagine,
    | it would be (1) concurrency and (2) caches. Those functions
    | often require extensive performance and load testing and have
    | tons of non-deterministic failure modes.
 
    | zasdffaa wrote:
    | In my _extensive_ experience of optimising stuff...
    | 
    | > Caching is a weapon of last resort. In most cases it is the
    | last major performance improvement
    | 
    | ...is wrong. It is one of the very first things I reach for
    | because it is the most effective and simplest, once you've
    | eliminated O(n^2) behaviour etc. It's not a magic wand but
    | nothing is.
 
      | notinfuriated wrote:
      | In my _extensive_ experience of optimizing stuff it is
      | right.
      | 
      | I cannot even count how many issues were created downstream
      | of the decision to start caching X, Y, or Z. At a point,
      | every other software engineer gets the bright idea to cache
      | their special little function or request or database call.
      | Soon enough, there are N layers of cache responsible for
      | delivering some special piece of data. The engineers build
      | tooling to clear those caches on updates, either manually
      | or automatically. Both suffer from issues, the latter
      | occasionally failing to perform and just causing general
      | confusion about why stale data is returning, and the former
      | being far worse, recruiting dozens of business stakeholders
      | into the problem of knowing-how-to-expire-cache. They start
      | making their own internal wiki pages with steps or various
      | cache tools to clear, often incorrect steps because they
      | got lucky one day by clearing cache X and not cache Y that
      | they did not know about.
      | 
      | Overall, in my experience, allowing software engineers to
      | do caching as they wish without any real guardrails has
      | caused massive problems most likely on the order of N!. I
      | cannot roll my eyes far enough into the back of my head
      | when I hear someone suggest caching as a solution any
      | longer, at least not on an application that has more than a
      | dozen business stakeholders managing data and expecting it
      | to be updateable in any reasonable way so that customers
      | see the most recent data when the business wants it.
 
      | PaulHoule wrote:
      | Caching can turn O(n2) algorithms into much faster
      | algorithms. I would point to how transformative tabling is
      | in Prolog, not to mention memoization rescuing the
      | calculation of fibonacci numbers by recursion, which
      | otherwise would be gross malpractice.
 
        | fwip wrote:
        | Caching is fine is there's no state involved. e.g: pure
        | functions like the fibonacci sequence.
        | 
        | As soon as your cache depends on a mutable state, things
        | get really hairy. Cache invalidation is one of the 2 hard
        | problems in computer science, after all.
 
        | samus wrote:
        | These are very well-behaved instances of caching. They
        | are so well-behaved that they can be fully described with
        | a little bit of analysis. Once I/O is involved or the
        | load isn't well-known beforehand, it becomes much more
        | difficult. Lazy I/O in Haskell is a very divisive topic
        | for a reason.
 
      | hinkley wrote:
      | That's not a very convincing argument. "I use it because it
      | works" without refuting any reasons I presented as to why
      | it "works" but breaks your other options.
      | 
      | How do you teach people to effectively use a profiler on
      | code with pernicious caching?
      | 
      | Part of why I'm the "Cleaner" is because I've worked with
      | any number of people who will present "evidence" that
      | everything that can be done has been done. Here are the
      | metrics from a year ago when we turned on the cache. Here
      | are the flame charts. See? Nothing to see here. They get
      | really uncomfortable six months later when I'm reporting
      | 2-3x in code that can't be optimized further. And in each
      | case everything was made more difficult by the people ahead
      | of me chumming the waters.
      | 
      | They were partially correct. There was in fact nothing left
      | for them to see, so nothing that they knew to do. But
      | "there's nothing to see here" and "there's nothing here"
      | are two very, very different statements.
 
        | trgn wrote:
        | There's a somewhat tangentially related heuristic to your
        | points imho. When writing code, when in doubt, task CPU
        | over memory, it will generally yield leaner, better
        | software. Easier to profile, debug, and optimize. Much
        | easier to scale in production environments too.
        | 
        | Code which optimizes for fast running time, tends to lean
        | heavily into using memory-bloating data structures and
        | caches. These are often poison for aggregate performance
        | in real world environments, and like you said among other
        | things, reduce visibility in the workings of system.
        | 
        | There's some implications of this stance; e.g.
        | optimizations to reduce memory footprint are generally
        | never wasted, while optimizations to squeeze out a few ns
        | of runtime may have many more unexpected consequences.
 
        | ilyt wrote:
        | > How do you teach people to effectively use a profiler
        | on code with pernicious caching?
        | 
        | Profile the function that is called if object is not
        | found in cache ? Have pluggable cache with plugin that
        | just doesn't cache? That is not hard problem to solve.
        | 
        | > They get really uncomfortable six months later when I'm
        | reporting 2-3x in code that can't be optimized further.
        | And in each case everything was made more difficult by
        | the people ahead of me chumming the waters
        | 
        | Well, they delivered (I'm guessing) 10x improvement in a
        | month by adding cache. Caching is entirely reasonable
        | first step and it can be good enough to be last one.
        | 
        | And frankly ability to use profiler and knowing some
        | architectural optimizations seems to be pretty rare
        | skill.
 
        | zasdffaa wrote:
        | Exactly! People here seem to be making a complex thing
        | out of a simple thing.
 
        | Aeolun wrote:
        | I kinda feel that your convincing counterargument is just
        | 'I have a different experience'.
        | 
        | I've never seen people reach for caches first, and then
        | discard all further optimizations.
        | 
        | Or rather, I've seen them do that, but not out of some
        | misplaced feeling that it was the only solution, just the
        | one that was easiest to accomplish in very little time.
 
        | cogman10 wrote:
        | The issue with caches is they are easy to apply. You can
        | easily drop a cache in any place you load a resource
        | without significant code changes.
        | 
        | I tend to think they are ok weapons of sometimes first
        | resort (rarely changing resource with heavy system
        | impact). But you'll definitely get better results simply
        | by rethinking and rearchitecting stuff. (Did you really
        | need to load the database just to lookup a single value?)
 
        | hinkley wrote:
        | > You can easily drop a cache in any place you load a
        | resource without significant code changes.
        | 
        | Only when the cache is brand new. The problem with people
        | with cache addiction is that they think they're free. So
        | once they know a cache is there, they begin to lean on
        | it, and lean on it. I've seen it with many different
        | teams in a number of verticals. Sometimes while I'm
        | there, sometimes before I arrived. It's the people not
        | the company.
        | 
        | This illusion that you can always turn it off later is
        | dangerous. That's only true if you have an impeccable
        | information architecture when you start. And if you have
        | that, you've already done a lot of heavy lifting before
        | caching becomes your only option.
 
        | zasdffaa wrote:
        | It's dificult to refute your argument because I don't
        | understand your points:
        | 
        | > Because of the locality aspect, people will assume that
        | it's 'free' to look up a value multiple times instead of
        | passing these data dependencies between the relevant
        | methods
        | 
        | what does that even mean? What does 'data dependencies'
        | mean specicially?
        | 
        | > then you can end up evicting data in the middle of a
        | transaction
        | 
        | So in the middle of a transaction, instead of looking up
        | data from a key - very cheap - you should do ... what?
        | Recalculate it instead?
        | 
        | > The less obvious consequence is now you have
        | concurrency bugs, because the code assumes that decisions
        | made about the data will go the same way on each call,
        | but due to eviction, two halves of the transaction may
        | make conflicting decisions with undefined behavior
        | 
        | Assuming I even understand this, you are saying where
        | there's concurrencty there's a risk of pulling stale data
        | out of a cache. Well, true. That's just something you
        | have to be careful of. Add a Tx reference to the cache
        | key.
        | 
        | > Once you introduce caching you poison the performance
        | well.
        | 
        | Not my experience at all. I can't understand why you're
        | saying this.
        | 
        | > Flame charts lie when cache is used
        | 
        | How so?
        | 
        | > and after people rely on the cache it is lying in the
        | other direction when the cache is off, because you're
        | calling that path 5 times now and it's swamping other
        | bottlenecks
        | 
        | I've no idea what this means
        | 
        | > This is what I really mean when I say it's a weapon of
        | last resort. Because once you use it, it attempts to take
        | any other options off the table
        | 
        | Heh! Absolutely not! Other optimisations are orthogonal
        | to caching IME:
        | 
        | Better algo - orthogonal
        | 
        | Go lower level - orthogonal
        | 
        | Go multicore - orthogonal
        | 
        | Noobish SQL query is begging for a rewrite - orthogonal
        | 
        | Use L1/2/3/ chip cache better, or RAM, by careful data
        | layout - orthogonal
        | 
        | Use a better library - orthogonal
        | 
        | Use better data structure eg. than linked list -
        | orthogonal
 
        | svieira wrote:
        | >> Because of the locality aspect, people will assume
        | that it's 'free' to look up a value multiple times
        | instead of passing these data dependencies between the
        | relevant methods
        | 
        | > what does that even mean? What does 'data dependencies'
        | mean specicially?
        | 
        | It means that instead of having a method like
        | `entryPoint(int)` that looks up a `Foo` by id and then
        | passes the `Foo` around (`verify(Foo)`, `Foo
        | update(Foo)`) you instead wind up with a system that
        | passes around an `int` (`verify(int)`, `/* Unused */ Foo
        | update(int)`) and each layer then looks up `Foo` in the
        | `FooService` (because `FooService` has a cache and
        | subsequent lookups should be essentially-free).
        | 
        | Even worse when `FooService` is abstracted behind a
        | thread-local like `ThreadLocal currentFoo`, the
        | implementation of which is
        | `fooService.findById(currentFooId.get()) /* cheap,
        | because fooService has a cache */`.
        | 
        | My own take on this is that caches are global-by-default
        | when you really want local-by-default (e. g. dynamic
        | programming instead of a `static final Cache
        | theCacheForTheEntireProgram`). Local-by-default caches
        | are easy to make transactional (when the call stack
        | returns the cache is gone) and don't grow in an unbounded
        | manner in the general case.
 
        | hinkley wrote:
        | Yes!
        | 
        | Local by default also has the feature/problem of making
        | you look at the scale of the data involved in your query
        | up front, before it's having a 100-1000 req/s hitting it
        | in production. It's harder to pretend this large
        | dependency graph will just magically take care of itself
        | (or worse, be someone else's problem after you've
        | declared victory) when it's staring you in the face.
 
  | gav wrote:
  | > Instead, it's pointing out a risk that sometimes they are too
  | good, leading to a system that can't function when the normal
  | level of locality and memoization are not available.
  | 
  | And at that point cache management starts to be a huge problem.
  | Clearing caches becomes a death spiral because your system
  | can't actually support your normal load.
  | 
  | For example I worked with a large retailer, who had a giant
  | monolithic enterprise Java ecommerce system. It was so slow
  | that product detail pages took upwards of 30 seconds to render.
  | There was a cache in front of that system (Ehcache), then a
  | page-level cache (Varnish) in front of that, then finally
  | Akamai took 98% of the load.
  | 
  | Rollouts were terrifying, you cleared the cache and the site
  | might come back up. Changing Akamai configuration was also
  | similarly risky.
  | 
  | It's critical that systems don't become cache dependent for
  | load, they should be a performance boost, not a required
  | component.
 
| nly wrote:
| Caching is easy, cache invalidation is hard.
 
  | latchkey wrote:
  | "There are two hard things in computer science: cache
  | invalidation, naming things, and off-by-one errors."
  | 
  | https://twitter.com/codinghorror/status/506010907021828096
 
    | withinboredom wrote:
    | I have no idea why you're quoting that tweet. This saying is
    | much older than 2014.
 
| PaulKeeble wrote:
| Multiple times I have found the impact of cache performance to be
| dramatic. I find that the usual optimisations in three aspects
| are usually obvious to achieve and can bring surprisingly big
| improvements.
| 
| The first is that RAM accessed randomly or in the wrong direction
| performs very poorly compared to sequential access in order. The
| same is true of cache because its much easier to predict and
| prefetched data when its accessed serially. The effect is so
| dramatic that its sometimes worth "accessing" data you don't need
| to get access to this hardware optimisation to maintain serial
| access. At times for example I have found Treemaps and listmaps
| perform better than hashmaps as the cheap search serially and in
| memory order surprisingly out does the O(log(n)) or O(n)
| algorithms and the cost of the hash.
| 
| If you have a data structure that is larger than optimal and
| overflows cache lines it can have big impacts on cache
| utilisation and performance if the algorithm doesn't need all the
| fields. Object orientation impacts quite heavily on this and
| certain designs can make very large objects and these often get
| poor cache usage.
| 
| Unnecessary allocation and deallocation of memory in algorithms
| often results in the cache getting thrashed and avoiding this and
| using the same memory can result in better hot cache usage and
| reducing garbage collection usage.
| 
| Most profilers can give you a good idea of the allocations but
| the other two are often harder to find and mostly just show up in
| cache misses.
 
___________________________________________________________________
(page generated 2022-11-14 23:00 UTC)