|
| nerpderp82 wrote:
| We really don't know what causes perf differences, layout has
| more of an effect than -O1 to -O3.
|
| "Performance Matters" by Emery Berger
| https://www.youtube.com/watch?v=r-TLSBdHe1A
| tmtvl wrote:
| I wonder if he mentions running from a different directory and
| running as a different user to highlight how annoying the
| effect from the environment is, as USER and PWD are part of the
| environment.
| lob_it wrote:
| Caching db queries and media assets already got a big boost a
| decade ago going from spinning rust to ssd to even faster NVME.
|
| Optimizing DB queries on spinning rust was an exciting experience
| though. Chasing milliseconds with caching and warming caches up
| still brings back good memories :)
|
| Scale and architecture makes new designs much easier. 1Mb of
| code... A 100Mb database and 1Gb of media assets is becoming a
| relic like "big iron". Blank slate canvases should skip a lot of
| it.
|
| Technology debt only gets harder to manage over time.
| ISL wrote:
| This happens in life, too. When your counterparts lose continuity
| with your work, they turn to other work and the context-switch to
| return to working with you becomes increasingly hard.
|
| Helps to be quick.
| joosters wrote:
| "Almost all programming can be viewed as an exercise in caching."
| hinkley wrote:
| About 15 years ago I was on a team that was having so many
| problems with caching that I began to see it for the liability
| that it is. It started with a 2-4 week delay each release cycle
| while they fixed caching bugs the testers had found, but once I
| joined the project I saw how deep the problem went.
|
| A number of their features invited a full table scan, which is
| fundamentally incompatible with an LRU cache, and in particular
| has disastrous stair step functions as the working set grows.
| Note that this is also a couple of years before anyone had heard
| of memcached so these were DIY implementations with multiple
| potential sources of bugs.
|
| About the time I found my feet on the project someone wanted to
| add caching to yet another workflow that was taking 30 seconds to
| run. It was an admin workflow so there was a little tolerance of
| slow response times, but not 30 seconds. I volunteered for the
| story because I thought we could do better than more caches.
|
| Through conversation, step debugging, and profiling I found the
| spot where they had wanted the caching. What the feature was
| doing was two relatively expensive lookups of the same data,
| filtering it on two very different dimensions, and returning the
| intersection of the two. In typical ORM fashion, both of the
| concerns were taking it upon themselves to fetch their own data
| instead of having it handed to them. Since they were both called
| from the same function, I hoisted the lookup out of the callee
| into the caller, sending the same list to both. Then I changed
| the code again to feed the result of call 1 to call 2, to make it
| work more like consecutive filters.
|
| The end result was that the page now loaded in 3 seconds, and I
| was able to write better unit tests for this code. One might
| expect that most of that improvement came from running the second
| filter on a much smaller list, but that's where you'd be wrong.
| The majority of that improvement came from the hoisting, not the
| chaining. Due to the amount of work in the filters, the initial
| lookup shouldn't have accounted for more than a third of the page
| load time, but just sharing the same data dropped the response
| time to around 6 seconds. Chaining the filters ended up only
| saving 3 seconds.
|
| The problem with large data sets is that CPUs have caches too.
| And in a garbage collected programming language, there are
| additional costs to carrying around duplicated data, particularly
| if it is in tree form like this was. Most likely we were smashing
| the hell out of CPU caches and introducing extra GC pressure to
| boot. By passing in the same object tree the second call followed
| warm and/or hot paths through heap memory.
|
| I'd had a similar but less dramatic experiences a few years
| before that, where I identified that a function, which accounted
| for 10% of the runtime, was being invoked roughly twice as often
| as the problem required. By that point 5% looked like low hanging
| fruit, so I eliminated the duplicate calls and reran the
| benchmarks. But the results didn't make sense so I tweaked and
| prodded and fiddled. The new time was 20% lower than the old
| time, for what should have been 5%. This is already way too long
| so I won't go into the reasons here, but in addition to the
| scenarios I mentioned for the previous issue, there's also
| problems of timing accuracy in profilers, and cyclic use patterns
| that can misattribute the expense of a call to siblings or the
| parent. A function that creates many objects may never reach the
| GC threshold, but a large allocation later in the sequence
| diagram may. Or collide with cache lines that are needed again
| later in the call graph.
| bawolff wrote:
| I feel like the moral of this story is less "caching is a
| liability" and more garbage in garbage out. Caching can help
| things when used appropriately, but its not magic.
| nerpderp82 wrote:
| Your performance topology is dictated by multiple hierarchies
| of caches, many of which you might not be aware of. If they
| disappear your application may no longer function. The cache
| is no longer and optimization but a critical component.
| Figuring out how to _fill_ it might be harder than how to
| invalidate it.
| hinkley wrote:
| > Figuring out how to fill it might be harder than how to
| invalidate it.
|
| This always seems to be a problem for someone else or for
| another day, which is part of the frustration.
|
| Cold caches killing your app are a considerable problem,
| doubly so if you deploy in the Cloud because now you aren't
| necessarily guaranteed to be deploying into a hot data
| center. I think there's an unspoken assumption we haven't
| really eliminated that was if you came to work and the
| power was out, you not only expected none of the software
| to work, but you peer pressured other people into not
| complaining too loudly about it when the power came back
| on. Today is a loss. Everyone can see it.
|
| But for instance if I work on an SaaS project in the Cloud,
| not only can I just lose a region, but for many problem
| domains my customers have mostly local customers, then I
| have diurnal cycles of traffic that are customer specific.
| All the non-bot traffic is in New York until 6am GMT, say.I
| might want to spin regions down pretty aggressively during
| parts of the day, but if spinning back up results in a
| thundering herd problem because 'the cache makes us fast'?
| Well then the value of my circuit breaker alerts drops to
| less than zero.
|
| That's just the first time things break. Caches and
| capacity planning are also at odds, because capacity
| planning is about planning your worst case and caches are
| almost always about the happy path.
| hinkley wrote:
| Silver bullets kill everyone, not just werewolves.
|
| Garbage In, Garbage Out is not wrong per se, but we have in
| the real world the concept of an attractive nuisance, where
| the victim is not entirely at fault for their injuries. The
| thing, being prone to invite misuse and misadventure, bears
| some responsibility for protecting people from themselves.
|
| Caching is an attractive nuisance. These days it's simply
| replaced Global Shared State which used to be the boogeyman,
| but sadly we haven't collectively clued in on them being
| equivalent.
| bawolff wrote:
| In this situation it was stuff that would be terrible both
| with and without caching, that didn't have much to do with
| caching itself, just the caching just provided a bit of a
| bandaid. I think the more apt metaphor might be "seatbelts
| make people drive recklessly".
|
| Would fast computers also be an attractive nuiscense
| because you dont have to think about performance as much?
| hinkley wrote:
| > Would fast computers also be an attractive nuisance
| because you don't have to think about performance as
| much?
|
| Sometime in the last week or two someone asserted just
| that. I didn't dig into that discussion very far but on
| the face of it? Sure. Certainly grabbing the fastest
| hardware available to Man is an expensive proposition and
| you should hold that in reserve. You don't get to play
| that card very often, and much less so today than during
| the Golden Era. Post Moore's Law for sure, but to an
| extend post-Dennard as well, since power draw of a
| facility is a significant cost center. And without
| Dennard you can make them more powerful but keeping them
| efficient gets expensive.
| dathinab wrote:
| Artickle: speed up lead to avoiding cache misses which lead to
| further speed up which lead to further less cache misses
|
| This kind of interaction is important to keep in mind.
|
| It applies to all kinds of caches in all kinds of situations. To
| some degree even the instruction and data caches of modern cpus.
|
| But most interesting is that in complex systems in (very very)
| rare cases the "whole system effect" can go the other way around
| too (due to a now faster components trashing the cache for other
| components in a way which makes the system as a whole slower).
| RicoElectrico wrote:
| What kind of batch process could that be?
| mananaysiempre wrote:
| Judging by the off-hand mentions in the article, something
| along the lines of ingesting the cargo manifest of a large
| ship?
| mjb wrote:
| I really love this point.
|
| For most applications the optimal size of a (time-based) cache is
| around the point that keeping an item around in the cache is
| equal to the cost of accessing it again. This is a point that Jim
| Gray made in the classic "The 5 Minute Rule for Trading Memory
| for Disc Accesses"
| (http://www.hpl.hp.com/techreports/tandem/TR-86.1.pdf) back in
| 1986.
|
| A side effect of that is that systems that run faster, and
| therefore access each cached it more often, get more benefit out
| of a cache by pushing that set point out further. Whether that
| happens in any given system depends a lot on the access patterns,
| and especially temporal locality within those access patterns,
| but the effect is real in a lot of different kinds of systems.
|
| > It's common wisdom that systems with caches can fail badly when
| the caches are cold. When every request misses the cache, the
| system is overloaded, and never recovers. What I observed was the
| mirror image.
|
| The bottom line is that both locality and memoization are huge
| contributor to the performance of systems, both positive and
| negative. When folks (including me) point out the risks of
| caches, we're not implying that caches are always bad. Instead,
| it's pointing out a risk that sometimes they are too _good_ ,
| leading to a system that can't function when the normal level of
| locality and memoization are not available.
| bombcar wrote:
| Reminds me of: There are only two hard things
| in Computer Science: cache invalidation, and naming things, and
| off-by-one errors.
|
| What people don't realize is caches are much harder than just
| invalidation.
|
| I'm a fan of continually testing (and running) things without
| cache at all just to verify what they do, caching can often
| cover a multitude of sins that you really don't need to be
| doing.
| hinkley wrote:
| Caches are global shared state. Correction: _lossy_ global
| shared state.
|
| 'Naming things' when they are in a global namespace is really
| hard, to the point where I'm starting to thing that cache
| invalidation and naming things are not two problems but the
| same problem in two manifestations.
| kwhitefoot wrote:
| > Caches are global shared state
|
| Not necessarily. I designed an object oriented simulation
| system which had a lot of calculated properties where the
| calculation was a substantial amount of work but the result
| would be queried many times before the input the function
| changed. Caching the results inside the object provided a
| substantial speed up (orders of magnitude) while keeping
| the cache completely local and hidden from the client code
| and keeping the program very modular.
|
| Keeping the cache inside the objects allowed us to keep the
| parts of the program neatly separated and minimized the
| amount of knowledge that client code, and more importantly
| client coders, needed to have about the objects they were
| using.
| bmm6o wrote:
| If we're splitting hairs, your situation sounds more like
| memoization, where you have saved the output of an
| expensive but deterministic calculation. With the correct
| assumptions, this can never result in incorrect behavior.
| Worst case is excessive memory consumption. Whereas a
| stale cache is something that needs to be taken into
| account.
| hinkley wrote:
| What happens when the system of record changes the answer
| halfway through the transaction?
|
| By caching answers to calculations you've moved the
| source of truth from the function to the cache. But the
| cache is also acting as a proxy for the system of record,
| which is a conflict of interests. That's substantially
| part of the problem with cache invalidation.
|
| ETA: There's also a common category error here that I see
| with people who grossly misunderstand DP as 'caching'.
| First that they think that caching and memoization are
| the same thing, which they emphatically are not, and
| second that DP is about memoization, which means they
| listened for the first ten minutes and then fell asleep.
|
| Tabulation is much more powerful than memoization, so
| equating DP with memoization is impoverishing it and
| yourself.
|
| But importantly here, the distinction between memoization
| and caching is that memoization is scoped while caching
| is global. With memoization a single call tree gets the
| same answers _every time_ , which means the mental
| process of Local Reasoning is maintained. If you know the
| call graph for this operation you know everything that
| interacts with it. You know if 'eviction' is even a
| thing. You also know that your unit tests are testing the
| code as it will behave under load, not some fictional
| happy path.
|
| tl;dr: If your calculation is important it should show up
| in your data flow, not arrive out of band from a magic
| oracle.
| hinkley wrote:
| Caching is a weapon of last resort. In most cases it is the
| last major performance improvement you will see from your code,
| even if other options were available, and any information
| architecture problems you have are set in stone or will get
| worse.
|
| Because of the locality aspect, people will assume that it's
| 'free' to look up a value multiple times instead of passing
| these data dependencies between the relevant methods. This
| turns your cache into global shared state, and once that
| happens you can't turn the cache off again. There are other
| problems with this arrangement as well.
|
| The easiest to understand (and yet I've failed on at least one
| project) is that if the working set for a problem ever becomes
| proportional to the size of the cache (>1 for single task
| systems, > 1/n for systems with n concurrent tasks), then you
| can end up evicting data in the middle of a transaction. The
| obvious consequence is that it causes performance to jump off a
| cliff. Particularly bad behavior with LRUs but switching to a
| different statistical model simply reduces the height of the
| cliff. There is still a cliff. The less obvious consequence is
| now you have concurrency bugs, because the code assumes that
| decisions made about the data will go the same way on each
| call, but due to eviction, two halves of the transaction may
| make conflicting decisions with undefined behavior.
|
| But speaking as the guy who fixes the perf issues other people
| gave up on: Once you introduce caching you poison the
| performance well. Flame charts lie when cache is used, and
| after people rely on the cache it is lying in the other
| direction when the cache is off, because you're calling that
| path 5 times now and it's swamping other bottlenecks. This is
| what I really mean when I say it's a weapon of last resort.
| Because once you use it, it attempts to take any other options
| off the table. Caching will be most of the last order of
| magnitude in scalability that your system ever sees. Every
| 'win' from there on will take more time and energy than most
| people are willing to invest.
| brightball wrote:
| I like applying the Circuit Breaker approach to caching.
| Rather than applying a cache everywhere, setup a cache
| wrapper that only triggers if the accessed resource is slower
| than a certain threshold.
|
| The goal is for the cache to go unused most of the time, but
| during times of unexpected system stress the caches will kick
| in, provide relief to the system and reduce impact to users.
| zasdffaa wrote:
| > The goal is for the cache to go unused most of the time
|
| Then you have a useful cache sitting there unused unless at
| peak time - why? Just enable it always. This makes no
| sense.
| brightball wrote:
| Depends on how large the cache is and how much users
| expect instant updates to their data. If real-time or
| near real-time is the expectation, then avoiding the
| cache usage unless necessary preserves their experience
| without incurring the hurdles of cache invalidation that
| grow more complex the more you cache.
|
| EDIT:
|
| Another area where this approach is beneficial is with a
| multi-tenant system where larger tenants querying more
| data could have a much more significant performance
| impact than other tenants on the system.
|
| Time of day/week/month usage spikes can also cause a
| performance degrade for everyone even if it's not
| typically stressing the system.
|
| This approach is just a mitigation to prevent the system
| from getting overload during those times. It's not the
| right approach for every situation. Just a spin on the
| Circuit Breaker Pattern.
|
| https://en.wikipedia.org/wiki/Circuit_breaker_design_patt
| ern
| zasdffaa wrote:
| WTF?
|
| > If real-time or near real-time is the expectation, then
| avoiding the cache usage
|
| Because looking up a precalculated result in a hash table
| is so much slower than not looking it up? And what,
| recalculating it instead is faster than an O(1) hash
| table lookup? If so,why are you caching?
|
| I am just getting more confused by what people are saying
| here.
| hinkley wrote:
| > I am just getting more confused by what people are
| saying here.
|
| Maybe the takeaway is that this shit is hard and everyone
| is a little bit confused in their own way.
|
| I'm not saying the only way to win is not to play, but I
| can neither confirm nor deny rumors that I may have at
| some point muttered this under my breath.
| addaon wrote:
| I do think you've hit on a point of confusion in the
| parent, but...
|
| In a (hard) real-time system the only thing that matters
| is the worst-case execution time -- if your worst-case
| time fits in the allowance you win, if it doesn't you
| lose.
|
| Adding a cache improves the average case (typically), but
| hurts the worst case (always) -- your worst case goes
| from "do expensive operation" to "do lookup in cache,
| fail, do expensive operation." Even if you parallelize
| misses ("do lookup in cache | do expensive operation, on
| cache success cancel expensive operation, on cache miss
| wait for expensive operation" you're generally adding at
| least a few operations in the critical path.
| salawat wrote:
| I'm beginning to wonder whether my personal moniker of
| "signal propagation" is nominatively equivalent to what
| others call "cache invalidation".
|
| I.e. you cannot by definition have a state change until
| the signal thereof has traversed a link from A to B. No
| signal, no change. If you have a cache for one system
| that delays writes to a backing data store referenced by
| other systems that are unaware of that cache, then you
| have necessarily a partition problem.
|
| Every problem in a distributed computing context can
| generally be reduced to a signal propagation problem.
| ilyt wrote:
| It's only a problem if you're breaking deadlines. Like
| having someone's status in communication app update with
| 10s delay is probably not a a problem, so as long as you
| don't ever cache it for longer than that, it is entirely
| fine.
|
| > If you have a cache for one system that delays writes
| to a backing data store referenced by other systems that
| are unaware of that cache, then you have necessarily a
| partition problem.
|
| In most cases people mean caching reads so the problem
| here is stale reads, not delayed writes. It's not
| necessarily the same, some component caching (or
| essentially, queuing) write means that once the write is
| done, all readers (assuming they don't have any extra
| cache) see it at once, while with caching reads (and no
| invalidation), multiple readers can have different view
| hinkley wrote:
| The one that really jams me (and others) up is that
| signal propagation and transaction lifetimes line up
| badly. Caches, being cheap imitations of ACID 'compliant'
| databases, which themselves cheat all day every day to
| make pretty numbers, almost never have read repeatable
| semantics, so if your code splits responsibility for
| fetching a piece of data you have one of the problems I
| complained about at/near the top level: Half of a
| decision can be made with one version of a piece of data
| and the other half with another, resulting in undefined
| behavior. In a system that asserts that A = !B you can
| have an interaction that did some of the activities for A
| and some of the ones for B (because we read A and A' and
| assumed they were identical).
| zasdffaa wrote:
| You're making a lot of assertions without any backup, and
| they don't make much sense.
|
| > Caches, being cheap imitations of ACID 'compliant'
| databases
|
| No they're not, not remotely.
|
| > which themselves cheat all day every day
|
| I don't believe a word of this, have you any actual
| evidence (other than I believe Oracle's serialisable is
| actually snapshot isolation IIRC)
|
| > Half of a decision can be made with one version of a
| piece of data and the other half with another, resulting
| in undefined behavior. In a system that asserts that A =
| !B you can have an interaction that did some of the
| activities for A and some of the ones for B
|
| And this just makes no sense, you're claiming that
| databases are not producing the results they should, if
| that's the case then I don't believe you understand
| transaction isolation levels or indeed how database
| queries work, in which case you have problems other than
| and larger than caching.
|
| This whole thread s just getting weirder.
| hinkley wrote:
| I've always been a bit of a fan of debounce for this but
| many of the obvious implementations are cache based which
| is essentially inviting people to do the thing I just
| said I hate.
|
| The cost of counting upvotes or logged in users is not
| worth the incremental value it affords the user. These
| are examples of situations where 'eventual consistency'
| can be embarrassingly easy. If you split the source of
| truth (SoT) from the system of record (SoR), then the
| cost of updating the 'truth' can be amortized across the
| writes to the SoR. The reads do a straight lookup from
| the SoT, which was the truth at some point but is now the
| fiction everyone is agreeing to. The distinction is that
| everyone agrees to the same source of truth. Nobody ever
| goes asking the SoR for a second opinion, except the SoR,
| which updates the SoT when appropriate. The SoT is dumb,
| the clients are dumber, only the code that manages the
| SoR has any smarts to it.
|
| Read traffic can escalate pretty high in such a system
| without much stress.
| brightball wrote:
| Exactly. Another situation is a multitenant system where
| some tenants are much larger than others, triggering
| larger queries which potentially have a larger
| performance impact.
| ilyt wrote:
| You can queue cache refresh in the background if the
| object in cache is close to the "limit" of how fresh you
| want it to be. Need a bit of cleverness to avoid
| thundering herd problem but otherwise works pretty well.
|
| Also if you really want near-real time you might want to
| actually push the updates directly to cache instead of
| relying client to trigger cache filling. cases where you
| need a ton of data _and_ it served real-time _and_ it 's
| same object that constantly changes (and not say pointer
| to a position of video stream) are pretty rare
| hinkley wrote:
| My problem with the term 'cache' here is that people have
| expectations about what cache is and when you violate
| them you tend to discover that they've returned the
| favor.
|
| > push the updates directly to cache instead of relying
| client to trigger cache filling
|
| You can use cache software to fulfill such a role but
| it's not exactly a cache anymore, and keeping people from
| treating it like one is some kind of cat herding.
|
| You're using it as a data store at this point, which for
| one has very different eviction semantics - exactly 1
| copy of each concept, no more, no less. You can store
| such things in Consul, for instance, and get very low
| stale read latency. The cache implementation scales
| higher but breaks sooner. Pick your poison.
| ilyt wrote:
| Well, it depends on your data model. Like, if you write
| only a little (compared to reads), write-thru model of
| cache (write to cache as well as to backing store) will
| lose you tiny % of hit ratio at profit of never having to
| worry about stale data.
|
| > My problem with the term 'cache' here is that people
| have expectations about what cache is and when you
| violate them you tend to discover that they've returned
| the favor.
|
| That's with everything that have any level of complexity,
| just see how many people got transaction isolation wrong
| just because they had some simplistic view on how it
| works in database they use.
| pyrolistical wrote:
| Sounds more like a dark buffer anti pattern
| ilyt wrote:
| ...instead of having better user experience all the time ?
| And have more code to manage on top of that ?
|
| Sorry but that sounds like terrible idea.
|
| And when system starts being slow _the cache is empty_ ,
| because you waited for system to slow down to start filling
| it
| bawolff wrote:
| Having a different code path that only triggers during high
| load feels pretty risky to me.
| bell-cot wrote:
| _In theory_ , the best way to manage this situation may
| be with a grayzone - if all is well, then there's only a
| (say) 5% chance of going to cache. (To at least catch
| "the cache is broken" errors sooner.) As things get
| worse, that chance goes up as you cross-fade to mostly
| using the cache.
|
| Important Warning: If you do not have a very good
| understanding of your overall system, or neglect to
| carefully dampen the cross-fade functions - then the
| whole thing can demonstrate fast and unpredictable swings
| in behavior. Note that being subpoenaed to testify in a
| "Why did the Crater City chemical plant blow up?"
| investigation is usually bad for one's career.
| hinkley wrote:
| That's better but composition is always the devil in the
| details. Which might be what you mean when you say:
|
| > or neglect to carefully dampen the cross-fade functions
|
| People with a 'caching problem' don't have one cache,
| they have a bunch of them. So that 1 in 20 chance of
| getting grey zoned on one cache, is 1 in 400 chance of
| getting grey zoned on two caches, and a 1 in 8000 chance
| to get grey zoned on 3. That is a very small fraction but
| when you're dealing with the Law of Large Numbers that's
| something that's happening at least once a second for
| many services.
|
| But if you're using this you probably have a lot more
| than 3 caches, and then we have to talk about the odds
| that any request gets grey zoned at least once. which is
| going to be 1 - (19/20)^n. At six caches you're already
| over 1 in 4, which is going to potentially change your
| mean response time quite a bit.
| bell-cot wrote:
| > "...which is going to potentially change your mean
| response time quite a bit."
|
| My experience is that the changing mean response time
| isn't nearly as bad as the changing standard deviation.
|
| The really big problems tend to arise when you have
| unsuspected or under-appreciated positive feedback loops
| - whether in the software, or in associated physical
| systems and human behaviors. Or resonance effects. Those
| situations are where good dampening gets really
| important.
| dist1ll wrote:
| If I had to pick the two worst sources of bugs I can imagine,
| it would be (1) concurrency and (2) caches. Those functions
| often require extensive performance and load testing and have
| tons of non-deterministic failure modes.
| zasdffaa wrote:
| In my _extensive_ experience of optimising stuff...
|
| > Caching is a weapon of last resort. In most cases it is the
| last major performance improvement
|
| ...is wrong. It is one of the very first things I reach for
| because it is the most effective and simplest, once you've
| eliminated O(n^2) behaviour etc. It's not a magic wand but
| nothing is.
| notinfuriated wrote:
| In my _extensive_ experience of optimizing stuff it is
| right.
|
| I cannot even count how many issues were created downstream
| of the decision to start caching X, Y, or Z. At a point,
| every other software engineer gets the bright idea to cache
| their special little function or request or database call.
| Soon enough, there are N layers of cache responsible for
| delivering some special piece of data. The engineers build
| tooling to clear those caches on updates, either manually
| or automatically. Both suffer from issues, the latter
| occasionally failing to perform and just causing general
| confusion about why stale data is returning, and the former
| being far worse, recruiting dozens of business stakeholders
| into the problem of knowing-how-to-expire-cache. They start
| making their own internal wiki pages with steps or various
| cache tools to clear, often incorrect steps because they
| got lucky one day by clearing cache X and not cache Y that
| they did not know about.
|
| Overall, in my experience, allowing software engineers to
| do caching as they wish without any real guardrails has
| caused massive problems most likely on the order of N!. I
| cannot roll my eyes far enough into the back of my head
| when I hear someone suggest caching as a solution any
| longer, at least not on an application that has more than a
| dozen business stakeholders managing data and expecting it
| to be updateable in any reasonable way so that customers
| see the most recent data when the business wants it.
| PaulHoule wrote:
| Caching can turn O(n2) algorithms into much faster
| algorithms. I would point to how transformative tabling is
| in Prolog, not to mention memoization rescuing the
| calculation of fibonacci numbers by recursion, which
| otherwise would be gross malpractice.
| fwip wrote:
| Caching is fine is there's no state involved. e.g: pure
| functions like the fibonacci sequence.
|
| As soon as your cache depends on a mutable state, things
| get really hairy. Cache invalidation is one of the 2 hard
| problems in computer science, after all.
| samus wrote:
| These are very well-behaved instances of caching. They
| are so well-behaved that they can be fully described with
| a little bit of analysis. Once I/O is involved or the
| load isn't well-known beforehand, it becomes much more
| difficult. Lazy I/O in Haskell is a very divisive topic
| for a reason.
| hinkley wrote:
| That's not a very convincing argument. "I use it because it
| works" without refuting any reasons I presented as to why
| it "works" but breaks your other options.
|
| How do you teach people to effectively use a profiler on
| code with pernicious caching?
|
| Part of why I'm the "Cleaner" is because I've worked with
| any number of people who will present "evidence" that
| everything that can be done has been done. Here are the
| metrics from a year ago when we turned on the cache. Here
| are the flame charts. See? Nothing to see here. They get
| really uncomfortable six months later when I'm reporting
| 2-3x in code that can't be optimized further. And in each
| case everything was made more difficult by the people ahead
| of me chumming the waters.
|
| They were partially correct. There was in fact nothing left
| for them to see, so nothing that they knew to do. But
| "there's nothing to see here" and "there's nothing here"
| are two very, very different statements.
| trgn wrote:
| There's a somewhat tangentially related heuristic to your
| points imho. When writing code, when in doubt, task CPU
| over memory, it will generally yield leaner, better
| software. Easier to profile, debug, and optimize. Much
| easier to scale in production environments too.
|
| Code which optimizes for fast running time, tends to lean
| heavily into using memory-bloating data structures and
| caches. These are often poison for aggregate performance
| in real world environments, and like you said among other
| things, reduce visibility in the workings of system.
|
| There's some implications of this stance; e.g.
| optimizations to reduce memory footprint are generally
| never wasted, while optimizations to squeeze out a few ns
| of runtime may have many more unexpected consequences.
| ilyt wrote:
| > How do you teach people to effectively use a profiler
| on code with pernicious caching?
|
| Profile the function that is called if object is not
| found in cache ? Have pluggable cache with plugin that
| just doesn't cache? That is not hard problem to solve.
|
| > They get really uncomfortable six months later when I'm
| reporting 2-3x in code that can't be optimized further.
| And in each case everything was made more difficult by
| the people ahead of me chumming the waters
|
| Well, they delivered (I'm guessing) 10x improvement in a
| month by adding cache. Caching is entirely reasonable
| first step and it can be good enough to be last one.
|
| And frankly ability to use profiler and knowing some
| architectural optimizations seems to be pretty rare
| skill.
| zasdffaa wrote:
| Exactly! People here seem to be making a complex thing
| out of a simple thing.
| Aeolun wrote:
| I kinda feel that your convincing counterargument is just
| 'I have a different experience'.
|
| I've never seen people reach for caches first, and then
| discard all further optimizations.
|
| Or rather, I've seen them do that, but not out of some
| misplaced feeling that it was the only solution, just the
| one that was easiest to accomplish in very little time.
| cogman10 wrote:
| The issue with caches is they are easy to apply. You can
| easily drop a cache in any place you load a resource
| without significant code changes.
|
| I tend to think they are ok weapons of sometimes first
| resort (rarely changing resource with heavy system
| impact). But you'll definitely get better results simply
| by rethinking and rearchitecting stuff. (Did you really
| need to load the database just to lookup a single value?)
| hinkley wrote:
| > You can easily drop a cache in any place you load a
| resource without significant code changes.
|
| Only when the cache is brand new. The problem with people
| with cache addiction is that they think they're free. So
| once they know a cache is there, they begin to lean on
| it, and lean on it. I've seen it with many different
| teams in a number of verticals. Sometimes while I'm
| there, sometimes before I arrived. It's the people not
| the company.
|
| This illusion that you can always turn it off later is
| dangerous. That's only true if you have an impeccable
| information architecture when you start. And if you have
| that, you've already done a lot of heavy lifting before
| caching becomes your only option.
| zasdffaa wrote:
| It's dificult to refute your argument because I don't
| understand your points:
|
| > Because of the locality aspect, people will assume that
| it's 'free' to look up a value multiple times instead of
| passing these data dependencies between the relevant
| methods
|
| what does that even mean? What does 'data dependencies'
| mean specicially?
|
| > then you can end up evicting data in the middle of a
| transaction
|
| So in the middle of a transaction, instead of looking up
| data from a key - very cheap - you should do ... what?
| Recalculate it instead?
|
| > The less obvious consequence is now you have
| concurrency bugs, because the code assumes that decisions
| made about the data will go the same way on each call,
| but due to eviction, two halves of the transaction may
| make conflicting decisions with undefined behavior
|
| Assuming I even understand this, you are saying where
| there's concurrencty there's a risk of pulling stale data
| out of a cache. Well, true. That's just something you
| have to be careful of. Add a Tx reference to the cache
| key.
|
| > Once you introduce caching you poison the performance
| well.
|
| Not my experience at all. I can't understand why you're
| saying this.
|
| > Flame charts lie when cache is used
|
| How so?
|
| > and after people rely on the cache it is lying in the
| other direction when the cache is off, because you're
| calling that path 5 times now and it's swamping other
| bottlenecks
|
| I've no idea what this means
|
| > This is what I really mean when I say it's a weapon of
| last resort. Because once you use it, it attempts to take
| any other options off the table
|
| Heh! Absolutely not! Other optimisations are orthogonal
| to caching IME:
|
| Better algo - orthogonal
|
| Go lower level - orthogonal
|
| Go multicore - orthogonal
|
| Noobish SQL query is begging for a rewrite - orthogonal
|
| Use L1/2/3/ chip cache better, or RAM, by careful data
| layout - orthogonal
|
| Use a better library - orthogonal
|
| Use better data structure eg. than linked list -
| orthogonal
| svieira wrote:
| >> Because of the locality aspect, people will assume
| that it's 'free' to look up a value multiple times
| instead of passing these data dependencies between the
| relevant methods
|
| > what does that even mean? What does 'data dependencies'
| mean specicially?
|
| It means that instead of having a method like
| `entryPoint(int)` that looks up a `Foo` by id and then
| passes the `Foo` around (`verify(Foo)`, `Foo
| update(Foo)`) you instead wind up with a system that
| passes around an `int` (`verify(int)`, `/* Unused */ Foo
| update(int)`) and each layer then looks up `Foo` in the
| `FooService` (because `FooService` has a cache and
| subsequent lookups should be essentially-free).
|
| Even worse when `FooService` is abstracted behind a
| thread-local like `ThreadLocal currentFoo`, the
| implementation of which is
| `fooService.findById(currentFooId.get()) /* cheap,
| because fooService has a cache */`.
|
| My own take on this is that caches are global-by-default
| when you really want local-by-default (e. g. dynamic
| programming instead of a `static final Cache
| theCacheForTheEntireProgram`). Local-by-default caches
| are easy to make transactional (when the call stack
| returns the cache is gone) and don't grow in an unbounded
| manner in the general case.
| hinkley wrote:
| Yes!
|
| Local by default also has the feature/problem of making
| you look at the scale of the data involved in your query
| up front, before it's having a 100-1000 req/s hitting it
| in production. It's harder to pretend this large
| dependency graph will just magically take care of itself
| (or worse, be someone else's problem after you've
| declared victory) when it's staring you in the face.
| gav wrote:
| > Instead, it's pointing out a risk that sometimes they are too
| good, leading to a system that can't function when the normal
| level of locality and memoization are not available.
|
| And at that point cache management starts to be a huge problem.
| Clearing caches becomes a death spiral because your system
| can't actually support your normal load.
|
| For example I worked with a large retailer, who had a giant
| monolithic enterprise Java ecommerce system. It was so slow
| that product detail pages took upwards of 30 seconds to render.
| There was a cache in front of that system (Ehcache), then a
| page-level cache (Varnish) in front of that, then finally
| Akamai took 98% of the load.
|
| Rollouts were terrifying, you cleared the cache and the site
| might come back up. Changing Akamai configuration was also
| similarly risky.
|
| It's critical that systems don't become cache dependent for
| load, they should be a performance boost, not a required
| component.
| nly wrote:
| Caching is easy, cache invalidation is hard.
| latchkey wrote:
| "There are two hard things in computer science: cache
| invalidation, naming things, and off-by-one errors."
|
| https://twitter.com/codinghorror/status/506010907021828096
| withinboredom wrote:
| I have no idea why you're quoting that tweet. This saying is
| much older than 2014.
| PaulKeeble wrote:
| Multiple times I have found the impact of cache performance to be
| dramatic. I find that the usual optimisations in three aspects
| are usually obvious to achieve and can bring surprisingly big
| improvements.
|
| The first is that RAM accessed randomly or in the wrong direction
| performs very poorly compared to sequential access in order. The
| same is true of cache because its much easier to predict and
| prefetched data when its accessed serially. The effect is so
| dramatic that its sometimes worth "accessing" data you don't need
| to get access to this hardware optimisation to maintain serial
| access. At times for example I have found Treemaps and listmaps
| perform better than hashmaps as the cheap search serially and in
| memory order surprisingly out does the O(log(n)) or O(n)
| algorithms and the cost of the hash.
|
| If you have a data structure that is larger than optimal and
| overflows cache lines it can have big impacts on cache
| utilisation and performance if the algorithm doesn't need all the
| fields. Object orientation impacts quite heavily on this and
| certain designs can make very large objects and these often get
| poor cache usage.
|
| Unnecessary allocation and deallocation of memory in algorithms
| often results in the cache getting thrashed and avoiding this and
| using the same memory can result in better hot cache usage and
| reducing garbage collection usage.
|
| Most profilers can give you a good idea of the allocations but
| the other two are often harder to find and mostly just show up in
| cache misses.
___________________________________________________________________
(page generated 2022-11-14 23:00 UTC) |