proxy70

	[HN Gopher] SRE Doesn't Scale ___________________________________________________________________ SRE Doesn't Scale Author : kiyanwang Score : 46 points Date : 2021-10-11 06:22 UTC (1 days ago)
	web link (bravenewgeek.com)
	w3m dump (bravenewgeek.com)
	\| pram wrote: \| "Hiring experienced, qualified SREs is difficult and costly. \| Despite enormous effort from the recruiting organization, there \| are never enough SREs to support all the services that need their \| expertise." \| \| Uh huh. Maybe what isn't scaling is their onerous recruiting \| filters. \| anonporridge wrote: \| A different element of this is that software professionals are \| often paid so well that a lot of people can take early \| retirement relatively easily if they're good about saving and \| investing. And the FIRE movement is only growing. \| \| I sometimes wonder if the software world has relatively greater \| 'leakage' via early retirement than other fields, creating a \| constant problem of not enough highly experienced people who \| remain stuck as wage slaves throughout a 40 year career. \| bostonsre wrote: \| It's good to be an sre. From my experience, there is high \| demand, low supply and schools don't really teach for sre roles \| so the pipeline stays small. \| outworlder wrote: \| Yes. \| \| And then there's the unrealistic demands from companies that \| didn't even understand what DevOps was supposed to be, and \| are misunderstanding SRE even more. Many companies still \| treats this the same way they did "ops". \| zaptheimpaler wrote: \| Listen, if you can't write a working text justification \| algorithm in 45 minutes a decade after college, you have no \| business doing completely unrelated SRE stuff like monitoring \| services or scaling databases. Everybody knows leetcode is the \| key to software. If you can't do that, maybe you just don't \| have the right IQ to build the next chat app @ Google. \| GauntletWizard wrote: \| This but unironically. If you can't do a simple programming \| task, (we can argue over what text justification is, because \| "left-pad" could qualify, as could a full TrueType renderer) \| you don't belong in SRE - Much of the point is to have cross- \| functional people, who can interface with developers on their \| level, _and_ do the math and grunt work around operating a \| database cluster. \| zaptheimpaler wrote: \| I once met a man who was intimately familiar with the \| details of the linux kernel and how the new chiplet \| architecture in AMD processors resembles a NUMA \| architecture and thus impacts VM performance. He was well \| versed in shell scripting, k8s, docker, the principles of \| observability, and infrastructure as code. He could explain \| the difference between READ COMMITED and REPEATABLE READ or \| LSTMs or distributed consistency models off the top of his \| head. He didn't have a CS degree so obviously he wasn't as \| intelligent as me, but yet I found him a little \| intimidating for some reason. \| \| But then I asked him - \| \| "Given an array of positive integers target and an array \| initial of same size with all zeros. \| \| Return the minimum number of operations to form a target \| array from initial if you are allowed to do the following \| operation: Choose any subarray from \| initial and increment each value by one. \| \| " \| \| He was stumped. As I had suspected, he wasn't quite up to \| the job of an SRE. I immediately failed him and went about \| editing my networking.yaml file. Someone has to maintain \| the bar around here.. \| [deleted] \| karmakaze wrote: \| To expand on the brief title \| \| > Google [...] says the SRE model [...] does not scale with \| microservices. Instead, they go on to describe a more tractable, \| framework-oriented model to address this through things like \| codified best practices, reusable solutions, standardization of \| tools and patterns \| \| Basically anyone planning on microservices should define and \| monitor bounds on which frameworks, tools and diversity of design \| patterns in use. Good advice at any scale. \| igetspam wrote: \| > Google enforces standards and opinions around things like \| programming languages, instrumentation and metrics, logging, and \| control systems surrounding traffic and load management. \| \| I think the author read this as more of a problem than a \| solution. This concept is supported by the DevOps model too. Your \| infrastructure is just as much a part of your product and the \| teams providing the infrastructure just as responsible for the \| service levels and API contracts as any customer facing product \| team. \| klodolph wrote: \| It sounds like the lesson here is that tacking on SRE to an out- \| of-control development process, churning out new services by the \| boatload, doesn't scale. \| \| This is caused by the typical attitude software companies have \| towards development. The most common model for software \| development is simple... rush to push out new features to the \| market, and pay the costs later--and then everyone is balking at \| the costs. \| \| The solution is kind of brilliant, IMO. You don't have to pay \| technical debt on projects if you shut them down and delete the \| code from your repository. Migrate to industry-standard \| solutions. Use off-the-shelf programs & libraries. Delete all \| your custom stuff. Replace good solutions with "good enough" \| solutions. \| \| The SREs can help you with that, but they can't help with out-of- \| control development. As your code base gets larger, the cost of \| supporting that code base gets larger too. The difficulty of \| scaling your SRE to match development reflects your out-of- \| control development process, not a problem with SRE. Keep the \| costs under control by keeping your code base under control. \| bcrosby95 wrote: \| Funny, I read it differently. They talk about frameworks, \| libraries, and best practices. \| \| Effectively, they're talking about standardization across your \| teams/services so they don't fuck things up. Essentially, \| you're taking away some of the purported freedoms of \| microservices (complete independence - eg I can write this \| service in brainfuck if I want!) and reigning it in a bit so \| you don't build a pile of trash. \| klodolph wrote: \| I think of that kind of standardization kind of like deleting \| code. Stuff like, "We are deprecating support for Python in \| SRE, no new projects may be shipped in Python." \| lykr0n wrote: \| Yep. SRE is not a substitute for high level, overarching \| architects and designers. \| \| One pattern I see is that, as the company grows the development \| gets split into different product groups which will organically \| diverge unless there is rigid enforcement of design patterns. \| In some places, SRE does this implicitly because they will only \| support X, Y, or Z but in others each product group will have \| their own group of SREs. \| \| There becomes a point when you need one or a small group of \| people who are the opinionated developers who can make design \| decisions and who have the authority to cause everyone else to \| course correct. If you don't have this, you'll wind up with \| long migrations and legacy stuff that never seems to go away. \| rektide wrote: \| My read on the article was that much more was related to each \| team being on their own to set up & drive their pipelines, \| operate their own services, and there being a lack of \| commonality/shared experience. \| \| A vast number of the software engineers don't get the ops \| (running software) stuff hardly at all & half of them can sort \| of play along, hack stuff into place. The engineers on product \| teams who do know how to do things meanwhile don't get all the \| constraints, best practices, ideas that other various DevOps \| folk have done & have their own wants/desires/expected ways of \| doing things, so they end up creating their own very unique \| sub-ways of doing things within the org. None of these \| practices converge on regularity or consistency with what \| DevOps machinery ends up being built. \| \| What we do have often is just a random pile of containers and \| scripts that a couple people sort of know decently & everyone \| else suffers through & survives within. Almost never does it \| look like any other company's devops kitchen. \| \| SRE doesn't scale because it's an every now and then thing, and \| few people notice or care about the difference between a well- \| built corporate citizen that runs well & is monitored & \| operated according to whatever the in-power SRE cabal wants. \| People start to care only if things are going bad, either via \| services not building/integrating/deploying/running as well as \| they should, or from too much confoundedness/general head \| scratching by either the SRE or regular engineers. SRE is not a \| priority, it's not practiced regular, it's only an every-now- \| and-then thing, so we don't have the chance to get good, to \| institutionalize the right ways of doing things. That's what \| the articles is discussing. Not the rest of the everyday normal \| software development rushing-bedlam you describe. \| klodolph wrote: \| > SRE doesn't scale because it's an every now and then thing, \| ... \| \| That's the part that doesn't scale... tacking on SRE at the \| end, or doing it every now and then. The reason people don't \| care about the software being a "well-built corporate \| citizen" is because they care more about shipping features. \| If you have an SRE team that will say "no" to you when you \| try to ship new stuff, you'll eventually figure out a way to \| build new things in a way that the SRE team will say "yes". \| When I say "no", that could be a hard pushback like "no, \| that's not getting shipped" or it could be an answer like, \| "no, the SRE team will not support that, yet." \| \| These kind of decisions need to be made at a high level, \| because everyone in the institution is typically operating \| with the wrong incentives. That's why you end up with a \| random pile of containers and scripts. It doesn't have to end \| up that way, even when you have microservices. \| \| > That's what the articles is discussing. Not the rest of the \| everyday normal software development rushing-bedlam you \| describe. \| \| I disagree with the article, so necessarily there are going \| to be differences between what I'm saying and what the \| article is saying. \| [deleted] \| gautamdivgi wrote: \| > And that move to microservices--in combination with cloud-- \| unleashes a whole new level of autonomy and empowerment for \| developers who, often coming from a more restrictive ops- \| controlled environment on prem, introduce all sorts of new \| programming languages, compute platforms, databases, and other \| technologies. \| \| You need standards, without that SRE is pointless. Everything \| needs a standard method of monitoring. As an e.g. - stick to \| Java/Spring Boot, MariaDB and K8S. That will generally cover 85% \| of your use cases. \| \| The automation and advantage of SRE is derived through standards \| and familiarity with the tool chain. \| mbesto wrote: \| Isn't this more of a comment about microservices than it is about \| SRE? It reads to me like "once you hit a number of microservices \| it ends up looking like a monolith": \| \| http://highscalability.com/blog/2020/4/8/one-team-at-uber-is... \| iamstupidsimple wrote: \| Forgive me but aren't 'macroservices' just... services? I don't \| see the difference. \| wara23arish wrote: \| dumb question time but what exactly makes something a micro \| service. \| \| Is the separation of a specific functionality from a wider \| array of functions to its own vm make it a microservice? \| \| When does something stop being a microservice i guess? \| thecleaner wrote: \| My definition is separation of infrastructure and deployment \| cycles. Everything that always in one deployment is one \| service or stuff thats part of your code-base is definitely \| not a different service. \| igetspam wrote: \| It stops being a microservice when a developer starts saying, \| "oh! We can do X in service Y too! It already does ${similar \| work} and reads/writes from/to ${data source}, so why not?" \| \| The intended model is to do one thing, thus enabling surgical \| changes to functionality without having to rebuild \| everything. As long as you stick to your API contracts, you \| can muck around with the internals without effecting anything \| else. \| forty wrote: \| I remember asking a candidate whether they were doing \| microservices at her current job. \| \| She answered "I don't know if we have microservices, but we \| do have services that don't do much" \| \| It's since then that's my definition of a microservice :) \| notyourday wrote: \| > dumb question time but what exactly makes something a micro \| service. \| \| This leftpad as a service, over HTTPS ___________________________________________________________________ (page generated 2021-10-12 23:00 UTC)