[HN Gopher] SRE Doesn't Scale
___________________________________________________________________
 
SRE Doesn't Scale
 
Author : kiyanwang
Score  : 46 points
Date   : 2021-10-11 06:22 UTC (1 days ago)
 
web link (bravenewgeek.com)
w3m dump (bravenewgeek.com)
 
| pram wrote:
| "Hiring experienced, qualified SREs is difficult and costly.
| Despite enormous effort from the recruiting organization, there
| are never enough SREs to support all the services that need their
| expertise."
| 
| Uh huh. Maybe what isn't scaling is their onerous recruiting
| filters.
 
  | anonporridge wrote:
  | A different element of this is that software professionals are
  | often paid so well that a lot of people can take early
  | retirement relatively easily if they're good about saving and
  | investing. And the FIRE movement is only growing.
  | 
  | I sometimes wonder if the software world has relatively greater
  | 'leakage' via early retirement than other fields, creating a
  | constant problem of not enough highly experienced people who
  | remain stuck as wage slaves throughout a 40 year career.
 
  | bostonsre wrote:
  | It's good to be an sre. From my experience, there is high
  | demand, low supply and schools don't really teach for sre roles
  | so the pipeline stays small.
 
    | outworlder wrote:
    | Yes.
    | 
    | And then there's the unrealistic demands from companies that
    | didn't even understand what DevOps was supposed to be, and
    | are misunderstanding SRE even more. Many companies still
    | treats this the same way they did "ops".
 
  | zaptheimpaler wrote:
  | Listen, if you can't write a working text justification
  | algorithm in 45 minutes a decade after college, you have no
  | business doing completely unrelated SRE stuff like monitoring
  | services or scaling databases. Everybody knows leetcode is the
  | key to software. If you can't do that, maybe you just don't
  | have the right IQ to build the next chat app @ Google.
 
    | GauntletWizard wrote:
    | This but unironically. If you can't do a simple programming
    | task, (we can argue over what text justification is, because
    | "left-pad" could qualify, as could a full TrueType renderer)
    | you don't belong in SRE - Much of the point is to have cross-
    | functional people, who can interface with developers on their
    | level, _and_ do the math and grunt work around operating a
    | database cluster.
 
      | zaptheimpaler wrote:
      | I once met a man who was intimately familiar with the
      | details of the linux kernel and how the new chiplet
      | architecture in AMD processors resembles a NUMA
      | architecture and thus impacts VM performance. He was well
      | versed in shell scripting, k8s, docker, the principles of
      | observability, and infrastructure as code. He could explain
      | the difference between READ COMMITED and REPEATABLE READ or
      | LSTMs or distributed consistency models off the top of his
      | head. He didn't have a CS degree so obviously he wasn't as
      | intelligent as me, but yet I found him a little
      | intimidating for some reason.
      | 
      | But then I asked him -
      | 
      | "Given an array of positive integers target and an array
      | initial of same size with all zeros.
      | 
      | Return the minimum number of operations to form a target
      | array from initial if you are allowed to do the following
      | operation:                   Choose any subarray from
      | initial and increment each value by one.
      | 
      | "
      | 
      | He was stumped. As I had suspected, he wasn't quite up to
      | the job of an SRE. I immediately failed him and went about
      | editing my networking.yaml file. Someone has to maintain
      | the bar around here..
 
  | [deleted]
 
| karmakaze wrote:
| To expand on the brief title
| 
| > Google [...] says the SRE model [...] does not scale with
| microservices. Instead, they go on to describe a more tractable,
| framework-oriented model to address this through things like
| codified best practices, reusable solutions, standardization of
| tools and patterns
| 
| Basically anyone planning on microservices should define and
| monitor bounds on which frameworks, tools and diversity of design
| patterns in use. Good advice at any scale.
 
| igetspam wrote:
| > Google enforces standards and opinions around things like
| programming languages, instrumentation and metrics, logging, and
| control systems surrounding traffic and load management.
| 
| I think the author read this as more of a problem than a
| solution. This concept is supported by the DevOps model too. Your
| infrastructure is just as much a part of your product and the
| teams providing the infrastructure just as responsible for the
| service levels and API contracts as any customer facing product
| team.
 
| klodolph wrote:
| It sounds like the lesson here is that tacking on SRE to an out-
| of-control development process, churning out new services by the
| boatload, doesn't scale.
| 
| This is caused by the typical attitude software companies have
| towards development. The most common model for software
| development is simple... rush to push out new features to the
| market, and pay the costs later--and then everyone is balking at
| the costs.
| 
| The solution is kind of brilliant, IMO. You don't have to pay
| technical debt on projects if you shut them down and delete the
| code from your repository. Migrate to industry-standard
| solutions. Use off-the-shelf programs & libraries. Delete all
| your custom stuff. Replace good solutions with "good enough"
| solutions.
| 
| The SREs can help you with that, but they can't help with out-of-
| control development. As your code base gets larger, the cost of
| supporting that code base gets larger too. The difficulty of
| scaling your SRE to match development reflects your out-of-
| control development process, not a problem with SRE. Keep the
| costs under control by keeping your code base under control.
 
  | bcrosby95 wrote:
  | Funny, I read it differently. They talk about frameworks,
  | libraries, and best practices.
  | 
  | Effectively, they're talking about standardization across your
  | teams/services so they don't fuck things up. Essentially,
  | you're taking away some of the purported freedoms of
  | microservices (complete independence - eg I can write this
  | service in brainfuck if I want!) and reigning it in a bit so
  | you don't build a pile of trash.
 
    | klodolph wrote:
    | I think of that kind of standardization kind of like deleting
    | code. Stuff like, "We are deprecating support for Python in
    | SRE, no new projects may be shipped in Python."
 
  | lykr0n wrote:
  | Yep. SRE is not a substitute for high level, overarching
  | architects and designers.
  | 
  | One pattern I see is that, as the company grows the development
  | gets split into different product groups which will organically
  | diverge unless there is rigid enforcement of design patterns.
  | In some places, SRE does this implicitly because they will only
  | support X, Y, or Z but in others each product group will have
  | their own group of SREs.
  | 
  | There becomes a point when you need one or a small group of
  | people who are the opinionated developers who can make design
  | decisions and who have the authority to cause everyone else to
  | course correct. If you don't have this, you'll wind up with
  | long migrations and legacy stuff that never seems to go away.
 
  | rektide wrote:
  | My read on the article was that much more was related to each
  | team being on their own to set up & drive their pipelines,
  | operate their own services, and there being a lack of
  | commonality/shared experience.
  | 
  | A vast number of the software engineers don't get the ops
  | (running software) stuff hardly at all & half of them can sort
  | of play along, hack stuff into place. The engineers on product
  | teams who do know how to do things meanwhile don't get all the
  | constraints, best practices, ideas that other various DevOps
  | folk have done & have their own wants/desires/expected ways of
  | doing things, so they end up creating their own very unique
  | sub-ways of doing things within the org. None of these
  | practices converge on regularity or consistency with what
  | DevOps machinery ends up being built.
  | 
  | What we do have often is just a random pile of containers and
  | scripts that a couple people sort of know decently & everyone
  | else suffers through & survives within. Almost never does it
  | look like any other company's devops kitchen.
  | 
  | SRE doesn't scale because it's an every now and then thing, and
  | few people notice or care about the difference between a well-
  | built corporate citizen that runs well & is monitored &
  | operated according to whatever the in-power SRE cabal wants.
  | People start to care only if things are going bad, either via
  | services not building/integrating/deploying/running as well as
  | they should, or from too much confoundedness/general head
  | scratching by either the SRE or regular engineers. SRE is not a
  | priority, it's not practiced regular, it's only an every-now-
  | and-then thing, so we don't have the chance to get good, to
  | institutionalize the right ways of doing things. That's what
  | the articles is discussing. Not the rest of the everyday normal
  | software development rushing-bedlam you describe.
 
    | klodolph wrote:
    | > SRE doesn't scale because it's an every now and then thing,
    | ...
    | 
    | That's the part that doesn't scale... tacking on SRE at the
    | end, or doing it every now and then. The reason people don't
    | care about the software being a "well-built corporate
    | citizen" is because they care more about shipping features.
    | If you have an SRE team that will say "no" to you when you
    | try to ship new stuff, you'll eventually figure out a way to
    | build new things in a way that the SRE team will say "yes".
    | When I say "no", that could be a hard pushback like "no,
    | that's not getting shipped" or it could be an answer like,
    | "no, the SRE team will not support that, yet."
    | 
    | These kind of decisions need to be made at a high level,
    | because everyone in the institution is typically operating
    | with the wrong incentives. That's why you end up with a
    | random pile of containers and scripts. It doesn't have to end
    | up that way, even when you have microservices.
    | 
    | > That's what the articles is discussing. Not the rest of the
    | everyday normal software development rushing-bedlam you
    | describe.
    | 
    | I disagree with the article, so necessarily there are going
    | to be differences between what I'm saying and what the
    | article is saying.
 
  | [deleted]
 
| gautamdivgi wrote:
| > And that move to microservices--in combination with cloud--
| unleashes a whole new level of autonomy and empowerment for
| developers who, often coming from a more restrictive ops-
| controlled environment on prem, introduce all sorts of new
| programming languages, compute platforms, databases, and other
| technologies.
| 
| You need standards, without that SRE is pointless. Everything
| needs a standard method of monitoring. As an e.g. - stick to
| Java/Spring Boot, MariaDB and K8S. That will generally cover 85%
| of your use cases.
| 
| The automation and advantage of SRE is derived through standards
| and familiarity with the tool chain.
 
| mbesto wrote:
| Isn't this more of a comment about microservices than it is about
| SRE? It reads to me like "once you hit a number of microservices
| it ends up looking like a monolith":
| 
| http://highscalability.com/blog/2020/4/8/one-team-at-uber-is...
 
  | iamstupidsimple wrote:
  | Forgive me but aren't 'macroservices' just... services? I don't
  | see the difference.
 
  | wara23arish wrote:
  | dumb question time but what exactly makes something a micro
  | service.
  | 
  | Is the separation of a specific functionality from a wider
  | array of functions to its own vm make it a microservice?
  | 
  | When does something stop being a microservice i guess?
 
    | thecleaner wrote:
    | My definition is separation of infrastructure and deployment
    | cycles. Everything that always in one deployment is one
    | service or stuff thats part of your code-base is definitely
    | not a different service.
 
    | igetspam wrote:
    | It stops being a microservice when a developer starts saying,
    | "oh! We can do X in service Y too! It already does ${similar
    | work} and reads/writes from/to ${data source}, so why not?"
    | 
    | The intended model is to do one thing, thus enabling surgical
    | changes to functionality without having to rebuild
    | everything. As long as you stick to your API contracts, you
    | can muck around with the internals without effecting anything
    | else.
 
    | forty wrote:
    | I remember asking a candidate whether they were doing
    | microservices at her current job.
    | 
    | She answered "I don't know if we have microservices, but we
    | do have services that don't do much"
    | 
    | It's since then that's my definition of a microservice :)
 
    | notyourday wrote:
    | > dumb question time but what exactly makes something a micro
    | service.
    | 
    | This leftpad as a service, over HTTPS
 
___________________________________________________________________
(page generated 2021-10-12 23:00 UTC)