proxy70

	[HN Gopher] Sensenmann: Code Deletion at Scale ___________________________________________________________________ Sensenmann: Code Deletion at Scale Author : gslin Score : 127 points Date : 2023-04-29 18:47 UTC (4 hours ago)
	web link (testing.googleblog.com)
	w3m dump (testing.googleblog.com)
	\| jawns wrote: \| The most difficult part about code deletion is practicing the \| Chesterton's Fence principle: \| \| > In the matter of reforming things, as distinct from deforming \| them, there is one plain and simple principle; a principle which \| will probably be called a paradox. There exists in such a case a \| certain institution or law; let us say, for the sake of \| simplicity, a fence or gate erected across a road. The more \| modern type of reformer goes gaily up to it and says, "I don't \| see the use of this; let us clear it away." To which the more \| intelligent type of reformer will do well to answer: "If you \| don't see the use of it, I certainly won't let you clear it away. \| Go away and think. Then, when you can come back and tell me that \| you do see the use of it, I may allow you to destroy it. \| \| https://wiki.lesswrong.com/wiki/Chesterton%27s_Fence \| \| While this tool certainly does the job of _proposing_ code \| deletions, that 's the easier part. The harder part is knowing \| why the code exists in the first place, which is necessary to \| know whether it's truly a good idea to remove it. Google, \| smartly, is leaving that part up to a human (for now). \| proper_elb wrote: \| You raise a good point, and I would answer it with agree and \| disagree: \| \| Agree: Yes, you are correct, merely observing that a code path \| was never executed in the last 6 months is not the same as \| understanding why the code path was created in the first place. \| There might be the quite real possibility of an infrequent \| event that appears just once in every two years or so (of \| course, this should also be documented somewhere!). \| \| Disagree: Pragmatically, we _have_ an answer if the code path \| was not executed after 6 months use in production and test: We \| know that, with a very high probability, the code path was \| created either by mistake (human factor) or intentionally for \| some behavior that is no longer expected from our software. To \| continue the Fence metaphor, regarding Sensenmann: After 6 \| months, we know about the Fence that 1) it has no role to play \| in keeping the stuff out that we want out (that was all done by \| other fences that were had contact with an animal at least \| once) and 2) that it _might_ have been used to keep out flying \| elephants or whatever, but no such being was observed in the \| last 6 months (at least the fence made no contact with it, \| which it then should have!) and probably went away. \| \| That said, having a human in the loop is probably a good idea. \| anonymousiam wrote: \| It should also be clear that this article is about "deleting" \| code from an active project, not about "deleting" it entirely \| from the version control system. Thus, any code "deleted" \| through the described process could still easily be restored if \| necessary. \| breck wrote: \| As a counter to Chesterton's Fence: sometimes the fastest way \| to understand what something does is to remove it and see what \| complains. You might get only 1 complainer for every 10 fences \| you take down. Putting that one fence back up takes much longer \| than taking it down, but the time saved from removing the other \| 9 unnecessary ones makes it a net win. And this time you can \| add Documentation to the rebuilt fence. \| macNchz wrote: \| Also known as the scream test in IT: unplug that old server \| and see who screams! \| shagie wrote: \| Microsoft uses a scream test to silence its unused servers \| - https://www.microsoft.com/insidetrack/blog/microsoft- \| uses-a-... \| IshKebab wrote: \| A further counterpoint: if you follow the Fence proponents' \| logic to its conclusion you can _never remove any code_ which \| is clearly an absurd situation. \| \| I think the real logical flaw is that Fencers (as I will now \| call them) put the blame on the person who removes an \| apparently useless fence. But they're wrong. The real blame \| lies with the person who built the apparently useless fence \| and didn't put a sign on it explaining why it shouldn't be \| removed. \| proper_elb wrote: \| > A further counterpoint: if you follow the Fence \| proponents' logic to its conclusion you can never remove \| any code which is clearly an absurd situation. \| \| No, that would only be the case if one would never \| understand any code. Chesterton's Fence consists of two \| parts ("understanding some code" as a precondition to \| "removing some code"), and leaving one or the other part \| out makes it some other thing than what Chesterton's Fence \| means. \| \| > The real blame lies with the person who built the \| apparently useless fence and didn't put a sign on it \| explaining why it shouldn't be removed. \| \| Chesterton's Fence is not about blame, or the past in \| general - it is about how to deal with things that are in \| the present. (Although I agree that the original fence- \| builder should have left a note or two!) \| kube-system wrote: \| The principle says you can't remove it _until you \| understand why it was there_. It's more about doing due \| diligence. \| \| I follow the principle when I remove code, and it's a \| reason why good code comments are important. "Oh yeah, this \| was written for [x] which is no longer a thing, we can \| remove it now" \| hgsgm wrote: \| It's not about blame, it's about making good decisions. \| xboxnolifes wrote: \| And not putting up a sign explaining why it's a necessary \| fence is a bad decision. Avoiding removing all unlabeled \| fences because they _might, maybe, potentially_ be \| useful, is also likely a bad decision if taken to it 's \| conclusion. \| TeMPOraL wrote: \| Chesterton's Fence is a reminder to actually get up, walk \| over to the fence and skim the multiple labels and notes \| the builders left there - because the big problem is \| usually someone looking at a thing that came before them, \| and _assuming_ they understand what it was for, without \| bothering to actually check it. \| einpoklum wrote: \| The article is about the removal of _dead_ code. So, not a \| "fence across the road" - it's a fence that was moved to the \| side of the road, already cleared. The question is just whether \| to dismantle the fence or keep it there just in case. \| Xorlev wrote: \| +1. And, it's in version control forever. It's not as if it \| entirely disappears. Like one of the sibling comments \| mentioned, I only rarely reject Sensenmann CLs. \| \| That's worth explaining: it's automated code deletion, but \| the owner of the code (a committer to that directory \| hierarchy) must approve it, so it's rare there's ever a false \| deletion. \| opportune wrote: \| I don't think you understand Senssenmann fully based on this \| post. At Google basically everything in use has a Bazel-like \| build target. This means the code base is effectively a \| directed "forest"/tree-like data structure with recognizable \| sources and sink. If you can trace through the tree and find \| included-but-not-used code by analyzing build targets, you can \| safely delete it. There are even systems (though not covering \| everything) that sample binaries' function usage you could \| double check against. \| \| > why the code exists in the first place \| \| If the code is unreachable it's at best a "possibly will be \| used in the future" and most likely simply something that was \| used but not deleted when it's last use was removed (or a YAGNI \| liability). \| \| If you can find a piece of code included in build targets but \| unreachable in all of them, it's typically safe to delete. And \| it's not done without permission generally, automation will \| send the change to a team member to double check it's ok to \| delete/nobody is going to start using it soon. \| UncleMeat wrote: \| "This code has been dead for six months" is a _very_ good \| heuristic that the code is not relevant. I do occasionally \| reject the sensenmann CLs, but only very very rarely. This isn \| 't weird code that nobody knows why it exists but it is \| currently doing something. This is code that cannot execute. \| ninjanomnom wrote: \| Code that only triggers from a yearly holiday, disaster \| alerts, leap years, or the like, would have longer periods of \| going unused and likely be very problematic if removed. \| Unless by dead code you mean unreachable code in which case \| it shouldn't exist in the first place and I agree should be \| removed. \| joshuamorton wrote: \| Yes, the nice thing about blaze/bazel + sensenmann is that \| you can very accurately say "this code was not built into a \| binary that has run in the past 6 months". \| \| Sometimes you still want it (e.g. python scripts that are \| used every once in a while for ad-hoc things and might go \| months between uses), but _usually_ the right thing to do \| is productionize stuff like that slightly more (and also \| test it semi-regularly to make sure it hasn 't broken). \| CyberDildonics wrote: \| You can probably get most of that by just looking at the \| atime attribute on the file system. \| joshuamorton wrote: \| Nah, there's stuff that scans the entire repo regularly \| for all kinds of interesting purposes, and of that's \| ignoring the fact that `atime` isn't available or a \| source of truth in piper. \| \| Like conceptually I believe this could be wrong in both \| directions, since there's heavy caching of build \| artifacts, you can totally build a transitive dependency \| of some file without actually reading the file (and \| potentially do this for a relatively long period of time, \| though I don't think that will happen in practice), and \| stuff will regularly look through large swaths of files \| that aren't necessarily run. \| taspeotis wrote: \| Different industries I guess. The new financial year comes \| around every 12mo. Good luck explaining to the accountants \| that you deleted their end-of-year reconciliation reports \| because they didn't run them every 6mo. \| jeffbee wrote: \| Wouldn't it be mostly their fault for approving the \| removal? \| dekhn wrote: \| Google's response to Chesterton's Fence is: "if you liked it, \| then put a test on it". \| \| I used to update the internal version of numpy for Google and \| if people asked me to rollback after I made my update (having \| fixed all the test failures I could detect), and they didn't \| have a test, well, that's their problem. The one situation \| where that rule wouldn't apply is if I somehow managed to break \| production and we needed to do an emergency rollback. \| \| I shed a tear when some of my old, unused code was autodeleted \| at Google, but nowadays my attitude is: HEAD of your version \| control should only contain things which are absolutely \| necessary from a functional selection perspective. \| Forge36 wrote: \| I like that philosophy. In a similar vein: if it's important \| why aren't we testing it. \| \| How do you encourage testing? \| sitkack wrote: \| I hope everyone involved gets L+1! \| [deleted] \| joebiden2 wrote: \| Sincere quesion: what is interesting or novel about this? Is it \| just the scale or did I miss some subtle aspect? \| \| This is more (or less?) the same as industry best practices, just \| scaled up. There is a challenge in scaling up, as there is more \| potential for someone to mess it up. But it's the same technique. \| \| So what am I missing? \| summerlight wrote: \| In the perspective of software engineering economics, the scale \| is important. Everyone knows it's good to clean up unused code \| but they just don't care because they think it doesn't yield a \| short term ROI for themselves. Then why don't we bring the cost \| down and see what happens? Automation changes this equation. \| codemac wrote: \| > the same as industry best practices, just scaled up. \| \| That's like saying S3 is the same as ext4, their the same, just \| scaled up! This is a poor argument, you'll note that S3 and \| ext4 are entirely different things, not "challenges", \| fundamentally different implementations. \| \| Google is the only company I've ever worked for that \| automatically deleted dead code, let alone across a company of \| 100k+ SWE. \| joebiden2 wrote: \| Fair enough and thanks for the reply. Still, for anything \| bothering engineers more than a bit repeatedly, anyone will \| write tools to remove the manual burden. \| \| Our internal practice is to delete code if you suspect it's \| unused, run tests, and if it doesn't affect any tests, go for \| it. This could be automated, but it is not pressing enough, \| so we didn't automate it yet. \| \| We could though, and it may even be a good idea, but I still \| don't get the novelty. But I appreciate your point of view. \| kccqzy wrote: \| I think the takeaway is that at Google's scale, even if you \| think some minor problem is not pressing enough to be \| automated, it will become pressing soon enough. \| jsnell wrote: \| Your proposed process is exactly the wrong way around. \| You'll end up keeping dead code just because it has tests, \| and delete code that's still used in prod just because it \| happened to be untested. \| \| This is one of the details that the blog post goes into. \| Sounds like it's not as trivial and obvious a problem as \| you think it is, and you would have benefited from just not \| dismissing the post because of that. \| brunooliv wrote: \| Not to criticize your POV and argument directly, but, in \| the end, a lot of things, especially like these, are always \| easily subjected to the "we could do it, but just didn't \| bother to yet" kind of argument, and, when it comes down to \| the real work, things are much harder than they \| superficially appear to be. So yeah this isn't new... \| But... You know eheh \| joebiden2 wrote: \| Well, I'd politely agree to disagree. Google scale is \| defined by novel, radical approaches, like for example \| inventing map/reduce, writing papers on LLMs others then \| implement successfully, or creating something like \| Kubernetes. \| \| The specific topic here is not one of those google \| problems to me, as I can compare it to other problems we \| already solved. But yes, we could miss that critical \| point where a totally different problem domain emerges \| just from one order of magnitude more, so fair game. \| btilly wrote: \| I liked most of the blog, but it bothers me to see stuff like, \| "... just as with the introduction of unit testing 20 years \| ago...". \| \| No, unit testing was NOT introduced 20 years ago. As an example, \| Perl 1 was released about 35 years ago with a unit test suite \| that got run on every install. Every version of Perl has done so \| since, and since CPAN came along, most Perl modules have followed \| suit. This was the secret sauce behind Perl's reputation for \| being so portable. \| \| Nor was Perl a pioneer. In fact unit testing was used in the \| 1960s on the Apollo program, and was even called unit testing. I \| believe that the concept can be dated back to a 1950s textbook \| but I can't find the reference. \| \| So unit testing is over 60 years old. \| allanrbo wrote: \| Maybe they just mean that Google started investing seriously in \| unit testing 20 years ago? \| UncleMeat wrote: \| This refers to Google, I'm pretty sure. Early Google didn't \| believe in unit testing and it took a few particularly stubborn \| engineers to demonstrate the value of the practice and convert \| the culture to promoting unit testing. \| charcircuit wrote: \| Does this only get rid of unused binaries? Or is the system smart \| enough to use the profiling infrastructure to identify dead code \| in general? \| kccqzy wrote: \| Profiling is inherently probabilistic and I don't think it \| should be used. Anything that inspects the runtime (dynamic) \| behavior of code isn't good enough for a code deletion tool. \| Only static analysis will do. \| Jolter wrote: \| Just read the article. It's not that long. \| charcircuit wrote: \| I was hoping to spark a conversation about this approach as \| no such thing was mentioned in the article even though it \| should be possible to do. \| \| The whole point of these comment sections is to have a \| discussion. If the point of this site was just to read \| articles there wouldn't be a comment section. \| croes wrote: \| FYI: Sensenmann is the german word for Grim Reaper \| Zetobal wrote: \| I still don't get the tech industries fascination with random \| german words at least here it's sort of fitting. \| aardvarkr wrote: \| Citation needed ^ Maybe there are just some great German devs \| and you're using a lot of their software? \| mflendrich wrote: \| Google's Zurich office has had a tradition of creating \| codenames in German (regardless of the backgrounds of any \| engineers involved). \| \| Source: I worked on Sensenmann. \| meibo wrote: \| Looks like this was made by a team in Zurich, which is mostly \| German, so I imagine it came to them fairly naturally, and \| who doesn't want to pick cool names for hackathon projects. \| evmar wrote: \| It was kind of an internal joke at Google for German- \| speaking teams to make German-named projects. (It's maybe \| only a joke that makes sense to the infamous German sense \| of humor.) \| mkoubaa wrote: \| Sounds like a useful system from which almost nothing is usable \| outside of Google. \| vamega wrote: \| Sure; but working at another very large company, I can say I \| wish we had this. \| \| Old unused code is a huge problem for us. The coordination \| costs of trying to update company wide problems are made much \| more severe by old code. \| \| I wish we had something like this. We're large enough we'd need \| our own system anyway. We don't have a monorepo, and we don't \| use tools so many others do. \| DamonHD wrote: \| I worry about archival and enough history for diagnosing long- \| standing subtle issues that take a long time to surface as bugs. \| This is not theoretical: apparently a TeX bug picked up after \| many years had been there from the start. \| kragen wrote: \| i think piper saves the full history of the whole monorepo; if \| that's correct it's not 'deletion' in that sense \| gravypod wrote: \| (opinions are my own) \| \| > Its goal is simple (at least, in principle): automatically \| identify dead code, and send code review requests \| ('changelists') to delete it. \| \| It sends CLs (pull requests) and shows up as a commit. You \| get a chance to approve or deny the deletion \| kragen wrote: \| yeah but even if you approve it the code is still there in \| the code history, right? \| bradfitz wrote: \| Yes. \| DamonHD wrote: \| Good. \| \| But relatively hard to find and work with. \| dekhn wrote: \| I used to do archeaology on Google's monorepo (IE, \| looking far into the past of mapreduce, search engine, \| ads, and other products) and it wasn't really that hard. \| Heck, there was even a sythetic filesystem where you \| could just cd to a historical commit # and see a view of \| the repo at that timepoint (google's version control is \| based on an always-increasing globally shared commit \| numbers). \| kevinoconnor7 wrote: \| Not really. If you're in the very rare situation where \| you need to diagnose a bug in long-since dead code, you \| can just view repository synced to where the version was \| cut. \| pradn wrote: \| No you just go to the folder and select "show deleted". \| dmoy wrote: \| Don't even need to do that. Codesearch with a `from:0` \| qualifier does full regex search on the history iirc \| tonfa wrote: \| Still fairly easy to find IMO. You can search deleted \| code easily, and blame layers allows finding how code \| evolved fairly quickly. \| kragen wrote: \| thank you \| jmyeet wrote: \| The most important part of this is that the build units are \| hermetic and all dependencies are explicit. This is why you need \| to use something like Bazel/Blaze vs older build systems like \| make where identifying what's used, particularly when you get \| into meta-rules, becomes all but impossible. \| \| As the article points out, you also have to look at what's \| actually run. This is the real advantage of Google \| infrastructure: the vertical integration so if a binary is run on \| Borg, or even on the command line, that can be tracked. \| lifeisstillgood wrote: \| There is a meta-meta situation surrounding really good software \| management. \| \| You can knock up some code that say solves a specific business \| problem right now. (meta:0) \| \| But you need an environment that can take a new piece of code and \| deploy it and test it (meta:1) \| \| how is that code running - this is shadingnfrom production \| monitoringninto QA and performance (meta:2) \| \| Compare all the running code and its performance against the \| benefits of replacing code or going back to level 0 and just \| fixing a business problem (meta:3) \| \| Then this death eater - meta 4 I think. \| \| And to me this is why comments like "software needs to solve \| business problems" is naive - once you start using software you \| need more software to manage the software - it's going to grow \| till it consumes the business. \| falcor84 wrote: \| >For example, if an engineer is unsure how to use a library, they \| can find examples just by searching \| \| Isn't that the case with all libraries? How does the monorepo \| help here? \| er4hn wrote: \| Discoverability. It's a lot easier to search one repo then it \| is to search a set of repos. For the latter you need to have \| all the repos listed somewhere, and have them be accessible. \| speedgoose wrote: \| SourceGraph is great to search in many repos. \| knutzui wrote: \| It's just as easy to index multiple repos as it is to index \| one, which means that the same goes for searching. Why would \| it be any different for one as opposed to many? \| codetrotter wrote: \| We use GitLab in the company I work for. There may be \| repositories created by others in the company, that depend \| on repos I work on, where I don't have access to said other \| repos. So to me these are invisible. If everything was in \| one monorepo, it'd all be visible to me easily. \| Kwpolska wrote: \| That's more of a culture thing. Your company chose to \| enforce more granular permissions. Perhaps there are good \| reasons behind it, e.g. code for different clients being \| under different contracts and NDAs. \| speedgoose wrote: \| I'm not sure. I also thought that these big repos have to use \| sparse checkouts to not use too much space on the developers \| machines. So you would have to use an external code search \| index anyway. \| kpw94 wrote: \| "searching" how a method is used is as simple as clicking on \| the symbol. \| \| Think Visual Studio "find all references", but working around \| the entire company's codebase, not just your current project. \| oneplane wrote: \| On a non-google level just being aware of code sitting around \| costing resources is pretty important. Often, tests and \| maintenance are just ignored or not calculated in as cost (be it \| time, money, effort or otherwise). It is almost in the same realm \| as "I don't know why it works" which is as dangerous as "I don't \| know why it doesn't work". ___________________________________________________________________ (page generated 2023-04-29 23:00 UTC)