proxy70

	[HN Gopher] A terrible schema from a clueless programmer ___________________________________________________________________ A terrible schema from a clueless programmer Author : zdw Score : 475 points Date : 2021-11-07 14:55 UTC (8 hours ago)
	web link (rachelbythebay.com)
	w3m dump (rachelbythebay.com)
	\| arein3 wrote: \| to me the first implementation seems the best, it just needed a \| multi column index \| \| the 3NF "optimized" form seems worse and unreadable. I would \| advise to normalize only when it's needed(complex/evolving \| model), in this case it was not needed \| biglost wrote: \| Or store the ip address as int64 with index, or better both \| solutions together \| stickfigure wrote: \| Assuming this needs to be optimized for massive scale, just hash \| the values to a single indexed field. \| \| And use something other than an RDBMS. Put the hash in Redis and \| expire the key; your code simply does an existence check for the \| hash. You could probably handle gmail with a big enough cluster. \| \| That super-normalized schema looks terrible. \| kodt wrote: \| I too recommend using Redis in 2002. \| stickfigure wrote: \| Fair enough, but I think the issue here is recognizing that \| the underlying business operation is a hit test on an \| expiring hash key. You could have used MySQL as a key/value \| store or looked for something more specialized. \| kazinator wrote: \| > _The whole system was reworked to have a table for each set of \| values. There was now a table of IP addresses, a table of HELO \| strings, a table of FROM addresses, and a table of TO addresses._ \| \| Lisp interning, in database tables. A.k.a. Flygweight Pattern. \| ccleve wrote: \| Sorry, no. The original schema was correct, and the new one is a \| mistake. \| \| The reason is that the new schema adds a great deal of needless \| complexity, requires the overhead of foreign keys, and makes it a \| hassle to change things later. \| \| It's better to stick the the original design and add a unique \| index _with key prefix compression_ , which all major databases \| do these days. This means that the leading values gets compressed \| out and the resulting index will be no larger and no slower than \| the one with foreign keys. \| \| If you include all of the keys in the index, then it will be a \| covering index and all queries will hit the index only, and not \| the heap table. \| silisili wrote: \| I think it depends a lot on the data. If those values like IP, \| From, To, etc keep repeating, you save a lot of space by \| normalizing it as she did. \| \| But strictly from a performance aspect, I agree it's a wash if \| both were done correctly. \| philliphaydon wrote: \| Space is cheap now tho. Better to duplicate some data and \| avoid a bunch of joins than to worry about saving a few gb of \| space. \| wvenable wrote: \| Using space to avoid joins will not necessarily improve \| performance in an RDBMS -- it might even make it worse. \| kasey_junk wrote: \| This system was written in 2002. \| acdha wrote: \| It's not that easy: you need to consider the total size and \| cardinality of the fields potentially being denormalized, \| too. If, say, the JOINed values fit in memory and, \| especially, if the raw value is much larger than the key it \| might be the case that you're incurring a table scan to \| avoid something which stays in memory or allows the query \| to be satisfied from a modest sized index. I/O isn't as \| cheap if you're using a SAN or if you have many concurrent \| queries. \| nerdponx wrote: \| And even if this is somehow not an ideal use of the database, \| it's certainly far from "terrible", and to conclude that the \| programmer who designed it was "clueless" is insulting to that \| programmer. Even if that programmer was you, years ago (as in \| the blog post). \| \| Moreover, unless you can prove with experimental data that the \| 3rd-normal-form version of the database performs significantly \| better or solves some other business problem, then I would \| argue that refactoring it is strictly worse. \| \| There are good reasons not to use email addresses as primary or \| foreign keys, but those reasons are conceptual ("business \| logic") and not technical. \| late2part wrote: \| Deleted \| ccleve wrote: \| If you're suggesting that we should double the size of the \| hardware rather than add a single index, then I respectfully \| disagree. \| \| But your point is well-taken. Hardware is cheap. \| emerongi wrote: \| To a degree, this is true though. An engineer's salary is a \| huge expense for a startup, it's straight up cheaper to spend \| more on cloud and have the engineer work on the product \| itself. Once you're bigger, you can optimize, of course. \| late2part wrote: \| Deleted \| emerongi wrote: \| If this is a case from real life that you have seen, you \| should definitely elaborate more on it. Otherwise, you \| are creating an extreme example that has no significance \| in the discussion, as I could create similarly ridiculous \| examples for the other side. \| \| There are always costs and benefits to decisions. It \| seems that are you only looking at the costs and none of \| the benefits? \| late2part wrote: \| Deleted \| bigbillheck wrote: \| > compute is cheap \| \| There was a whole thread yesterday about how a dude found out \| that it isn't: https://briananglin.me/posts/spending-5k-to- \| learn-how-databa... (also mentioned in the RbtB post) \| rustc wrote: \| Depends on the provider. I've checked the pricing of 4 top \| relational database cloud vendors (Google, Amazon, Azure, \| IBM) and they all charge for the number of CPU cores and \| RAM you buy, not by the number of rows scanned like \| PlanetScale did in the post you refer to. They'll be more \| expensive than buying your own server but not nearly what \| PlanetScale charges for full table scan queries. \| late2part wrote: \| When do the rest of the folks figure it out? \| kasey_junk wrote: \| She doesn't mention the write characteristics of the system but \| she implies that it was pretty write heavy. \| \| In that case it's not obvious to me that putting a key prefix \| index on every column is the correct thing to do, because that \| will get toilsome very quick in high write loads. \| \| Given that she herself wrote the before and after systems 20 \| years ago and that the story was more about everyone having \| dumb mistakes when they are inexperienced perhaps we should \| assume the best about her second design? \| ccleve wrote: \| It's not an index on every column. It's a single index across \| all columns. \| noisy_boy wrote: \| Which will be updated every time there is a new \| combination. \| Animats wrote: \| Which is OK. It's not like the whole index has to be \| rebuilt. It's cheaper than O(log N). \| [deleted] \| maddynator wrote: \| One thing that worth taking into consideration is that this \| happened in 2002. When the databases were not in cloud, the ops \| was done by dba's and key prefix compression thats omnipresent \| today was likely not that common or potentially not even \| implemented/available. \| \| But i don't think the point of the post is whats right/wrong \| way of doing it. The point as mentioned by few here is that \| programmers makes mistakes. They are costly and will be costly \| if in tech industry, we continue to boot experienced \| engineers... the tacit knowledge those engineers have gained wi \| ll not be passed on and this means more people have to figure \| things out by themselves \| Philip-J-Fry wrote: \| >One thing that worth taking into consideration is that this \| happened in 2002. When the databases were not in cloud, the \| ops was done by dba's \| \| Are you implying that isn't the case today? Thousands of \| (big) companies are still like that and will continue to be \| like that. \| \| I write my own SQL, design tables and stuff, submit it for a \| review by someone 10x more qualified than myself, and at the \| end of the day I'll get a message back from a DBA saying "do \| this, it's better". \| mixmastamyk wrote: \| The question of it being in the cloud or not is highly \| orthogonal to schema design. \| WJW wrote: \| If you have one, more power to you. I'm currently making a \| good living as a freelance DBA/SRE for startups. With a \| team of 3-5 devs, some of which frontend and/or mobile \| devs, proper database knowledge is definitely thin on the \| ground in some places. \| hinkley wrote: \| In 2002, you started seeing major increases in query run time \| with as little as 5 joins. Once you hit six you had to start \| thinking about rearchitecting your data or living with slow \| response times. \| \| There was a lot of pressure to relax 3NF as being too \| academic and not practical. \| \| Around then, I had a friend who was using a pattern of \| varchar primary keys so that queries that just needed the \| (unique) name and not the metadata could skip the join. We \| all acted like he was engaging in the Dark Arts. \| adfgaertyqer wrote: \| You don't really need compression. Rows only need to persist \| for about an hour. The table can't be more than a few MiB. \| \| We can debate the Correct Implementation all day long. The \| fact of the matter is that adding _any_ index to the original \| table, even the wrong index, would lead to a massive speedup. \| We can debate 2x or 5x speedups from compression or from \| choosing a different schema or a different index, but we get \| 10,000x from adding any index at all. \| ipaddr wrote: \| Just to make this fun. \| \| Adding an index now increases the insert operation \| cost/time and adds additional storage. \| \| If insert speed/volume is more important than reads keep \| the indexes away. Replicate and create an index on that \| copy. \| dudeinjapan wrote: \| I agree. In the "new" FK-based approach, you'll still need to \| scan indexes to match the IP and email addresses (now in their \| own tables) to find the FKs, then do one more scan to match the \| FK values. I would think this would be significantly slower \| than a single compound index scan, assuming index scans are \| O(log(n)) \| \| Key thing is to use EXPLAIN and benchmark whatever you do. Then \| the right path will reveal itself... \| msqlqthrowaway wrote: \| > all major databases do these days \| \| Did everyone on HN miss that the database in question was \| whichever version of MySQL existed in 2002? \| mst wrote: \| Which I was using in production at the time, and, yeah, what \| she ended up doing was a much better plan for that thing. \| jeremyjh wrote: \| It actually doesn't matter. Even simple indexes would have \| made that schema work just fine, and MySQL could certainly \| index strings in 2002. \| omegalulw wrote: \| > Sorry, no. The original schema was correct, and the new one \| is a mistake. \| \| Well, you also save a space by doing this (though presumably \| you only need to index the emails as IPs are already 128 bits). \| \| But other than that, I'm also not sure why the original schema \| was bad. \| \| If you were to build individual indexes on all four rows, you \| would essentially build the four id tables implicitly. \| \| You can calculate the intersection of hits on all four indexs \| that match your query to get your result. This is linear in the \| number of hits across all four indexes in the worst case but if \| you are careful about which index you look at first, you will \| probably be a lot more efficient, e.g. (from, ip, to, helo). \| \| Even with a multi index on the new schema, how do you search \| faster than this? \| [deleted] \| an9n wrote: \| Wow what a twist. Somebody call M Night Shyamalan! \| welder wrote: \| The ending is the most important part. \| \| > Now, what do you suppose happened to that clueless programmer \| who didn't know anything about foreign key relationships? \| \| > Well, that's easy. She just wrote this post for you. That's \| right, I was that clueless newbie who came up with a completely \| ridiculous abuse of a SQL database that was slow, bloated, and \| obviously wrong at a glance to anyone who had a clue. \| \| > My point is: EVERYONE goes through this, particularly if \| operating in a vacuum with no mentorship, guidance, or reference \| points. Considering that we as an industry tend to chase off \| anyone who makes it to the age of 35, is it any surprise that we \| have a giant flock of people roaming around trying anything \| that'll work? \| deathanatos wrote: \| ... I appreciate her sentiment at the end there that we should \| try to change the status quo (which I think does suck), but I'm \| not sure how much power the average employee has over it. Most \| employers I've worked at seem loathe to retain anyone past \| about 2 years. (E.g., currently I'm in the 95'th percentile, \| after just over 2 years.) IME it takes about 6 months to really \| learn how a company's systems work to where to proficiency \| (yes, _six months_.) which means we 're spending ~25% of the \| time "training", where "training" is usually someone trying \| something crazy and failing, and getting corrected as opposed \| to some structured form of learning. Oftentimes the systems \| that these engineers are stumbling I think are the internal \| ones; they need more features, or refactoring to account for \| organic growth, but since they're not shiny new customer facing \| features they'll get exactly 0 priority from PMs. The engineers \| who build experience working with these systems and actually \| know what needs to change and how without breaking the existing \| use cases ... are about to leave the company, since nobody is \| retained beyond ~2 years. \| \| I also find my own team is usually strapped for resources, \| normally, people. (Usually politely phrased as "time".) Yes, \| one has to be wary of mythical man-month'ing it, but like my at \| my last employ we had essentially 2 of us on a project that \| could have easily used at least one, if not two more people. \| Repeat across every project and that employer was understaffed \| by 50-100%, essentially. \| \| Some company just went for S-1, and they were bleeding cash. \| But they _weren 't_ bleeding it into new ventures: they were \| bleeding it into marketing. Sure, that might win you one or two \| customers, but I think you'd make much stronger gains with new \| products, or less buggy products that don't drive the existing \| customers away. \| \| Also there's an obsession with "NIT" -- not invented there -- \| that really ties my hands as an engineer. Like, everything has \| to be out sourced to some cloud vendor provider whose product \| only fits some of the needs and barely, and whose "support" \| department appears to be unaware of what a computer is. I'm a \| SWE, let me do my thing, once in a while? (Yes, where there's a \| good fit for an external product, yes, by all means. But these \| days my job is 100% support tickets, and like 3% actual \| engineering.) \| 5faulker wrote: \| Seems like the definition of hacker to me. \| shrimpx wrote: \| > tend to chase off anyone who makes it to the age of 35 \| \| That's definitely not true anymore, if it ever was. \| Senior/staff level engs with 20+ years of experience are sought \| after and paid a ton of money. \| jjice wrote: \| The way she kept referring to the programmer made me suspicious \| that that would be the ending. I personally really like that as \| a way of conveying the message of us all having growing pains \| and learning the hard way from some inefficient code. \| \| I know I've also had these encounters (specifically in DBs), \| and I'm sure there are plenty more to come. \| carapace wrote: \| (It seems a lot of folks are getting nerd-sniped by the set-up \| and missing the moral of the story, eh?) \| \| I don't know the actual numbers, but it's been pointed out that \| at any given time something like half of all programmers have \| been doing it less than five years, for decades now. \| \| That, plus the strident ignorance of past art and practice, \| seem to me to bring on a lot of issues. \| lamontcg wrote: \| > It seems a lot of folks are getting nerd-sniped \| \| that is an excellent term. \| jacobolus wrote: \| Origin: https://xkcd.com/356/ \| wwweston wrote: \| Old enough some people under 35 may not have encountered \| it... ;) \| sseagull wrote: \| Well, then, they are part of the 10,000 \| \| https://xkcd.com/1053/ \| misnome wrote: \| Only if they were born in the US \| lamontcg wrote: \| I'm 49, I don't know how I never saw that xkcd. \| egeozcan wrote: \| I feel like that phrase has been there forever, and I'm \| exactly 35. Will this happen more often as I grow older? \| Ugh. Feels weird. Maybe also a bit depressing. \| lamontcg wrote: \| It was weird seeing all the spongebob squarepants memes \| take over the internet when I was too old to ever grow up \| with that. I turned 28 in 1999 when that first aired. \| That was my "holy shit I'm so old" moment when that \| finally trickled up into my awareness as someone nearly \| turning 40 or so. \| yesenadam wrote: \| Same age as you. I have a lot of friends in the \| Philippines. I swear the Facebook employment info of half \| the people in the Philippines says they work at the \| Krusty Krab. (And for most filipinos, the internet = \| facebook.) For some reason I never asked what that meant, \| and for many years I thought that was just some odd joke, \| and was vaguely puzzled about how widespread it is. \| Eventually I happened on the Spongebob connection! \| beering wrote: \| > (It seems a lot of folks are getting nerd-sniped by the \| set-up and missing the moral of the story, eh?) \| \| The narrative should lead to the conclusion. If I told you \| the story of the tortoise and the hare in which the hare gets \| shot by a hunter, then said the moral is "perseverance wins", \| you'd be rightfully confused. \| [deleted] \| twic wrote: \| I don't think it's purely nerd-sniping. If your story is "at \| first you are bad, but then you get good", but your example \| is of a case where you did something fine but then replaced \| it with something worse, that rather undermines the story. \| mwcampbell wrote: \| If even someone with Rachel's level of experience still \| doesn't know all the minutiae of database optimization, I \| think that just amplifies her point about the importance of \| mentoring novices. \| duxup wrote: \| I recall reading a rant on another site about someone so \| upset they had to deal with clueless noobs at work. \| \| They then went on to list the errors that this brand new \| entry level employee had made when writing ... an \| authentication system ... \| \| I was more than a little shocked when I realized they \| were serious and hadn't realized the issue was sending \| the new entry level guy to do that job alone. \| HahaReally wrote: \| OR we could demand they get expensive irrelevant degrees \| and be geniuses at coding by the time they graduate with \| no relevant skills! Hey, in fact, let's just do that \| instead. :) \| adolph wrote: \| Chronology alone only progresses; progress sometimes \| regresses. \| kofejnik wrote: \| IMO she's still pretty clueless (sorry!), but over the years \| got better at self-promotion \| \| So the real lesson is to be very careful as to who you listen \| to \| crispyambulance wrote: \| She has never been "clueless" and has written insightfully \| for years. \| globular-toast wrote: \| Well she's clueless about what actually sped up this \| database query. \| Lammy wrote: \| imo "clueful"/clueless is more about being able to sense \| situations where there is something to be known rather \| than knowing everything upfront all the time (which is \| obviously impossible) \| yuliyp wrote: \| Without knowing which database engine (MySQL comes with \| more than one) she was using, nor the testing that she \| actually performed, what makes you say that? \| mst wrote: \| I remember versions of mysql whose string indexing was \| sufficiently limited on every available backend that this \| would've been the correct change. \| \| Hell, back in ... 2001? I think? I ended up making \| exactly the same set of changes to an early database \| design of mine, for pretty much the same stated reasons \| and with very similarly pleasant effects on its \| performance. \| mwcampbell wrote: \| The point isn't to determine who is or isn't clueless. The \| point is how we deal with each other's cluelessness. \| crispyambulance wrote: \| Yep. Folks are getting lost in the weeds discussing indexing of \| database tables. That's _totally_ beside the point here. \| \| The thing is, the first implementation was a perfectly fine \| "straight line" approach to solve the problem at hand. One \| table, a few columns, computers are pretty fast at searching \| for stuff... why not? In many scenarios, one would never see a \| problem with that schema. \| \| Unfortunately, "operating in a vacuum with no mentorship, \| guidance, or reference points", is normal for many folks. She's \| talking about the trenches of SV, it's even worse outside of \| that where the only help you might get is smug smackdowns on \| stackoverflow (or worse, the DBA stackexchange) for daring to \| ask about such a "basic problem". \| otterley wrote: \| I'd be a lot more sympathetic if the major RDMSes didn't have \| outstanding and thorough reference manuals or that there \| weren't a mountain of books on the subject that cover, among \| other things, the topic of indexing and its importance. \| MySQL's manual, for example, has covered this subject from \| the very beginning: \| https://dev.mysql.com/doc/refman/5.6/en/mysql-indexes.html (I \| don't have the 3.x manuals handy but it was there back then \| too). \| \| It's not clear from the article whether the author spent any \| time studying the problem before implementing it. If she \| failed to do so, it is both problematic and way more common \| than it ought to be. \| \| "Ready, fire, aim" \| \| But you know what they say: good judgment comes from \| experiences and experience comes from bad judgment. \| \| Compounding the problem here is that the author now has much \| more experience and in a reflective blog post, _still_ got \| the wrong answer. \| \| IMO the better lesson to take away here would have been to \| take the time getting to know the technology before putting \| it into production instead of jumping head first into it. \| That would be bar raising advice. The current advice doesn't \| advise caution; instead it perpetuates the status quo and \| gives "feel good" advice. \| jraph wrote: \| You can't really search a problem if you don't suspect its \| existence. At this point you might feel you have everything \| you need to start the implementation, how do you guess? \| dasil003 wrote: \| I couldn't disagree more with this comment. No amount of \| reading documentation teaches you how to build production \| systems. You progress much faster by getting your hands \| dirty, making mistakes, and learning from them. \| \| The challenge of writing good software is not about knowing \| and focusing on the perfection every gory detail, it's \| about developing the judgement to focus on the details that \| actually matter. Junior engineers who follow your advice \| will be scared to make mistakes and their development will \| languish compared to those who dive in and learn from their \| mistakes. As one gains experience it's easy to become \| arrogant and dismissive of mistakes that junior engineers \| make, but this can be extremely poisonous to their \| development. \| isoskeles wrote: \| I couldn't disagree more with this comment. No amount of \| getting your hands dirty and making mistakes teaches you \| how to learn from your mistakes, nor changes the impact \| of the mistakes. You also progress much faster by \| learning from other peoples' mistakes, this is why \| written language is so powerful. \| \| Honestly, I think your comment is okay minus the fact \| that you're trying to highlight such hard disagreement \| with a sentiment that people should read the fucking \| manual. Really, you couldn't disagree MORE? \| \| There is definitely value in RTFM. There are cases where \| making mistakes in production is not acceptable, and \| progressing by "making mistakes" is a mistake on its own. \| I don't think the case in this article sounds like one of \| those, but they do exist (e.g. financial systems (think \| about people losing money on crypto exchanges), \| healthcare, or, say, sending rockets into space). In many \| cases, making mistakes is fine in test systems, but \| completely, absolutely, catastrophically unacceptable in \| production. Although, I refuse to blame the junior \| engineer for such mistakes, I blame management that "sent \| a boy to do a man's job" (apologies for a dated idiom \| here). \| \| (As an aside, overall, I disagree with many of the \| comments here nitpicking the way the author solved the \| problem, calling her "clueless", etc. I really don't care \| about that level of detail, and while I agree the \| solution does not seem ideal, it worked for them better \| than the previous solution.) \| crispyambulance wrote: \| > No amount of getting your hands dirty and making \| mistakes teaches you how to learn from your mistakes... \| \| Categorically wrong. \| \| Mistakes, failure, and trial and error are very much a \| part of developing skills. If you're not making mistakes, \| you're also not taking enough risks and thus missing out \| on opportunities for growth. \| isoskeles wrote: \| Yes, mistakes are required to learn from them, but making \| mistakes does not imply that a person will learn from \| their mistakes. \| otterley wrote: \| Respectfully, I believe you are disagreeing with an \| argument that I am not making. It's not an either-or \| scenario. Both access to and reliance on reference \| materials _and_ the ability to safely experiment are part \| of the professional engineer 's toolbox. I can say this \| with confidence because I was a junior engineer once - \| it's not like I'm speaking without experience. \| \| _Everyone_ - new and experienced developers alike - \| should have a healthy fear of causing pain to customers. \| Serving customers - indirectly or otherwise - is what \| makes businesses run. As a principal engineer today my \| number one focus is on the customer experience, and I try \| to instill this same focus in every mentee I have. \| \| Does that mean that junior developers should live in \| constant fear and decision paralysis? Of course not. \| \| That's where the mentor - the more experienced and \| seasoned engineer - comes in. The mentor is roughly \| analogous to the teacher. And the teaching process uses a \| _combination_ of a mentor, reference materials, and the \| lab process. The reference materials provide the details; \| the lab process provides a safe environment in which to \| learn; and the mentor provides the boundaries to prevent \| the experiments from causing damage, feedback to the \| mentee, and wisdom to fill in the gaps between them all. \| If any of these are absent, the new learner 's \| development will suffer. \| \| Outstanding educational systems were built over the last \| 3 centuries in the Western tradition using this \| technique, and other societies not steeped in this \| tradition have sent their children to us (especially in \| the last half-century) to do better than they ever could \| at home. It is a testament to the quality of the \| approach. \| \| So no, I'm not advocating that junior developers should \| do nothing until they have read all the books front to \| back. They'd never gain any experience at all or make \| essential mistakes they could learn from if they did \| that. But junior developers should not work in a vacuum - \| more senior developers who both provide guidance _and_ \| refer them to reference materials to study and try again, \| in my experience, lead to stronger, more capable \| engineers in the long term. "Learning to learn" and \| "respect what came before" are skills as important as the \| engineering process itself. \| crispyambulance wrote: \| > That's where the mentor [...] comes in. \| \| The thing is, specifically, that the OP did NOT have a \| mentor. They had a serious problem to solve, pronto, and \| knew enough to take a crack at it. OK, the initial \| implementation was suboptimal. So what? It's totally \| normal, learn and try again. Repeat. \| \| It would be nice if every workplace had a orderly \| hierarchy of talent where everyone takes care to nurture \| and support everyone else (especially new folks). And \| where it's OK to ask questions and receive guidance even \| across organizational silos, where there are guardrails \| to mitigate accidents and terrible mistakes. If you work \| in such an environment, you are lucky. \| \| It is far more common to have sharp-elbow/blamestorm \| workplaces which pretend to value "accountability" but \| then don't lift a finger to support anyone who takes \| initiative. I suspect that at the time Rachelbythebay \| worked in exactly this kind of environment, where it \| simply wasn't possible for an OPS person to go see the \| resident database expert and ask for advice. \| otterley wrote: \| Right. Hence my concern about the lack of helpful advice \| other than "accept yourself"[1]: Neophyte engineers \| lacking guardrails and local mentors should fall back to \| what they learned in school: experiment, use your books, \| and ask questions from the community. Mentors can be \| found outside the immediate workplace, after all. \| \| And when it comes to taking production risks, measure \| twice and cut once, just like a good carpenter. \| \| [1] Of course you should accept yourself. But that alone \| won't advance the profession or one's career. \| noduerme wrote: \| But it's also about constantly asking yourself "how can \| this be done better? What do I not know that could \| improve it?" \| \| I learned probably 80% of what I know about DB \| optimization from reading the Percona blog and S.O. The \| other 20% came from splashing around and making mistakes, \| and testing tons of different things with EXPLAIN. That's \| basically learning in a vacuum. \| [deleted] \| damagednoob wrote: \| Yes and no and this post highlights a subtle issue with \| mentorship (which I think is important): Technology does not \| stand still. What was true in 2002, might not be true today. \| While adopting the naive approach was detrimental back then, \| today databases recognise that this happens and provide easy \| workarounds to get you out of trouble that didn't exist back \| then. \| \| I've experienced this just by switching languages. C# had \| many articles dedicated to how much better StringBuilder was \| compared to String.Concat and yet, other languages would do \| the right thing by default. I would give advice in a totally \| different language about a problem that the target language \| did not have. \| \| As the song goes: \| \| "Be careful whose advice you buy but be patient with those \| who supply it \| \| Advice is a form of nostalgia, dispensing it is a way of \| fishing the past \| \| From the disposal, wiping it off, painting over the ugly \| parts \| \| And recycling it for more than it's worth" \| nightski wrote: \| That might be true if you just take blanket advice. The key \| is to find out _why_ say StringBuilder is better than \| String.Concat. If you understand the implementation details \| and tradeoffs involved, this makes the knowledge much more \| applicable in the future. The core concepts in technology \| do not move nearly as fast as individual projects, \| libraries, frameworks, languages, etc... \| unclebucknasty wrote: \| What's funny is there seems to be a lot of debate among \| grizzled veterans here about whether her old solution is \| actually better than the new. We're getting into index types, \| engine implementations, normal forms, etc. And really, this is \| a relatively simple development problem that I can easily \| imagine on an interview test. \| \| Now, imagine a new engineer, just getting started trying to \| make sense of it all, yet being met with harsh criticism and \| impatience. \| \| Maybe that was exactly her point--to get everyone debating over \| such a relatively trivial problem to communicate the messages: \| Good engineering is hard. Everyone is learning. Opinions can \| vary. Show some humility and grace. \| jraph wrote: \| I was reading her article like "yeah, well... wow, she's tough \| / hard with this programmer, it's not that unreasonable \| anyway!", until this passage. \| \| I would not expect her to be like this from what I've read on \| her blog so I was surprised. Well written! \| IceDane wrote: \| The moral of this post really falls flat coming from this author, \| most of whose posts are snarky, passive-aggressive rants where \| she rants about someone's alleged incompetence or something \| similar. \| enraged_camel wrote: \| This is a really weird ad hominem that is unrelated to the post \| itself. \| IceDane wrote: \| Ad hominem? \| \| It seems pretty relevant to me that the author mostly writes \| snarky articles which directly contradict the moral of this \| story. \| \| For the record, I totally agree with the moral of the story. \| But we can't just blindly ignore context, and in this case \| the context is that the author regularly writes articles \| where someone else is ridiculed because the author has deemed \| them incompetent. \| \| In a similar vein, no one would just ignore it and praise \| Zuckerberg if he started discussing the importance of \| personal privacy on his personal blog. \| avl999 wrote: \| The responses in this thread are classic pedantic HN with the \| actual point whoooshing over so many people's head. \| jeremyjh wrote: \| I think it actually makes quite a bit of a difference if 20 \| years later the author still does not understand the actual \| problem and solution she is using as an example. It actually \| does undermine her argument that people make mistakes, learn \| from them and grow etc if the lack of growth is still in \| evidence 20 years later in the singular example under \| consideration. \| bizzarojerry wrote: \| No I'd say the responses in this thread are classic pedantic HN \| with users like you taking the high moral ground and looking \| down on people who actually are trying to address the technical \| issues with a post in a technical forum. \| dang wrote: \| We've banned this account for breaking the site guidelines. \| Creating accounts to do that with will eventually get your \| main account banned as well, so please don't! \| \| https://news.ycombinator.com/newsguidelines.html \| [deleted] \| [deleted] \| darkwater wrote: \| I'd really love to be snarky here but I'll try to be polite: all \| those comments about the example situation are missing the whole \| point of the post. And it really worries me that there is a good \| chunk of the tech workers that just ignores the real meaning of \| something and just nitpick about stupid implementation details. \| The post is about managing rookie errors, being empathetic and \| also warn the ageism that pervades the sector. TBH about this \| last point IDK the situation nowadays in Silicon Valley, but in \| Europe my limited experience is that ageism is not that bad; kt's \| actually difficult to find seasoned developers. \| \| Edit: typos \| Lewton wrote: \| I'm happy the hn comments are nitpicking, I'm not particularly \| well versed in database schemas but while reading it I was \| going "???????" and it's good to know I'm not going crazy \| 08-15 wrote: \| In this blog post, senior engineer Rachel talks nonsense about \| normalization and promotes a bafflingly complicated solution as \| superior to a simple one, without identifying the actual fix, \| which appears to have happened accidentally. \| \| In other words, with enough empathy and patience, a clueless \| rookie can grow into a clueless senior engineer! \| \| Rachel usually makes more sense than that. That's why people \| are nitpicking implementation details. \| mixmastamyk wrote: \| Hah, pretty accurate and kinda funny, but could be nicer. I \| still make mistakes. \| 08-15 wrote: \| Meanwhile, Rachel posted an update. Apparently, the whole \| section about how "the system got reworked" doesn't \| describe the proper solution, but her own (Rookie Rachel's) \| attempt to fix it. \| \| And in _that_ context, everything makes sense. \| hn_throwaway_99 wrote: \| The reason "the details are important" here are _not_ because \| of the nitty gritty around what mistakes a "novice" programmer \| made. \| \| They are important because the _present_ incarnation of the \| author is making _all the wrong_ diagnoses about the problems \| with the original implementation, _despite_ doing it with an \| air of "Yes, younger me was so naive and inexperienced, and \| present me is savvy and wise". \| newaccount74 wrote: \| I think she inadvertently made a different point, which is \| that even experienced developers sometimes misunderstand the \| problem and make mistakes. \| \| Or an even better argument: you don't need to actually \| understand the problem to fix it, often you accidentally fix \| the problem just by using a different approach. \| mst wrote: \| I would argue instead that this comment thread is making \| the point that people forget that things that work now \| wouldn't've worked then and design decisions have to be \| made based on the database engine you're running in \| production. \| jeremyjh wrote: \| MySQL could index strings in 2002. It would have worked \| fine. \| mst wrote: \| "mysql could index strings" and "using a compound index \| over four varchar columns would've worked out well" are \| significantly different propositions. \| jeremyjh wrote: \| Yes, we won't know because she didn't try it. And we know \| she didn't try it because the problem she described is \| table scans, not "indexed strings are somehow slow in \| MySQL circa 2002". \| mst wrote: \| Or she did try it and it didn't work - or researched the \| question and figured, quite reasonably based on my \| experience of mysql in that era, it probably wouldn't \| work - and kept the post to only the original problem and \| the final approach to avoid having even _more_ technical \| details in a post that wasn 't really about those. \| \| I agree that we don't know, but it seems a little unfair \| to her to treat every unknown as definitely being the \| most negative of the possibilities. \| hn_throwaway_99 wrote: \| I really don't understand your general thrust here. MySql \| certainly had lots of issues in 2003, but being able to \| support multi-column indexes was not one of them. Her \| analysis is simply wrong - it is wrong now and was wrong \| then. Here is the doc from MySql v3.23: \| \| 7.4.2 Multiple-Column Indexes MySQL can create composite \| indexes (that is, indexes on multiple columns). An index \| may consist of up to 16 columns. For certain data types, \| you can index a prefix of the column (see Section 7.4.1, \| "Column Indexes"). A multiple-column index can be \| considered a sorted array containing values that are \| created by concatenating the values of the indexed \| columns. MySQL uses multiple-column indexes in such a way \| that queries are fast when you specify a known quantity \| for the first column of the index in a WHERE clause, even \| if you do not specify values for the other columns \| mst wrote: \| As I said already: "mysql could index strings" and "using \| a compound index over four varchar columns would've \| worked out well" are significantly different \| propositions. \| \| To be more verbose about it - there is an important \| difference between "can be created" and "will perform \| sufficiently well on whatever (likely scavenged) hardware \| was assigned to the internal IT system in question." \| \| I wouldn't be surprised if the "server" for this system \| was something like a repurposed Pentium 233 desktop with \| a cheap spinning rust IDE drive in it, and depending on \| just how badly the spammers were kicking the shit out of \| the mail system in question that's going to be a fun \| time. \| lern_too_spel wrote: \| They are not significantly different. It just means \| sorting using more than one column. \| darkwater wrote: \| She explained the problem, the first not working solution and \| the second working solution as they really happened in 2002 \| _as an example_. The real point is the last part of the post. \| And it was _not_ "how to properly implement an sql based \| filter for open proxies in your MTA". \| hn_throwaway_99 wrote: \| I get all that. But 2020 version of this person _still does \| not understand the problem_ , and she is arguing that she \| does while oddly self-deprecating the inexperienced version \| of herself, who arguably had a better solution to begin \| with. \| mixmastamyk wrote: \| Yes, I think an alternate design was found that didn't \| hit as many MySQL limitations as the previous one. This \| improved performance and was a win. But the post-mortem \| diagnosis was lacking. \| shawnz wrote: \| Just because the author was intending to get across a certain \| point, doesn't mean the implementation details aren't worth \| discussing too. I don't think many people here would disagree \| with the central point, so what's there to discuss about it? I \| think it's uncharitable to assume that everyone talking about \| the implementation details is "missing the point". \| rmk wrote: \| I think you are leaping to conclusions here. It is possible \| that most people want to keep HN focused on technical \| conversations and use this as an opportunity to learn something \| (I certainly do), which is why you are seeing many more \| comments on the technical aspects. \| [deleted] \| Cpoll wrote: \| I don't buy it. If the only point was managing rookie errors, \| etc., the blog post shouldn't have been 15~ paragraphs of \| technical discussion and 2 paragraphs at the bottom of "my \| point is ..." You can't advocate that people just ignore 80% of \| the article because they're "not the point of the post." \| \| I'm sure a good chunk of tech workers are worried that when an \| influential blogger writes something like this a couple hundred \| developers will Google 3NF and start opening PRs. \| [deleted] \| unethical_ban wrote: \| It's a narrative. A story. She told it in that way to lure \| people who get hooked on technical discussion, and then make \| the point that they probably would have missed or ignored. \| Without a narrative example, she could have just said "I made \| mistakes too, live and learn" but she chose to provide an \| example before presenting the thesis. \| \| And hey, what's so bad about people learning about 3NF? Are \| you not supposed to know what that is until you're some \| mythical ninth level DBA? \| globular-toast wrote: \| The real meaning being that people learn and gain experience \| over time? Is this really something we need to read two pages \| of text to find out? I think people are justifiably miffed at \| having read two pages for such a trivial message. Not only \| that, but these "stupid implementation details" could seriously \| mislead someone who really is a clueless newbie. \| [deleted] \| hn_throwaway_99 wrote: \| This post is bizarre, precisely because there _is nothing \| particularly wrong_ about the original schema, and the author \| seems to believe that the problem is that the column values were \| stored as strings, or that the schema wasn 't in "third normal \| form". \| \| Which is nonsense. The problem with the original DB design is \| that the appropriate columns weren't indexed. I don't know enough \| about the problem space to really know if a more normalized DB \| structure would be warranted, but from her description I would \| say it wasn't. \| mewpmewp2 wrote: \| I think if anything, all of it could've been put into a single \| indexed column since the query was AND ... AND ... not OR. \| \| So you could've had a indexed column of "fingerprint", like \| \| ip1_blahblah_evil@spammer.somewhere_victim1@our.domain \| \| And indexed this, with only single WHERE in the query. \| \| I don't understand at all how multiple tables thing would help \| compared to indices, and the whole post seemed kind of crazy to \| me for that reason. In fact if I had to guess multiple tables \| would've performed worse. \| \| That is if I'm understanding the problem correctly at all. \| gavinray wrote: \| This has been the advice given to me by Postgres experts in a \| similar scenario: "If you want to efficiently \| fuzzy-search through a combination of firstname + lastname + \| (etc), it's faster to make a generated column which \| concatenates them and index the generated column and do a \| text search on that." \| \| (Doesn't have to be fuzzy-searching, but just a search in \| general, as there's a single column to scan per row rather \| than multiple) \| \| But also yeah I think just a compound UNIQUE constraint on \| the original columns would have worked \| \| I'm pretty sure that the degree of normalization given in the \| end goes also well beyond 3rd-Normal-Form. \| \| I think that is 5th Normal Form/6th Normal Form or so, almost \| as extreme as you can get: \| \| https://en.wikipedia.org/wiki/Database_normalization#Satisfy. \| .. \| \| The way I remember 3rd Normal Form is _" Every table can \| stand alone as it's own coherent entity/has no cross-cutting \| concerns"_. \| \| So if you have a "product" record, then your table might have \| "product.name", "product.price", "product.description", etc. \| \| The end schema shown could be described as a "Star Schema" \| too, I believe (see image on right): \| \| https://en.wikipedia.org/wiki/Star_schema#Example \| \| This Rachel person is also much smarter than I am, and you \| can make just about anything work, so we're all bikeshedding \| anyways! \| tkot wrote: \| Would it not prevent some optimizations based on statistics \| regarding the data distribution, like using the most \| selective attribute to narrow down the rows that need to be \| scanned? I'm assuming there are 2 indexes, 1 for each \| column that gets combined. \| \| Let's say you know the lastname (Smith) but only the first \| letter of the firstname (A) - in the proposed scenario only \| the first letter of the firstname helps you narrow down the \| search to records starting with the letter (WHERE combined \| LIKE "A%SMITH"), you will have to check all the rows where \| firstname starts with "A" even if their lastname is not \| Smith. If there are two separate indexed columns the WHERE \| clause will look like this: \| \| WHERE firstname like "A%" AND lastname = "Smith" \| \| so the search can be restricted to smaller number of rows. \| \| Of course having 2 indexes will have its costs like \| increased storage and slower writes. \| \| Overall the blog post conveys a useful message but the \| table example is a bit confusing, it doesn't look like \| there are any functional dependencies in the original table \| so pointing to normalization as the solution sounds a bit \| odd. \| \| Given that the query from the post is always about concrete \| values (as opposed to ranges) it sounds like the right way \| to index the table is to use hash index (which might not \| have been available back in 2002 in MySql). \| gavinray wrote: \| I won't pretend to be an expert in this realm, but see \| here: \| \| https://www.postgresql.org/docs/current/textsearch- \| tables.ht... \| \| Specifically, the part starting at the below paragraph, \| the explanation for which continues to the bottom of the \| page: "Another approach is to create a \| separate tsvector column to hold the output of \| to_tsvector. To keep this column automatically up to date \| with its source data, use a stored generated column. This \| example is a concatenation of title and body, using \| coalesce to ensure that one field will still be indexed \| when the other is NULL:" \| \| I have been told by PG wizards this same generated, \| concatenated single-column approach with an index on each \| individual column, plus the concatenated column, is the \| most effective way for things like ILIKE search as well. \| \| But I couldn't explain to you why \| kelnos wrote: \| Right, and when you do that, you don't even need a RDBMS. An \| key-value store would suffice. This essentially just becomes \| a set! Redis or memcached are battle-tested workhorses that \| would work even better than a relational DB here. \| \| But this was also back in the early '00s, when "data store" \| meant "relational DB", and anything that wasn't a RDBMS was \| probably either a research project or a toy that most people \| wouldn't be comfortable using in production. \| dehrmann wrote: \| > this was also back in the early '00s, when "data store" \| meant "relational DB", and anything that wasn't a RDBMS was \| probably either a research project or a toy that most \| people wouldn't be comfortable using in production. \| \| Indeed; her problem looks to have been pre-memcached. \| ryandrake wrote: \| Also, the "better" solution assumes IPv4 addresses and is not \| robust to a sudden requirements change to support IPv6. Best \| to keep IP address as a string unless you really, REALLY need \| to do something numerical with one or more of the octets. \| johannes1234321 wrote: \| Keeping IP addresses as string isn't trivial. You can write \| those in many ways. Even with IPv4. IPv6 just brings new \| variations. And when talking about migration to IPv6 you \| have to decide if you keep IPv4 Addresses as such or prefix \| with ::ffff:. \| \| In the end you have to normalize anyways. Some databases \| have specific types. If not you have to pick a scheme and \| then ideally verify. \| pilif wrote: \| Based on my experience I disagree. If there is a native \| data type that represents your data, you should really use \| it. \| \| It will make sure that incorrect data is detected at the \| time of storage rather than at some indeterminate time in \| the future and it will likely be more efficient than \| arbitrary strings. \| \| And in case of IP addresses, when you are using Postgres, \| it comes with an inet type that covers both ipv4 and ipv6, \| so you will be save from requirement changes \| dehrmann wrote: \| Yeah, I'm not clear on how multiple tables fixed it other \| than allowing you to scan through the main table faster. \| \| Multiple tables could be a big win if long email addresses \| are causing you a data size problem, but for this use case, I \| think a hash of the email would suffice. \| jeremyjh wrote: \| Well she did say she had indexes on the new table. It could \| have been fixed with an index on the previous table, but a \| new table with indexes also fixed it. \| riquito wrote: \| That way you can get false positives unless you concatenate \| using a non-valid character in the protocol \| \| foo@example.com other@example.net foo@example.co \| mother@example.net \| \| Btw, storing just the domains, or inverting the email \| strings, would have speed up the comparison \| erikerikson wrote: \| Clearly it worked. \| \| However... \| \| Consider the rows `a b c d e` and `f b h i j`. How many times \| will `b` be stored in the two formulations? 2 and 1, right? The \| data volume has a cost. \| \| Consider the number of cycles to compare a string versus an \| integer. A string is, of course, a sequence of numbers. Given \| the alphabet has a small set of symbols you will necessarily \| have a character repeated as soon as you store more values than \| there are characters. Therefore the database would have to \| perform at least a second comparison. I imagine you're \| correctly objecting that in this case the string comparison \| happens either way but consider how this combines with the \| first point. Is the unique table on which the string \| comparisons are made smaller? Do you expect that the table with \| combinations of values will be larger than the table storing \| comparable values? Through this, does the combinations table \| using integer comparisons represent a larger proportion of the \| comparison workload? Clearly she found that to be the case. I \| would expect it to be the case as well. \| globular-toast wrote: \| > Clearly it worked. \| \| No it didn't. Someone else put an index on the main table and \| _that_ worked. \| erikerikson wrote: \| The article declared that it was running in production and \| satisfying the requirement. Do you have a different \| definition of working? I say that to clarify what I was \| attempting to communicate by "it worked". \| \| I think you're remarking about the relationship between \| normalized tables and indexes. That relationship does exist \| in some databases. \| \| Please say more if I'm missing your point. \| lern_too_spel wrote: \| GP's point is that the final schema has a primary key \| index on the main table that solved the problem, and then \| it has a bunch of useless additional tables. The solution \| was just to add a primary key index on the main table. \| Adding the additional tables just slows down the query \| because it now has to do five index lookups instead of \| one. \| noctune wrote: \| I especially feel the proposed solution could be problematic \| given that the data is supposed to be transient. How are keys \| deleted from the "id \| value" tables when rows are deleted from \| the main table? Does it just keep accumulating rows or is there \| something like a GC query running occasionally that deletes all \| unreferenced values? \| bitexploder wrote: \| Yeah... technically that's (the solution) normalization, but \| really not a bad design originally. If you never want a \| particular atom of data repeated, yes you have to normalize \| each column, and that is efficient in the long run for storage \| size, but in more normal data storage that still means a lot of \| joins and you still want indices. On a modern SSD you could use \| that one table, throw on some indices as you suggest and not \| notice it again until you hit millions of rows. \| \| Anything you are doing conditional logic or joins on in SQL \| usually needs some sort of index. \| themgt wrote: \| Also you could just store IPv4 as an unsigned INT directly, \| rather than strings in a different table and an unsigned INT as \| a foreign key. \| tptacek wrote: \| I have regretted every single time I've gotten cute about \| storing IP addresses as scalars in SQL. \| ryukafalz wrote: \| What issues did you run into? Having admittedly never done \| this before, it feels like whenever you need the dotted- \| quad IP you could just query: SELECT \| INET_NTOA(ip) FROM ips WHERE... \| \| ...in MySQL at least. (Which might be why MySQL has this \| function built in?) \| \| I guess if you ever need to match on a specific prefix \| storing the numeric IP might be an issue? \| soheil wrote: \| Because as shocking as this might sound like unlike the \| characters in the movie Matrix I don't have a built-in \| INET_NTOA function in my retina to see the string form of \| the IP addresses when glancing at table rows of a large \| table looking for patterns or a certain IP. \| ryukafalz wrote: \| How often are you looking at the full contents of a table \| vs. looking at the results of a query that you could \| throw an INET_NTOA onto? \| soheil wrote: \| Many. You also expect me to remember how to spell \| INET_NTOA and on what fields to use it on. What if I \| wanted do a quick "SELECT * FROM"? I barely know how to \| spell English words what makes you think I'm going to \| remember how to spell INET_NTOA. \| mixmastamyk wrote: \| dbeaver CE is free. \| Lammy wrote: \| I realize you're describing a general problem and not \| just this particular example, but fwiw NTOA and friends \| got a lot easier for me once I learned enough that I \| stopped trying to "remember how to spell" them and \| started thinking about them in terms of the underlying \| concepts, like "Network (as in big-endianness) to ASCII". \| Learning the significance of the term "address family" \| was a big clue for me for self-directed learning in this \| space :) \| Ginden wrote: \| > I guess if you ever need to match on a specific prefix \| storing the numeric IP might be an issue? \| \| If you want to search for range, you can do: \| SELECT INET_NTOA(ip) FROM ips WHERE ip > \| INET_NTOA(:startOfRange) AND ip < INET_NTOA(:endOfRange) \| havkd wrote: \| Is this correct? Would indexing the columns instead of moving \| the values to another table lead to the same increase in \| performance? \| mst wrote: \| Given this is a "from the start of her career" story, I'm \| guessing she was running similar versions of mysql to the \| versions I started with, and if my guess is correct then \| probably not. \| \| On anything you're likely to be deploying today, just \| throwing a compound index at it is likely quite sufficient \| though. \| kofejnik wrote: \| It would be faster, I believe. As we're only looking for \| existence, a single index traversal is all io we would need \| to do \| taeric wrote: \| An index is often easily viewed as another table. So, should. \| \| (Some subtleties on projected values and such, but the point \| stands.) \| havkd wrote: \| Then this post is very misleading. \| taeric wrote: \| I wouldn't get too vehement on it. Knowing how to \| normalize isn't a bad thing. And, if you intricately know \| your write and read needs, you may be able to assist the \| work more appropriately. \| jameshart wrote: \| Yes, the proposed 'better' structure basically amounts to \| building your own indexes. Normally, I'd assume it is \| better to use the RDBMS's own engine to do that, don't roll \| your own. \| \| There may well be some subtlety to the indexing \| capabilities of MySQL I'm unaware of though - could easily \| imagine myself making rookie mistakes like assuming that it \| has same indexing capabilities. So, to the post's point - \| if I were working on a MySQL db I would probably benefit \| from an old hand's advice to warn me away from dangerous \| assumptions. \| \| On the other hand I also remember an extremely experienced \| MS SQL Server DBA giving me some terrible advice because \| what he had learned as a best practice on SQL Server 7 \| turned out to be a great way to not get any of the benefits \| of a new feature in SQL Server 2005. \| \| Basically, we're all beginners half the time. \| rrrrrrrrrrrryan wrote: \| As a former SQL Server DBA, most SQL Server DBAs are \| absolute gurus on 1 or 2 specific versions of SQL server, \| and massively ignorant about anything newer. \| \| It's a strange role where, in order to do your job well, \| you kind of have to hyper-specialize in the specific \| version of the tech that your MegaCorp employer uses, \| which is usually a bit older, because upgrading databases \| can be extremely costly and difficult. \| loeg wrote: \| The same O(N) -> O(log N) improvement for queries, yes. The \| constant factor on the separate tables _might_ be better. \| It's also a more complicated design. \| adfgaertyqer wrote: \| Sure would! I think it would be marginally better, in fact, \| because you would just need to look at the index rather than \| five tables. Access would be more local. \| rustc wrote: \| Moving the data into another table would still require \| indexes: one on the original table's column (which now stores \| the new id) and one on the new table's primary key. \| \| In most cases I'd expect just adding an index to the original \| table to be more efficient, but it depends on the type of the \| original column and if some data could be de-duplicated by \| the normalization. \| ignoramous wrote: \| > _This post is bizarre_ ... _Which is nonsense_ _...but from \| her description I would say it wasn 't. _ \| \| This _is_ what the post is about. \| \| And btw, if they hadn't normalized those tables, it'd instead \| have been pointed out to them that their "post is bizzare..." \| someonewhocar3s wrote: \| I was kinda surprised by the last chapter, suddenly this is \| about 'advancing in programming as a woman', and it's over! \| Like the last few chapters make sense in that view. \| \| What I want to know is how she went from a strong database \| format (ID + data, keyed) into a weak database model (not a \| database... is a key-value store) which is likely to be much \| slower, but also, this is what happens to varchars under the \| hood already (the rows only hold pointers, to where the \| strings/objects are placed, rather than inlining varchars) (at \| least in litesql!), so optimally it's only two dereferences \| instead of one.. \| \| Like, eh, this programmer went through an anti-learning process \| somehow, and came up with a senseless optimisation. The post \| ends midway through the learning! \| [deleted] \| jeroenhd wrote: \| Based on this piece of old documentation I found [0] for MySQL \| 3.23 (the most recent version in 2002 as far as I can tell), \| certain types of indices were only available on certain types \| of engines. Furthermore, columns were restricted to 255 \| characters, which may be too short for some of the fields saved \| in the database. \| \| Modern databases abstract away a lot of database complexity for \| things like indices. It's true that these days you'd just add \| an index on the text column and go with it. Depending on your \| index type and data, the end result might be that the database \| turns the table into third normal form by creating separate \| lookup tables for strings, but hides it from the user. It could \| also create a smarter index that's less wasteful, but the end \| result is not so dissimilar. Manually doing these kinds of \| optimisations these days is usually a waste of effort or can \| even cause performance issues (e.g. that post on the front page \| yesterday about someone forgetting to add an index because \| mysql added them automatically). \| \| All that doesn't mean it was probably a terrible design back \| when it was written. We're talking database tech of two decades \| ago, when XP had just come out and was considered a memory hog \| because it required 128MB of RAM to work well. \| \| [0]: \| http://download.nust.na/pub6/mysql/doc/refman/4.1/en/create-... \| mixmastamyk wrote: \| > All that doesn't mean it was probably a terrible design \| back when it was written. We're talking database tech of two \| decades ago, when XP had just come out and was considered a \| memory hog because it required 128MB of RAM to work well. \| \| A lot of this stuff was invented in the 70s, and was quite \| sophisticated by 2000. It just wasn't _free_ , rather quite \| expensive. MySQL was pretty braindead at the time, and my \| recollection is that even postgres was not that hot either. \| We've very lucky they've come so far. \| hn_throwaway_99 wrote: \| The fact remains that whether the text column existed on her \| original table, or whether it was pulled out to a normalized \| table, literally all of the same constraints would apply \| (e.g. max char length, any other underlying limitations of \| indexing). \| \| The issue is that her analysis of what the issue was with her \| original table is _completely wrong_ , and it's very weird \| given that the tone her "present" self is that it's so much \| more experienced and wise than her "inexperienced, naive" \| self. \| \| My point is that she should give her inexperience self a \| break, all that was missing from her original implementation \| were some indexes. \| mst wrote: \| Multiple-long-column compound indices sucked in old mysqls \| if you could even convince it to use them in the first \| place. \| \| Being able to look up each id via a single-string unique \| index would've almost certainly worked much better in those \| days. \| hn_throwaway_99 wrote: \| I used MySql a lot back then, had many multi-column \| indexes, and never had an issue. \| \| More importantly, given the degree of uniqueness likely \| to be present in many of those columns (like the email \| addresses), she could have gotten away with not indexing \| on every column. \| mst wrote: \| Your first sentence was, I'm afraid, very much not a \| universal experience. \| \| Your second is quite possibly true, but would have \| required experimentation to be sure of, and at some point \| "doing the thing that might be overkill but will \| definitely work" becomes a more effective use of \| developer time, especially when you have a groaning \| production system with live users involved. \| e12e wrote: \| Based on: https://web.mit.edu/databases/mysql/mysql-3.23.6-al \| pha/Docs/... \| \| I'd say there's very little performance gain in normalizing \| (it usually goes the other way anyway: normalize for good \| design, avoiding storing multiple copies of individual \| columns; _de_ -normalize for performance). \| \| I'm a little surprised by the tone of the article - sure, \| there were universities that taught computer science without \| a database course - but it's not like there weren't practical \| books on dB design in the 90s and onward? \| \| I guess it's meant as a critique of the mentioned, but not \| linked other article "being discussed in the usual places". \| willvarfar wrote: \| Knowing only what we can guess about the actual problem, i'd \| say indexing ip ought be enough to make everything super \| fast. \| gfodor wrote: \| The point is she was being hyper critical of her past schema to \| make a larger point. \| dehrmann wrote: \| I have mixed feelings about this. On one hand, I appreciate \| the larger point, and pointing out mistakes _again_ goes \| against that point. On the other, the post had so many \| technical details it overshadowed the larger point, \| especially for something that wasn 't obviously broken. \| mwcampbell wrote: \| And IIUC, the larger point, made by the twist ending, is that \| we shouldn't be so critical, but try to actually help newbies \| learn what we insist that they should know. \| erikerikson wrote: \| It seemed to me the point was that if the industry remains \| designed to push out experienced engineers that there won't \| be anyone to mentor the engineers that are just beginning \| their careers. Further that we will bend the productivity \| of the field downward and have less effective systems. \| sokoloff wrote: \| In what way is the industry designed that way now? Maybe \| some get sucked into management roles, but I don't see \| any evidence of a broad design to push them out. \| erikerikson wrote: \| Role count pyramids, volume and inflexibility of working \| hours, abusive management techniques, a constant \| treadmill of the new hotness, more... \| \| Some offices avoid many of the challenging patterns but \| not the vast majority. A lot of it is natural result of \| the newness of the industry and the high stakes of \| investment. Many of the work and organizational model \| assumptions mean that offering flexibility to employees \| complicates the already challenging management role. \| Given the power of those roles, this means workers \| largely must fit into the mold rather than designing a \| working relationship that meets all needs. As the needs \| change predictably over time a mounting pressure \| develops. \| \| I'm only 17 years in but it has gotten harder and harder \| to find someone willing to leave me to simply code in \| peace and quiet and dig my brain into deep and \| interesting problems. \| wbl wrote: \| Look at the way we screen resumes, or valorize work \| conditions incompatible with having family obligations. \| Ageism in the tech industry is rife. \| bialpio wrote: \| Or how leetcode heavy hiring favors folks that tend to be \| more fresh out of college & folks who have time to \| prepare for them. \| tomc1985 wrote: \| Idunno the real villain is that bullshit, costly "per-row" \| pricing \| codazoda wrote: \| As a 25 year "veteran" programmer, I'm glad I'm not the only \| one that likes the original schema (with indexes added). A \| fully normalized table is a lot more difficult to read, \| troubleshoot, and reason about. You would need a pretty good \| query to poke around the data. \| mst wrote: \| When I first started using postgres I was amazed at how much \| of what I thought was "database knowledge" was actually \| "working around limitations of the versions of mysql I'd been \| using" knowledge. \| \| These days, sure, the original schema plus indices would work \| fine on pretty much anything I can think of. Olde mysqls were \| a bit special though. \| [deleted] \| zhoujianfu wrote: \| Phew.. I was reading this post thinking "is basically every \| table I've ever made wrong?!" \| mst wrote: \| For modern databases, the extra normalisation is an \| incremental optimisation that you usually won't need. \| \| But I think she's if anything been in the industry longer \| than I have and the versions of mysql I first ran in \| production were a different story. \| topher_t wrote: \| I definitely got a spike of imposter syndrome thinking, of \| fuck I've been in software this long and haven't learned this \| yet?! \| hoppla wrote: \| I usually store IPv4 as unsigned int, and IPv6 as two 64 bit \| unsigned ints. But supporting both variants require that I have \| an a table for each type and a glue table to reference the \| record in the correct table. Storing simple stuff quickly turns \| complicated... \| da39a3ee wrote: \| OK so change the article's technical solution from "normalize" \| to "add index". That doesn't change the point of the article. \| rmk wrote: \| Why not store the IP Address as a 32-bit number (IPv4 addresses)? \| Why store it as a string in the first place? This is something I \| did not quite get. Also, wouldn't it be better to split out the \| domain from the email address for ease of filtering? \| \| Also, how does performing joins give a performance advantage \| here. I'm assuming there would be queries to get at the IDs of at \| least one, but going up to 4, to get at the IDs of the items in \| the quad. Then there would be a lookup in the mapping table. \| \| I have worked for some time in this industry, but I have never \| had to deal with relational databases (directly; query tuning and \| such were typically the domain of expert db people). It would be \| interesting to see an explanation of this aspect. \| \| EDIT: To people who may want to comment, "You missed the point of \| the article!": no, I did not, but I want to focus on the \| technical things I can learn from this. I agree that ageism is a \| serious problem in this industry. \| pg_bot wrote: \| You could store the IP as a 32 bit unsigned int. There's no \| issue with that, but I would probably recommend nowadays to use \| an internal cidr or inet type if your database supports that. \| It probably wouldn't be better to split the email address from \| the domain for filtering since email address formats tend to be \| a bit tricky. \| \| Joins give a performance advantage due to the fact that you \| aren't duplicating data unnecessarily. The slow query in \| question becomes five queries (4 for the data once for the \| final lookup) which can each be done quickly and if any one of \| them return nil, you can return a timeout. \| rmk wrote: \| Yes, some of the databases support native ip address types \| that can be manipulated efficiently. Better to use that (like \| inbuilt json querying capabilities in postgres) than come up \| with your own. \| \| It is still not clear how is it better to do up to 5 separate \| queries or maybe a few joins, than to store the strings and \| construct indices and such on them? Is the idea that the cost \| of possibly establishing a new socket connection for some \| concurrent query execution or sequentially executing 5 \| queries is still < possible scans of the columns (even with \| indexing in effect)? Also, even if you had integers, don't \| you need some sort of an index on integer fields to avoid \| full table scans anyway? \| Aqueous wrote: \| There is a time and a place for a denormalized schema. If you are \| trying to search a large data-set a denormalized schema in a \| single table with some good indexing is much preferable to \| joining across 30 tables. \| \| Whether you choose normalized or denormalized depends very much \| on a) what you need to do now b) what you may need to do in the \| future. Both are considerations. \| bizzarojerry wrote: \| What a bizarre post from an influential blogger so confidently \| making assertions that are wildly contested even among HN users. \| chrischapman wrote: \| I think the 'terrible schema' thing is a secondary issue. The \| important take away for me was this: \| \| > Considering that we as an industry tend to chase off anyone who \| makes it to the age of 35, is it any surprise that we have a \| giant flock of people roaming around trying anything that'll \| work? \| alisonkisk wrote: \| That canard isn't true. \| mixmastamyk wrote: \| Unfortunately it is the penultimate sentence in a sizable post. \| temporallobe wrote: \| This was great. Clever twist at the end. \| \| I cringed a little because some of those mistakes looks like \| stuff I would do even now. I have a ton on of front/back end \| experience in a huge variety of languages and platforms, but I am \| NOT a database engineer. That's a specific skillset I never \| picked up nor, admittedly, am I passionate about. \| \| Sorry to go off on a tangent, but it also brings to mind that \| even so-called experts make mistakes. I watch a lot of live \| concert footage from famous bands from the 60s to the 90s, and as \| a musician myself, I spot a lot of mistakes even among the most \| renowned artist...with the sole exception of Rush. As far as I \| can tell, that band was flawless and miraculously sounded better \| live than on record! \| mceachen wrote: \| Actually, the Correct Answer is a bloom \| filter. \| \| (And, yes, we had math in the early 2000s.) \| \| Snark aside, I'm frustrated for the author. Her completely- \| reasonable schema wasn't "terrible" (even in archaic MySQL)--it \| just needed an index. \| \| There's always more than one way to do something. It's a folly of \| the less experienced to think that there's only One Correct Way, \| and it discourages teammates when there's a threat of labeling a \| solution as "terrible." \| \| Not to say there aren't infinite terrible approaches to any give \| problem: but the way you guide someone to detect why a give \| solution may not be optimal, and how you iterate to something \| better, is how you grow your team. \| Jolter wrote: \| To be fair, the "correct way" to do databases at that time was \| to use third normal form. Putting indices on string columns \| would have been considered a hack, much like using MySQL was. \| mst wrote: \| Plenty of databases at the time had pretty lame string \| indexing because it simply wasn't something people relied on \| if they wanted performance, and memory sayeth mysql's were \| both not particularly efficient and had some nasty technical \| limitations. \| \| On the mysql she was using, breaking things out so it only \| needed to index ints was almost certainly a much better idea. \| On anything I'd be deploying to today, I'd start by throwing \| a compound index at the thing with the expectation that'd \| probably get me to Good Enough. \| ed25519FUUU wrote: \| > id \| ip \| helo \| m_from \| m_to \| time \| ... \| \| I'm not a database person, but this doesn't seem _too bad_ if \| there was at least an index on IP. The performance there would \| probably be good enough for a _long time_ depending on the usage. \| \| But is 6 tables (indexed on the value) really better than 1 table \| with 6 indexes? \| rektide wrote: \| The popular advice in these comments to use indexes sounds really \| good. I haven't thought it out fully, but my first thought was \| "this sounds like a DIY columnar store". A columnar database does \| a lot of this work for you. I'm not 100% it's a good option in \| this case, because things like email subjects are not good \| columnar data. I would like to know more about how problematic \| that is. \| ummonk wrote: \| I was preparing to rant about the tone that the post was taking \| towards the "clueless programmer" until I got to the end. Well \| done... \| \| Still not sure why adding an index didn't work instead of \| breaking out into multiple tables indexed by IDs. \| \| Besides, this use case screams for "just store the hashes and \| call it a day". \| kpommerenke wrote: \| While the article is not really about the database schema, could \| you do a table with just two fields? \| \| 1) hash of concat(ip & helo & m_from & m_to) 2) time \| daly wrote: \| We had a test database that contained 1 record. Nobody paid much \| attention since the focus was on the problem, not the database. \| \| The database included several newly developed "stored \| procedures". \| \| Time elapsed... and it was nearing the time to ship the code. So \| we tried to populate the database. But we could not. It turned \| out that the stored procedures would only allow a single record \| in the database. \| \| Since a portion of the "business logic" depended on the stored \| procedures... well, things got "delayed" for quite a while... and \| we ended up having a major re-design of the back end. \| \| Fun times. \| hinkley wrote: \| Why do so many of us "forget" to populate the database with \| representative quantities of data until after we've baked in \| all of our poor design decisions? \| globular-toast wrote: \| > That's right, I was that clueless newbie \| \| Questionable use of past tense here. \| mwcampbell wrote: \| Why? She's clearly not a clueless newbie now. \| framecowbird wrote: \| I think the OP was sarcastically implying that she still is. \| inopinatus wrote: \| Such remarks generally backfire onto the author, as indeed \| in this case. \| kofejnik wrote: \| Right, not a newbie anymore \| [deleted] \| zephrx1111 wrote: \| I think this post showed us how an engineer on the Mt. stupid \| looks like. Apparently this design is wrong. We should rely on \| the indexing ability of database itself, instead of inventing our \| own indexing system. \| tyingq wrote: \| Feels like you could just concatenate and hash the 4 values with \| MD5 and store the hash and time. \| \| Edit: I guess concatenate with a delimiter if you're worried \| about false positives with the concat. But it does read like a \| cache of _" I've seen this before"_. Doing it this way would be \| compact and indexed well. MD5 was fast in 2002, and you could \| just use a CRC instead if it weren't. I suppose you lose some \| operational visibility to what's going on. \| ok123456 wrote: \| For the 2021 version, you'd just generate a bloom filter/cuckoo \| hash from all the data gathered by your spamhaus database \| periodically. Make a separate one for each value in your tuple \| and your score would be the number of the sub-hashes that \| matched. \| kingcharles wrote: \| Isn't that exactly what adding an index would do internally? \| tyingq wrote: \| 2002 MySQL had some limitations on indexing variable length \| string (varchar) columns. That's the gist of the linked \| story. \| Moto7451 wrote: \| Yup, we do this at work for similar purposes and it works a-ok. \| We also have some use cases that follow the original "problem" \| schema and they work fine with the correct indices involved. \| \| My guess is that in 2002 there were some issues making those \| options unappealing to the Engineering team. \| \| When we do this in the realm of huge traffic then we run the \| data through a log stream and it ends up in either a KV store, \| Parquet with SQL/Query layer on top of it, or hashed and rolled \| into a database (and all of the above of there are a lot of \| disparate consumers. Weee Data Lakes). \| \| This is also the sort of thing I'd imagine Elastic would love \| you to use their search engine for. \| shusson wrote: \| I would add a counter point to foreign key constraints, they make \| deletions and updates a lot more costly [1]. \| \| [1] https://shusson.info/post/postgres-experiments-deleting- \| tabl... \| wdr1 wrote: \| Related to database indexes, but not the post: a busted database \| index brought down ticket sales of the 2008 Olympics Games. \| \| This was the first time regular people could go buy tickets for \| events & they had been lining up overnight at Bank of China \| locations through the country. We were down for over a day before \| we called it off. Apparently this led to minor upheaval at \| several locations in Beijing & riot police were called in. \| \| We were pretty puzzled as we _had_ an index and had load tested \| extensively. We had Oracle support working directly with us & \| couldn't figure out why queries had started to become table \| scans. \| \| The culprit? A point upgrade to DBD::Oracle (something like X.Y.3 \| to X.Y.4) introduced subtle but in character sets. So the index \| required using a particular Unicode character set, and we were \| specifying it, but when it was translated into the actually \| query, it wasn't exactly the right one, so the DB assumed it \| couldn't use the index. Then, when all the banks opened & a large \| portion of very populous country tried to buy tickets at the same \| time, things just melted. \| \| Not a fun day. \| hinkley wrote: \| The one that always got us is much more mundane. Deleting a row \| requiring an index in every table with a FK to avoid tables \| scans. \| \| Near as I can tell, we assume there is some bit of magic built \| into the foreign key concept that handles this for us, but that \| is not the case. \| necovek wrote: \| Foreign keys are naturally referencing primary keys, which \| have their own (unique) indexes by default. And there's some \| extra magic with CASCADE. \| \| But it seems like I am missing your point: care to expand on \| it so I can learn about the gotcha as well? What are you \| attempting to do, and what's making the performance slow? \| nickkell wrote: \| If I've understood correctly it's the foreign key column \| itself that isn't indexed by default. Deleting a record \| then requires a table scan to ensure that the constraint \| holds \| dan-robertson wrote: \| Example: if you delete a record from the customers table, \| you want an index on the foreign key in the orders table \| to delete the corresponding entries. This also means the \| performance of the delete can have the unfortunate \| property where the small work of deleting a single row \| cascades into the work of deleting many rows. \| zeroimpl wrote: \| Yeah I've run into this a few times. In Postgres, you can't \| add a foreign key referencing columns that aren't indexed. I \| assume this is for performance reasons so inserts/updates are \| efficient, but there is no such check that the source columns \| are also indexed, so updates/deletes on the referenced table \| can be terribly slow. \| superjan wrote: \| At my previous job, a particular database was designed using \| ascii strings fields for a particular field rather than \| unicode. If you then query with a string in unicode format, the \| database decided that the comparison should be done in unicode. \| The only way was to table scan and convert all ascii fields to \| unicode on the fly. It was found only in production. \| \| Given that you mention China in your story, did GB 18030 have \| anything to do with your problems? \| \| https://en.wikipedia.org/wiki/GB_18030 \| path411 wrote: \| Can't imagine the pain to even discover this problem. It \| normally takes me a long time to be like, "maybe I didn't make \| a mistake, and I have a third party bug?". Third party bugs are \| always the most annoying to me, usually the longest to diagnose \| and then ultimately I have to just tell my boss I can't do \| anything to fix it, but hopefully I can find a way to avoid it. \| mst wrote: \| I love DBD::Oracle so much I've always arranged to have other \| people test oracle related DBIx::Class features for me. \| \| Writing the pre-ANSI-JOIN-support SQL dialect code was pretty \| good fun though. \| taberiand wrote: \| The moral of this story, to me, is always lock in all \| dependencies to the exact specific versions tested for \| production release. \| [deleted] \| throwawayFanta wrote: \| I think this blog fails to take into account that 2021 is not \| 2002. Computer Science is a much more formal/mainstream field of \| study now, and people don't operate "in a vacuum with no \| mentorship, guidance, or reference points." \| \| Some comments on the previous blog post raised important \| questions regarding minimum understanding/knowledge of technology \| one utilizes as part of their day job. And I would agree that \| indexing is a fundamental aspect while using relational \| databases. \| \| But unfortunately this isn't very uncommon these days, way too \| often have I heard people in $bigco say, let's use X, everyone \| uses X without completely understanding the \| implications/drawbacks/benefits of that choice. \| [deleted] \| savant_penguin wrote: \| I more or less thrown the towel on that "not being clueless \| anymore" \| \| Sure, I'll try and study to keep myself on my feet, but there is \| soo much to learn that I'd rather focus on the technologies \| needed right _now_ to get things done \| \| Way more often than not "thing A has to be done with tech B" and \| the expectation of getting it done without multiple foot guns in \| gone \| havocsupreme wrote: \| I'm pretty sure all the people who are criticizing the database \| design haven't read the ending. The article isn't about the \| schema, it's about helping those entry level programmers \| loeg wrote: \| It would be more effective at making that point if it didn't \| have the confused and arguably incorrect section in the middle. \| If as a writer, the audience misses your point, maybe it's a \| problem with your writing rather than the audience. \| empthought wrote: \| You don't think it's helpful for people to steer entry level \| programmers away from bad ideas like this one? \| lisper wrote: \| > You know what it should have been? It should have been \| normalized. \| \| No, it should have had a unique index on the first five columns. \| \| (Oooh, I just hate it so much when someone gets up on their high \| horse and opens a blog post with a long condescending preamble \| about how some blogger got it wrong, and then they get it even \| wronger.) \| nobodyandproud wrote: \| If there are millions of people entries coming in per hour, I can \| see the value. \| \| Otherwise, how long is their data retention? Wouldn't there be \| more value in cleaning out entries older than, say, two hours? \| lifeisstillgood wrote: \| The one thing I would add here is that the devlead (or equivalent \| - there always is someone) should have added at least one test. \| Not a unit test or 99% co drags test but a "does it do what we \| want" test - a golden path test. \| \| I came across something like this with an apprentice recently. I \| let the lack of a repeatable test pass because speed / time / \| crunch etc and the obvious how can that possibly fail code of \| course failed. The most basic test would have caught it in the \| same way some basic performance testing would have helped rachel \| however many years back. \| \| It's hard this stuff :-) \| dilyevsky wrote: \| The real take away here is DB performance optimization (imo \| performance in general) is freaking hard and even people who'd \| successfully optimized something like OP often totally \| misunderstand why it actually worked. Pay for an expert advice \| rvr_ wrote: \| short answer: composite indexes \| jeffrallen wrote: \| I did this (bad schema, too much scanning) too. \| \| I also ran newfs on a running Oracle database. (That did vastly \| less damage than you would have thought, actually.) \| ChrisMarshallNY wrote: \| I'd say that pretty much _everyone_ is a better DB programmer \| than I am, so I don 't really go out of my way to design anything \| especially robust. I try to design stuff I'm not good at, in a \| way that makes it easy for someone that knows more than I do, to \| come in and toss out the work I did, to be replaced with good \| work. \| \| There's _lots_ of stuff that I 'm bad at. I'm also good at a fair \| bit of stuff. I got that way, by being bad at it, making \| mistakes, asking "dumb" questions, and seeing how others did it. \| Grimm1 wrote: \| Heh at the end I think I may have been called out for my harsh \| response to the post being referenced at the start here on HN \| since it seemed to raise some hackles. Borrowing from other \| social media terminology here, "subtweeted" but by a blog post, \| not sure how I feel about that. \| \| Honestly it almost makes me feel compelled to start writing more \| for the industry but to be honest, my opinion is most ideas are \| not great, or not novel, up to and including my own. So writing \| about them seems likely not to be beneficial. I guess you could \| argue I could let the industry be the judge of that though. \| fortran77 wrote: \| The "NoSQL" movement -- which I wisely rejected -- is also \| responsible for this. \| jugg1es wrote: \| There are problems that nosql databases solve easily that \| require a lot of fancy tricks in a RDBMS. \| harel wrote: \| All "movements" have some merit, or they wouldn't exist in the \| first place. Nobody starts a "movement" just because. It exists \| because it solves something. "NoSQL" was a solution to \| something. Today you can benefit from best of all worlds, and \| still NoSQL has a place and a use case, just like anything else \| that exists. \| tester756 wrote: \| I will play devil advocate \| \| >and still NoSQL has a place and a use case, just like \| anything else that exists. \| \| What actually makes you believe that it's the NoSQL that has \| "some use cases" and relational databases are "default ones" \| instead of NoSQL/no-relational by default? \| junon wrote: \| Err. Wat. The original schema was (mostly) fine. It had no \| indexes. The second schema looks like it tried to work around \| lack of database optimization features. It's in no way "better" \| from a data design standpoint. \| \| A database with good string index support isn't doing string \| comparisons to find selection candidates - at least, not \| initially. \| \| What a bizarrely confident article. \| pgeorgi wrote: \| > lack of database optimization features \| \| That's a perfection description of mysql as of 2002. \| junon wrote: \| MySQL, at least as far back as 2000, had indexes. \| \| http://web.archive.org/web/20020610031610/http://www.mysql.c. \| .. \| pgeorgi wrote: \| I wouldn't bet on these indexes optimizing anything back \| then. MySQL was legendary that while implementing the \| necessary standards (to some degree) it was neither \| reliable nor efficient. \| eklitzke wrote: \| Indexes in MySQL worked fine in 2002, at least for the \| use case described in this article. \| junon wrote: \| That's pretty beside the point though, the concept of \| indexing had existed for decades even prior to that. \| pgeorgi wrote: \| The quote was "database optimization features" and the \| scope was "MySQL as of 2002". \| \| Of course, even my old DBase II handbook talks about \| indexes - all that's old hat. MySQL had them, too. \| \| MySQL also used to have a long-earned reputation as a toy \| database though, and MySQL in 2002 was right within the \| timeframe where it established that reputation. So yeah, \| you could add indexes, but did they speed things up (as \| in them being an "optimization feature")? Public opinion \| was rather torn on that. \| oconnor663 wrote: \| I think there's an easy "best of both worlds" take here: \| \| 1. The majority of the performance problem could've and probably \| should've been summarized as "you need to use an index". (Maybe \| there were MySQL limitations that got in the way of indexing back \| then? But these days there aren't.) \| \| 2. Everyone makes mistakes! New programmers make mistakes like \| not knowing about indexes. Experienced programmers make mistakes \| like knowing a ton about everything and then teaching things in a \| weird order. All of these things are ok and normal and part of \| growing, and it's important that we treat each other kindly in \| the meantime. \| mst wrote: \| Based on my own war stories, if it was an early '00s mysql then \| I expect "just throw an index at the varchar columns" would've \| helped but not nearly as much as what she actually did. \| gadrev wrote: \| What I got from this post is a headache after reviewing the \| wikipedia entries on 3NF (all all of it's dependencies: 1NF, 2NF, \| also 3.5NF). Man the language is dense. \| \| j/k, always nice to review some of this "basic" stuff (that you \| barely use in everyday work but it's good to have it in the back \| of your head when changing something in the db schema) \| bizzarojerry wrote: \| What a bizarre post from an influential blogger so confidently \| making assertions that are wildly contested even among HN users. \| \| If anything this is the problem with industry, articles like this \| that make vast generalizations from a specific instance, \| unsubstantiated finger-pointing gestures, and passive \| aggressively blame the industry as a whole. \| mmaunder wrote: \| Oh well played!! I was setting up to give the author a hard time \| about being judgemental, particularly because storing IPs or \| UUIDs as strings is a mistake I've seen some pretty darn good \| devs make. Some folks just aren't super strong on schema design \| and performance but are great at other things. \| \| Plus MySQL kind of rocks. Fight me. There are some interesting \| optimizations that Percona has written about that may improve the \| performance of OPs schema. The learning never ends. \| \| TBH I haven't seen a shortage of mentors. In fact most of our dev \| team is 35+ and are very approachable and collegial and our \| internal mentoring programs have been extremely successful - \| literally creating world experts in their specific field. I don't \| think we're unique in that respect. \| \| I think it's rather unfortunate that the top comment here is a \| spoiler. Actually reading the post and going on the emotional \| rollercoaster that the author intended to take you on is the \| point. Not the punch line. \| jjice wrote: \| > Plus MySQL kind of rocks. \| \| Agreed. I prefer Postgres for personal projects, but MySQL is a \| fine database. Honestly, even a relational DB I wouldn't want \| to use again (DB2...) is still pretty solid to me. The \| relational model is pretty damn neat and SQL is a pretty solid \| query language. \| \| I wonder how many people disagree with that last part in \| particular... \| mmaunder wrote: \| It's actually the core of all dev that involves any data \| storage, today. HN is both aspirational in tech and has a \| younger demographic so you won't get any points here for \| mentioning anything SQL. Hence the downvotes I'm getting. But \| hey, it's all good!! \| tomsmeding wrote: \| Why exactly would it be so bad to just put a suitable index on \| the table containing strings? The time complexity of the \| resulting search would be the same, so I assume there will be \| some constant factor slowdowns. Is it that indices over string \| fields are stored inefficiently on disk? (If so, can that not be \| fixed in the db engine directly?) Or is this fine today but \| wasn't fine 15 years ago? \| dan-robertson wrote: \| I think the article describes the solution that was used and \| not necessarily the best solution. If they made the "obvious" \| mistake in the first case they mightn't arrive at the best \| solution in the second. Adding the single index and using the \| four tables have the same worst case complexity for reads, \| O(log n), with different constants and massively beat scanning \| at O(n). \| \| It may be that the second solution wasn't the best, or that it \| was better for write performance or memory usage--this was 2002 \| after all. It may also be the case that the cardinal it's of \| some columns was low in such a way that the four table solution \| was better, but maybe getting the columns in the right order \| for a multi column index could do that too. \| \| Either way, I don't think it really matters that much and \| doesn't affect the point of the article. \| HelloNurse wrote: \| Normalization is important for deduplication, not only to index \| and compare a few short numbers instead of a few long string: \| those host names and email addresses are long and often \| repeated. \| remram wrote: \| That's not what normalization is. You're thinking of \| deduplication or compression. This table is normalized as per \| usual database terminology. \| ghayes wrote: \| But if it's guaranteed to be 1:1, why? The two \| implementations (normalized and denormalized) in that case \| should be completely isomorphic, quirks of the DB engine \| aside. \| mst wrote: \| It was fine on -some- databases 20 years ago. mysql of that \| era, less often. \| thehappypm wrote: \| At the worst you could concatenate the strings and have a \| single primary key.. or hash the string concatenation. \| jerjerjer wrote: \| Yeah, while this is not optimal (ip should be converted into \| integer, time should be ts), the table would be small (as old \| entries could be safely deleted). The only real issue is the \| lack of indices. \| \| Also is helo field even needed? \| mjevans wrote: \| Yes. A proper mail implementation won't randomize it, so it \| should be consistent. Depending on the design of the \| validation stages it might only need a case-normalized lookup \| to the string index table and that id as part of the final \| multi-column unique index, and in that last case only to \| convey to the database the expectations to optimize around \| (and require the data adhere to). \| junon wrote: \| Indexing existed 15 years ago. The article never mentions why \| indexing didn't solve this problem. Super weird take on the \| author's part... \| quickthrower2 wrote: \| Surely more than 15 years! 50 maybe? \| ericbarrett wrote: \| Early InnoDB* had pretty strict limits on varchar indexes and \| was not the most efficient. I don't remember the details but \| it's entirely possible the single-table format Rachel \| described ran head on into those limitations. \| \| Also remember indexes take space, and if you index all your \| text columns you'll balloon your DB size; and this was 2002, \| when that mattered a lot more even for text. Indexes also add \| write and compute burden for inserts/updates as now the DB \| engine has to compute and insert new index entries in \| addition to the row itself. \| \| Finally, normalizing your data while using the file-per-table \| setting (also not the default back then) can additionally \| provide better write & mixed throughout due to the writes \| being spread across multiple file descriptors and not \| fighting over as many shared locks. (The locking semantics \| for InnoDB have also improved massively in the last 19 \| years.) \| \| * Assuming the author used InnoDB and not MyISAM. The latter \| is a garbage engine that doesn't even provide basic \| crash/reboot durability, and was the default for years; using \| MyISAM was considered the #1 newbie MySQL mistake back then, \| and it happened all the time. \| noduerme wrote: \| To me it seems that spreading it over four tables would \| lead to a lot more potential read locks while the big \| combined table is waiting for a join on each of the others, \| and some process is trying to insert on the others and link \| them to the main. This is assuming they were using foreign \| keys and the main table 4-column index was unique. \| cogman10 wrote: \| > Also remember indexes take space \| \| Indexes take space, except for the clustered index. \| \| An important distinction. \| \| If the point of this table is always selecting based on IP, \| m_from, m_to, then clustering on those columns in that \| order would make sense. \| \| Of course, if it's expected that a lot of other query \| patterns will exist then it might make sense to only \| cluster on one of those columns and build indexes on the \| others. \| kofejnik wrote: \| The initial design was fine and btw it did not violate 3NF, \| actually (if you add auto increment pk). A single composite index \| would've most likely solved all problems \| krisgenre wrote: \| My piano teacher once told me that its common to commit a mistake \| but its disastrous if you keep practicing without realizing that \| you have made a mistake. What I look for in a developer is their \| ability to realize a mistake and find ways to fix it. \| marginalia_nu wrote: \| https://news.ycombinator.com/item?id=29137141 \| cuu508 wrote: \| > The observation was that we could probably store the IP \| address, HELO string, FROM address and TO address in a table, and \| send back a 4xx "temporary failure" error the first time we saw \| that particular tuple (or "quad"). A real mail server which did \| SMTP properly would retry at some point, typically 15 minutes to \| an hour later. If it did retry and enough time had elapsed, we \| would allow it through. \| \| I've run into this form of greylisting, it's quite annoying. My \| service sends one-time login links and authorization codes that \| expire in 15 minutes. If the email gets delayed, the user can \| just try again, right? Except I'm using AWS SES, so the next \| email may very well come from a different address and will get \| delayed again. \| loeg wrote: \| You could have your codes expire after an hour or day instead? \| cuu508 wrote: \| That's _a_ solution, but with tradeoffs: \| \| * security, an attacker has more time to intercept and use \| the links and codes. \| \| * UX, making the user wait 15+ minutes to do certain actions \| is quite terrible. \| \| I've had a couple support requests about this. The common \| theme seems to be the customer is using Mimecast, and the fix \| is to add my sender address in a whitelist somewhere in their \| Mimecast configuration. \| alksjoflkajsom wrote: \| People in the comments are getting (rightfully) outraged about \| the poor understanding of indexing, but I'm a little surprised \| that everyone here doesn't seem to understand normalization \| either. The original schema is perfectly normalized and is \| already in 3NF: none of the columns shown has a dependence on any \| of the other columns outside of the primary key (in other words, \| if you knew eg the values of the ip, helo, and from columns, \| you'd still have no information about the "to" column). \| \| Normalization does not mean "the same string can only appear \| once". Mapping the string "address1@foo.bar" to a new value "id1" \| has no effect on the relations in your table. Now instead of \| "address1@foo.bar" in 20 different locations, you have "id1" in \| 20 different locations. There's been no actual "deduplication", \| but that's again not the point. Creating the extra tables has no \| impact on the normalization of the data. \| [deleted] \| slim wrote: \| Plus depending on the amount of emails they get, that \| optimisation could be unnecessary. That database schema was \| perfectly fine in some cases. \| colanderman wrote: \| Came here to say this. Whatever may have been true of InnoDB 20 \| years ago, don't follow this article's advice for any modern \| relational database. \| \| To add: Not sure about MySQL, but `varchar`/`text` in \| PostgreSQL for short strings like those in the article is very \| efficient. It basically just takes up space equaling the length \| of the string on disk, plus one byte [1]. \| \| [1] https://www.postgresql.org/docs/current/datatype- \| character.h... \| mixmastamyk wrote: \| Thank you. Was thinking that I was the clueless one... \| (probably a little) because that is not my understanding of \| what normalization is. \| \| I thought perhaps swapping the strings to integers might make \| it easier to index, or perhaps it did indeed help with \| dedupilcation in that the implementation didn't "compress" \| identical strings in a column--saving space and perhaps help \| performance. But both issues appeared to be implementation \| issues with an unsophisticated Mysql circa 2000, rather than a \| fundamentally wrong schema. \| \| I agreed with her comment at the end about not valuing \| experience, but proper databasing should be taught to every \| developer at the undergraduate level, and somehow to the self- \| taught. Looking at comptia... they don't seem to have a db/sql \| test, only an "IT-Fundamentals" which touches on it. \| nzealand wrote: \| > Normalization does not mean "the same string can only appear \| once". \| \| It can, if you want to easily update an email address, or \| easily remove all references to an email address because it is \| PII. \| alksjoflkajsom wrote: \| That's not what normalization normally means, no. See eg \| wikipedia \| https://en.wikipedia.org/wiki/Database_normalization \| noduerme wrote: \| >>> Mapping the string "address1@foo.bar" to a new value "id1" \| has no effect on the relations in your table. \| \| How do you mean? If id1 is unique on table A, and table B has a \| foreign key dependency on A.id, then yeah you still have id1 in \| twenty locations but it's normalized in that altering the \| referenced table once will alter the joined value in all twenty \| cases. \| \| This might not be important in the spam-graylisting use case, \| and very narrowly it might be 3NF as originally written, but it \| certainly wouldn't be if there were any other data attached to \| each email address, such as a weighting value. \| twic wrote: \| This is very slightly not quite true, because the HELO string \| and the remote address should go hand-in-hand. \| alksjoflkajsom wrote: \| You might be right (I don't actually know what a HELO string \| is, I don't know anything about SMTP :). I was just going off \| how the author presented the data, as a tuple of four \| completely independent things. \| \| Of course the main point still stands, that the two schemas \| are exactly as normalized as each other. \| \| Edit: rereading the original post, the author mentions that \| "they forged...the HELO"--so perhaps there was indeed no \| relationship between HELO and IP here. But again, I don't \| know anything about SMTP, so this could be wrong. \| Brian_K_White wrote: \| I do know about smtp and you were right regardless, because \| the author was talking about 4 database fields, not smtp. \| The details of smtp are irrelevant. \| GauntletWizard wrote: \| It's (somewhat) because the HELO is forged that there's no \| relationship between HELO and IP. The very first message of \| SMTP is "HELO ", hostname can either be a unique \| identifier (server1.company.com, etc.) or a system-level \| identifier (mta.company.com for all of your company's \| outbound mail agents, or in the case of bulk mailers they \| might use bulk.client1.com when sending as client1, \| bulk.client2.com, etc). But there is/was no authentication \| on what you send as HELO (Now you can verify it against the \| TLS certificate, though many implementations don't do that \| at all or well), so correlating based on the hostname in \| HELO was questionable at best. Thus, the combination of \| (HELO, IP) was the a single value as a tuple. \| Izkata wrote: \| > But there is/was no authentication on what you send as \| HELO \| \| Yep, and that explains the "foobar" rows - those should \| have resolved to the same IP, except because there's no \| authentication that blocks it you could put gibberish \| here and the SMTP server would accept it. \| \| > so correlating based on the hostname in HELO was \| questionable at best \| \| Eh, spambots from two different IPs could have both \| hardcoded "foobar" because of the lack of authentication, \| so I could see this working to filter \| legitimate/illegitimate emails from a compromised IP. \| [deleted] \| [deleted] \| dorianmariefr wrote: \| id \| ip_address \| from \| to \| content \| updated_at \| created_at \| \| And indexes where you need them. Normalize data like IP \| addresses, from and to. And you are good to go. \| fullstackchris wrote: \| I guess I've come away with a totally different takeaway than \| most. This post is rather strong on the blame game, which could \| be fixed by one thing... \| \| RTFD! (Read The F*in Docs!) - I only skimmed the post, but its \| definitely something that would have been avoided had some SQL \| documentation or introduction been read. \| \| I'd argue one of the huge things that differentiates "senior" \| developers from all the "other" levels - we're not smarter or \| more more clever than anyone else - we read up on the tools we \| use, see how they work, read how others have used them before... \| I understand this was from 2002, but MySQL came out in 1995 - \| there was certainly at least a handful of books on the topic. \| \| Perhaps when just starting off as an intern, you may be able to \| argue that you are 'operating in a vacuum' but any number of \| introductory SQL books or documentation could quickly reveal \| solutions to the problem encountered in the post. (Some of which \| are suggested in these comments). \| \| Of course we all make mistakes in software - I definitely could \| see myself creating such a schema and forgetting to add any sort \| of indexing - but when running into performance issues later, the \| only way you'll be able to know what to do next to fix it is by \| having read literature about details of the tools you are using. \| \| Operating in a vacuum? Then break out of it and inform yourself. \| Many people have spent many hours creating good documentation and \| tutorial on many many software tools - use them. \| fwip wrote: \| The problem here is "unknown unknowns," coupled with the real \| business pressure to get solutions out the door. The MySQL \| manual is (currently) over 6,000 pages. A developer new to SQL \| or relational DBs doesn't know how many of those are necessary \| to read and comprehend to deploy a working solution. \| \| And in this case, she designed and deployed the system, and it \| worked and met the business needs for several months. When \| performance became an issue, she optimized. I'm sure she had \| plenty of other unrelated work to do in the meantime, \| especially as a lead/solo dev in the early 2000's. \| \| Sounds like a productive developer to me. \| GuB-42 wrote: \| The problem, I think is that most people, me included, don't \| really know what databases really do. There is a whole lot about \| optimizing procedural code, with a variety of tools, the dangers \| of premature optimization and the tradeoff with readability. \| Anyone with a computer science degree has heard about algorithmic \| complexity, caches, etc... \| \| But databases are just magic. You try things out, usually \| involving a CREATE INDEX at some point, and sometime it gets \| faster, so you keep it. \| \| Rachel, in he blog post is a good example of that thought \| process. She used a "best practice", added an index (there is \| always an index) and it made her queries faster, cool. I don't \| blame her, it works, and it is a good reminder of the 3NF \| principle. But work like that on procedural code and I'm sure we \| will get plenty of reactions like "why no profiler?". \| \| Many, many programmers write SQL, but very few seem to know about \| query plans and the way the underlying data structures work. It \| almost looks like secret knowledge of the DBA caste or something. \| I know it is all public knowledge of course, but it is rarely \| taught, and the little I know about is is all personal curiosity. \| musingsole wrote: \| > But databases are just magic. \| \| 100%!!! \| \| I hate how developers talk about a "database" as a monolithic \| concept. It's an abstract concept with countless \| implementations built off of competing philosophies of that \| abstract concept. SQL is only slightly more concrete, but \| there's as many variants and special quirks of SQL dialects out \| there as databases. \| matsemann wrote: \| > _The first time you encounter something, you 're probably going \| to make some mistakes. There's a post going around tonight about \| how someone forgot to put an index on some database thing and \| wound up doing full table scans (or something like that)._ \| \| Which post is referenced here? \| \| I also believe this to be a tooling issue. It's often opaque what \| ends up being run after I've done something in some framework \| (java jpa, django queries whatever). How many queries (is it n+1 \| issues at bay?), how the queries will behave etc. Locally with \| little data everything is fine, until it blows up in production. \| But you may not even notice it blowing up in production, because \| that relies on someone having instrumented the db and push \| logs+alarms somewhere. So it's easy to remain clueless. \| claytonjy wrote: \| Pretty sure it's this one \| https://news.ycombinator.com/item?id=29132572 \| webstrand wrote: \| While the normalized version is more compact and doesn't store \| redundant data, it _also_ needs an index on the four columns or \| it'll have to check every row in the table. A similar index added \| to the original denormalized table would have given comparable \| query performance. \| \| The table schema isn't terrible, it's just not great. A good \| first-pass to be optimized when it's discovered to be overly \| large. \| [deleted] \| Closi wrote: \| > The table schema isn't terrible, it's just not great. \| \| It depends - if this was the full use-case then maybe a single \| table is actually a pretty good solution. \| [deleted] \| ehershey wrote: \| Proud of myself for recognizing the conceit (pun intended) here \| pretty quickly. Good point in the end too! \| Donckele wrote: \| Sorry Rachel but the best solution was a simple fix as mentioned \| by most people here - columns having the right types and indexes. \| \| When I was still wet behind the ears I was doing what you are now \| proposing - but its overkill. \| jugg1es wrote: \| One of the most useful lessons I ever learned about designing \| database schemas was the utter uselessness of indexing and \| searching on datetime fields. Since virtually every value is \| going to be different, indexing and searching on that field is \| (almost) no better than having no index on the datetime field at \| all. \| \| It was a revelation to me when I decided to experiment with \| having a indexed date-ONLY field and using that to search \| instead. It improved performance by almost two orders of \| magnitude. In hindsight, this should have been totally obvious if \| I had stopped to think about how indexing works. But, as is \| stated in this article, we have to keep relearning the same \| lessons. Maybe I should write an article about how useless it is \| to index a datetime field... \| \| EDIT: This was something I ran into about 10 years ago. It is \| possible there was something else going on at the time that I \| didn't know about that caused the issue I was seeing. This is an \| anecdote from my past self. I have not had to use this technique \| since then, and we were using an on-prem server. It's possible \| that the rest of the table was not designed well and index I was \| trying to use was already inefficient for the resources that we \| had at the time. \| dilyevsky wrote: \| You're maybe thinking of compression? Primary id is unique too \| but indexed search is still logn compared to full table scan \| (n) \| blibble wrote: \| there's more index types than only hashes (which do indeed only \| support equality) \| \| e.g. btrees allow greater than/less than comparisons \| \| postgres supports these and many more: \| https://www.postgresql.org/docs/9.5/indexes-types.html \| jugg1es wrote: \| I was using a BTREE index and we were doing greater/less than \| queries. It was not a hash index. \| ComodoHacker wrote: \| Did you also learn that certain indexes allow for range \| queries, and modern databases are quite efficient at those? \| \| And yes, please write an article, it would be quite \| interesting. With data and scripts to reproduce, of course. \| pushrax wrote: \| This entirely depends on what kind of index is used. A sorted \| index, such as a B or B+ tree (used in many SQL databases), \| will allow for fast point/range lookups in a continuous value \| space. A typical inverted index or hash based index only allows \| point lookups of specific values in a discrete value space. \| jugg1es wrote: \| It was a sorted BTREE index in MySQL 5.x. I agree that its \| supposed to be fast but it just wasn't for some reason. \| pushrax wrote: \| Are you sure it was actually using the index you expected? \| There are subtleties in index field order that can prevent \| the query optimizer from using an index that you might \| think it should be using. \| \| One common misstep is having a table with columns like (id, \| user, date), with an index on (user, id) and on (date, id), \| then issuing a query like "... WHERE user = 1 and date > \| ...". There is no optimal index for that query, so the \| optimizer will have to guess which one is better, or try to \| intersect them. In this example, it might use only the \| (user, id) index, and scan the date for all records with \| that user. A better index for this query would be (user, \| date, id). \| Izkata wrote: \| One of the more bizarre things we'd found in MySQL \| 5.something was that accidentally creating two identical \| indexes significantly slowed down queries that used it. \| \| I wouldn't be surprised if you hit some sort of similar \| strange bug. \| whoknowswhat11 wrote: \| I was just going to make this comment. \| \| Postgres as far as I know uses B-tree by default. \| \| You can switch sort order I think for this as well, so "most \| recent" becomes more efficient. \| \| Multi-column indexes also work, if you are just searching for \| first column postgres can still use multi-column index. \| dreyfan wrote: \| B-Tree indexes perform just fine on datetime fields. Were you \| using hash or inverted indexes maybe? \| Diggsey wrote: \| > Since virtually every value is going to be different, \| indexing and searching on that field is (almost) no better than \| having no index on the datetime field at all. \| \| Do you mean exact equality is rarely what you want because most \| of the values are different? \| \| Or are you talking about the negative effect on index size of \| having so many distinct values? \| \| I think the latter point could be quite database-dependent, eg. \| BTree de-duplication support was only added in Postgres 13. \| However, you could shave off quite a bit just from the fact \| that storing a date requires less space in the index than a \| datetime. \| drchickensalad wrote: \| I have no idea how you came to this conclusion, but indices on \| datetime fields are completely required to bring down the seek \| time from O(n) to O(log(N)), plus the length of the range. \| Massive parts of the businesses I've worked at would be simply \| impossible without this. \| \| The cardinality of the index has other ramifications that \| aren't generally important here. \| hn_throwaway_99 wrote: \| This a very strange, and incorrect, conclusion you have come \| to. It doesn't matter that datetime fields are unique, as most \| of the time you are not searching for a particular date time, \| but a range. I.e. "show me all of rows created between date X \| and Y". In that case, the ordering of the index makes it \| efficient to do that query. \| \| Furthermore, often times your query will have an "ORDER BY \| dateTimeCol" value on it (or, more commonly, ORDER BY \| dateTimeCol DESC). If your index is created correctly, it means \| it can return rows with a quick index scan instead of needing \| to implement a sort as part of the execution plan. \| mceachen wrote: \| Your result is surprising: I suspect your wide table with an \| index wasn't in cache, and your timestamp-only table was. \| \| The whole point is that indexes (can) prevent full table scans, \| which are expensive. This is true even if the column(s) you're \| indexing have high cardinality: but it relies on performant \| table or index joins (which, if your rdbms is configured with \| sufficient memory, should be the case). \| nlitened wrote: \| > The rub is that instead of just being slow, it also cost a fair \| amount of money because this crazy vendor system charged by the \| row or somesuch. So, by scanning the whole table, they touched \| all of those rows, and oh hey, massive amounts of money just set \| ablaze! \| \| Why _the hell_ is nobody mentioning that using a database that \| charges per row touched is absolute insanity? When has it become \| so normal that nobody mentions it? \| lern_too_spel wrote: \| If the amount charged isn't proportional to the work done, the \| cloud provider would quickly go out of business. \| nlitened wrote: \| My VPS provider gives me unlimited traffic, unlimited disk \| reads, and unlimited DB rows touched for fixed 30 dollars a \| month, regardless of how much CPU and RAM I keep loaded. \| \| This makes me think that these things are at least 100x \| cheaper than AWS might want to make me believe. \| ricardobeat wrote: \| You'll probably enjoy reading this article that was on the \| front page yesterday: \| https://briananglin.me/posts/spending-5k-to-learn-how-databa... \| [deleted] \| inopinatus wrote: \| The stench of splaining in these remarks is nigh overwhelming, \| and mostly missing the point to boot. ___________________________________________________________________ (page generated 2021-11-07 23:00 UTC)