|
| arein3 wrote:
| to me the first implementation seems the best, it just needed a
| multi column index
|
| the 3NF "optimized" form seems worse and unreadable. I would
| advise to normalize only when it's needed(complex/evolving
| model), in this case it was not needed
| biglost wrote:
| Or store the ip address as int64 with index, or better both
| solutions together
| stickfigure wrote:
| Assuming this needs to be optimized for massive scale, just hash
| the values to a single indexed field.
|
| And use something other than an RDBMS. Put the hash in Redis and
| expire the key; your code simply does an existence check for the
| hash. You could probably handle gmail with a big enough cluster.
|
| That super-normalized schema looks terrible.
| kodt wrote:
| I too recommend using Redis in 2002.
| stickfigure wrote:
| Fair enough, but I think the issue here is recognizing that
| the underlying business operation is a hit test on an
| expiring hash key. You could have used MySQL as a key/value
| store or looked for something more specialized.
| kazinator wrote:
| > _The whole system was reworked to have a table for each set of
| values. There was now a table of IP addresses, a table of HELO
| strings, a table of FROM addresses, and a table of TO addresses._
|
| Lisp interning, in database tables. A.k.a. Flygweight Pattern.
| ccleve wrote:
| Sorry, no. The original schema was correct, and the new one is a
| mistake.
|
| The reason is that the new schema adds a great deal of needless
| complexity, requires the overhead of foreign keys, and makes it a
| hassle to change things later.
|
| It's better to stick the the original design and add a unique
| index _with key prefix compression_ , which all major databases
| do these days. This means that the leading values gets compressed
| out and the resulting index will be no larger and no slower than
| the one with foreign keys.
|
| If you include all of the keys in the index, then it will be a
| covering index and all queries will hit the index only, and not
| the heap table.
| silisili wrote:
| I think it depends a lot on the data. If those values like IP,
| From, To, etc keep repeating, you save a lot of space by
| normalizing it as she did.
|
| But strictly from a performance aspect, I agree it's a wash if
| both were done correctly.
| philliphaydon wrote:
| Space is cheap now tho. Better to duplicate some data and
| avoid a bunch of joins than to worry about saving a few gb of
| space.
| wvenable wrote:
| Using space to avoid joins will not necessarily improve
| performance in an RDBMS -- it might even make it worse.
| kasey_junk wrote:
| This system was written in 2002.
| acdha wrote:
| It's not that easy: you need to consider the total size and
| cardinality of the fields potentially being denormalized,
| too. If, say, the JOINed values fit in memory and,
| especially, if the raw value is much larger than the key it
| might be the case that you're incurring a table scan to
| avoid something which stays in memory or allows the query
| to be satisfied from a modest sized index. I/O isn't as
| cheap if you're using a SAN or if you have many concurrent
| queries.
| nerdponx wrote:
| And even if this is somehow not an ideal use of the database,
| it's certainly far from "terrible", and to conclude that the
| programmer who designed it was "clueless" is insulting to that
| programmer. Even if that programmer was you, years ago (as in
| the blog post).
|
| Moreover, unless you can prove with experimental data that the
| 3rd-normal-form version of the database performs significantly
| better or solves some other business problem, then I would
| argue that refactoring it is strictly worse.
|
| There are good reasons not to use email addresses as primary or
| foreign keys, but those reasons are conceptual ("business
| logic") and not technical.
| late2part wrote:
| Deleted
| ccleve wrote:
| If you're suggesting that we should double the size of the
| hardware rather than add a single index, then I respectfully
| disagree.
|
| But your point is well-taken. Hardware is cheap.
| emerongi wrote:
| To a degree, this is true though. An engineer's salary is a
| huge expense for a startup, it's straight up cheaper to spend
| more on cloud and have the engineer work on the product
| itself. Once you're bigger, you can optimize, of course.
| late2part wrote:
| Deleted
| emerongi wrote:
| If this is a case from real life that you have seen, you
| should definitely elaborate more on it. Otherwise, you
| are creating an extreme example that has no significance
| in the discussion, as I could create similarly ridiculous
| examples for the other side.
|
| There are always costs and benefits to decisions. It
| seems that are you only looking at the costs and none of
| the benefits?
| late2part wrote:
| Deleted
| bigbillheck wrote:
| > compute is cheap
|
| There was a whole thread yesterday about how a dude found out
| that it isn't: https://briananglin.me/posts/spending-5k-to-
| learn-how-databa... (also mentioned in the RbtB post)
| rustc wrote:
| Depends on the provider. I've checked the pricing of 4 top
| relational database cloud vendors (Google, Amazon, Azure,
| IBM) and they all charge for the number of CPU cores and
| RAM you buy, not by the number of rows scanned like
| PlanetScale did in the post you refer to. They'll be more
| expensive than buying your own server but not nearly what
| PlanetScale charges for full table scan queries.
| late2part wrote:
| When do the rest of the folks figure it out?
| kasey_junk wrote:
| She doesn't mention the write characteristics of the system but
| she implies that it was pretty write heavy.
|
| In that case it's not obvious to me that putting a key prefix
| index on every column is the correct thing to do, because that
| will get toilsome very quick in high write loads.
|
| Given that she herself wrote the before and after systems 20
| years ago and that the story was more about everyone having
| dumb mistakes when they are inexperienced perhaps we should
| assume the best about her second design?
| ccleve wrote:
| It's not an index on every column. It's a single index across
| all columns.
| noisy_boy wrote:
| Which will be updated every time there is a new
| combination.
| Animats wrote:
| Which is OK. It's not like the whole index has to be
| rebuilt. It's cheaper than O(log N).
| [deleted]
| maddynator wrote:
| One thing that worth taking into consideration is that this
| happened in 2002. When the databases were not in cloud, the ops
| was done by dba's and key prefix compression thats omnipresent
| today was likely not that common or potentially not even
| implemented/available.
|
| But i don't think the point of the post is whats right/wrong
| way of doing it. The point as mentioned by few here is that
| programmers makes mistakes. They are costly and will be costly
| if in tech industry, we continue to boot experienced
| engineers... the tacit knowledge those engineers have gained wi
| ll not be passed on and this means more people have to figure
| things out by themselves
| Philip-J-Fry wrote:
| >One thing that worth taking into consideration is that this
| happened in 2002. When the databases were not in cloud, the
| ops was done by dba's
|
| Are you implying that isn't the case today? Thousands of
| (big) companies are still like that and will continue to be
| like that.
|
| I write my own SQL, design tables and stuff, submit it for a
| review by someone 10x more qualified than myself, and at the
| end of the day I'll get a message back from a DBA saying "do
| this, it's better".
| mixmastamyk wrote:
| The question of it being in the cloud or not is highly
| orthogonal to schema design.
| WJW wrote:
| If you have one, more power to you. I'm currently making a
| good living as a freelance DBA/SRE for startups. With a
| team of 3-5 devs, some of which frontend and/or mobile
| devs, proper database knowledge is definitely thin on the
| ground in some places.
| hinkley wrote:
| In 2002, you started seeing major increases in query run time
| with as little as 5 joins. Once you hit six you had to start
| thinking about rearchitecting your data or living with slow
| response times.
|
| There was a lot of pressure to relax 3NF as being too
| academic and not practical.
|
| Around then, I had a friend who was using a pattern of
| varchar primary keys so that queries that just needed the
| (unique) name and not the metadata could skip the join. We
| all acted like he was engaging in the Dark Arts.
| adfgaertyqer wrote:
| You don't really need compression. Rows only need to persist
| for about an hour. The table can't be more than a few MiB.
|
| We can debate the Correct Implementation all day long. The
| fact of the matter is that adding _any_ index to the original
| table, even the wrong index, would lead to a massive speedup.
| We can debate 2x or 5x speedups from compression or from
| choosing a different schema or a different index, but we get
| 10,000x from adding any index at all.
| ipaddr wrote:
| Just to make this fun.
|
| Adding an index now increases the insert operation
| cost/time and adds additional storage.
|
| If insert speed/volume is more important than reads keep
| the indexes away. Replicate and create an index on that
| copy.
| dudeinjapan wrote:
| I agree. In the "new" FK-based approach, you'll still need to
| scan indexes to match the IP and email addresses (now in their
| own tables) to find the FKs, then do one more scan to match the
| FK values. I would think this would be significantly slower
| than a single compound index scan, assuming index scans are
| O(log(n))
|
| Key thing is to use EXPLAIN and benchmark whatever you do. Then
| the right path will reveal itself...
| msqlqthrowaway wrote:
| > all major databases do these days
|
| Did everyone on HN miss that the database in question was
| whichever version of MySQL existed in 2002?
| mst wrote:
| Which I was using in production at the time, and, yeah, what
| she ended up doing was a much better plan for that thing.
| jeremyjh wrote:
| It actually doesn't matter. Even simple indexes would have
| made that schema work just fine, and MySQL could certainly
| index strings in 2002.
| omegalulw wrote:
| > Sorry, no. The original schema was correct, and the new one
| is a mistake.
|
| Well, you also save a space by doing this (though presumably
| you only need to index the emails as IPs are already 128 bits).
|
| But other than that, I'm also not sure why the original schema
| was bad.
|
| If you were to build individual indexes on all four rows, you
| would essentially build the four id tables implicitly.
|
| You can calculate the intersection of hits on all four indexs
| that match your query to get your result. This is linear in the
| number of hits across all four indexes in the worst case but if
| you are careful about which index you look at first, you will
| probably be a lot more efficient, e.g. (from, ip, to, helo).
|
| Even with a multi index on the new schema, how do you search
| faster than this?
| [deleted]
| an9n wrote:
| Wow what a twist. Somebody call M Night Shyamalan!
| welder wrote:
| The ending is the most important part.
|
| > Now, what do you suppose happened to that clueless programmer
| who didn't know anything about foreign key relationships?
|
| > Well, that's easy. She just wrote this post for you. That's
| right, I was that clueless newbie who came up with a completely
| ridiculous abuse of a SQL database that was slow, bloated, and
| obviously wrong at a glance to anyone who had a clue.
|
| > My point is: EVERYONE goes through this, particularly if
| operating in a vacuum with no mentorship, guidance, or reference
| points. Considering that we as an industry tend to chase off
| anyone who makes it to the age of 35, is it any surprise that we
| have a giant flock of people roaming around trying anything
| that'll work?
| deathanatos wrote:
| ... I appreciate her sentiment at the end there that we should
| try to change the status quo (which I think does suck), but I'm
| not sure how much power the average employee has over it. Most
| employers I've worked at seem loathe to retain anyone past
| about 2 years. (E.g., currently I'm in the 95'th percentile,
| after just over 2 years.) IME it takes about 6 months to really
| learn how a company's systems work to where to proficiency
| (yes, _six months_.) which means we 're spending ~25% of the
| time "training", where "training" is usually someone trying
| something crazy and failing, and getting corrected as opposed
| to some structured form of learning. Oftentimes the systems
| that these engineers are stumbling I think are the internal
| ones; they need more features, or refactoring to account for
| organic growth, but since they're not shiny new customer facing
| features they'll get exactly 0 priority from PMs. The engineers
| who build experience working with these systems and actually
| know what needs to change and how without breaking the existing
| use cases ... are about to leave the company, since nobody is
| retained beyond ~2 years.
|
| I also find my own team is usually strapped for resources,
| normally, people. (Usually politely phrased as "time".) Yes,
| one has to be wary of mythical man-month'ing it, but like my at
| my last employ we had essentially 2 of us on a project that
| could have easily used at least one, if not two more people.
| Repeat across every project and that employer was understaffed
| by 50-100%, essentially.
|
| Some company just went for S-1, and they were bleeding cash.
| But they _weren 't_ bleeding it into new ventures: they were
| bleeding it into marketing. Sure, that might win you one or two
| customers, but I think you'd make much stronger gains with new
| products, or less buggy products that don't drive the existing
| customers away.
|
| Also there's an obsession with "NIT" -- not invented there --
| that really ties my hands as an engineer. Like, everything has
| to be out sourced to some cloud vendor provider whose product
| only fits some of the needs and barely, and whose "support"
| department appears to be unaware of what a computer is. I'm a
| SWE, let me do my thing, once in a while? (Yes, where there's a
| good fit for an external product, yes, by all means. But these
| days my job is 100% support tickets, and like 3% actual
| engineering.)
| 5faulker wrote:
| Seems like the definition of hacker to me.
| shrimpx wrote:
| > tend to chase off anyone who makes it to the age of 35
|
| That's definitely not true anymore, if it ever was.
| Senior/staff level engs with 20+ years of experience are sought
| after and paid a ton of money.
| jjice wrote:
| The way she kept referring to the programmer made me suspicious
| that that would be the ending. I personally really like that as
| a way of conveying the message of us all having growing pains
| and learning the hard way from some inefficient code.
|
| I know I've also had these encounters (specifically in DBs),
| and I'm sure there are plenty more to come.
| carapace wrote:
| (It seems a lot of folks are getting nerd-sniped by the set-up
| and missing the moral of the story, eh?)
|
| I don't know the actual numbers, but it's been pointed out that
| at any given time something like half of all programmers have
| been doing it less than five years, for decades now.
|
| That, plus the strident ignorance of past art and practice,
| seem to me to bring on a lot of issues.
| lamontcg wrote:
| > It seems a lot of folks are getting nerd-sniped
|
| that is an excellent term.
| jacobolus wrote:
| Origin: https://xkcd.com/356/
| wwweston wrote:
| Old enough some people under 35 may not have encountered
| it... ;)
| sseagull wrote:
| Well, then, they are part of the 10,000
|
| https://xkcd.com/1053/
| misnome wrote:
| Only if they were born in the US
| lamontcg wrote:
| I'm 49, I don't know how I never saw that xkcd.
| egeozcan wrote:
| I feel like that phrase has been there forever, and I'm
| exactly 35. Will this happen more often as I grow older?
| Ugh. Feels weird. Maybe also a bit depressing.
| lamontcg wrote:
| It was weird seeing all the spongebob squarepants memes
| take over the internet when I was too old to ever grow up
| with that. I turned 28 in 1999 when that first aired.
| That was my "holy shit I'm so old" moment when that
| finally trickled up into my awareness as someone nearly
| turning 40 or so.
| yesenadam wrote:
| Same age as you. I have a lot of friends in the
| Philippines. I swear the Facebook employment info of half
| the people in the Philippines says they work at the
| Krusty Krab. (And for most filipinos, the internet =
| facebook.) For some reason I never asked what that meant,
| and for many years I thought that was just some odd joke,
| and was vaguely puzzled about how widespread it is.
| Eventually I happened on the Spongebob connection!
| beering wrote:
| > (It seems a lot of folks are getting nerd-sniped by the
| set-up and missing the moral of the story, eh?)
|
| The narrative should lead to the conclusion. If I told you
| the story of the tortoise and the hare in which the hare gets
| shot by a hunter, then said the moral is "perseverance wins",
| you'd be rightfully confused.
| [deleted]
| twic wrote:
| I don't think it's purely nerd-sniping. If your story is "at
| first you are bad, but then you get good", but your example
| is of a case where you did something fine but then replaced
| it with something worse, that rather undermines the story.
| mwcampbell wrote:
| If even someone with Rachel's level of experience still
| doesn't know all the minutiae of database optimization, I
| think that just amplifies her point about the importance of
| mentoring novices.
| duxup wrote:
| I recall reading a rant on another site about someone so
| upset they had to deal with clueless noobs at work.
|
| They then went on to list the errors that this brand new
| entry level employee had made when writing ... an
| authentication system ...
|
| I was more than a little shocked when I realized they
| were serious and hadn't realized the issue was sending
| the new entry level guy to do that job alone.
| HahaReally wrote:
| OR we could demand they get expensive irrelevant degrees
| and be geniuses at coding by the time they graduate with
| no relevant skills! Hey, in fact, let's just do that
| instead. :)
| adolph wrote:
| Chronology alone only progresses; progress sometimes
| regresses.
| kofejnik wrote:
| IMO she's still pretty clueless (sorry!), but over the years
| got better at self-promotion
|
| So the real lesson is to be very careful as to who you listen
| to
| crispyambulance wrote:
| She has never been "clueless" and has written insightfully
| for years.
| globular-toast wrote:
| Well she's clueless about what actually sped up this
| database query.
| Lammy wrote:
| imo "clueful"/clueless is more about being able to sense
| situations where there is something to be known rather
| than knowing everything upfront all the time (which is
| obviously impossible)
| yuliyp wrote:
| Without knowing which database engine (MySQL comes with
| more than one) she was using, nor the testing that she
| actually performed, what makes you say that?
| mst wrote:
| I remember versions of mysql whose string indexing was
| sufficiently limited on every available backend that this
| would've been the correct change.
|
| Hell, back in ... 2001? I think? I ended up making
| exactly the same set of changes to an early database
| design of mine, for pretty much the same stated reasons
| and with very similarly pleasant effects on its
| performance.
| mwcampbell wrote:
| The point isn't to determine who is or isn't clueless. The
| point is how we deal with each other's cluelessness.
| crispyambulance wrote:
| Yep. Folks are getting lost in the weeds discussing indexing of
| database tables. That's _totally_ beside the point here.
|
| The thing is, the first implementation was a perfectly fine
| "straight line" approach to solve the problem at hand. One
| table, a few columns, computers are pretty fast at searching
| for stuff... why not? In many scenarios, one would never see a
| problem with that schema.
|
| Unfortunately, "operating in a vacuum with no mentorship,
| guidance, or reference points", is normal for many folks. She's
| talking about the trenches of SV, it's even worse outside of
| that where the only help you might get is smug smackdowns on
| stackoverflow (or worse, the DBA stackexchange) for daring to
| ask about such a "basic problem".
| otterley wrote:
| I'd be a lot more sympathetic if the major RDMSes didn't have
| outstanding and thorough reference manuals or that there
| weren't a mountain of books on the subject that cover, among
| other things, the topic of indexing and its importance.
| MySQL's manual, for example, has covered this subject from
| the very beginning:
| https://dev.mysql.com/doc/refman/5.6/en/mysql-indexes.html (I
| don't have the 3.x manuals handy but it was there back then
| too).
|
| It's not clear from the article whether the author spent any
| time studying the problem before implementing it. If she
| failed to do so, it is both problematic and way more common
| than it ought to be.
|
| "Ready, fire, aim"
|
| But you know what they say: good judgment comes from
| experiences and experience comes from bad judgment.
|
| Compounding the problem here is that the author now has much
| more experience and in a reflective blog post, _still_ got
| the wrong answer.
|
| IMO the better lesson to take away here would have been to
| take the time getting to know the technology before putting
| it into production instead of jumping head first into it.
| That would be bar raising advice. The current advice doesn't
| advise caution; instead it perpetuates the status quo and
| gives "feel good" advice.
| jraph wrote:
| You can't really search a problem if you don't suspect its
| existence. At this point you might feel you have everything
| you need to start the implementation, how do you guess?
| dasil003 wrote:
| I couldn't disagree more with this comment. No amount of
| reading documentation teaches you how to build production
| systems. You progress much faster by getting your hands
| dirty, making mistakes, and learning from them.
|
| The challenge of writing good software is not about knowing
| and focusing on the perfection every gory detail, it's
| about developing the judgement to focus on the details that
| actually matter. Junior engineers who follow your advice
| will be scared to make mistakes and their development will
| languish compared to those who dive in and learn from their
| mistakes. As one gains experience it's easy to become
| arrogant and dismissive of mistakes that junior engineers
| make, but this can be extremely poisonous to their
| development.
| isoskeles wrote:
| I couldn't disagree more with this comment. No amount of
| getting your hands dirty and making mistakes teaches you
| how to learn from your mistakes, nor changes the impact
| of the mistakes. You also progress much faster by
| learning from other peoples' mistakes, this is why
| written language is so powerful.
|
| Honestly, I think your comment is okay minus the fact
| that you're trying to highlight such hard disagreement
| with a sentiment that people should read the fucking
| manual. Really, you couldn't disagree MORE?
|
| There is definitely value in RTFM. There are cases where
| making mistakes in production is not acceptable, and
| progressing by "making mistakes" is a mistake on its own.
| I don't think the case in this article sounds like one of
| those, but they do exist (e.g. financial systems (think
| about people losing money on crypto exchanges),
| healthcare, or, say, sending rockets into space). In many
| cases, making mistakes is fine in test systems, but
| completely, absolutely, catastrophically unacceptable in
| production. Although, I refuse to blame the junior
| engineer for such mistakes, I blame management that "sent
| a boy to do a man's job" (apologies for a dated idiom
| here).
|
| (As an aside, overall, I disagree with many of the
| comments here nitpicking the way the author solved the
| problem, calling her "clueless", etc. I really don't care
| about that level of detail, and while I agree the
| solution does not seem ideal, it worked for them better
| than the previous solution.)
| crispyambulance wrote:
| > No amount of getting your hands dirty and making
| mistakes teaches you how to learn from your mistakes...
|
| Categorically wrong.
|
| Mistakes, failure, and trial and error are very much a
| part of developing skills. If you're not making mistakes,
| you're also not taking enough risks and thus missing out
| on opportunities for growth.
| isoskeles wrote:
| Yes, mistakes are required to learn from them, but making
| mistakes does not imply that a person will learn from
| their mistakes.
| otterley wrote:
| Respectfully, I believe you are disagreeing with an
| argument that I am not making. It's not an either-or
| scenario. Both access to and reliance on reference
| materials _and_ the ability to safely experiment are part
| of the professional engineer 's toolbox. I can say this
| with confidence because I was a junior engineer once -
| it's not like I'm speaking without experience.
|
| _Everyone_ - new and experienced developers alike -
| should have a healthy fear of causing pain to customers.
| Serving customers - indirectly or otherwise - is what
| makes businesses run. As a principal engineer today my
| number one focus is on the customer experience, and I try
| to instill this same focus in every mentee I have.
|
| Does that mean that junior developers should live in
| constant fear and decision paralysis? Of course not.
|
| That's where the mentor - the more experienced and
| seasoned engineer - comes in. The mentor is roughly
| analogous to the teacher. And the teaching process uses a
| _combination_ of a mentor, reference materials, and the
| lab process. The reference materials provide the details;
| the lab process provides a safe environment in which to
| learn; and the mentor provides the boundaries to prevent
| the experiments from causing damage, feedback to the
| mentee, and wisdom to fill in the gaps between them all.
| If any of these are absent, the new learner 's
| development will suffer.
|
| Outstanding educational systems were built over the last
| 3 centuries in the Western tradition using this
| technique, and other societies not steeped in this
| tradition have sent their children to us (especially in
| the last half-century) to do better than they ever could
| at home. It is a testament to the quality of the
| approach.
|
| So no, I'm not advocating that junior developers should
| do nothing until they have read all the books front to
| back. They'd never gain any experience at all or make
| essential mistakes they could learn from if they did
| that. But junior developers should not work in a vacuum -
| more senior developers who both provide guidance _and_
| refer them to reference materials to study and try again,
| in my experience, lead to stronger, more capable
| engineers in the long term. "Learning to learn" and
| "respect what came before" are skills as important as the
| engineering process itself.
| crispyambulance wrote:
| > That's where the mentor [...] comes in.
|
| The thing is, specifically, that the OP did NOT have a
| mentor. They had a serious problem to solve, pronto, and
| knew enough to take a crack at it. OK, the initial
| implementation was suboptimal. So what? It's totally
| normal, learn and try again. Repeat.
|
| It would be nice if every workplace had a orderly
| hierarchy of talent where everyone takes care to nurture
| and support everyone else (especially new folks). And
| where it's OK to ask questions and receive guidance even
| across organizational silos, where there are guardrails
| to mitigate accidents and terrible mistakes. If you work
| in such an environment, you are lucky.
|
| It is far more common to have sharp-elbow/blamestorm
| workplaces which pretend to value "accountability" but
| then don't lift a finger to support anyone who takes
| initiative. I suspect that at the time Rachelbythebay
| worked in exactly this kind of environment, where it
| simply wasn't possible for an OPS person to go see the
| resident database expert and ask for advice.
| otterley wrote:
| Right. Hence my concern about the lack of helpful advice
| other than "accept yourself"[1]: Neophyte engineers
| lacking guardrails and local mentors should fall back to
| what they learned in school: experiment, use your books,
| and ask questions from the community. Mentors can be
| found outside the immediate workplace, after all.
|
| And when it comes to taking production risks, measure
| twice and cut once, just like a good carpenter.
|
| [1] Of course you should accept yourself. But that alone
| won't advance the profession or one's career.
| noduerme wrote:
| But it's also about constantly asking yourself "how can
| this be done better? What do I not know that could
| improve it?"
|
| I learned probably 80% of what I know about DB
| optimization from reading the Percona blog and S.O. The
| other 20% came from splashing around and making mistakes,
| and testing tons of different things with EXPLAIN. That's
| basically learning in a vacuum.
| [deleted]
| damagednoob wrote:
| Yes and no and this post highlights a subtle issue with
| mentorship (which I think is important): Technology does not
| stand still. What was true in 2002, might not be true today.
| While adopting the naive approach was detrimental back then,
| today databases recognise that this happens and provide easy
| workarounds to get you out of trouble that didn't exist back
| then.
|
| I've experienced this just by switching languages. C# had
| many articles dedicated to how much better StringBuilder was
| compared to String.Concat and yet, other languages would do
| the right thing by default. I would give advice in a totally
| different language about a problem that the target language
| did not have.
|
| As the song goes:
|
| "Be careful whose advice you buy but be patient with those
| who supply it
|
| Advice is a form of nostalgia, dispensing it is a way of
| fishing the past
|
| From the disposal, wiping it off, painting over the ugly
| parts
|
| And recycling it for more than it's worth"
| nightski wrote:
| That might be true if you just take blanket advice. The key
| is to find out _why_ say StringBuilder is better than
| String.Concat. If you understand the implementation details
| and tradeoffs involved, this makes the knowledge much more
| applicable in the future. The core concepts in technology
| do not move nearly as fast as individual projects,
| libraries, frameworks, languages, etc...
| unclebucknasty wrote:
| What's funny is there seems to be a lot of debate among
| grizzled veterans here about whether her old solution is
| actually better than the new. We're getting into index types,
| engine implementations, normal forms, etc. And really, this is
| a relatively simple development problem that I can easily
| imagine on an interview test.
|
| Now, imagine a new engineer, just getting started trying to
| make sense of it all, yet being met with harsh criticism and
| impatience.
|
| Maybe that was exactly her point--to get everyone debating over
| such a relatively trivial problem to communicate the messages:
| Good engineering is hard. Everyone is learning. Opinions can
| vary. Show some humility and grace.
| jraph wrote:
| I was reading her article like "yeah, well... wow, she's tough
| / hard with this programmer, it's not that unreasonable
| anyway!", until this passage.
|
| I would not expect her to be like this from what I've read on
| her blog so I was surprised. Well written!
| IceDane wrote:
| The moral of this post really falls flat coming from this author,
| most of whose posts are snarky, passive-aggressive rants where
| she rants about someone's alleged incompetence or something
| similar.
| enraged_camel wrote:
| This is a really weird ad hominem that is unrelated to the post
| itself.
| IceDane wrote:
| Ad hominem?
|
| It seems pretty relevant to me that the author mostly writes
| snarky articles which directly contradict the moral of this
| story.
|
| For the record, I totally agree with the moral of the story.
| But we can't just blindly ignore context, and in this case
| the context is that the author regularly writes articles
| where someone else is ridiculed because the author has deemed
| them incompetent.
|
| In a similar vein, no one would just ignore it and praise
| Zuckerberg if he started discussing the importance of
| personal privacy on his personal blog.
| avl999 wrote:
| The responses in this thread are classic pedantic HN with the
| actual point whoooshing over so many people's head.
| jeremyjh wrote:
| I think it actually makes quite a bit of a difference if 20
| years later the author still does not understand the actual
| problem and solution she is using as an example. It actually
| does undermine her argument that people make mistakes, learn
| from them and grow etc if the lack of growth is still in
| evidence 20 years later in the singular example under
| consideration.
| bizzarojerry wrote:
| No I'd say the responses in this thread are classic pedantic HN
| with users like you taking the high moral ground and looking
| down on people who actually are trying to address the technical
| issues with a post in a technical forum.
| dang wrote:
| We've banned this account for breaking the site guidelines.
| Creating accounts to do that with will eventually get your
| main account banned as well, so please don't!
|
| https://news.ycombinator.com/newsguidelines.html
| [deleted]
| [deleted]
| darkwater wrote:
| I'd really love to be snarky here but I'll try to be polite: all
| those comments about the example situation are missing the whole
| point of the post. And it really worries me that there is a good
| chunk of the tech workers that just ignores the real meaning of
| something and just nitpick about stupid implementation details.
| The post is about managing rookie errors, being empathetic and
| also warn the ageism that pervades the sector. TBH about this
| last point IDK the situation nowadays in Silicon Valley, but in
| Europe my limited experience is that ageism is not that bad; kt's
| actually difficult to find seasoned developers.
|
| Edit: typos
| Lewton wrote:
| I'm happy the hn comments are nitpicking, I'm not particularly
| well versed in database schemas but while reading it I was
| going "???????" and it's good to know I'm not going crazy
| 08-15 wrote:
| In this blog post, senior engineer Rachel talks nonsense about
| normalization and promotes a bafflingly complicated solution as
| superior to a simple one, without identifying the actual fix,
| which appears to have happened accidentally.
|
| In other words, with enough empathy and patience, a clueless
| rookie can grow into a clueless senior engineer!
|
| Rachel usually makes more sense than that. That's why people
| are nitpicking implementation details.
| mixmastamyk wrote:
| Hah, pretty accurate and kinda funny, but could be nicer. I
| still make mistakes.
| 08-15 wrote:
| Meanwhile, Rachel posted an update. Apparently, the whole
| section about how "the system got reworked" doesn't
| describe the proper solution, but her own (Rookie Rachel's)
| attempt to fix it.
|
| And in _that_ context, everything makes sense.
| hn_throwaway_99 wrote:
| The reason "the details are important" here are _not_ because
| of the nitty gritty around what mistakes a "novice" programmer
| made.
|
| They are important because the _present_ incarnation of the
| author is making _all the wrong_ diagnoses about the problems
| with the original implementation, _despite_ doing it with an
| air of "Yes, younger me was so naive and inexperienced, and
| present me is savvy and wise".
| newaccount74 wrote:
| I think she inadvertently made a different point, which is
| that even experienced developers sometimes misunderstand the
| problem and make mistakes.
|
| Or an even better argument: you don't need to actually
| understand the problem to fix it, often you accidentally fix
| the problem just by using a different approach.
| mst wrote:
| I would argue instead that this comment thread is making
| the point that people forget that things that work now
| wouldn't've worked then and design decisions have to be
| made based on the database engine you're running in
| production.
| jeremyjh wrote:
| MySQL could index strings in 2002. It would have worked
| fine.
| mst wrote:
| "mysql could index strings" and "using a compound index
| over four varchar columns would've worked out well" are
| significantly different propositions.
| jeremyjh wrote:
| Yes, we won't know because she didn't try it. And we know
| she didn't try it because the problem she described is
| table scans, not "indexed strings are somehow slow in
| MySQL circa 2002".
| mst wrote:
| Or she did try it and it didn't work - or researched the
| question and figured, quite reasonably based on my
| experience of mysql in that era, it probably wouldn't
| work - and kept the post to only the original problem and
| the final approach to avoid having even _more_ technical
| details in a post that wasn 't really about those.
|
| I agree that we don't know, but it seems a little unfair
| to her to treat every unknown as definitely being the
| most negative of the possibilities.
| hn_throwaway_99 wrote:
| I really don't understand your general thrust here. MySql
| certainly had lots of issues in 2003, but being able to
| support multi-column indexes was not one of them. Her
| analysis is simply wrong - it is wrong now and was wrong
| then. Here is the doc from MySql v3.23:
|
| 7.4.2 Multiple-Column Indexes MySQL can create composite
| indexes (that is, indexes on multiple columns). An index
| may consist of up to 16 columns. For certain data types,
| you can index a prefix of the column (see Section 7.4.1,
| "Column Indexes"). A multiple-column index can be
| considered a sorted array containing values that are
| created by concatenating the values of the indexed
| columns. MySQL uses multiple-column indexes in such a way
| that queries are fast when you specify a known quantity
| for the first column of the index in a WHERE clause, even
| if you do not specify values for the other columns
| mst wrote:
| As I said already: "mysql could index strings" and "using
| a compound index over four varchar columns would've
| worked out well" are significantly different
| propositions.
|
| To be more verbose about it - there is an important
| difference between "can be created" and "will perform
| sufficiently well on whatever (likely scavenged) hardware
| was assigned to the internal IT system in question."
|
| I wouldn't be surprised if the "server" for this system
| was something like a repurposed Pentium 233 desktop with
| a cheap spinning rust IDE drive in it, and depending on
| just how badly the spammers were kicking the shit out of
| the mail system in question that's going to be a fun
| time.
| lern_too_spel wrote:
| They are not significantly different. It just means
| sorting using more than one column.
| darkwater wrote:
| She explained the problem, the first not working solution and
| the second working solution as they really happened in 2002
| _as an example_. The real point is the last part of the post.
| And it was _not_ "how to properly implement an sql based
| filter for open proxies in your MTA".
| hn_throwaway_99 wrote:
| I get all that. But 2020 version of this person _still does
| not understand the problem_ , and she is arguing that she
| does while oddly self-deprecating the inexperienced version
| of herself, who arguably had a better solution to begin
| with.
| mixmastamyk wrote:
| Yes, I think an alternate design was found that didn't
| hit as many MySQL limitations as the previous one. This
| improved performance and was a win. But the post-mortem
| diagnosis was lacking.
| shawnz wrote:
| Just because the author was intending to get across a certain
| point, doesn't mean the implementation details aren't worth
| discussing too. I don't think many people here would disagree
| with the central point, so what's there to discuss about it? I
| think it's uncharitable to assume that everyone talking about
| the implementation details is "missing the point".
| rmk wrote:
| I think you are leaping to conclusions here. It is possible
| that most people want to keep HN focused on technical
| conversations and use this as an opportunity to learn something
| (I certainly do), which is why you are seeing many more
| comments on the technical aspects.
| [deleted]
| Cpoll wrote:
| I don't buy it. If the only point was managing rookie errors,
| etc., the blog post shouldn't have been 15~ paragraphs of
| technical discussion and 2 paragraphs at the bottom of "my
| point is ..." You can't advocate that people just ignore 80% of
| the article because they're "not the point of the post."
|
| I'm sure a good chunk of tech workers are worried that when an
| influential blogger writes something like this a couple hundred
| developers will Google 3NF and start opening PRs.
| [deleted]
| unethical_ban wrote:
| It's a narrative. A story. She told it in that way to lure
| people who get hooked on technical discussion, and then make
| the point that they probably would have missed or ignored.
| Without a narrative example, she could have just said "I made
| mistakes too, live and learn" but she chose to provide an
| example before presenting the thesis.
|
| And hey, what's so bad about people learning about 3NF? Are
| you not supposed to know what that is until you're some
| mythical ninth level DBA?
| globular-toast wrote:
| The real meaning being that people learn and gain experience
| over time? Is this really something we need to read two pages
| of text to find out? I think people are justifiably miffed at
| having read two pages for such a trivial message. Not only
| that, but these "stupid implementation details" could seriously
| mislead someone who really is a clueless newbie.
| [deleted]
| hn_throwaway_99 wrote:
| This post is bizarre, precisely because there _is nothing
| particularly wrong_ about the original schema, and the author
| seems to believe that the problem is that the column values were
| stored as strings, or that the schema wasn 't in "third normal
| form".
|
| Which is nonsense. The problem with the original DB design is
| that the appropriate columns weren't indexed. I don't know enough
| about the problem space to really know if a more normalized DB
| structure would be warranted, but from her description I would
| say it wasn't.
| mewpmewp2 wrote:
| I think if anything, all of it could've been put into a single
| indexed column since the query was AND ... AND ... not OR.
|
| So you could've had a indexed column of "fingerprint", like
|
| ip1_blahblah_evil@spammer.somewhere_victim1@our.domain
|
| And indexed this, with only single WHERE in the query.
|
| I don't understand at all how multiple tables thing would help
| compared to indices, and the whole post seemed kind of crazy to
| me for that reason. In fact if I had to guess multiple tables
| would've performed worse.
|
| That is if I'm understanding the problem correctly at all.
| gavinray wrote:
| This has been the advice given to me by Postgres experts in a
| similar scenario: "If you want to efficiently
| fuzzy-search through a combination of firstname + lastname +
| (etc), it's faster to make a generated column which
| concatenates them and index the generated column and do a
| text search on that."
|
| (Doesn't have to be fuzzy-searching, but just a search in
| general, as there's a single column to scan per row rather
| than multiple)
|
| But also yeah I think just a compound UNIQUE constraint on
| the original columns would have worked
|
| I'm pretty sure that the degree of normalization given in the
| end goes also well beyond 3rd-Normal-Form.
|
| I think that is 5th Normal Form/6th Normal Form or so, almost
| as extreme as you can get:
|
| https://en.wikipedia.org/wiki/Database_normalization#Satisfy.
| ..
|
| The way I remember 3rd Normal Form is _" Every table can
| stand alone as it's own coherent entity/has no cross-cutting
| concerns"_.
|
| So if you have a "product" record, then your table might have
| "product.name", "product.price", "product.description", etc.
|
| The end schema shown could be described as a "Star Schema"
| too, I believe (see image on right):
|
| https://en.wikipedia.org/wiki/Star_schema#Example
|
| This Rachel person is also much smarter than I am, and you
| can make just about anything work, so we're all bikeshedding
| anyways!
| tkot wrote:
| Would it not prevent some optimizations based on statistics
| regarding the data distribution, like using the most
| selective attribute to narrow down the rows that need to be
| scanned? I'm assuming there are 2 indexes, 1 for each
| column that gets combined.
|
| Let's say you know the lastname (Smith) but only the first
| letter of the firstname (A) - in the proposed scenario only
| the first letter of the firstname helps you narrow down the
| search to records starting with the letter (WHERE combined
| LIKE "A%SMITH"), you will have to check all the rows where
| firstname starts with "A" even if their lastname is not
| Smith. If there are two separate indexed columns the WHERE
| clause will look like this:
|
| WHERE firstname like "A%" AND lastname = "Smith"
|
| so the search can be restricted to smaller number of rows.
|
| Of course having 2 indexes will have its costs like
| increased storage and slower writes.
|
| Overall the blog post conveys a useful message but the
| table example is a bit confusing, it doesn't look like
| there are any functional dependencies in the original table
| so pointing to normalization as the solution sounds a bit
| odd.
|
| Given that the query from the post is always about concrete
| values (as opposed to ranges) it sounds like the right way
| to index the table is to use hash index (which might not
| have been available back in 2002 in MySql).
| gavinray wrote:
| I won't pretend to be an expert in this realm, but see
| here:
|
| https://www.postgresql.org/docs/current/textsearch-
| tables.ht...
|
| Specifically, the part starting at the below paragraph,
| the explanation for which continues to the bottom of the
| page: "Another approach is to create a
| separate tsvector column to hold the output of
| to_tsvector. To keep this column automatically up to date
| with its source data, use a stored generated column. This
| example is a concatenation of title and body, using
| coalesce to ensure that one field will still be indexed
| when the other is NULL:"
|
| I have been told by PG wizards this same generated,
| concatenated single-column approach with an index on each
| individual column, plus the concatenated column, is the
| most effective way for things like ILIKE search as well.
|
| But I couldn't explain to you why
| kelnos wrote:
| Right, and when you do that, you don't even need a RDBMS. An
| key-value store would suffice. This essentially just becomes
| a set! Redis or memcached are battle-tested workhorses that
| would work even better than a relational DB here.
|
| But this was also back in the early '00s, when "data store"
| meant "relational DB", and anything that wasn't a RDBMS was
| probably either a research project or a toy that most people
| wouldn't be comfortable using in production.
| dehrmann wrote:
| > this was also back in the early '00s, when "data store"
| meant "relational DB", and anything that wasn't a RDBMS was
| probably either a research project or a toy that most
| people wouldn't be comfortable using in production.
|
| Indeed; her problem looks to have been pre-memcached.
| ryandrake wrote:
| Also, the "better" solution assumes IPv4 addresses and is not
| robust to a sudden requirements change to support IPv6. Best
| to keep IP address as a string unless you really, REALLY need
| to do something numerical with one or more of the octets.
| johannes1234321 wrote:
| Keeping IP addresses as string isn't trivial. You can write
| those in many ways. Even with IPv4. IPv6 just brings new
| variations. And when talking about migration to IPv6 you
| have to decide if you keep IPv4 Addresses as such or prefix
| with ::ffff:.
|
| In the end you have to normalize anyways. Some databases
| have specific types. If not you have to pick a scheme and
| then ideally verify.
| pilif wrote:
| Based on my experience I disagree. If there is a native
| data type that represents your data, you should really use
| it.
|
| It will make sure that incorrect data is detected at the
| time of storage rather than at some indeterminate time in
| the future and it will likely be more efficient than
| arbitrary strings.
|
| And in case of IP addresses, when you are using Postgres,
| it comes with an inet type that covers both ipv4 and ipv6,
| so you will be save from requirement changes
| dehrmann wrote:
| Yeah, I'm not clear on how multiple tables fixed it other
| than allowing you to scan through the main table faster.
|
| Multiple tables could be a big win if long email addresses
| are causing you a data size problem, but for this use case, I
| think a hash of the email would suffice.
| jeremyjh wrote:
| Well she did say she had indexes on the new table. It could
| have been fixed with an index on the previous table, but a
| new table with indexes also fixed it.
| riquito wrote:
| That way you can get false positives unless you concatenate
| using a non-valid character in the protocol
|
| foo@example.com other@example.net foo@example.co
| mother@example.net
|
| Btw, storing just the domains, or inverting the email
| strings, would have speed up the comparison
| erikerikson wrote:
| Clearly it worked.
|
| However...
|
| Consider the rows `a b c d e` and `f b h i j`. How many times
| will `b` be stored in the two formulations? 2 and 1, right? The
| data volume has a cost.
|
| Consider the number of cycles to compare a string versus an
| integer. A string is, of course, a sequence of numbers. Given
| the alphabet has a small set of symbols you will necessarily
| have a character repeated as soon as you store more values than
| there are characters. Therefore the database would have to
| perform at least a second comparison. I imagine you're
| correctly objecting that in this case the string comparison
| happens either way but consider how this combines with the
| first point. Is the unique table on which the string
| comparisons are made smaller? Do you expect that the table with
| combinations of values will be larger than the table storing
| comparable values? Through this, does the combinations table
| using integer comparisons represent a larger proportion of the
| comparison workload? Clearly she found that to be the case. I
| would expect it to be the case as well.
| globular-toast wrote:
| > Clearly it worked.
|
| No it didn't. Someone else put an index on the main table and
| _that_ worked.
| erikerikson wrote:
| The article declared that it was running in production and
| satisfying the requirement. Do you have a different
| definition of working? I say that to clarify what I was
| attempting to communicate by "it worked".
|
| I think you're remarking about the relationship between
| normalized tables and indexes. That relationship does exist
| in some databases.
|
| Please say more if I'm missing your point.
| lern_too_spel wrote:
| GP's point is that the final schema has a primary key
| index on the main table that solved the problem, and then
| it has a bunch of useless additional tables. The solution
| was just to add a primary key index on the main table.
| Adding the additional tables just slows down the query
| because it now has to do five index lookups instead of
| one.
| noctune wrote:
| I especially feel the proposed solution could be problematic
| given that the data is supposed to be transient. How are keys
| deleted from the "id | value" tables when rows are deleted from
| the main table? Does it just keep accumulating rows or is there
| something like a GC query running occasionally that deletes all
| unreferenced values?
| bitexploder wrote:
| Yeah... technically that's (the solution) normalization, but
| really not a bad design originally. If you never want a
| particular atom of data repeated, yes you have to normalize
| each column, and that is efficient in the long run for storage
| size, but in more normal data storage that still means a lot of
| joins and you still want indices. On a modern SSD you could use
| that one table, throw on some indices as you suggest and not
| notice it again until you hit millions of rows.
|
| Anything you are doing conditional logic or joins on in SQL
| usually needs some sort of index.
| themgt wrote:
| Also you could just store IPv4 as an unsigned INT directly,
| rather than strings in a different table and an unsigned INT as
| a foreign key.
| tptacek wrote:
| I have regretted every single time I've gotten cute about
| storing IP addresses as scalars in SQL.
| ryukafalz wrote:
| What issues did you run into? Having admittedly never done
| this before, it feels like whenever you need the dotted-
| quad IP you could just query: SELECT
| INET_NTOA(ip) FROM ips WHERE...
|
| ...in MySQL at least. (Which might be why MySQL has this
| function built in?)
|
| I guess if you ever need to match on a specific prefix
| storing the numeric IP might be an issue?
| soheil wrote:
| Because as shocking as this might sound like unlike the
| characters in the movie Matrix I don't have a built-in
| INET_NTOA function in my retina to see the string form of
| the IP addresses when glancing at table rows of a large
| table looking for patterns or a certain IP.
| ryukafalz wrote:
| How often are you looking at the full contents of a table
| vs. looking at the results of a query that you could
| throw an INET_NTOA onto?
| soheil wrote:
| Many. You also expect me to remember how to spell
| INET_NTOA and on what fields to use it on. What if I
| wanted do a quick "SELECT * FROM"? I barely know how to
| spell English words what makes you think I'm going to
| remember how to spell INET_NTOA.
| mixmastamyk wrote:
| dbeaver CE is free.
| Lammy wrote:
| I realize you're describing a general problem and not
| just this particular example, but fwiw NTOA and friends
| got a lot easier for me once I learned enough that I
| stopped trying to "remember how to spell" them and
| started thinking about them in terms of the underlying
| concepts, like "Network (as in big-endianness) to ASCII".
| Learning the significance of the term "address family"
| was a big clue for me for self-directed learning in this
| space :)
| Ginden wrote:
| > I guess if you ever need to match on a specific prefix
| storing the numeric IP might be an issue?
|
| If you want to search for range, you can do:
| SELECT INET_NTOA(ip) FROM ips WHERE ip >
| INET_NTOA(:startOfRange) AND ip < INET_NTOA(:endOfRange)
| havkd wrote:
| Is this correct? Would indexing the columns instead of moving
| the values to another table lead to the same increase in
| performance?
| mst wrote:
| Given this is a "from the start of her career" story, I'm
| guessing she was running similar versions of mysql to the
| versions I started with, and if my guess is correct then
| probably not.
|
| On anything you're likely to be deploying today, just
| throwing a compound index at it is likely quite sufficient
| though.
| kofejnik wrote:
| It would be faster, I believe. As we're only looking for
| existence, a single index traversal is all io we would need
| to do
| taeric wrote:
| An index is often easily viewed as another table. So, should.
|
| (Some subtleties on projected values and such, but the point
| stands.)
| havkd wrote:
| Then this post is very misleading.
| taeric wrote:
| I wouldn't get too vehement on it. Knowing how to
| normalize isn't a bad thing. And, if you intricately know
| your write and read needs, you may be able to assist the
| work more appropriately.
| jameshart wrote:
| Yes, the proposed 'better' structure basically amounts to
| building your own indexes. Normally, I'd assume it is
| better to use the RDBMS's own engine to do that, don't roll
| your own.
|
| There may well be some subtlety to the indexing
| capabilities of MySQL I'm unaware of though - could easily
| imagine myself making rookie mistakes like assuming that it
| has same indexing capabilities. So, to the post's point -
| if I were working on a MySQL db I would probably benefit
| from an old hand's advice to warn me away from dangerous
| assumptions.
|
| On the other hand I also remember an extremely experienced
| MS SQL Server DBA giving me some terrible advice because
| what he had learned as a best practice on SQL Server 7
| turned out to be a great way to not get any of the benefits
| of a new feature in SQL Server 2005.
|
| Basically, we're all beginners half the time.
| rrrrrrrrrrrryan wrote:
| As a former SQL Server DBA, most SQL Server DBAs are
| absolute gurus on 1 or 2 specific versions of SQL server,
| and massively ignorant about anything newer.
|
| It's a strange role where, in order to do your job well,
| you kind of have to hyper-specialize in the specific
| version of the tech that your MegaCorp employer uses,
| which is usually a bit older, because upgrading databases
| can be extremely costly and difficult.
| loeg wrote:
| The same O(N) -> O(log N) improvement for queries, yes. The
| constant factor on the separate tables _might_ be better.
| It's also a more complicated design.
| adfgaertyqer wrote:
| Sure would! I think it would be marginally better, in fact,
| because you would just need to look at the index rather than
| five tables. Access would be more local.
| rustc wrote:
| Moving the data into another table would still require
| indexes: one on the original table's column (which now stores
| the new id) and one on the new table's primary key.
|
| In most cases I'd expect just adding an index to the original
| table to be more efficient, but it depends on the type of the
| original column and if some data could be de-duplicated by
| the normalization.
| ignoramous wrote:
| > _This post is bizarre_ ... _Which is nonsense_ _...but from
| her description I would say it wasn 't. _
|
| This _is_ what the post is about.
|
| And btw, if they hadn't normalized those tables, it'd instead
| have been pointed out to them that their "post is bizzare..."
| someonewhocar3s wrote:
| I was kinda surprised by the last chapter, suddenly this is
| about 'advancing in programming as a woman', and it's over!
| Like the last few chapters make sense in that view.
|
| What I want to know is how she went from a strong database
| format (ID + data, keyed) into a weak database model (not a
| database... is a key-value store) which is likely to be much
| slower, but also, this is what happens to varchars under the
| hood already (the rows only hold pointers, to where the
| strings/objects are placed, rather than inlining varchars) (at
| least in litesql!), so optimally it's only two dereferences
| instead of one..
|
| Like, eh, this programmer went through an anti-learning process
| somehow, and came up with a senseless optimisation. The post
| ends midway through the learning!
| [deleted]
| jeroenhd wrote:
| Based on this piece of old documentation I found [0] for MySQL
| 3.23 (the most recent version in 2002 as far as I can tell),
| certain types of indices were only available on certain types
| of engines. Furthermore, columns were restricted to 255
| characters, which may be too short for some of the fields saved
| in the database.
|
| Modern databases abstract away a lot of database complexity for
| things like indices. It's true that these days you'd just add
| an index on the text column and go with it. Depending on your
| index type and data, the end result might be that the database
| turns the table into third normal form by creating separate
| lookup tables for strings, but hides it from the user. It could
| also create a smarter index that's less wasteful, but the end
| result is not so dissimilar. Manually doing these kinds of
| optimisations these days is usually a waste of effort or can
| even cause performance issues (e.g. that post on the front page
| yesterday about someone forgetting to add an index because
| mysql added them automatically).
|
| All that doesn't mean it was probably a terrible design back
| when it was written. We're talking database tech of two decades
| ago, when XP had just come out and was considered a memory hog
| because it required 128MB of RAM to work well.
|
| [0]:
| http://download.nust.na/pub6/mysql/doc/refman/4.1/en/create-...
| mixmastamyk wrote:
| > All that doesn't mean it was probably a terrible design
| back when it was written. We're talking database tech of two
| decades ago, when XP had just come out and was considered a
| memory hog because it required 128MB of RAM to work well.
|
| A lot of this stuff was invented in the 70s, and was quite
| sophisticated by 2000. It just wasn't _free_ , rather quite
| expensive. MySQL was pretty braindead at the time, and my
| recollection is that even postgres was not that hot either.
| We've very lucky they've come so far.
| hn_throwaway_99 wrote:
| The fact remains that whether the text column existed on her
| original table, or whether it was pulled out to a normalized
| table, literally all of the same constraints would apply
| (e.g. max char length, any other underlying limitations of
| indexing).
|
| The issue is that her analysis of what the issue was with her
| original table is _completely wrong_ , and it's very weird
| given that the tone her "present" self is that it's so much
| more experienced and wise than her "inexperienced, naive"
| self.
|
| My point is that she should give her inexperience self a
| break, all that was missing from her original implementation
| were some indexes.
| mst wrote:
| Multiple-long-column compound indices sucked in old mysqls
| if you could even convince it to use them in the first
| place.
|
| Being able to look up each id via a single-string unique
| index would've almost certainly worked much better in those
| days.
| hn_throwaway_99 wrote:
| I used MySql a lot back then, had many multi-column
| indexes, and never had an issue.
|
| More importantly, given the degree of uniqueness likely
| to be present in many of those columns (like the email
| addresses), she could have gotten away with not indexing
| on every column.
| mst wrote:
| Your first sentence was, I'm afraid, very much not a
| universal experience.
|
| Your second is quite possibly true, but would have
| required experimentation to be sure of, and at some point
| "doing the thing that might be overkill but will
| definitely work" becomes a more effective use of
| developer time, especially when you have a groaning
| production system with live users involved.
| e12e wrote:
| Based on: https://web.mit.edu/databases/mysql/mysql-3.23.6-al
| pha/Docs/...
|
| I'd say there's very little performance gain in normalizing
| (it usually goes the other way anyway: normalize for good
| design, avoiding storing multiple copies of individual
| columns; _de_ -normalize for performance).
|
| I'm a little surprised by the tone of the article - sure,
| there were universities that taught computer science without
| a database course - but it's not like there weren't practical
| books on dB design in the 90s and onward?
|
| I guess it's meant as a critique of the mentioned, but not
| linked other article "being discussed in the usual places".
| willvarfar wrote:
| Knowing only what we can guess about the actual problem, i'd
| say indexing ip ought be enough to make everything super
| fast.
| gfodor wrote:
| The point is she was being hyper critical of her past schema to
| make a larger point.
| dehrmann wrote:
| I have mixed feelings about this. On one hand, I appreciate
| the larger point, and pointing out mistakes _again_ goes
| against that point. On the other, the post had so many
| technical details it overshadowed the larger point,
| especially for something that wasn 't obviously broken.
| mwcampbell wrote:
| And IIUC, the larger point, made by the twist ending, is that
| we shouldn't be so critical, but try to actually help newbies
| learn what we insist that they should know.
| erikerikson wrote:
| It seemed to me the point was that if the industry remains
| designed to push out experienced engineers that there won't
| be anyone to mentor the engineers that are just beginning
| their careers. Further that we will bend the productivity
| of the field downward and have less effective systems.
| sokoloff wrote:
| In what way is the industry designed that way now? Maybe
| some get sucked into management roles, but I don't see
| any evidence of a broad design to push them out.
| erikerikson wrote:
| Role count pyramids, volume and inflexibility of working
| hours, abusive management techniques, a constant
| treadmill of the new hotness, more...
|
| Some offices avoid many of the challenging patterns but
| not the vast majority. A lot of it is natural result of
| the newness of the industry and the high stakes of
| investment. Many of the work and organizational model
| assumptions mean that offering flexibility to employees
| complicates the already challenging management role.
| Given the power of those roles, this means workers
| largely must fit into the mold rather than designing a
| working relationship that meets all needs. As the needs
| change predictably over time a mounting pressure
| develops.
|
| I'm only 17 years in but it has gotten harder and harder
| to find someone willing to leave me to simply code in
| peace and quiet and dig my brain into deep and
| interesting problems.
| wbl wrote:
| Look at the way we screen resumes, or valorize work
| conditions incompatible with having family obligations.
| Ageism in the tech industry is rife.
| bialpio wrote:
| Or how leetcode heavy hiring favors folks that tend to be
| more fresh out of college & folks who have time to
| prepare for them.
| tomc1985 wrote:
| Idunno the real villain is that bullshit, costly "per-row"
| pricing
| codazoda wrote:
| As a 25 year "veteran" programmer, I'm glad I'm not the only
| one that likes the original schema (with indexes added). A
| fully normalized table is a lot more difficult to read,
| troubleshoot, and reason about. You would need a pretty good
| query to poke around the data.
| mst wrote:
| When I first started using postgres I was amazed at how much
| of what I thought was "database knowledge" was actually
| "working around limitations of the versions of mysql I'd been
| using" knowledge.
|
| These days, sure, the original schema plus indices would work
| fine on pretty much anything I can think of. Olde mysqls were
| a bit special though.
| [deleted]
| zhoujianfu wrote:
| Phew.. I was reading this post thinking "is basically every
| table I've ever made wrong?!"
| mst wrote:
| For modern databases, the extra normalisation is an
| incremental optimisation that you usually won't need.
|
| But I think she's if anything been in the industry longer
| than I have and the versions of mysql I first ran in
| production were a different story.
| topher_t wrote:
| I definitely got a spike of imposter syndrome thinking, of
| fuck I've been in software this long and haven't learned this
| yet?!
| hoppla wrote:
| I usually store IPv4 as unsigned int, and IPv6 as two 64 bit
| unsigned ints. But supporting both variants require that I have
| an a table for each type and a glue table to reference the
| record in the correct table. Storing simple stuff quickly turns
| complicated...
| da39a3ee wrote:
| OK so change the article's technical solution from "normalize"
| to "add index". That doesn't change the point of the article.
| rmk wrote:
| Why not store the IP Address as a 32-bit number (IPv4 addresses)?
| Why store it as a string in the first place? This is something I
| did not quite get. Also, wouldn't it be better to split out the
| domain from the email address for ease of filtering?
|
| Also, how does performing joins give a performance advantage
| here. I'm assuming there would be queries to get at the IDs of at
| least one, but going up to 4, to get at the IDs of the items in
| the quad. Then there would be a lookup in the mapping table.
|
| I have worked for some time in this industry, but I have never
| had to deal with relational databases (directly; query tuning and
| such were typically the domain of expert db people). It would be
| interesting to see an explanation of this aspect.
|
| EDIT: To people who may want to comment, "You missed the point of
| the article!": no, I did not, but I want to focus on the
| technical things I can learn from this. I agree that ageism is a
| serious problem in this industry.
| pg_bot wrote:
| You could store the IP as a 32 bit unsigned int. There's no
| issue with that, but I would probably recommend nowadays to use
| an internal cidr or inet type if your database supports that.
| It probably wouldn't be better to split the email address from
| the domain for filtering since email address formats tend to be
| a bit tricky.
|
| Joins give a performance advantage due to the fact that you
| aren't duplicating data unnecessarily. The slow query in
| question becomes five queries (4 for the data once for the
| final lookup) which can each be done quickly and if any one of
| them return nil, you can return a timeout.
| rmk wrote:
| Yes, some of the databases support native ip address types
| that can be manipulated efficiently. Better to use that (like
| inbuilt json querying capabilities in postgres) than come up
| with your own.
|
| It is still not clear how is it better to do up to 5 separate
| queries or maybe a few joins, than to store the strings and
| construct indices and such on them? Is the idea that the cost
| of possibly establishing a new socket connection for some
| concurrent query execution or sequentially executing 5
| queries is still < possible scans of the columns (even with
| indexing in effect)? Also, even if you had integers, don't
| you need some sort of an index on integer fields to avoid
| full table scans anyway?
| Aqueous wrote:
| There is a time and a place for a denormalized schema. If you are
| trying to search a large data-set a denormalized schema in a
| single table with some good indexing is much preferable to
| joining across 30 tables.
|
| Whether you choose normalized or denormalized depends very much
| on a) what you need to do now b) what you may need to do in the
| future. Both are considerations.
| bizzarojerry wrote:
| What a bizarre post from an influential blogger so confidently
| making assertions that are wildly contested even among HN users.
| chrischapman wrote:
| I think the 'terrible schema' thing is a secondary issue. The
| important take away for me was this:
|
| > Considering that we as an industry tend to chase off anyone who
| makes it to the age of 35, is it any surprise that we have a
| giant flock of people roaming around trying anything that'll
| work?
| alisonkisk wrote:
| That canard isn't true.
| mixmastamyk wrote:
| Unfortunately it is the penultimate sentence in a sizable post.
| temporallobe wrote:
| This was great. Clever twist at the end.
|
| I cringed a little because some of those mistakes looks like
| stuff I would do even now. I have a ton on of front/back end
| experience in a huge variety of languages and platforms, but I am
| NOT a database engineer. That's a specific skillset I never
| picked up nor, admittedly, am I passionate about.
|
| Sorry to go off on a tangent, but it also brings to mind that
| even so-called experts make mistakes. I watch a lot of live
| concert footage from famous bands from the 60s to the 90s, and as
| a musician myself, I spot a lot of mistakes even among the most
| renowned artist...with the sole exception of Rush. As far as I
| can tell, that band was flawless and miraculously sounded better
| live than on record!
| mceachen wrote:
| Actually, the Correct Answer is a bloom
| filter.
|
| (And, yes, we had math in the early 2000s.)
|
| Snark aside, I'm frustrated for the author. Her completely-
| reasonable schema wasn't "terrible" (even in archaic MySQL)--it
| just needed an index.
|
| There's always more than one way to do something. It's a folly of
| the less experienced to think that there's only One Correct Way,
| and it discourages teammates when there's a threat of labeling a
| solution as "terrible."
|
| Not to say there aren't infinite terrible approaches to any give
| problem: but the way you guide someone to detect why a give
| solution may not be optimal, and how you iterate to something
| better, is how you grow your team.
| Jolter wrote:
| To be fair, the "correct way" to do databases at that time was
| to use third normal form. Putting indices on string columns
| would have been considered a hack, much like using MySQL was.
| mst wrote:
| Plenty of databases at the time had pretty lame string
| indexing because it simply wasn't something people relied on
| if they wanted performance, and memory sayeth mysql's were
| both not particularly efficient and had some nasty technical
| limitations.
|
| On the mysql she was using, breaking things out so it only
| needed to index ints was almost certainly a much better idea.
| On anything I'd be deploying to today, I'd start by throwing
| a compound index at the thing with the expectation that'd
| probably get me to Good Enough.
| ed25519FUUU wrote:
| > id | ip | helo | m_from | m_to | time | ...
|
| I'm not a database person, but this doesn't seem _too bad_ if
| there was at least an index on IP. The performance there would
| probably be good enough for a _long time_ depending on the usage.
|
| But is 6 tables (indexed on the value) really better than 1 table
| with 6 indexes?
| rektide wrote:
| The popular advice in these comments to use indexes sounds really
| good. I haven't thought it out fully, but my first thought was
| "this sounds like a DIY columnar store". A columnar database does
| a lot of this work for you. I'm not 100% it's a good option in
| this case, because things like email subjects are not good
| columnar data. I would like to know more about how problematic
| that is.
| ummonk wrote:
| I was preparing to rant about the tone that the post was taking
| towards the "clueless programmer" until I got to the end. Well
| done...
|
| Still not sure why adding an index didn't work instead of
| breaking out into multiple tables indexed by IDs.
|
| Besides, this use case screams for "just store the hashes and
| call it a day".
| kpommerenke wrote:
| While the article is not really about the database schema, could
| you do a table with just two fields?
|
| 1) hash of concat(ip & helo & m_from & m_to) 2) time
| daly wrote:
| We had a test database that contained 1 record. Nobody paid much
| attention since the focus was on the problem, not the database.
|
| The database included several newly developed "stored
| procedures".
|
| Time elapsed... and it was nearing the time to ship the code. So
| we tried to populate the database. But we could not. It turned
| out that the stored procedures would only allow a single record
| in the database.
|
| Since a portion of the "business logic" depended on the stored
| procedures... well, things got "delayed" for quite a while... and
| we ended up having a major re-design of the back end.
|
| Fun times.
| hinkley wrote:
| Why do so many of us "forget" to populate the database with
| representative quantities of data until after we've baked in
| all of our poor design decisions?
| globular-toast wrote:
| > That's right, I was that clueless newbie
|
| Questionable use of past tense here.
| mwcampbell wrote:
| Why? She's clearly not a clueless newbie now.
| framecowbird wrote:
| I think the OP was sarcastically implying that she still is.
| inopinatus wrote:
| Such remarks generally backfire onto the author, as indeed
| in this case.
| kofejnik wrote:
| Right, not a newbie anymore
| [deleted]
| zephrx1111 wrote:
| I think this post showed us how an engineer on the Mt. stupid
| looks like. Apparently this design is wrong. We should rely on
| the indexing ability of database itself, instead of inventing our
| own indexing system.
| tyingq wrote:
| Feels like you could just concatenate and hash the 4 values with
| MD5 and store the hash and time.
|
| Edit: I guess concatenate with a delimiter if you're worried
| about false positives with the concat. But it does read like a
| cache of _" I've seen this before"_. Doing it this way would be
| compact and indexed well. MD5 was fast in 2002, and you could
| just use a CRC instead if it weren't. I suppose you lose some
| operational visibility to what's going on.
| ok123456 wrote:
| For the 2021 version, you'd just generate a bloom filter/cuckoo
| hash from all the data gathered by your spamhaus database
| periodically. Make a separate one for each value in your tuple
| and your score would be the number of the sub-hashes that
| matched.
| kingcharles wrote:
| Isn't that exactly what adding an index would do internally?
| tyingq wrote:
| 2002 MySQL had some limitations on indexing variable length
| string (varchar) columns. That's the gist of the linked
| story.
| Moto7451 wrote:
| Yup, we do this at work for similar purposes and it works a-ok.
| We also have some use cases that follow the original "problem"
| schema and they work fine with the correct indices involved.
|
| My guess is that in 2002 there were some issues making those
| options unappealing to the Engineering team.
|
| When we do this in the realm of huge traffic then we run the
| data through a log stream and it ends up in either a KV store,
| Parquet with SQL/Query layer on top of it, or hashed and rolled
| into a database (and all of the above of there are a lot of
| disparate consumers. Weee Data Lakes).
|
| This is also the sort of thing I'd imagine Elastic would love
| you to use their search engine for.
| shusson wrote:
| I would add a counter point to foreign key constraints, they make
| deletions and updates a lot more costly [1].
|
| [1] https://shusson.info/post/postgres-experiments-deleting-
| tabl...
| wdr1 wrote:
| Related to database indexes, but not the post: a busted database
| index brought down ticket sales of the 2008 Olympics Games.
|
| This was the first time regular people could go buy tickets for
| events & they had been lining up overnight at Bank of China
| locations through the country. We were down for over a day before
| we called it off. Apparently this led to minor upheaval at
| several locations in Beijing & riot police were called in.
|
| We were pretty puzzled as we _had_ an index and had load tested
| extensively. We had Oracle support working directly with us &
| couldn't figure out why queries had started to become table
| scans.
|
| The culprit? A point upgrade to DBD::Oracle (something like X.Y.3
| to X.Y.4) introduced subtle but in character sets. So the index
| required using a particular Unicode character set, and we were
| specifying it, but when it was translated into the actually
| query, it wasn't exactly the right one, so the DB assumed it
| couldn't use the index. Then, when all the banks opened & a large
| portion of very populous country tried to buy tickets at the same
| time, things just melted.
|
| Not a fun day.
| hinkley wrote:
| The one that always got us is much more mundane. Deleting a row
| requiring an index in every table with a FK to avoid tables
| scans.
|
| Near as I can tell, we assume there is some bit of magic built
| into the foreign key concept that handles this for us, but that
| is not the case.
| necovek wrote:
| Foreign keys are naturally referencing primary keys, which
| have their own (unique) indexes by default. And there's some
| extra magic with CASCADE.
|
| But it seems like I am missing your point: care to expand on
| it so I can learn about the gotcha as well? What are you
| attempting to do, and what's making the performance slow?
| nickkell wrote:
| If I've understood correctly it's the foreign key column
| itself that isn't indexed by default. Deleting a record
| then requires a table scan to ensure that the constraint
| holds
| dan-robertson wrote:
| Example: if you delete a record from the customers table,
| you want an index on the foreign key in the orders table
| to delete the corresponding entries. This also means the
| performance of the delete can have the unfortunate
| property where the small work of deleting a single row
| cascades into the work of deleting many rows.
| zeroimpl wrote:
| Yeah I've run into this a few times. In Postgres, you can't
| add a foreign key referencing columns that aren't indexed. I
| assume this is for performance reasons so inserts/updates are
| efficient, but there is no such check that the source columns
| are also indexed, so updates/deletes on the referenced table
| can be terribly slow.
| superjan wrote:
| At my previous job, a particular database was designed using
| ascii strings fields for a particular field rather than
| unicode. If you then query with a string in unicode format, the
| database decided that the comparison should be done in unicode.
| The only way was to table scan and convert all ascii fields to
| unicode on the fly. It was found only in production.
|
| Given that you mention China in your story, did GB 18030 have
| anything to do with your problems?
|
| https://en.wikipedia.org/wiki/GB_18030
| path411 wrote:
| Can't imagine the pain to even discover this problem. It
| normally takes me a long time to be like, "maybe I didn't make
| a mistake, and I have a third party bug?". Third party bugs are
| always the most annoying to me, usually the longest to diagnose
| and then ultimately I have to just tell my boss I can't do
| anything to fix it, but hopefully I can find a way to avoid it.
| mst wrote:
| I love DBD::Oracle so much I've always arranged to have other
| people test oracle related DBIx::Class features for me.
|
| Writing the pre-ANSI-JOIN-support SQL dialect code was pretty
| good fun though.
| taberiand wrote:
| The moral of this story, to me, is always lock in all
| dependencies to the exact specific versions tested for
| production release.
| [deleted]
| throwawayFanta wrote:
| I think this blog fails to take into account that 2021 is not
| 2002. Computer Science is a much more formal/mainstream field of
| study now, and people don't operate "in a vacuum with no
| mentorship, guidance, or reference points."
|
| Some comments on the previous blog post raised important
| questions regarding minimum understanding/knowledge of technology
| one utilizes as part of their day job. And I would agree that
| indexing is a fundamental aspect while using relational
| databases.
|
| But unfortunately this isn't very uncommon these days, way too
| often have I heard people in $bigco say, let's use X, everyone
| uses X without completely understanding the
| implications/drawbacks/benefits of that choice.
| [deleted]
| savant_penguin wrote:
| I more or less thrown the towel on that "not being clueless
| anymore"
|
| Sure, I'll try and study to keep myself on my feet, but there is
| soo much to learn that I'd rather focus on the technologies
| needed right _now_ to get things done
|
| Way more often than not "thing A has to be done with tech B" and
| the expectation of getting it done without multiple foot guns in
| gone
| havocsupreme wrote:
| I'm pretty sure all the people who are criticizing the database
| design haven't read the ending. The article isn't about the
| schema, it's about helping those entry level programmers
| loeg wrote:
| It would be more effective at making that point if it didn't
| have the confused and arguably incorrect section in the middle.
| If as a writer, the audience misses your point, maybe it's a
| problem with your writing rather than the audience.
| empthought wrote:
| You don't think it's helpful for people to steer entry level
| programmers away from bad ideas like this one?
| lisper wrote:
| > You know what it should have been? It should have been
| normalized.
|
| No, it should have had a unique index on the first five columns.
|
| (Oooh, I just hate it so much when someone gets up on their high
| horse and opens a blog post with a long condescending preamble
| about how some blogger got it wrong, and then they get it even
| wronger.)
| nobodyandproud wrote:
| If there are millions of people entries coming in per hour, I can
| see the value.
|
| Otherwise, how long is their data retention? Wouldn't there be
| more value in cleaning out entries older than, say, two hours?
| lifeisstillgood wrote:
| The one thing I would add here is that the devlead (or equivalent
| - there always is someone) should have added at least one test.
| Not a unit test or 99% co drags test but a "does it do what we
| want" test - a golden path test.
|
| I came across something like this with an apprentice recently. I
| let the lack of a repeatable test pass because speed / time /
| crunch etc and the obvious how can that possibly fail code of
| course failed. The most basic test would have caught it in the
| same way some basic performance testing would have helped rachel
| however many years back.
|
| It's hard this stuff :-)
| dilyevsky wrote:
| The real take away here is DB performance optimization (imo
| performance in general) is freaking hard and even people who'd
| successfully optimized something like OP often totally
| misunderstand why it actually worked. Pay for an expert advice
| rvr_ wrote:
| short answer: composite indexes
| jeffrallen wrote:
| I did this (bad schema, too much scanning) too.
|
| I also ran newfs on a running Oracle database. (That did vastly
| less damage than you would have thought, actually.)
| ChrisMarshallNY wrote:
| I'd say that pretty much _everyone_ is a better DB programmer
| than I am, so I don 't really go out of my way to design anything
| especially robust. I try to design stuff I'm not good at, in a
| way that makes it easy for someone that knows more than I do, to
| come in and toss out the work I did, to be replaced with good
| work.
|
| There's _lots_ of stuff that I 'm bad at. I'm also good at a fair
| bit of stuff. I got that way, by being bad at it, making
| mistakes, asking "dumb" questions, and seeing how others did it.
| Grimm1 wrote:
| Heh at the end I think I may have been called out for my harsh
| response to the post being referenced at the start here on HN
| since it seemed to raise some hackles. Borrowing from other
| social media terminology here, "subtweeted" but by a blog post,
| not sure how I feel about that.
|
| Honestly it almost makes me feel compelled to start writing more
| for the industry but to be honest, my opinion is most ideas are
| not great, or not novel, up to and including my own. So writing
| about them seems likely not to be beneficial. I guess you could
| argue I could let the industry be the judge of that though.
| fortran77 wrote:
| The "NoSQL" movement -- which I wisely rejected -- is also
| responsible for this.
| jugg1es wrote:
| There are problems that nosql databases solve easily that
| require a lot of fancy tricks in a RDBMS.
| harel wrote:
| All "movements" have some merit, or they wouldn't exist in the
| first place. Nobody starts a "movement" just because. It exists
| because it solves something. "NoSQL" was a solution to
| something. Today you can benefit from best of all worlds, and
| still NoSQL has a place and a use case, just like anything else
| that exists.
| tester756 wrote:
| I will play devil advocate
|
| >and still NoSQL has a place and a use case, just like
| anything else that exists.
|
| What actually makes you believe that it's the NoSQL that has
| "some use cases" and relational databases are "default ones"
| instead of NoSQL/no-relational by default?
| junon wrote:
| Err. Wat. The original schema was (mostly) fine. It had no
| indexes. The second schema looks like it tried to work around
| lack of database optimization features. It's in no way "better"
| from a data design standpoint.
|
| A database with good string index support isn't doing string
| comparisons to find selection candidates - at least, not
| initially.
|
| What a bizarrely confident article.
| pgeorgi wrote:
| > lack of database optimization features
|
| That's a perfection description of mysql as of 2002.
| junon wrote:
| MySQL, at least as far back as 2000, had indexes.
|
| http://web.archive.org/web/20020610031610/http://www.mysql.c.
| ..
| pgeorgi wrote:
| I wouldn't bet on these indexes optimizing anything back
| then. MySQL was legendary that while implementing the
| necessary standards (to some degree) it was neither
| reliable nor efficient.
| eklitzke wrote:
| Indexes in MySQL worked fine in 2002, at least for the
| use case described in this article.
| junon wrote:
| That's pretty beside the point though, the concept of
| indexing had existed for decades even prior to that.
| pgeorgi wrote:
| The quote was "database optimization features" and the
| scope was "MySQL as of 2002".
|
| Of course, even my old DBase II handbook talks about
| indexes - all that's old hat. MySQL had them, too.
|
| MySQL also used to have a long-earned reputation as a toy
| database though, and MySQL in 2002 was right within the
| timeframe where it established that reputation. So yeah,
| you could add indexes, but did they speed things up (as
| in them being an "optimization feature")? Public opinion
| was rather torn on that.
| oconnor663 wrote:
| I think there's an easy "best of both worlds" take here:
|
| 1. The majority of the performance problem could've and probably
| should've been summarized as "you need to use an index". (Maybe
| there were MySQL limitations that got in the way of indexing back
| then? But these days there aren't.)
|
| 2. Everyone makes mistakes! New programmers make mistakes like
| not knowing about indexes. Experienced programmers make mistakes
| like knowing a ton about everything and then teaching things in a
| weird order. All of these things are ok and normal and part of
| growing, and it's important that we treat each other kindly in
| the meantime.
| mst wrote:
| Based on my own war stories, if it was an early '00s mysql then
| I expect "just throw an index at the varchar columns" would've
| helped but not nearly as much as what she actually did.
| gadrev wrote:
| What I got from this post is a headache after reviewing the
| wikipedia entries on 3NF (all all of it's dependencies: 1NF, 2NF,
| also 3.5NF). Man the language is dense.
|
| j/k, always nice to review some of this "basic" stuff (that you
| barely use in everyday work but it's good to have it in the back
| of your head when changing something in the db schema)
| bizzarojerry wrote:
| What a bizarre post from an influential blogger so confidently
| making assertions that are wildly contested even among HN users.
|
| If anything this is the problem with industry, articles like this
| that make vast generalizations from a specific instance,
| unsubstantiated finger-pointing gestures, and passive
| aggressively blame the industry as a whole.
| mmaunder wrote:
| Oh well played!! I was setting up to give the author a hard time
| about being judgemental, particularly because storing IPs or
| UUIDs as strings is a mistake I've seen some pretty darn good
| devs make. Some folks just aren't super strong on schema design
| and performance but are great at other things.
|
| Plus MySQL kind of rocks. Fight me. There are some interesting
| optimizations that Percona has written about that may improve the
| performance of OPs schema. The learning never ends.
|
| TBH I haven't seen a shortage of mentors. In fact most of our dev
| team is 35+ and are very approachable and collegial and our
| internal mentoring programs have been extremely successful -
| literally creating world experts in their specific field. I don't
| think we're unique in that respect.
|
| I think it's rather unfortunate that the top comment here is a
| spoiler. Actually reading the post and going on the emotional
| rollercoaster that the author intended to take you on is the
| point. Not the punch line.
| jjice wrote:
| > Plus MySQL kind of rocks.
|
| Agreed. I prefer Postgres for personal projects, but MySQL is a
| fine database. Honestly, even a relational DB I wouldn't want
| to use again (DB2...) is still pretty solid to me. The
| relational model is pretty damn neat and SQL is a pretty solid
| query language.
|
| I wonder how many people disagree with that last part in
| particular...
| mmaunder wrote:
| It's actually the core of all dev that involves any data
| storage, today. HN is both aspirational in tech and has a
| younger demographic so you won't get any points here for
| mentioning anything SQL. Hence the downvotes I'm getting. But
| hey, it's all good!!
| tomsmeding wrote:
| Why exactly would it be so bad to just put a suitable index on
| the table containing strings? The time complexity of the
| resulting search would be the same, so I assume there will be
| some constant factor slowdowns. Is it that indices over string
| fields are stored inefficiently on disk? (If so, can that not be
| fixed in the db engine directly?) Or is this fine today but
| wasn't fine 15 years ago?
| dan-robertson wrote:
| I think the article describes the solution that was used and
| not necessarily the best solution. If they made the "obvious"
| mistake in the first case they mightn't arrive at the best
| solution in the second. Adding the single index and using the
| four tables have the same worst case complexity for reads,
| O(log n), with different constants and massively beat scanning
| at O(n).
|
| It may be that the second solution wasn't the best, or that it
| was better for write performance or memory usage--this was 2002
| after all. It may also be the case that the cardinal it's of
| some columns was low in such a way that the four table solution
| was better, but maybe getting the columns in the right order
| for a multi column index could do that too.
|
| Either way, I don't think it really matters that much and
| doesn't affect the point of the article.
| HelloNurse wrote:
| Normalization is important for deduplication, not only to index
| and compare a few short numbers instead of a few long string:
| those host names and email addresses are long and often
| repeated.
| remram wrote:
| That's not what normalization is. You're thinking of
| deduplication or compression. This table is normalized as per
| usual database terminology.
| ghayes wrote:
| But if it's guaranteed to be 1:1, why? The two
| implementations (normalized and denormalized) in that case
| should be completely isomorphic, quirks of the DB engine
| aside.
| mst wrote:
| It was fine on -some- databases 20 years ago. mysql of that
| era, less often.
| thehappypm wrote:
| At the worst you could concatenate the strings and have a
| single primary key.. or hash the string concatenation.
| jerjerjer wrote:
| Yeah, while this is not optimal (ip should be converted into
| integer, time should be ts), the table would be small (as old
| entries could be safely deleted). The only real issue is the
| lack of indices.
|
| Also is helo field even needed?
| mjevans wrote:
| Yes. A proper mail implementation won't randomize it, so it
| should be consistent. Depending on the design of the
| validation stages it might only need a case-normalized lookup
| to the string index table and that id as part of the final
| multi-column unique index, and in that last case only to
| convey to the database the expectations to optimize around
| (and require the data adhere to).
| junon wrote:
| Indexing existed 15 years ago. The article never mentions why
| indexing didn't solve this problem. Super weird take on the
| author's part...
| quickthrower2 wrote:
| Surely more than 15 years! 50 maybe?
| ericbarrett wrote:
| Early InnoDB* had pretty strict limits on varchar indexes and
| was not the most efficient. I don't remember the details but
| it's entirely possible the single-table format Rachel
| described ran head on into those limitations.
|
| Also remember indexes take space, and if you index all your
| text columns you'll balloon your DB size; and this was 2002,
| when that mattered a lot more even for text. Indexes also add
| write and compute burden for inserts/updates as now the DB
| engine has to compute and insert new index entries in
| addition to the row itself.
|
| Finally, normalizing your data while using the file-per-table
| setting (also not the default back then) can additionally
| provide better write & mixed throughout due to the writes
| being spread across multiple file descriptors and not
| fighting over as many shared locks. (The locking semantics
| for InnoDB have also improved massively in the last 19
| years.)
|
| * Assuming the author used InnoDB and not MyISAM. The latter
| is a garbage engine that doesn't even provide basic
| crash/reboot durability, and was the default for years; using
| MyISAM was considered the #1 newbie MySQL mistake back then,
| and it happened all the time.
| noduerme wrote:
| To me it seems that spreading it over four tables would
| lead to a lot more potential read locks while the big
| combined table is waiting for a join on each of the others,
| and some process is trying to insert on the others and link
| them to the main. This is assuming they were using foreign
| keys and the main table 4-column index was unique.
| cogman10 wrote:
| > Also remember indexes take space
|
| Indexes take space, except for the clustered index.
|
| An important distinction.
|
| If the point of this table is always selecting based on IP,
| m_from, m_to, then clustering on those columns in that
| order would make sense.
|
| Of course, if it's expected that a lot of other query
| patterns will exist then it might make sense to only
| cluster on one of those columns and build indexes on the
| others.
| kofejnik wrote:
| The initial design was fine and btw it did not violate 3NF,
| actually (if you add auto increment pk). A single composite index
| would've most likely solved all problems
| krisgenre wrote:
| My piano teacher once told me that its common to commit a mistake
| but its disastrous if you keep practicing without realizing that
| you have made a mistake. What I look for in a developer is their
| ability to realize a mistake and find ways to fix it.
| marginalia_nu wrote:
| https://news.ycombinator.com/item?id=29137141
| cuu508 wrote:
| > The observation was that we could probably store the IP
| address, HELO string, FROM address and TO address in a table, and
| send back a 4xx "temporary failure" error the first time we saw
| that particular tuple (or "quad"). A real mail server which did
| SMTP properly would retry at some point, typically 15 minutes to
| an hour later. If it did retry and enough time had elapsed, we
| would allow it through.
|
| I've run into this form of greylisting, it's quite annoying. My
| service sends one-time login links and authorization codes that
| expire in 15 minutes. If the email gets delayed, the user can
| just try again, right? Except I'm using AWS SES, so the next
| email may very well come from a different address and will get
| delayed again.
| loeg wrote:
| You could have your codes expire after an hour or day instead?
| cuu508 wrote:
| That's _a_ solution, but with tradeoffs:
|
| * security, an attacker has more time to intercept and use
| the links and codes.
|
| * UX, making the user wait 15+ minutes to do certain actions
| is quite terrible.
|
| I've had a couple support requests about this. The common
| theme seems to be the customer is using Mimecast, and the fix
| is to add my sender address in a whitelist somewhere in their
| Mimecast configuration.
| alksjoflkajsom wrote:
| People in the comments are getting (rightfully) outraged about
| the poor understanding of indexing, but I'm a little surprised
| that everyone here doesn't seem to understand normalization
| either. The original schema is perfectly normalized and is
| already in 3NF: none of the columns shown has a dependence on any
| of the other columns outside of the primary key (in other words,
| if you knew eg the values of the ip, helo, and from columns,
| you'd still have no information about the "to" column).
|
| Normalization does not mean "the same string can only appear
| once". Mapping the string "address1@foo.bar" to a new value "id1"
| has no effect on the relations in your table. Now instead of
| "address1@foo.bar" in 20 different locations, you have "id1" in
| 20 different locations. There's been no actual "deduplication",
| but that's again not the point. Creating the extra tables has no
| impact on the normalization of the data.
| [deleted]
| slim wrote:
| Plus depending on the amount of emails they get, that
| optimisation could be unnecessary. That database schema was
| perfectly fine in some cases.
| colanderman wrote:
| Came here to say this. Whatever may have been true of InnoDB 20
| years ago, don't follow this article's advice for any modern
| relational database.
|
| To add: Not sure about MySQL, but `varchar`/`text` in
| PostgreSQL for short strings like those in the article is very
| efficient. It basically just takes up space equaling the length
| of the string on disk, plus one byte [1].
|
| [1] https://www.postgresql.org/docs/current/datatype-
| character.h...
| mixmastamyk wrote:
| Thank you. Was thinking that I was the clueless one...
| (probably a little) because that is not my understanding of
| what normalization is.
|
| I thought perhaps swapping the strings to integers might make
| it easier to index, or perhaps it did indeed help with
| dedupilcation in that the implementation didn't "compress"
| identical strings in a column--saving space and perhaps help
| performance. But both issues appeared to be implementation
| issues with an unsophisticated Mysql circa 2000, rather than a
| fundamentally wrong schema.
|
| I agreed with her comment at the end about not valuing
| experience, but proper databasing should be taught to every
| developer at the undergraduate level, and somehow to the self-
| taught. Looking at comptia... they don't seem to have a db/sql
| test, only an "IT-Fundamentals" which touches on it.
| nzealand wrote:
| > Normalization does not mean "the same string can only appear
| once".
|
| It can, if you want to easily update an email address, or
| easily remove all references to an email address because it is
| PII.
| alksjoflkajsom wrote:
| That's not what normalization normally means, no. See eg
| wikipedia
| https://en.wikipedia.org/wiki/Database_normalization
| noduerme wrote:
| >>> Mapping the string "address1@foo.bar" to a new value "id1"
| has no effect on the relations in your table.
|
| How do you mean? If id1 is unique on table A, and table B has a
| foreign key dependency on A.id, then yeah you still have id1 in
| twenty locations but it's normalized in that altering the
| referenced table once will alter the joined value in all twenty
| cases.
|
| This might not be important in the spam-graylisting use case,
| and very narrowly it might be 3NF as originally written, but it
| certainly wouldn't be if there were any other data attached to
| each email address, such as a weighting value.
| twic wrote:
| This is very slightly not quite true, because the HELO string
| and the remote address should go hand-in-hand.
| alksjoflkajsom wrote:
| You might be right (I don't actually know what a HELO string
| is, I don't know anything about SMTP :). I was just going off
| how the author presented the data, as a tuple of four
| completely independent things.
|
| Of course the main point still stands, that the two schemas
| are exactly as normalized as each other.
|
| Edit: rereading the original post, the author mentions that
| "they forged...the HELO"--so perhaps there was indeed no
| relationship between HELO and IP here. But again, I don't
| know anything about SMTP, so this could be wrong.
| Brian_K_White wrote:
| I do know about smtp and you were right regardless, because
| the author was talking about 4 database fields, not smtp.
| The details of smtp are irrelevant.
| GauntletWizard wrote:
| It's (somewhat) because the HELO is forged that there's no
| relationship between HELO and IP. The very first message of
| SMTP is "HELO ", hostname can either be a unique
| identifier (server1.company.com, etc.) or a system-level
| identifier (mta.company.com for all of your company's
| outbound mail agents, or in the case of bulk mailers they
| might use bulk.client1.com when sending as client1,
| bulk.client2.com, etc). But there is/was no authentication
| on what you send as HELO (Now you can verify it against the
| TLS certificate, though many implementations don't do that
| at all or well), so correlating based on the hostname in
| HELO was questionable at best. Thus, the combination of
| (HELO, IP) was the a single value as a tuple.
| Izkata wrote:
| > But there is/was no authentication on what you send as
| HELO
|
| Yep, and that explains the "foobar" rows - those should
| have resolved to the same IP, except because there's no
| authentication that blocks it you could put gibberish
| here and the SMTP server would accept it.
|
| > so correlating based on the hostname in HELO was
| questionable at best
|
| Eh, spambots from two different IPs could have both
| hardcoded "foobar" because of the lack of authentication,
| so I could see this working to filter
| legitimate/illegitimate emails from a compromised IP.
| [deleted]
| [deleted]
| dorianmariefr wrote:
| id | ip_address | from | to | content | updated_at | created_at
|
| And indexes where you need them. Normalize data like IP
| addresses, from and to. And you are good to go.
| fullstackchris wrote:
| I guess I've come away with a totally different takeaway than
| most. This post is rather strong on the blame game, which could
| be fixed by one thing...
|
| RTFD! (Read The F**in Docs!) - I only skimmed the post, but its
| definitely something that would have been avoided had some SQL
| documentation or introduction been read.
|
| I'd argue one of the huge things that differentiates "senior"
| developers from all the "other" levels - we're not smarter or
| more more clever than anyone else - we read up on the tools we
| use, see how they work, read how others have used them before...
| I understand this was from 2002, but MySQL came out in 1995 -
| there was certainly at least a handful of books on the topic.
|
| Perhaps when just starting off as an intern, you may be able to
| argue that you are 'operating in a vacuum' but any number of
| introductory SQL books or documentation could quickly reveal
| solutions to the problem encountered in the post. (Some of which
| are suggested in these comments).
|
| Of course we all make mistakes in software - I definitely could
| see myself creating such a schema and forgetting to add any sort
| of indexing - but when running into performance issues later, the
| only way you'll be able to know what to do next to fix it is by
| having read literature about details of the tools you are using.
|
| Operating in a vacuum? Then break out of it and inform yourself.
| Many people have spent many hours creating good documentation and
| tutorial on many many software tools - use them.
| fwip wrote:
| The problem here is "unknown unknowns," coupled with the real
| business pressure to get solutions out the door. The MySQL
| manual is (currently) over 6,000 pages. A developer new to SQL
| or relational DBs doesn't know how many of those are necessary
| to read and comprehend to deploy a working solution.
|
| And in this case, she designed and deployed the system, and it
| worked and met the business needs for several months. When
| performance became an issue, she optimized. I'm sure she had
| plenty of other unrelated work to do in the meantime,
| especially as a lead/solo dev in the early 2000's.
|
| Sounds like a productive developer to me.
| GuB-42 wrote:
| The problem, I think is that most people, me included, don't
| really know what databases really do. There is a whole lot about
| optimizing procedural code, with a variety of tools, the dangers
| of premature optimization and the tradeoff with readability.
| Anyone with a computer science degree has heard about algorithmic
| complexity, caches, etc...
|
| But databases are just magic. You try things out, usually
| involving a CREATE INDEX at some point, and sometime it gets
| faster, so you keep it.
|
| Rachel, in he blog post is a good example of that thought
| process. She used a "best practice", added an index (there is
| always an index) and it made her queries faster, cool. I don't
| blame her, it works, and it is a good reminder of the 3NF
| principle. But work like that on procedural code and I'm sure we
| will get plenty of reactions like "why no profiler?".
|
| Many, many programmers write SQL, but very few seem to know about
| query plans and the way the underlying data structures work. It
| almost looks like secret knowledge of the DBA caste or something.
| I know it is all public knowledge of course, but it is rarely
| taught, and the little I know about is is all personal curiosity.
| musingsole wrote:
| > But databases are just magic.
|
| 100%!!!
|
| I hate how developers talk about a "database" as a monolithic
| concept. It's an abstract concept with countless
| implementations built off of competing philosophies of that
| abstract concept. SQL is only slightly more concrete, but
| there's as many variants and special quirks of SQL dialects out
| there as databases.
| matsemann wrote:
| > _The first time you encounter something, you 're probably going
| to make some mistakes. There's a post going around tonight about
| how someone forgot to put an index on some database thing and
| wound up doing full table scans (or something like that)._
|
| Which post is referenced here?
|
| I also believe this to be a tooling issue. It's often opaque what
| ends up being run after I've done something in some framework
| (java jpa, django queries whatever). How many queries (is it n+1
| issues at bay?), how the queries will behave etc. Locally with
| little data everything is fine, until it blows up in production.
| But you may not even notice it blowing up in production, because
| that relies on someone having instrumented the db and push
| logs+alarms somewhere. So it's easy to remain clueless.
| claytonjy wrote:
| Pretty sure it's this one
| https://news.ycombinator.com/item?id=29132572
| webstrand wrote:
| While the normalized version is more compact and doesn't store
| redundant data, it _also_ needs an index on the four columns or
| it'll have to check every row in the table. A similar index added
| to the original denormalized table would have given comparable
| query performance.
|
| The table schema isn't terrible, it's just not great. A good
| first-pass to be optimized when it's discovered to be overly
| large.
| [deleted]
| Closi wrote:
| > The table schema isn't terrible, it's just not great.
|
| It depends - if this was the full use-case then maybe a single
| table is actually a pretty good solution.
| [deleted]
| ehershey wrote:
| Proud of myself for recognizing the conceit (pun intended) here
| pretty quickly. Good point in the end too!
| Donckele wrote:
| Sorry Rachel but the best solution was a simple fix as mentioned
| by most people here - columns having the right types and indexes.
|
| When I was still wet behind the ears I was doing what you are now
| proposing - but its overkill.
| jugg1es wrote:
| One of the most useful lessons I ever learned about designing
| database schemas was the utter uselessness of indexing and
| searching on datetime fields. Since virtually every value is
| going to be different, indexing and searching on that field is
| (almost) no better than having no index on the datetime field at
| all.
|
| It was a revelation to me when I decided to experiment with
| having a indexed date-ONLY field and using that to search
| instead. It improved performance by almost two orders of
| magnitude. In hindsight, this should have been totally obvious if
| I had stopped to think about how indexing works. But, as is
| stated in this article, we have to keep relearning the same
| lessons. Maybe I should write an article about how useless it is
| to index a datetime field...
|
| EDIT: This was something I ran into about 10 years ago. It is
| possible there was something else going on at the time that I
| didn't know about that caused the issue I was seeing. This is an
| anecdote from my past self. I have not had to use this technique
| since then, and we were using an on-prem server. It's possible
| that the rest of the table was not designed well and index I was
| trying to use was already inefficient for the resources that we
| had at the time.
| dilyevsky wrote:
| You're maybe thinking of compression? Primary id is unique too
| but indexed search is still logn compared to full table scan
| (n)
| blibble wrote:
| there's more index types than only hashes (which do indeed only
| support equality)
|
| e.g. btrees allow greater than/less than comparisons
|
| postgres supports these and many more:
| https://www.postgresql.org/docs/9.5/indexes-types.html
| jugg1es wrote:
| I was using a BTREE index and we were doing greater/less than
| queries. It was not a hash index.
| ComodoHacker wrote:
| Did you also learn that certain indexes allow for range
| queries, and modern databases are quite efficient at those?
|
| And yes, please write an article, it would be quite
| interesting. With data and scripts to reproduce, of course.
| pushrax wrote:
| This entirely depends on what kind of index is used. A sorted
| index, such as a B or B+ tree (used in many SQL databases),
| will allow for fast point/range lookups in a continuous value
| space. A typical inverted index or hash based index only allows
| point lookups of specific values in a discrete value space.
| jugg1es wrote:
| It was a sorted BTREE index in MySQL 5.x. I agree that its
| supposed to be fast but it just wasn't for some reason.
| pushrax wrote:
| Are you sure it was actually using the index you expected?
| There are subtleties in index field order that can prevent
| the query optimizer from using an index that you might
| think it should be using.
|
| One common misstep is having a table with columns like (id,
| user, date), with an index on (user, id) and on (date, id),
| then issuing a query like "... WHERE user = 1 and date >
| ...". There is no optimal index for that query, so the
| optimizer will have to guess which one is better, or try to
| intersect them. In this example, it might use only the
| (user, id) index, and scan the date for all records with
| that user. A better index for this query would be (user,
| date, id).
| Izkata wrote:
| One of the more bizarre things we'd found in MySQL
| 5.something was that accidentally creating two identical
| indexes significantly slowed down queries that used it.
|
| I wouldn't be surprised if you hit some sort of similar
| strange bug.
| whoknowswhat11 wrote:
| I was just going to make this comment.
|
| Postgres as far as I know uses B-tree by default.
|
| You can switch sort order I think for this as well, so "most
| recent" becomes more efficient.
|
| Multi-column indexes also work, if you are just searching for
| first column postgres can still use multi-column index.
| dreyfan wrote:
| B-Tree indexes perform just fine on datetime fields. Were you
| using hash or inverted indexes maybe?
| Diggsey wrote:
| > Since virtually every value is going to be different,
| indexing and searching on that field is (almost) no better than
| having no index on the datetime field at all.
|
| Do you mean exact equality is rarely what you want because most
| of the values are different?
|
| Or are you talking about the negative effect on index size of
| having so many distinct values?
|
| I think the latter point could be quite database-dependent, eg.
| BTree de-duplication support was only added in Postgres 13.
| However, you could shave off quite a bit just from the fact
| that storing a date requires less space in the index than a
| datetime.
| drchickensalad wrote:
| I have no idea how you came to this conclusion, but indices on
| datetime fields are completely required to bring down the seek
| time from O(n) to O(log(N)), plus the length of the range.
| Massive parts of the businesses I've worked at would be simply
| impossible without this.
|
| The cardinality of the index has other ramifications that
| aren't generally important here.
| hn_throwaway_99 wrote:
| This a very strange, and incorrect, conclusion you have come
| to. It doesn't matter that datetime fields are unique, as most
| of the time you are not searching for a particular date time,
| but a range. I.e. "show me all of rows created between date X
| and Y". In that case, the ordering of the index makes it
| efficient to do that query.
|
| Furthermore, often times your query will have an "ORDER BY
| dateTimeCol" value on it (or, more commonly, ORDER BY
| dateTimeCol DESC). If your index is created correctly, it means
| it can return rows with a quick index scan instead of needing
| to implement a sort as part of the execution plan.
| mceachen wrote:
| Your result is surprising: I suspect your wide table with an
| index wasn't in cache, and your timestamp-only table was.
|
| The whole point is that indexes (can) prevent full table scans,
| which are expensive. This is true even if the column(s) you're
| indexing have high cardinality: but it relies on performant
| table or index joins (which, if your rdbms is configured with
| sufficient memory, should be the case).
| nlitened wrote:
| > The rub is that instead of just being slow, it also cost a fair
| amount of money because this crazy vendor system charged by the
| row or somesuch. So, by scanning the whole table, they touched
| all of those rows, and oh hey, massive amounts of money just set
| ablaze!
|
| Why _the hell_ is nobody mentioning that using a database that
| charges per row touched is absolute insanity? When has it become
| so normal that nobody mentions it?
| lern_too_spel wrote:
| If the amount charged isn't proportional to the work done, the
| cloud provider would quickly go out of business.
| nlitened wrote:
| My VPS provider gives me unlimited traffic, unlimited disk
| reads, and unlimited DB rows touched for fixed 30 dollars a
| month, regardless of how much CPU and RAM I keep loaded.
|
| This makes me think that these things are at least 100x
| cheaper than AWS might want to make me believe.
| ricardobeat wrote:
| You'll probably enjoy reading this article that was on the
| front page yesterday:
| https://briananglin.me/posts/spending-5k-to-learn-how-databa...
| [deleted]
| inopinatus wrote:
| The stench of *splaining in these remarks is nigh overwhelming,
| and mostly missing the point to boot.
___________________________________________________________________
(page generated 2021-11-07 23:00 UTC) |