[HN Gopher] What are the most important statistical ideas of the...
___________________________________________________________________
 
What are the most important statistical ideas of the past 50 years?
 
Author : Anon84
Score  : 215 points
Date   : 2022-02-21 16:46 UTC (6 hours ago)
 
web link (www.tandfonline.com)
w3m dump (www.tandfonline.com)
 
| ModernMech wrote:
| Kalman published his filter in 1960... a little over 50 years but
| I'd say it's worthy to mention given its huge impact. The idea
| that we can use multiple noisy sensors to get a more accurate
| reading than any one of them could provide enables all kinds of
| autonomous systems. Basically everything that uses sensors (which
| is essentially every device these days) is better due to this
| fact.
 
  | savant_penguin wrote:
  | And it uses almost all of the tricks in the book in a single
  | analytic model
  | 
  | Statistics+optimization+dynamic systems+linear algebra
 
| oxff wrote:
| "Throw more compute at it"
 
  | Q6T46nT668w6i3m wrote:
  | The authors agree. It's mentioned a handful of times.
 
| prout69098132 wrote:
 
  | prout69098132 wrote:
 
| prout69098132 wrote:
 
| westcort wrote:
| Not in the past 50 years, but more like the past 80 years,
| nonparametric statistics in general are pretty amazing, though
| underused. Look at the Mann-Whitney U test and the Wilcoxon Rank
| Sum. Those tests require very few assumptions.
 
  | dannykwells wrote:
  | These are the standard tests in most of biology at this time.
  | Not underused at all. Very powerful and lovely to not have to
  | assume normality.
 
| prout69098132 wrote:
 
| bell-cot wrote:
| The most important statistical idea of the past 50 years is the
| same as the most important statistical idea of the 50 years
| before that:
| 
| "Due to reduced superstition, better education, and general
| awareness & progress, humans who are neither meticulous
| statistics experts, nor working in very constrained & repetitive
| circumstances, will understand and apply statistics more
| objectively and correctly than they generally have in the past."
| 
| Sadly, this idea is still wrong.
 
| deepsquirrelnet wrote:
| Good review article. I always enjoy browsing the references, and
| found "Computer Age Statistical Inference" among them. Looks like
| a good read, with a pdf available online.
 
  | mjb wrote:
  | It's a great book. Short and to-the-point. Highly recommended.
 
| aabajian wrote:
| Didn't read the document, but hopefully it mentions PageRank, the
| prime example of using probabilistic graphical models to rank
| nodes in a directed graph. More info:
| https://www.amazon.com/Probabilistic-Graphical-Models-Princi...
| 
| I've heard that Google and Baidu essentially started at the same
| time, with the same algorithm discovery (PageRank). Maybe someone
| can comment on if there was idea sharing or if both teams derived
| it independently.
 
  | nabla9 wrote:
  | Page Rank is just application of eigenvalues into ranking.
  | 
  | The idea came first up in the 70's.
  | https://www.sciencedirect.com/science/article/abs/pii/030645...
  | and several times afterward before PageRank was developed.
 
  | mianos wrote:
  | The sort of methods 'PageRank' uses already existed. It reminds
  | of Apple `inventing` (air quotes) the mp3 player. It didn't, it
  | applied existing technology, refined it and publicized it. They
  | did not invent it but maybe 'inventing' something is only a
  | very small part of making something useful for many people.
 
  | bjourne wrote:
  | PageRank actually had a predecessor called HITS (according to
  | some sources HITS were developed before PageRank, according to
  | others they were contemporaries), an algorithm developed by Jon
  | Kleinberg for ranking hypertext documents.
  | https://en.wikipedia.org/wiki/HITS_algorithm However, Kleinberg
  | stayed in academia and never attempted to commercialize his
  | research like Page and Brin did. HITS was more complex than
  | PageRank and context-sensitive so queries required much more
  | computing resources than PageRank. PageRank is _kind of_ what
  | you get if you take HITS and remove the slow parts.
  | 
  | What I find very interesting about PageRank is how you can
  | trade accuracy for performance. The traditional way of
  | calculating PageRank by means of squaring a matrix iteratively
  | until it reaches convergence gives you correct results but is
  | sloooooow. For a modestly sized graph it could take days. But
  | if accuracy isn't that important you can use Monte Carlo
  | simulation and get most of the PageRank correct in a fraction
  | of the time of the iterative method. It's also easy to
  | parallelize.
 
    | jll29 wrote:
    | Page's PageRank patent references HITS:
    | 
    | Jon M. Kleinberg, "Authoritative sources in a hyperlinked
    | environment," 1998, Proc. Of the 9th Annual ACM-SIAM
    | Symposium on Discrete Algorithms, pp. 668-677.
 
  | mach1ne wrote:
  | Didn't Larry Page and Sergey Brin openly publicize the PageRank
  | algorithm? It'd seem more likely that Baidu just copypasted the
  | idea.
 
  | oneoff786 wrote:
  | The basic concept behind page rank is pretty obvious. If you
  | stare at a graph for a while, it'll probably be your big idea
  | if you try to imagine centrality calculations.
  | 
  | Implementing it and catching edge cases isn't trivial
 
  | screye wrote:
  | Given that Pagerank was literally invented and named after
  | Larry Page, I would think that Google had a head start.
  | 
  | That being said, Page Rank is a more a stellar example of
  | adapting an academic idea into practice, than a statistical
  | idea in and of itself.
  | 
  | Afterall, it is 'merely' the stationary distribution for a
  | random walk over an undirected graph. I say 'merely' with a lot
  | of respect, because the best ideas often feel simple in
  | hindsight. But, it is that simplicity that makes them even more
  | impressive.
 
  | andi999 wrote:
  | I heard that the first approach of google was using/adapting an
  | published algorithm which was used to rank scientific
  | publications from the network of citations. Not sure if this is
  | the algorithm you mentioned though.
 
    | divbzero wrote:
    | The ranking of scientific publications based on citations
    | you're describing is impact factor [1]. I haven't heard that
    | as an inspiration for Larry Page's PageRank [2] but that is
    | plausible.
    | 
    | [1]: https://en.wikipedia.org/wiki/Impact_factor
    | 
    | [2]: https://en.wikipedia.org/wiki/PageRank
 
  | ppsreejith wrote:
  | From the wikipedia page of Robin Li, co-founder of Baidu:
  | https://en.wikipedia.org/wiki/Robin_Li#RankDex
  | 
  | > In 1996, while at IDD, Li created the Rankdex site-scoring
  | algorithm for search engine page ranking, which was awarded a
  | U.S. patent. It was the first search engine that used
  | hyperlinks to measure the quality of websites it was indexing,
  | predating the very similar algorithm patent filed by Google two
  | years later in 1998.
 
| oraoraoraoraora wrote:
| Statistical process control has been significant over the last
| 100 years.
 
  | avs733 wrote:
  | I would agree with you but they are speaking to a different
  | audience. This is in a journal for statistics researchers and
  | theorists. These would all be things that would inform the
  | creation of pragmatic tools like SPC.
 
| csee wrote:
| Meta-analysis techniques like funnel plots.
 
  | pacbard wrote:
  | Meta-analysis is an application of idea #4 (Bayesian Multilevel
  | Models) in the article.
  | 
  | What makes meta-analysis special within a multilevel framework
  | is that you know the level 1 variance. This creates a special
  | case of a generalized multilevel model where you leverage your
  | knowledge of L1 mean and variance (from each individual study's
  | results) to estimate the possible mean and variance of the
  | population effect.
  | 
  | The population mean and variance is usually presented in funnel
  | plots where you can see the expected distribution of effect
  | sizes/point estimates given a sample size/standard error.
  | 
  | Researchers have also started to plot actual point estimates
  | from published papers in this plot, showing that most of the
  | published results are "inside the funnel", a result that is
  | usually cited as evidence of publication bias. In other words,
  | the missing studies end up in researchers' file drawers instead
  | of being published somewhere.
 
| bobbyd2323 wrote:
| Bootstrap
 
| datastoat wrote:
| Validation on holdout sets.
| 
| When I was a student in the 1990s, I was taught about hypothesis
| testing (and all the hassle of p-fishing etc.), and about
| Bayesian inference (which is lovely, until you have to invent
| priors over the model space -- e.g. a prior over neural network
| architectures). These are both systems that tie themselves in
| epistemological knots when trying to answer the simple question
| "What model shall I use?"
| 
| Holdout set validation is such a clean simple idea, and so easy
| to use (as long as you have big data), and it does away with all
| the frequentist and Bayesian tangle, which is why it's so
| widespread in ML nowadays.
| 
| It also aligns statistical inference with Popper's idea of
| scientific falsifiability -- scientists test their models against
| a new experimental data, data scientists can test their model
| against qualitatively different holdout sets. (Just make sure you
| don't get your holdout set by shuffling, since that's not what
| Popper would call a "genuine risky validation".)
| 
| The article mentions Breiman's "alternative view of the
| foundations of statistics based on prediction rather than
| modeling". That's not general enough, since it doesn't
| accommodate generative modelling (e.g. GPT, GANs). I think it's
| better to frame ML in terms of "evaluating model fit on a holdout
| set", since that accommodates both predictive and generative
| modelling.
 
  | anxrn wrote:
  | Very much agree with the simplicity and power of separation of
  | training, validation and test sets. Is this really a 'big data'
  | era notion though? This was fairly standard in 90s era language
  | and speech work.
 
| pandoro wrote:
| Solomonoff Induction. Although proven to be uncomputable (there
| are people working on formalizing efficient approximations) it is
| such a mind-blowing idea. It brings together Occam's razor,
| Epicurus' Principle of multiple explanations, Bayes' theorem,
| Algorithmic Information Theory and Universal Turing machines in a
| theory of universal induction. The mathematical proof and details
| are way above my head but I cannot help but feel like it is very
| underrated.
 
  | spekcular wrote:
  | Statistics is an applied science, and Solomonoff induction has
  | had zero practical impact. So I feel it's not underrated at
  | all, and perhaps overrated among a certain crowd.
 
| ThouYS wrote:
| Bootstrap resampling is such a black magic thing
 
  | graycat wrote:
  | There is a nice treatment of resampling, i.e., _permutation_
  | tests, in (from my TeX format bibliography)
  | 
  | Sidney Siegel, {\it Nonparametric Statistics for the Behavioral
  | Sciences,\/} McGraw-Hill, New York, 1956.\ \
  | 
  | Right, there was already a good book on such tests over 50
  | years ago.
  | 
  | Can also justify it with an independence, identically
  | distributed assumption. But a weaker assumption of
  | _exchangeability_ can also work -- I published a paper with
  | that.
  | 
  | The broad idea of such a statistical hypothesis test is to
  | decide on the _null_ hypothesis, _null_ as in no effect (if
  | looking for an effect, then want to reject the null hypothesis
  | of no effect) and to make assumptions to permit calculating the
  | probability of what you observe. If that probability is way too
  | small then reject the null hypothesis and conclude that there
  | was an effect. Right, it 's fishy.
 
    | jll29 wrote:
    | The 2nd edition is never far from my desk:
    | 
    | Siegel, S., & Castellan, N. J. (1988). Nonparametric
    | statistics for the behavioral sciences (2nd ed.) New York:
    | McGraw-Hill.
 
  | CrazyStat wrote:
  | One way to approach the bootstrap is as sampling from the
  | posterior mean of a Dirichlet Process model with a
  | noninformative prior (alpha=0).
 
  | derbOac wrote:
  | It's just Monte Carlo simulation using the observed
  | distribution as the population distribution.
 
    | grayclhn wrote:
    | 1) It's not -- there are lots of procedures called "the
    | bootstrap" that act differently.
    | 
    | 2) The fact that "substitute the data for the population
    | distribution" both works and is sometimes provably better
    | than other more sensible approaches is a little mind blowing.
    | 
    | Most things called the bootstrap feel like cheating, ie "this
    | part seems hard, let's do the easiest thing possible instead
    | and hope it works."
 
    | civilized wrote:
    | It's not the mere description of the procedure that people
    | find mysterious.
 
    | btown wrote:
    | This has big "A monad is just a monoid in the category of
    | endofunctors" energy
 
| vanattab wrote:
| The most important to me is "There are three kinds of lies in
| this world. Lies, damn lies, and statistics."
| 
| Not attacking the mathematically field of statistics just
| pointing out that lots of people abuse statistics in an attempt
| to get people to behave as they would prefer.
 
| jll29 wrote:
| Off-the-cuff, i.e. without digging deeply into a set of history
| of statistics books:
| 
| Tied 1st place:
| 
| * Markov chain Monte Carlo (MCMC) and the Metropolis-Hastings
| algorithm
| 
| * Hidden Markov Models and the Viterbi algorithm for most
| probable sequence in linear time
| 
| * Vapnik-Chervonenkis theory of statistical learning (Vladimir
| Naumovich Vapnik & Alexey Chervonenkis) and SVMs
| 
| 4th place:
| 
| * Edwin Jaynes: maximum entropy for constructing priors
| (borderline: 1957)
| 
| Honorable mentions:
| 
| * Breiman et al.'s CART (Classification and Regression Trees)
| algorithm (and Quinlan's C5.0 extension)
| 
| * Box-Jenkins method (autoregressive moving average (ARMA) /
| autoregressive integrated moving average (ARIMA) to find the best
| fit of a time-series model to past values of a time series)
| 
| (The beginning of the 20th century was much more fertile in
| comparison - Kolmogorov, Fisher, Gosset, Aitken, Cox, de Finetti,
| Kullback, the Pearsons, Spearman etc.)
 
  | dlg wrote:
  | I generally agree with you. However, as a pedantic note,
  | Metrolopis, Rosenbluth, Rosenbluth, Teller and Teller was in
  | 1953 and Hastings was 1970.
 
| mlcrypto wrote:
 
| uoaei wrote:
| IMO, kernel-based computational methods are by far the most
| important _overlooked_ advances in the statistical sciences.
| 
| Kernel methods are linear methods on data projected into very-
| high-dimensional spaces, and you get basically all the benefits
| of linear methods (convexity, access to analytical
| techniques/manipulations, etc.) while being much more
| computationally tractable and data-efficient than a naive
| approach. Maximum mean discrepancy (MMD) is a particularly shiny
| result from the last few years.
| 
| The tradeoff is that you must use an adequate kernel for whatever
| procedure you intend, and these can sometimes have sneaky
| pitfalls. A crass example would be the relative failure of tSNE
| and similar kernel-based visualization tools: in the case of tSNE
| the Cauchy kernel's tails are extremely fat which ends up
| degrading the representation of intra- vs inter-cluster
| distances.
 
  | teshier-A wrote:
  | My experience with MMD is that unless you've been using it and
  | are familiar with other kernel methods, you probably won't know
  | what to do with it : what kernel do I use ? how can I test for
  | significance (in any sense of the word) ? Add the (last I
  | checked) horrendous complexity, to me it looks like a less
  | usable mutual information (or KL div) and without all the nice
  | information theory around it.
 
| enriquto wrote:
| My favourite is Mandelbrot's heuristic converse of the central
| limit theorem: the _only_ variables that are normal are those
| that are sums of many variables of finite variance.
 
| andi999 wrote:
| Identifying p value hacking.
 
  | hackernewds wrote:
  | what does that mean?
 
    | pacbard wrote:
    | p-hacking is a research "dark pattern" where a researcher
    | fits several similar models and reports only the one that has
    | the significant p-value for the relationship of interest.
    | 
    | This strategy is possible because p-values are themselves
    | stochastic and a researcher will find one significant p-value
    | for every 20 models that they run (at least on average).
    | 
    | p-hacking could also refer to pushing a p-value close to the
    | significant cut-off (usually 0.05) by modifying the
    | statistical model slightly until the desired result is
    | achieved. This process usually involves the inclusion of
    | control variables that are not really related to the outcome
    | but that will change the standard errors/p-values.
    | 
    | Another way to p-hack is to drop specific observations until
    | the desired p-value is reached. This process usually involves
    | removing participants from a sample for a seemingly
    | legitimate reason until the desired p-value is achieved.
    | Usually identifying and eliminating a few high leverage
    | observations is enough to change the significance level of a
    | point estimate.
    | 
    | Multiple strategies to address p-hacking have been proposed
    | and discussed. One of the most popular ones is pre-
    | registration of research designs and models. The idea here is
    | that a researcher would publish their research design and
    | models before conducting the experiment and they will report
    | only the results from the pre-registered models. This process
    | eliminates the "fishing expedition" nature of p-hacking.
    | 
    | Other strategies involve better research designs that are not
    | sensitive to model respecification. These are usually
    | experimental and quasi-experimental methods that leverage an
    | external source of variation (external to both the researcher
    | and the studied system, like random assignment to conditions)
    | to isolate the relationship between two variables.
 
      | pthread_t wrote:
      | I saw this firsthand as an undergrad research assistant in
      | a neuroscience lab. How did it go when I brought it up?
      | Swept under the rug and published in a high-impact journal.
 
    | ldiracdelta wrote:
    | I believe it is referencing "The Replication Crisis"
    | https://en.wikipedia.org/wiki/Replication_crisis
 
  | [deleted]
 
| oneoff786 wrote:
| Shap values and other methods to parse out the inner workings for
| "black box" machine learning models. They're good enough that
| I've grown fond of just throwing a light gbm model at everything
| and calling it a day for that sweet spot of predictive power and
| ease of implementation.
 
  | hackernewds wrote:
  | Would be so kind to share an example or resources to learn
  | about this?
 
    | screye wrote:
    | [1] Scott is the leading authority on all things SHAP (he
    | wrote the seminal paper on it).
    | 
    | [2] Chris Molnar's interpretable learning book has chapters
    | on Shapeley values and SHAP. If you'd prefer text instead of
    | video.
    | 
    | [1] https://www.youtube.com/watch?v=B-c8tIgchu0
    | 
    | [2] https://christophm.github.io/interpretable-ml-
    | book/shapley.h...
 
    | magneticnorth wrote:
    | Seconding Chris Molnar's excellent writeup. I also find the
    | readme & example notebooks in Scott Lundberg's github repo to
    | be a great way to get started. There are also references
    | there for the original papers, which are surprisingly
    | readable, imo. https://github.com/slundberg/shap
 
  | teruakohatu wrote:
  | > I've grown fond of just throwing a light gbm model at
  | everything and calling it a day
  | 
  | It is not always a good idea to do that. Always try different
  | methods, there is no ultimate method. At the very least OLS
  | should be tried and some other fully explainable methods, even
  | a simple CART like method.
 
    | [deleted]
 
    | csee wrote:
    | OLS is my default go to. It outperforms a random forest so
    | often in small data real world applications over
    | nonstationary data, and model explainability is built into
    | it. If I'm working in a domain with stationary data then I'd
    | tilt more to the forest (due to not having to engineer
    | features, and the inbuilt ability to detect non-linear
    | relationships and interactions between features).
 
  | hervature wrote:
  | Strongly disagree. Shapley values and LIME give a very crude
  | and extremely limited understanding of the model. They
  | basically amount to a random selection of local slopes. For
  | instance, if I tell you the (randomly selected) slopes of a
  | function are [0.5, 0.2, 12.4, 1.1, 2.6] which the average is
  | 3.3, can you guess anything? You might notice it is monotonic
  | (maybe) but you certainly don't guess it is e^x.
 
    | LeanderK wrote:
    | I would say that our ML models are not predictable enough yet
    | at the local neighbourhood to really trust LIME. Adversarial
    | examples prove that you just can't select a small enough
    | range since you can always find those even for super tiny
    | distances.
 
  | sockpuppet69 wrote:
 
___________________________________________________________________
(page generated 2022-02-21 23:00 UTC)