proxy70

	[HN Gopher] What are the most important statistical ideas of the... ___________________________________________________________________ What are the most important statistical ideas of the past 50 years? Author : Anon84 Score : 215 points Date : 2022-02-21 16:46 UTC (6 hours ago)
	web link (www.tandfonline.com)
	w3m dump (www.tandfonline.com)
	\| ModernMech wrote: \| Kalman published his filter in 1960... a little over 50 years but \| I'd say it's worthy to mention given its huge impact. The idea \| that we can use multiple noisy sensors to get a more accurate \| reading than any one of them could provide enables all kinds of \| autonomous systems. Basically everything that uses sensors (which \| is essentially every device these days) is better due to this \| fact. \| savant_penguin wrote: \| And it uses almost all of the tricks in the book in a single \| analytic model \| \| Statistics+optimization+dynamic systems+linear algebra \| oxff wrote: \| "Throw more compute at it" \| Q6T46nT668w6i3m wrote: \| The authors agree. It's mentioned a handful of times. \| prout69098132 wrote: \| prout69098132 wrote: \| prout69098132 wrote: \| westcort wrote: \| Not in the past 50 years, but more like the past 80 years, \| nonparametric statistics in general are pretty amazing, though \| underused. Look at the Mann-Whitney U test and the Wilcoxon Rank \| Sum. Those tests require very few assumptions. \| dannykwells wrote: \| These are the standard tests in most of biology at this time. \| Not underused at all. Very powerful and lovely to not have to \| assume normality. \| prout69098132 wrote: \| bell-cot wrote: \| The most important statistical idea of the past 50 years is the \| same as the most important statistical idea of the 50 years \| before that: \| \| "Due to reduced superstition, better education, and general \| awareness & progress, humans who are neither meticulous \| statistics experts, nor working in very constrained & repetitive \| circumstances, will understand and apply statistics more \| objectively and correctly than they generally have in the past." \| \| Sadly, this idea is still wrong. \| deepsquirrelnet wrote: \| Good review article. I always enjoy browsing the references, and \| found "Computer Age Statistical Inference" among them. Looks like \| a good read, with a pdf available online. \| mjb wrote: \| It's a great book. Short and to-the-point. Highly recommended. \| aabajian wrote: \| Didn't read the document, but hopefully it mentions PageRank, the \| prime example of using probabilistic graphical models to rank \| nodes in a directed graph. More info: \| https://www.amazon.com/Probabilistic-Graphical-Models-Princi... \| \| I've heard that Google and Baidu essentially started at the same \| time, with the same algorithm discovery (PageRank). Maybe someone \| can comment on if there was idea sharing or if both teams derived \| it independently. \| nabla9 wrote: \| Page Rank is just application of eigenvalues into ranking. \| \| The idea came first up in the 70's. \| https://www.sciencedirect.com/science/article/abs/pii/030645... \| and several times afterward before PageRank was developed. \| mianos wrote: \| The sort of methods 'PageRank' uses already existed. It reminds \| of Apple `inventing` (air quotes) the mp3 player. It didn't, it \| applied existing technology, refined it and publicized it. They \| did not invent it but maybe 'inventing' something is only a \| very small part of making something useful for many people. \| bjourne wrote: \| PageRank actually had a predecessor called HITS (according to \| some sources HITS were developed before PageRank, according to \| others they were contemporaries), an algorithm developed by Jon \| Kleinberg for ranking hypertext documents. \| https://en.wikipedia.org/wiki/HITS_algorithm However, Kleinberg \| stayed in academia and never attempted to commercialize his \| research like Page and Brin did. HITS was more complex than \| PageRank and context-sensitive so queries required much more \| computing resources than PageRank. PageRank is _kind of_ what \| you get if you take HITS and remove the slow parts. \| \| What I find very interesting about PageRank is how you can \| trade accuracy for performance. The traditional way of \| calculating PageRank by means of squaring a matrix iteratively \| until it reaches convergence gives you correct results but is \| sloooooow. For a modestly sized graph it could take days. But \| if accuracy isn't that important you can use Monte Carlo \| simulation and get most of the PageRank correct in a fraction \| of the time of the iterative method. It's also easy to \| parallelize. \| jll29 wrote: \| Page's PageRank patent references HITS: \| \| Jon M. Kleinberg, "Authoritative sources in a hyperlinked \| environment," 1998, Proc. Of the 9th Annual ACM-SIAM \| Symposium on Discrete Algorithms, pp. 668-677. \| mach1ne wrote: \| Didn't Larry Page and Sergey Brin openly publicize the PageRank \| algorithm? It'd seem more likely that Baidu just copypasted the \| idea. \| oneoff786 wrote: \| The basic concept behind page rank is pretty obvious. If you \| stare at a graph for a while, it'll probably be your big idea \| if you try to imagine centrality calculations. \| \| Implementing it and catching edge cases isn't trivial \| screye wrote: \| Given that Pagerank was literally invented and named after \| Larry Page, I would think that Google had a head start. \| \| That being said, Page Rank is a more a stellar example of \| adapting an academic idea into practice, than a statistical \| idea in and of itself. \| \| Afterall, it is 'merely' the stationary distribution for a \| random walk over an undirected graph. I say 'merely' with a lot \| of respect, because the best ideas often feel simple in \| hindsight. But, it is that simplicity that makes them even more \| impressive. \| andi999 wrote: \| I heard that the first approach of google was using/adapting an \| published algorithm which was used to rank scientific \| publications from the network of citations. Not sure if this is \| the algorithm you mentioned though. \| divbzero wrote: \| The ranking of scientific publications based on citations \| you're describing is impact factor [1]. I haven't heard that \| as an inspiration for Larry Page's PageRank [2] but that is \| plausible. \| \| [1]: https://en.wikipedia.org/wiki/Impact_factor \| \| [2]: https://en.wikipedia.org/wiki/PageRank \| ppsreejith wrote: \| From the wikipedia page of Robin Li, co-founder of Baidu: \| https://en.wikipedia.org/wiki/Robin_Li#RankDex \| \| > In 1996, while at IDD, Li created the Rankdex site-scoring \| algorithm for search engine page ranking, which was awarded a \| U.S. patent. It was the first search engine that used \| hyperlinks to measure the quality of websites it was indexing, \| predating the very similar algorithm patent filed by Google two \| years later in 1998. \| oraoraoraoraora wrote: \| Statistical process control has been significant over the last \| 100 years. \| avs733 wrote: \| I would agree with you but they are speaking to a different \| audience. This is in a journal for statistics researchers and \| theorists. These would all be things that would inform the \| creation of pragmatic tools like SPC. \| csee wrote: \| Meta-analysis techniques like funnel plots. \| pacbard wrote: \| Meta-analysis is an application of idea #4 (Bayesian Multilevel \| Models) in the article. \| \| What makes meta-analysis special within a multilevel framework \| is that you know the level 1 variance. This creates a special \| case of a generalized multilevel model where you leverage your \| knowledge of L1 mean and variance (from each individual study's \| results) to estimate the possible mean and variance of the \| population effect. \| \| The population mean and variance is usually presented in funnel \| plots where you can see the expected distribution of effect \| sizes/point estimates given a sample size/standard error. \| \| Researchers have also started to plot actual point estimates \| from published papers in this plot, showing that most of the \| published results are "inside the funnel", a result that is \| usually cited as evidence of publication bias. In other words, \| the missing studies end up in researchers' file drawers instead \| of being published somewhere. \| bobbyd2323 wrote: \| Bootstrap \| datastoat wrote: \| Validation on holdout sets. \| \| When I was a student in the 1990s, I was taught about hypothesis \| testing (and all the hassle of p-fishing etc.), and about \| Bayesian inference (which is lovely, until you have to invent \| priors over the model space -- e.g. a prior over neural network \| architectures). These are both systems that tie themselves in \| epistemological knots when trying to answer the simple question \| "What model shall I use?" \| \| Holdout set validation is such a clean simple idea, and so easy \| to use (as long as you have big data), and it does away with all \| the frequentist and Bayesian tangle, which is why it's so \| widespread in ML nowadays. \| \| It also aligns statistical inference with Popper's idea of \| scientific falsifiability -- scientists test their models against \| a new experimental data, data scientists can test their model \| against qualitatively different holdout sets. (Just make sure you \| don't get your holdout set by shuffling, since that's not what \| Popper would call a "genuine risky validation".) \| \| The article mentions Breiman's "alternative view of the \| foundations of statistics based on prediction rather than \| modeling". That's not general enough, since it doesn't \| accommodate generative modelling (e.g. GPT, GANs). I think it's \| better to frame ML in terms of "evaluating model fit on a holdout \| set", since that accommodates both predictive and generative \| modelling. \| anxrn wrote: \| Very much agree with the simplicity and power of separation of \| training, validation and test sets. Is this really a 'big data' \| era notion though? This was fairly standard in 90s era language \| and speech work. \| pandoro wrote: \| Solomonoff Induction. Although proven to be uncomputable (there \| are people working on formalizing efficient approximations) it is \| such a mind-blowing idea. It brings together Occam's razor, \| Epicurus' Principle of multiple explanations, Bayes' theorem, \| Algorithmic Information Theory and Universal Turing machines in a \| theory of universal induction. The mathematical proof and details \| are way above my head but I cannot help but feel like it is very \| underrated. \| spekcular wrote: \| Statistics is an applied science, and Solomonoff induction has \| had zero practical impact. So I feel it's not underrated at \| all, and perhaps overrated among a certain crowd. \| ThouYS wrote: \| Bootstrap resampling is such a black magic thing \| graycat wrote: \| There is a nice treatment of resampling, i.e., _permutation_ \| tests, in (from my TeX format bibliography) \| \| Sidney Siegel, {\it Nonparametric Statistics for the Behavioral \| Sciences,\/} McGraw-Hill, New York, 1956.\ \ \| \| Right, there was already a good book on such tests over 50 \| years ago. \| \| Can also justify it with an independence, identically \| distributed assumption. But a weaker assumption of \| _exchangeability_ can also work -- I published a paper with \| that. \| \| The broad idea of such a statistical hypothesis test is to \| decide on the _null_ hypothesis, _null_ as in no effect (if \| looking for an effect, then want to reject the null hypothesis \| of no effect) and to make assumptions to permit calculating the \| probability of what you observe. If that probability is way too \| small then reject the null hypothesis and conclude that there \| was an effect. Right, it 's fishy. \| jll29 wrote: \| The 2nd edition is never far from my desk: \| \| Siegel, S., & Castellan, N. J. (1988). Nonparametric \| statistics for the behavioral sciences (2nd ed.) New York: \| McGraw-Hill. \| CrazyStat wrote: \| One way to approach the bootstrap is as sampling from the \| posterior mean of a Dirichlet Process model with a \| noninformative prior (alpha=0). \| derbOac wrote: \| It's just Monte Carlo simulation using the observed \| distribution as the population distribution. \| grayclhn wrote: \| 1) It's not -- there are lots of procedures called "the \| bootstrap" that act differently. \| \| 2) The fact that "substitute the data for the population \| distribution" both works and is sometimes provably better \| than other more sensible approaches is a little mind blowing. \| \| Most things called the bootstrap feel like cheating, ie "this \| part seems hard, let's do the easiest thing possible instead \| and hope it works." \| civilized wrote: \| It's not the mere description of the procedure that people \| find mysterious. \| btown wrote: \| This has big "A monad is just a monoid in the category of \| endofunctors" energy \| vanattab wrote: \| The most important to me is "There are three kinds of lies in \| this world. Lies, damn lies, and statistics." \| \| Not attacking the mathematically field of statistics just \| pointing out that lots of people abuse statistics in an attempt \| to get people to behave as they would prefer. \| jll29 wrote: \| Off-the-cuff, i.e. without digging deeply into a set of history \| of statistics books: \| \| Tied 1st place: \| \| * Markov chain Monte Carlo (MCMC) and the Metropolis-Hastings \| algorithm \| \| * Hidden Markov Models and the Viterbi algorithm for most \| probable sequence in linear time \| \| * Vapnik-Chervonenkis theory of statistical learning (Vladimir \| Naumovich Vapnik & Alexey Chervonenkis) and SVMs \| \| 4th place: \| \| * Edwin Jaynes: maximum entropy for constructing priors \| (borderline: 1957) \| \| Honorable mentions: \| \| * Breiman et al.'s CART (Classification and Regression Trees) \| algorithm (and Quinlan's C5.0 extension) \| \| * Box-Jenkins method (autoregressive moving average (ARMA) / \| autoregressive integrated moving average (ARIMA) to find the best \| fit of a time-series model to past values of a time series) \| \| (The beginning of the 20th century was much more fertile in \| comparison - Kolmogorov, Fisher, Gosset, Aitken, Cox, de Finetti, \| Kullback, the Pearsons, Spearman etc.) \| dlg wrote: \| I generally agree with you. However, as a pedantic note, \| Metrolopis, Rosenbluth, Rosenbluth, Teller and Teller was in \| 1953 and Hastings was 1970. \| mlcrypto wrote: \| uoaei wrote: \| IMO, kernel-based computational methods are by far the most \| important _overlooked_ advances in the statistical sciences. \| \| Kernel methods are linear methods on data projected into very- \| high-dimensional spaces, and you get basically all the benefits \| of linear methods (convexity, access to analytical \| techniques/manipulations, etc.) while being much more \| computationally tractable and data-efficient than a naive \| approach. Maximum mean discrepancy (MMD) is a particularly shiny \| result from the last few years. \| \| The tradeoff is that you must use an adequate kernel for whatever \| procedure you intend, and these can sometimes have sneaky \| pitfalls. A crass example would be the relative failure of tSNE \| and similar kernel-based visualization tools: in the case of tSNE \| the Cauchy kernel's tails are extremely fat which ends up \| degrading the representation of intra- vs inter-cluster \| distances. \| teshier-A wrote: \| My experience with MMD is that unless you've been using it and \| are familiar with other kernel methods, you probably won't know \| what to do with it : what kernel do I use ? how can I test for \| significance (in any sense of the word) ? Add the (last I \| checked) horrendous complexity, to me it looks like a less \| usable mutual information (or KL div) and without all the nice \| information theory around it. \| enriquto wrote: \| My favourite is Mandelbrot's heuristic converse of the central \| limit theorem: the _only_ variables that are normal are those \| that are sums of many variables of finite variance. \| andi999 wrote: \| Identifying p value hacking. \| hackernewds wrote: \| what does that mean? \| pacbard wrote: \| p-hacking is a research "dark pattern" where a researcher \| fits several similar models and reports only the one that has \| the significant p-value for the relationship of interest. \| \| This strategy is possible because p-values are themselves \| stochastic and a researcher will find one significant p-value \| for every 20 models that they run (at least on average). \| \| p-hacking could also refer to pushing a p-value close to the \| significant cut-off (usually 0.05) by modifying the \| statistical model slightly until the desired result is \| achieved. This process usually involves the inclusion of \| control variables that are not really related to the outcome \| but that will change the standard errors/p-values. \| \| Another way to p-hack is to drop specific observations until \| the desired p-value is reached. This process usually involves \| removing participants from a sample for a seemingly \| legitimate reason until the desired p-value is achieved. \| Usually identifying and eliminating a few high leverage \| observations is enough to change the significance level of a \| point estimate. \| \| Multiple strategies to address p-hacking have been proposed \| and discussed. One of the most popular ones is pre- \| registration of research designs and models. The idea here is \| that a researcher would publish their research design and \| models before conducting the experiment and they will report \| only the results from the pre-registered models. This process \| eliminates the "fishing expedition" nature of p-hacking. \| \| Other strategies involve better research designs that are not \| sensitive to model respecification. These are usually \| experimental and quasi-experimental methods that leverage an \| external source of variation (external to both the researcher \| and the studied system, like random assignment to conditions) \| to isolate the relationship between two variables. \| pthread_t wrote: \| I saw this firsthand as an undergrad research assistant in \| a neuroscience lab. How did it go when I brought it up? \| Swept under the rug and published in a high-impact journal. \| ldiracdelta wrote: \| I believe it is referencing "The Replication Crisis" \| https://en.wikipedia.org/wiki/Replication_crisis \| [deleted] \| oneoff786 wrote: \| Shap values and other methods to parse out the inner workings for \| "black box" machine learning models. They're good enough that \| I've grown fond of just throwing a light gbm model at everything \| and calling it a day for that sweet spot of predictive power and \| ease of implementation. \| hackernewds wrote: \| Would be so kind to share an example or resources to learn \| about this? \| screye wrote: \| [1] Scott is the leading authority on all things SHAP (he \| wrote the seminal paper on it). \| \| [2] Chris Molnar's interpretable learning book has chapters \| on Shapeley values and SHAP. If you'd prefer text instead of \| video. \| \| [1] https://www.youtube.com/watch?v=B-c8tIgchu0 \| \| [2] https://christophm.github.io/interpretable-ml- \| book/shapley.h... \| magneticnorth wrote: \| Seconding Chris Molnar's excellent writeup. I also find the \| readme & example notebooks in Scott Lundberg's github repo to \| be a great way to get started. There are also references \| there for the original papers, which are surprisingly \| readable, imo. https://github.com/slundberg/shap \| teruakohatu wrote: \| > I've grown fond of just throwing a light gbm model at \| everything and calling it a day \| \| It is not always a good idea to do that. Always try different \| methods, there is no ultimate method. At the very least OLS \| should be tried and some other fully explainable methods, even \| a simple CART like method. \| [deleted] \| csee wrote: \| OLS is my default go to. It outperforms a random forest so \| often in small data real world applications over \| nonstationary data, and model explainability is built into \| it. If I'm working in a domain with stationary data then I'd \| tilt more to the forest (due to not having to engineer \| features, and the inbuilt ability to detect non-linear \| relationships and interactions between features). \| hervature wrote: \| Strongly disagree. Shapley values and LIME give a very crude \| and extremely limited understanding of the model. They \| basically amount to a random selection of local slopes. For \| instance, if I tell you the (randomly selected) slopes of a \| function are [0.5, 0.2, 12.4, 1.1, 2.6] which the average is \| 3.3, can you guess anything? You might notice it is monotonic \| (maybe) but you certainly don't guess it is e^x. \| LeanderK wrote: \| I would say that our ML models are not predictable enough yet \| at the local neighbourhood to really trust LIME. Adversarial \| examples prove that you just can't select a small enough \| range since you can always find those even for super tiny \| distances. \| sockpuppet69 wrote: ___________________________________________________________________ (page generated 2022-02-21 23:00 UTC)