|
| ModernMech wrote:
| Kalman published his filter in 1960... a little over 50 years but
| I'd say it's worthy to mention given its huge impact. The idea
| that we can use multiple noisy sensors to get a more accurate
| reading than any one of them could provide enables all kinds of
| autonomous systems. Basically everything that uses sensors (which
| is essentially every device these days) is better due to this
| fact.
| savant_penguin wrote:
| And it uses almost all of the tricks in the book in a single
| analytic model
|
| Statistics+optimization+dynamic systems+linear algebra
| oxff wrote:
| "Throw more compute at it"
| Q6T46nT668w6i3m wrote:
| The authors agree. It's mentioned a handful of times.
| prout69098132 wrote:
| prout69098132 wrote:
| prout69098132 wrote:
| westcort wrote:
| Not in the past 50 years, but more like the past 80 years,
| nonparametric statistics in general are pretty amazing, though
| underused. Look at the Mann-Whitney U test and the Wilcoxon Rank
| Sum. Those tests require very few assumptions.
| dannykwells wrote:
| These are the standard tests in most of biology at this time.
| Not underused at all. Very powerful and lovely to not have to
| assume normality.
| prout69098132 wrote:
| bell-cot wrote:
| The most important statistical idea of the past 50 years is the
| same as the most important statistical idea of the 50 years
| before that:
|
| "Due to reduced superstition, better education, and general
| awareness & progress, humans who are neither meticulous
| statistics experts, nor working in very constrained & repetitive
| circumstances, will understand and apply statistics more
| objectively and correctly than they generally have in the past."
|
| Sadly, this idea is still wrong.
| deepsquirrelnet wrote:
| Good review article. I always enjoy browsing the references, and
| found "Computer Age Statistical Inference" among them. Looks like
| a good read, with a pdf available online.
| mjb wrote:
| It's a great book. Short and to-the-point. Highly recommended.
| aabajian wrote:
| Didn't read the document, but hopefully it mentions PageRank, the
| prime example of using probabilistic graphical models to rank
| nodes in a directed graph. More info:
| https://www.amazon.com/Probabilistic-Graphical-Models-Princi...
|
| I've heard that Google and Baidu essentially started at the same
| time, with the same algorithm discovery (PageRank). Maybe someone
| can comment on if there was idea sharing or if both teams derived
| it independently.
| nabla9 wrote:
| Page Rank is just application of eigenvalues into ranking.
|
| The idea came first up in the 70's.
| https://www.sciencedirect.com/science/article/abs/pii/030645...
| and several times afterward before PageRank was developed.
| mianos wrote:
| The sort of methods 'PageRank' uses already existed. It reminds
| of Apple `inventing` (air quotes) the mp3 player. It didn't, it
| applied existing technology, refined it and publicized it. They
| did not invent it but maybe 'inventing' something is only a
| very small part of making something useful for many people.
| bjourne wrote:
| PageRank actually had a predecessor called HITS (according to
| some sources HITS were developed before PageRank, according to
| others they were contemporaries), an algorithm developed by Jon
| Kleinberg for ranking hypertext documents.
| https://en.wikipedia.org/wiki/HITS_algorithm However, Kleinberg
| stayed in academia and never attempted to commercialize his
| research like Page and Brin did. HITS was more complex than
| PageRank and context-sensitive so queries required much more
| computing resources than PageRank. PageRank is _kind of_ what
| you get if you take HITS and remove the slow parts.
|
| What I find very interesting about PageRank is how you can
| trade accuracy for performance. The traditional way of
| calculating PageRank by means of squaring a matrix iteratively
| until it reaches convergence gives you correct results but is
| sloooooow. For a modestly sized graph it could take days. But
| if accuracy isn't that important you can use Monte Carlo
| simulation and get most of the PageRank correct in a fraction
| of the time of the iterative method. It's also easy to
| parallelize.
| jll29 wrote:
| Page's PageRank patent references HITS:
|
| Jon M. Kleinberg, "Authoritative sources in a hyperlinked
| environment," 1998, Proc. Of the 9th Annual ACM-SIAM
| Symposium on Discrete Algorithms, pp. 668-677.
| mach1ne wrote:
| Didn't Larry Page and Sergey Brin openly publicize the PageRank
| algorithm? It'd seem more likely that Baidu just copypasted the
| idea.
| oneoff786 wrote:
| The basic concept behind page rank is pretty obvious. If you
| stare at a graph for a while, it'll probably be your big idea
| if you try to imagine centrality calculations.
|
| Implementing it and catching edge cases isn't trivial
| screye wrote:
| Given that Pagerank was literally invented and named after
| Larry Page, I would think that Google had a head start.
|
| That being said, Page Rank is a more a stellar example of
| adapting an academic idea into practice, than a statistical
| idea in and of itself.
|
| Afterall, it is 'merely' the stationary distribution for a
| random walk over an undirected graph. I say 'merely' with a lot
| of respect, because the best ideas often feel simple in
| hindsight. But, it is that simplicity that makes them even more
| impressive.
| andi999 wrote:
| I heard that the first approach of google was using/adapting an
| published algorithm which was used to rank scientific
| publications from the network of citations. Not sure if this is
| the algorithm you mentioned though.
| divbzero wrote:
| The ranking of scientific publications based on citations
| you're describing is impact factor [1]. I haven't heard that
| as an inspiration for Larry Page's PageRank [2] but that is
| plausible.
|
| [1]: https://en.wikipedia.org/wiki/Impact_factor
|
| [2]: https://en.wikipedia.org/wiki/PageRank
| ppsreejith wrote:
| From the wikipedia page of Robin Li, co-founder of Baidu:
| https://en.wikipedia.org/wiki/Robin_Li#RankDex
|
| > In 1996, while at IDD, Li created the Rankdex site-scoring
| algorithm for search engine page ranking, which was awarded a
| U.S. patent. It was the first search engine that used
| hyperlinks to measure the quality of websites it was indexing,
| predating the very similar algorithm patent filed by Google two
| years later in 1998.
| oraoraoraoraora wrote:
| Statistical process control has been significant over the last
| 100 years.
| avs733 wrote:
| I would agree with you but they are speaking to a different
| audience. This is in a journal for statistics researchers and
| theorists. These would all be things that would inform the
| creation of pragmatic tools like SPC.
| csee wrote:
| Meta-analysis techniques like funnel plots.
| pacbard wrote:
| Meta-analysis is an application of idea #4 (Bayesian Multilevel
| Models) in the article.
|
| What makes meta-analysis special within a multilevel framework
| is that you know the level 1 variance. This creates a special
| case of a generalized multilevel model where you leverage your
| knowledge of L1 mean and variance (from each individual study's
| results) to estimate the possible mean and variance of the
| population effect.
|
| The population mean and variance is usually presented in funnel
| plots where you can see the expected distribution of effect
| sizes/point estimates given a sample size/standard error.
|
| Researchers have also started to plot actual point estimates
| from published papers in this plot, showing that most of the
| published results are "inside the funnel", a result that is
| usually cited as evidence of publication bias. In other words,
| the missing studies end up in researchers' file drawers instead
| of being published somewhere.
| bobbyd2323 wrote:
| Bootstrap
| datastoat wrote:
| Validation on holdout sets.
|
| When I was a student in the 1990s, I was taught about hypothesis
| testing (and all the hassle of p-fishing etc.), and about
| Bayesian inference (which is lovely, until you have to invent
| priors over the model space -- e.g. a prior over neural network
| architectures). These are both systems that tie themselves in
| epistemological knots when trying to answer the simple question
| "What model shall I use?"
|
| Holdout set validation is such a clean simple idea, and so easy
| to use (as long as you have big data), and it does away with all
| the frequentist and Bayesian tangle, which is why it's so
| widespread in ML nowadays.
|
| It also aligns statistical inference with Popper's idea of
| scientific falsifiability -- scientists test their models against
| a new experimental data, data scientists can test their model
| against qualitatively different holdout sets. (Just make sure you
| don't get your holdout set by shuffling, since that's not what
| Popper would call a "genuine risky validation".)
|
| The article mentions Breiman's "alternative view of the
| foundations of statistics based on prediction rather than
| modeling". That's not general enough, since it doesn't
| accommodate generative modelling (e.g. GPT, GANs). I think it's
| better to frame ML in terms of "evaluating model fit on a holdout
| set", since that accommodates both predictive and generative
| modelling.
| anxrn wrote:
| Very much agree with the simplicity and power of separation of
| training, validation and test sets. Is this really a 'big data'
| era notion though? This was fairly standard in 90s era language
| and speech work.
| pandoro wrote:
| Solomonoff Induction. Although proven to be uncomputable (there
| are people working on formalizing efficient approximations) it is
| such a mind-blowing idea. It brings together Occam's razor,
| Epicurus' Principle of multiple explanations, Bayes' theorem,
| Algorithmic Information Theory and Universal Turing machines in a
| theory of universal induction. The mathematical proof and details
| are way above my head but I cannot help but feel like it is very
| underrated.
| spekcular wrote:
| Statistics is an applied science, and Solomonoff induction has
| had zero practical impact. So I feel it's not underrated at
| all, and perhaps overrated among a certain crowd.
| ThouYS wrote:
| Bootstrap resampling is such a black magic thing
| graycat wrote:
| There is a nice treatment of resampling, i.e., _permutation_
| tests, in (from my TeX format bibliography)
|
| Sidney Siegel, {\it Nonparametric Statistics for the Behavioral
| Sciences,\/} McGraw-Hill, New York, 1956.\ \
|
| Right, there was already a good book on such tests over 50
| years ago.
|
| Can also justify it with an independence, identically
| distributed assumption. But a weaker assumption of
| _exchangeability_ can also work -- I published a paper with
| that.
|
| The broad idea of such a statistical hypothesis test is to
| decide on the _null_ hypothesis, _null_ as in no effect (if
| looking for an effect, then want to reject the null hypothesis
| of no effect) and to make assumptions to permit calculating the
| probability of what you observe. If that probability is way too
| small then reject the null hypothesis and conclude that there
| was an effect. Right, it 's fishy.
| jll29 wrote:
| The 2nd edition is never far from my desk:
|
| Siegel, S., & Castellan, N. J. (1988). Nonparametric
| statistics for the behavioral sciences (2nd ed.) New York:
| McGraw-Hill.
| CrazyStat wrote:
| One way to approach the bootstrap is as sampling from the
| posterior mean of a Dirichlet Process model with a
| noninformative prior (alpha=0).
| derbOac wrote:
| It's just Monte Carlo simulation using the observed
| distribution as the population distribution.
| grayclhn wrote:
| 1) It's not -- there are lots of procedures called "the
| bootstrap" that act differently.
|
| 2) The fact that "substitute the data for the population
| distribution" both works and is sometimes provably better
| than other more sensible approaches is a little mind blowing.
|
| Most things called the bootstrap feel like cheating, ie "this
| part seems hard, let's do the easiest thing possible instead
| and hope it works."
| civilized wrote:
| It's not the mere description of the procedure that people
| find mysterious.
| btown wrote:
| This has big "A monad is just a monoid in the category of
| endofunctors" energy
| vanattab wrote:
| The most important to me is "There are three kinds of lies in
| this world. Lies, damn lies, and statistics."
|
| Not attacking the mathematically field of statistics just
| pointing out that lots of people abuse statistics in an attempt
| to get people to behave as they would prefer.
| jll29 wrote:
| Off-the-cuff, i.e. without digging deeply into a set of history
| of statistics books:
|
| Tied 1st place:
|
| * Markov chain Monte Carlo (MCMC) and the Metropolis-Hastings
| algorithm
|
| * Hidden Markov Models and the Viterbi algorithm for most
| probable sequence in linear time
|
| * Vapnik-Chervonenkis theory of statistical learning (Vladimir
| Naumovich Vapnik & Alexey Chervonenkis) and SVMs
|
| 4th place:
|
| * Edwin Jaynes: maximum entropy for constructing priors
| (borderline: 1957)
|
| Honorable mentions:
|
| * Breiman et al.'s CART (Classification and Regression Trees)
| algorithm (and Quinlan's C5.0 extension)
|
| * Box-Jenkins method (autoregressive moving average (ARMA) /
| autoregressive integrated moving average (ARIMA) to find the best
| fit of a time-series model to past values of a time series)
|
| (The beginning of the 20th century was much more fertile in
| comparison - Kolmogorov, Fisher, Gosset, Aitken, Cox, de Finetti,
| Kullback, the Pearsons, Spearman etc.)
| dlg wrote:
| I generally agree with you. However, as a pedantic note,
| Metrolopis, Rosenbluth, Rosenbluth, Teller and Teller was in
| 1953 and Hastings was 1970.
| mlcrypto wrote:
| uoaei wrote:
| IMO, kernel-based computational methods are by far the most
| important _overlooked_ advances in the statistical sciences.
|
| Kernel methods are linear methods on data projected into very-
| high-dimensional spaces, and you get basically all the benefits
| of linear methods (convexity, access to analytical
| techniques/manipulations, etc.) while being much more
| computationally tractable and data-efficient than a naive
| approach. Maximum mean discrepancy (MMD) is a particularly shiny
| result from the last few years.
|
| The tradeoff is that you must use an adequate kernel for whatever
| procedure you intend, and these can sometimes have sneaky
| pitfalls. A crass example would be the relative failure of tSNE
| and similar kernel-based visualization tools: in the case of tSNE
| the Cauchy kernel's tails are extremely fat which ends up
| degrading the representation of intra- vs inter-cluster
| distances.
| teshier-A wrote:
| My experience with MMD is that unless you've been using it and
| are familiar with other kernel methods, you probably won't know
| what to do with it : what kernel do I use ? how can I test for
| significance (in any sense of the word) ? Add the (last I
| checked) horrendous complexity, to me it looks like a less
| usable mutual information (or KL div) and without all the nice
| information theory around it.
| enriquto wrote:
| My favourite is Mandelbrot's heuristic converse of the central
| limit theorem: the _only_ variables that are normal are those
| that are sums of many variables of finite variance.
| andi999 wrote:
| Identifying p value hacking.
| hackernewds wrote:
| what does that mean?
| pacbard wrote:
| p-hacking is a research "dark pattern" where a researcher
| fits several similar models and reports only the one that has
| the significant p-value for the relationship of interest.
|
| This strategy is possible because p-values are themselves
| stochastic and a researcher will find one significant p-value
| for every 20 models that they run (at least on average).
|
| p-hacking could also refer to pushing a p-value close to the
| significant cut-off (usually 0.05) by modifying the
| statistical model slightly until the desired result is
| achieved. This process usually involves the inclusion of
| control variables that are not really related to the outcome
| but that will change the standard errors/p-values.
|
| Another way to p-hack is to drop specific observations until
| the desired p-value is reached. This process usually involves
| removing participants from a sample for a seemingly
| legitimate reason until the desired p-value is achieved.
| Usually identifying and eliminating a few high leverage
| observations is enough to change the significance level of a
| point estimate.
|
| Multiple strategies to address p-hacking have been proposed
| and discussed. One of the most popular ones is pre-
| registration of research designs and models. The idea here is
| that a researcher would publish their research design and
| models before conducting the experiment and they will report
| only the results from the pre-registered models. This process
| eliminates the "fishing expedition" nature of p-hacking.
|
| Other strategies involve better research designs that are not
| sensitive to model respecification. These are usually
| experimental and quasi-experimental methods that leverage an
| external source of variation (external to both the researcher
| and the studied system, like random assignment to conditions)
| to isolate the relationship between two variables.
| pthread_t wrote:
| I saw this firsthand as an undergrad research assistant in
| a neuroscience lab. How did it go when I brought it up?
| Swept under the rug and published in a high-impact journal.
| ldiracdelta wrote:
| I believe it is referencing "The Replication Crisis"
| https://en.wikipedia.org/wiki/Replication_crisis
| [deleted]
| oneoff786 wrote:
| Shap values and other methods to parse out the inner workings for
| "black box" machine learning models. They're good enough that
| I've grown fond of just throwing a light gbm model at everything
| and calling it a day for that sweet spot of predictive power and
| ease of implementation.
| hackernewds wrote:
| Would be so kind to share an example or resources to learn
| about this?
| screye wrote:
| [1] Scott is the leading authority on all things SHAP (he
| wrote the seminal paper on it).
|
| [2] Chris Molnar's interpretable learning book has chapters
| on Shapeley values and SHAP. If you'd prefer text instead of
| video.
|
| [1] https://www.youtube.com/watch?v=B-c8tIgchu0
|
| [2] https://christophm.github.io/interpretable-ml-
| book/shapley.h...
| magneticnorth wrote:
| Seconding Chris Molnar's excellent writeup. I also find the
| readme & example notebooks in Scott Lundberg's github repo to
| be a great way to get started. There are also references
| there for the original papers, which are surprisingly
| readable, imo. https://github.com/slundberg/shap
| teruakohatu wrote:
| > I've grown fond of just throwing a light gbm model at
| everything and calling it a day
|
| It is not always a good idea to do that. Always try different
| methods, there is no ultimate method. At the very least OLS
| should be tried and some other fully explainable methods, even
| a simple CART like method.
| [deleted]
| csee wrote:
| OLS is my default go to. It outperforms a random forest so
| often in small data real world applications over
| nonstationary data, and model explainability is built into
| it. If I'm working in a domain with stationary data then I'd
| tilt more to the forest (due to not having to engineer
| features, and the inbuilt ability to detect non-linear
| relationships and interactions between features).
| hervature wrote:
| Strongly disagree. Shapley values and LIME give a very crude
| and extremely limited understanding of the model. They
| basically amount to a random selection of local slopes. For
| instance, if I tell you the (randomly selected) slopes of a
| function are [0.5, 0.2, 12.4, 1.1, 2.6] which the average is
| 3.3, can you guess anything? You might notice it is monotonic
| (maybe) but you certainly don't guess it is e^x.
| LeanderK wrote:
| I would say that our ML models are not predictable enough yet
| at the local neighbourhood to really trust LIME. Adversarial
| examples prove that you just can't select a small enough
| range since you can always find those even for super tiny
| distances.
| sockpuppet69 wrote:
___________________________________________________________________
(page generated 2022-02-21 23:00 UTC) |