sofiechan home

Kolmogorov Paranoia: Extraordinary Evidence Probably Isn't.

anon_toli said in #3282 3w ago: received

I enjoyed this takedown of Scott Alexander's support for the COVID natural origins theory. Basically, Scott did a big "bayesian" analysis of the evidence for and against the idea that COVID originated in the lab vs naturally. As per his usual pre-written conclusion in support of liberal hegemony, he concluded that it probably wasn't a lab leak. The problem is that his argument hinged on one extraordinarily large (in a technical sense) piece of evidence: a large "bayes factor" (log likelihood ratio) from adding up a lot of early case location data points as if they were independent evidence. Along comes this guy Michael Weissman to point out that every other piece of evidence went the other way, and why you can't just assume extraordinary evidence is what you think it is:

https://michaelweissman.substack.com/p/open-letter-to-scott-alexander

The basic concept is that when you run into apparently strong evidence with extraordinarily high power, especially that comes from adding up many supposedly independent facts of the same type from the same source, you can't just assume the framing anymore. The evidence power demands that you dredge up any possible alternative explanation from an increasingly large universe of possible explanations, and as the apparent power of the evidence grows, any particular explanation for it becomes increasingly questionable, and vanishingly unlikely a-priori. As such, you actually have to strongly discount apparently large sample sizes as almost certainly non-independent.

A concrete example: someone comes to you and says that 99/100 experts surveyed agree on some fact (global warming, covid origins, etc). On the surface, this is presented as if it's a 10x larger sample size and extraordinarily stronger evidence than 9/10 experts agreeing on the same. If you naively assume the sample size, you are compelled to allow that immensely strong evidence to overwhelm all other common sense and convince you of the fact. After all, how probable is it that they are all in on the same conspiracy?

Fairly high, actually. Common sense will tell you that probably the all read the same papers, exist in the same social milieu, have similar cultural biases, got their opinions from their friends at the lab, etc. These are *not* independent samples. Intuitively you should count them as something more like 3-5 independent samples.

I was doing a statistics project the other day and wanted a technical operationalization of this kind of skepticism and a somewhat more rigorous foundation for it, to make the aggregation of many different forms of evidence robust to this kind of hidden non-independence. The heuristic I came up with is that effective sample size is N' = G*log(1+N/G), where G is the "gullibility factor" representing how large of a sample size you will take at face value before starting to strongly question independence. For the experts example above, my gullibility factor is about 3 experts. Point being that you need to explicitly reason about and justify any nontrivially large G-factor with extraordinary arguments, and use N' instead N for any sample size calculations (significance tests hardest hit etc).

The reason for the logarithmic functional form is that it tracks the kolmogorov complexity (description length and thus prior improbability via occams razor) of hypotheses that produce that amount of evidence non-independently. "This much data with this bias" should in the limit be interpreted proportional to *that* description length rather than the raw apparent entropy of the generated data. This "kolmogorov paranoia" is a way to explicitly allow for that increasing space of possible non-independent explanations of the data without having to explicitly argue any particular likely alternative hypothesis (as per the size of that space, a-priori there aren't any!) or explicitly model the non-independence.

Kolmogorov paranoia rigorously nukes the various bad arguments that depend on particular high powered evidence or arguments but that go against common sense.

I enjoyed this taked received

anon_regu said in #3330 2w ago: received

can you just do this with standard Bayes? For example I expect a random coin I pick up to be fair with 99.999% probability, but if roll twenty heads in a row I'm going to be pretty sure something's wrong with with my estimates of the coin's parameters or with the independence assumption.

Similarly, with 99/100 experts agreeing, my probability that they are wrong is dominated by the possibility that they are systematically biased or correlated. The reason you virtually never get bayes factors of size 10^10 for a nontrivial problem is that the probability your model is wrong is much greater. And Bayesian probability gives us a much richer language to talk about ways to integrate and update on correlated evidence.

I like 'kolmogorov paranoia' as a heuristic, though. I wonder if you can derive a more rigorous version of it by thinking about the adversarial case. Suppose there is an adversary actively trying to persuade you (as there obviously is in the COVID case), and they have some finite amount M of ability to manufacture evidence. What statistical decision procedure is maximally truth-tracking while being robust to an M-adversary? We might end up with a formula like yours with M = 1/G, i.e. claiming a large G-factor is equivalent to claiming that there are no powerful adversaries trying to manipulate the evidence.

referenced by: >>3331

can you just do this received

anon_toli said in #3331 2w ago: received

>>3330
To be clear this scheme is actually bayesian. Yes you can do this with standard bayes, and the bayesian update from the experts or whatever is dominated by the nonindependence hypotheses. The point is to actually do it, and that you can heuristically estimate and bound the space of independence hypotheses without having to actually model the details.

In theory you can just bayes everything. In practice you cant because bayesian calculations are explosively intractable in general so you fall back to heuristics and social epistemology. Bayes becomes a formula of legitimate argumentation, and then clever manipulators say stuff like “well you have to update on <my bullshit> unless you can come up with a specific alternative that’s a more likely explanation.” The point of OP is no you dont. There is a general (bayesian) argument against singular sets of supposedly strong evidence from the fact that the space of alternative explanations exponentially explodes in size with the supposed strength of the evidence.

Id love to find a rigorous foundation for the heuristic or figure out how real evidence power grows with apparent evidence complexity. Im not even sure whether this is an adversarial phenomenon though that would be one case of it for sure.

To be clear this sch received

You must login to post.