I have a strange relationship with Wired magazine. The design and layout gives me a headache, the relentless consumerism that underlies almost everything in the magazine really gets on my nerves and, well, it just irritates me. Having said that I always pick up the office copy and I love some of their long story like articles, La Vida Robot for example, which is pretty much my favourite thing I’ve ever read in a magazine (go and read it now it’s bloody great). Anyway after last months tabloid style “expose” of climate change stuff (the most garish cover in recent memory, headlines screaming FLY MORE TO SAVE THE WORLD! etc. but the copy having very little new to say beyond a few counter intuitive factoids) it looks like the new issue is going to be even more annoying if this Chris Anderson piece is anything to go by. Basically theory is dead apparently
We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
So the idea is that we don’t need models because statistical tools will tell us er… what exactly?
The first example Anderson gives is the mass gene sequencing of ecosystems. But…
Venter can tell you almost nothing about the species he found. He doesn’t know what they look like, how they live, or much of anything else about their morphology. He doesn’t even have their entire genome. All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species.
Are we really no longer interested in how these new species live, how they play their role in the ecosystem, what they look like? People who do science are interested in theory, that’s why they do it, to improve our understanding how the world works not simply to describe it working. It seems to me that what’s being advocated amounts to a kind of grand shrug of the shoulders. What Anderson doesn’t mention is that Ventner is collecting all this data with the hope that one day it will have useful explanatory or predictive power when parsed through a model not because filling up hard drives is just cool fun.
In fact we’re only able to draw conclusions about this mass gene sequencing because a theory explaining relationships between genomes already exists. If Darwin or someone else hadn’t yet come along to explain the hierarchy of species, someone would probably use Ventner’s DNA sequences to develop just such a theory.
How will we know where to look without models and hypotheses to guide us? I mean we can gather a lot of data but in terms of describing the state of even a single square meter of reality, we’re not even close to being able to store that kind of information. We need a model to know how to choose our samples.
My main issue with the article is, as I’ve alluded, this: What exactly does Anderson think these statistical tools and number crunching computer programs are if they’re not models? Google’s algorithm is a model of the relevance of documents to a string of keywords, it may seem abstract but google’s algorithm embodies an imperfect theory of how people determine whether a page is useful to them. Every time someone fails to find what they’re looking for in Google the current theory is falsified but as Newton’s theory is falsified by what we know about relativity and we continue to use it we still use googles theory as a good enough model for a most situations.
Am I missing something?
And then Charlie said:
I’ve read the article now. I was told about it by someone at work, who had a fairly low opinion of it.
Basically, I can’t really understand the point it’s making. This may be my fault, but I think the writer could have been clearer. Obviously it’s misleading to pick comments at random, but the “correlation supercedes causation” point seems to be the nub. Because the amount of data is so large, the risks in data-mining vanish to nothing. Is that what he’s saying? Maybe. I don’t know. Data-mining (looking for correlations blindly in a lot of data) is extremely dangerous. For instance, with a large enough sample of nurses, one of those nurses is bound to have a terrible record of deaths under their care. I remember Ben Goldacre writing about a Dutch nurse who had apparently been convicted of murder just on the basis of statistical fluke, with no other evidence.
It is true that models have become very abstracted. So while Google doesn’t have a model that defines pages by their “essential” features, but instead uses something else that is supposed to estimate the quality of those essential features (links) then it’s an abstraction away. But, as you say, it’s still a model. In the same way, standard regression models postulate a relationship between variables (and please forgive me if I’m teaching granny to suck eggs here):
something like:
UK Stock market level = (a)*GDP + (b)*RPI + (c)*(dividend index level) + (some error term)
Or something. You can take the data and regress to solve for a, b, and c, and the model is broadly OK if the sum of the squared errors isn’t too big.
However, you can take this to another level of abstraction:
UK stock market level = SUM(K(j)*INDEX(K))
Where you don’t predefine either the constant terms (K(j)) OR the actual Indexes you are regressing against. You can get the indexes themselves directly from the data themselves using a technique known as principal components analysis.
Using PCA, you don’t really need a first level model to come up with a structure for the way variables move together. It’s a really interesting statistical technique, and there are probably further abstractions that I don’t know about that could be agnostic about whether the terms combine arithmetically etc. I don’t know. the point is that that I think that this sort of abstraction is what Chris Anderson is driving at, but I’m not sure. However, it’s not as revolutionary an alternative to “the scientific method” as he makes out.
Also, I’m really surprised that he didn’t cite quantum mechanics as a supporting example. Instead, he says, “quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality”. Well, possibly, but the interesting thing about quantum mechanics is that it doesn’t specify an underlying mechanic to the way things move. It just gives you statistical predictions which turn out to be jaw-droppingly accurate. The scientist is agnostic about whether the particle is here or there, but the model tells him where to look for it. Now, I freely admit, I don’t understand the Wired article, but I would have thought this was a great example of what he’s driving at.
Apologies for the long post. It’s my last day in the office, and I can’t wait to be out of here, man. If you’re going to Tom’s Dolce Vita party, I’ll see you there.
And then Dan said:Ok, this is all way over my head. But I have disliked Wired ever since I was first given a copy by a friend, who told me I “could be more avant garde”. I actually saved that copy until Christmas so I could burn it in my parents’ fireplace. Take that, environmentalists!
And then tom said:Long posts are heartily encouraged!
I was thinking more about this last night…
I think what I’d give to Anderson is that there is a trade off between modelling and data, the more data you have the less modelling you need. The modelling kind of fills the gaps. Though this doesn’t seem like a particularly ground breaking thing to say and it’s only true for stuff in the past where data is available and there is a limit to the amount of data we can collect so there will always be gaps for modelling to fill.
I think at the moment there’s just too much noise in the article (attention grabbing talk of the death of theory) to be able to pick out the useful points. I have a feeling that what Anderson is driving at is close to what a recent New Scientist article referred to as
the Bayesian Approach (sparking an interesting letters page debate). Key idea “if we have a hypothesis - like the belief that a coin is dodgy - probability theory allows us to assess that hypothesis in the light of our observations.” Note that you still need a hypothesis though.
As a pretty staunch Popperian though I have a feeling that all the bayesian approach does is move the need for falsifiability back one step, puts it off until some future point of reckoning or muddies the criteria for falsifiability slightly, it’s probalistic rather than absolute. I don’t think things have to be falsifiable right now but I still think you have to be able to say that they could conceivably be with some future technology or whatever… I don’t know this is taking up too much of my time… really must get back to work.
Unfortunately we’re not going to be down in Devon this year; new flat, too much work to do. But hopefully we’ll bump into each other some time soon.
And then Neil said:Dan: you may be confusing Wired and Wire.
Anderson sure enjoys having his cake and eating it: what else is the “long tail” other than a theory about the way in which consumer behaviour and markets change in the internet age?
And then Charlie said:Popper’s later writing went into some stuff on statistical theories, and he put forward an idea of hypethetico-deductive chains stretching between theory and experimental results. But it’s all a bit contorted.
As for me, I love Bayesian approaches to problems. There are a lot of situations where misunderstandings in life could be cleared up if people just knew a bit more probability theory.
Mind you, you could say that about physics, biology, etc. Could you say it about philosophy, though? Probably not.
And then Dan said:Good point, Neil. Consider me out of this conversation for good.
And then Neil said:Also not to be confused wit “The Wire”!
And then Dan said:True dat, dawg.
And then Neil said:This pretty much demolishes Anderson’s argument IMO.
And then tom said:I think it’s been pretty well established over the last couple of weeks that Anderson is an idiot, if only by the fact that famous loon Kevin Kelly seems to agree with him.
And then tom said:My fave quote form the article Neil links to
And then Neil said:I love that Potlatch blog! He manages to be simultaneously erudite and funny, something that very few academic writers can pull off, and he tends to nail whatever subject is in hand as well.
Leave a Reply
You must be logged in to post a comment.
Login [ « ] [ » ]