Ontology, Science, and Statistics

There is a common aphorism in statistics: all models are wrong. This is often expanded into something along the lines of all models are wrong, but some are useful. The phrase was coined by British statistician George Box in his 1976 article Science and Statistics, in which he reflects on the legacy and impact of renouned scientist and statistician, R.A. Fisher.

While I have long been aware of the aforementioned aphorism, I had no idea where it came from until recently. What prompted me to find the origins of this little kernel of wisdom was this really silly essay about ontology and Silicon Valley. While I don’t think this essay has anything important to say and I wouldn’t recommend reading it (it’s pretty boring honestly), it did bring the term ontology as it is defined in information science to my attention. Maybe I’m late to the party but I find this relationship between the metaphysical and scientific conceptions of ontology interesting. For the sake of this post, I am referring to a descriptive rather than prescriptive conception ontology (which seems to be how it is commonly understood in data science).

In my view, to create or define the ontology of some a system, to exhaustively categorize and define the relations of everything in a system, is simply to model a system. When I build models in my work, I am attempting (and often failing) to wrangle something chaotic, biological, and physical into something neat and orderly, albeit conceptual. I am trying to create a neat and self contained narrative while remaining as true to my source material as possible. This brings me back to Box’s 1976 article where he outlines three guiding principles for building good models:

Flexibility allows the scientist to adapt to new information and critically learn from the gully between what is assumed and what is observed.

… the discrepancy between what tentative theory suggests should be so and what practice says is so – that can produce learning.

Parsimony encourages the scientist to favor simple models and be economical in her descriptions of the world

Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity.

Worrying Selectively allows the scientist to focus her attention on what is “importantly wrong”

It is inappropriate to be concerned about mice when there are tigers abroad.

Now usually when I get curious and go digging for these old papers to find the documented origin of an idea, I find something long and frankly hard to read. In the case of Science and Statistics, instead of reading the abstract and then skipping around a bit to get gist, I started reading and could not stop until I got to the end. It is such a fun glimpse into the history of science. The curiosity and enthusiasm for science in both the writer, Box, and his subject, Fisher, are baked into every sentence and absolutely infectious. He also gives one of the best definitions of science I’ve seen put into writing:

… science is a means whereby learning is achieved, not by mere theoretical speculation on the one hand, nor by the undirected accumulation of practical facts on the other, but rather by a motivated iteration between theory and practice …

But I digress. While Box’s advice is valuable in the scope in which it was intended (building statistical models for science), I think it applies well beyond it. Our models of the world, our abilities to understand, contextualize, and categorize everything in it are imperfect. Allowing our minds to bend and reshape, letting complexity into our lives only when necessary, and prioritizing the things that matter the most will serve us well.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • my first blog post