Week One - The Golem of Prague / The Garden of Forking Data / Sampling the Imaginary

Small World: The world of the golem’s assumptions. Bayesian golems are optimal, in the small world.

The small world is the self-contained logical world of the model. Within the small world, all possibilities are nominated (i.e. the sample space)

Within the small world of the model, it is important to be able to verify the model’s logic, making sure that it performs as expected under favorable assumptions. Bayesian models have some advantages in this regard, as they have reasonable claims to optimality: No alternative model could make better use of the information in the data and support better decisions, assuming the small world is an accurate description of the real world

Large world: The real world. No Guarantee of optimal procedures.

The large world is the broader context in which one deploys a model. In the large world, there may be events that were not imagined in the small world. Moreover, the model is always an incomplete representation of the large world, and so will make mistakes, even if all kinds of events have been properly nominated.

We move between the small world and large world when modelling.

The way that Bayesian models learn from evidence is arguably optimal in the small world. When their assumptions approximate reality, they also perform well in the large world. But large world performance has to be demonstrated rather than logically deduced. Passing back and forth between these two worlds allows both formal methods, like Bayesian inference, and informal methods, like peer review, to play an indispensable role

Bayesian Data Analysis

Count all the ways data can happen according to assumptions. Assumptions with more ways that are consistent with data are more plausible.

  • The Future:
    • Full of branching paths
    • Each choice closes come
  • The data:
    • Many possible events
    • Each observation eliminates some

There is a bag of four marbles that come in only two colours: blue and white. From multiple draws, what are the content of the bag? How many blue marbles and how many white marbles? There are only 5 possibilites based on what we know about the possible outcomes.

We draw with replacement from the bag to get observations. We count up all the possible ways that we can get the data (ie. blue, white, blue marbles). These are the ways that are consistent with the data.

We compare the results with other possible outcomes: