Two recent posts focused on independent work regarding Tipping Points and Robustness. After a key realization made today these two topics are very closely tied together. Scott Page and I have been working on a formal definition of tipping points based on a weighted distribution of the probabilities of outcomes of a system modeled as a Markov process (what a horrible sentence). It's rather complicated, but a fast and loose description of a tipping point (by our definition) is a state change of low probability that has a large effect on the probability distribution of outcomes. This sounds a lot like the sort of fragility that ruins otherwise robust systems and so our formal measure of tipping points might double as a measure of robustness.

Now I'll go through that a bit more slowly. Though 'robustness' is the buzz word of the moment, what I really want to describe is the more general concept of resilience: a system's ability to persist and maintain or regain functionality despite significant insults to the system. The property of resilience admits to degrees of applicability and hence ought to have a measure that takes on a range of values (which may or may not be normalizable to be between (say) zero and one). For simplicity I'll identify three classes of metastates that a system can be in: A) normal operation, B) manageable perturbations and recovery, or C) debilitating perturbations and failure (and ignore D) evolutionary systems where perturbations result in a different operation that is not failure). Manageable perturbations are low probability events that eventually loop back to normal operation states while debilitating perturbations are low probability events that lead to outcomes outside of normal operation.

With this breakdown we can rephrase our terminology in the following ways. A given system is resilient against Bs and fragile against Cs. A nice (normalized) measure of resilience (and hence of robustness and stability) can then be calculated as the number of Bs divided by the sum of Bs and Cs. We can measure the fragility analogously as #C / (#B + #C). Now note that the Cs are the tipping points of the system. Then with these measures the resilience of a system is one minus the proportion of perturbations that are tipping points. Even though these measures are crude and basic, they do uncover the sort of relationship between resilience and tipping points that I know find intuitive. Since Scott and I have developed sophisticated measures for tipping points and related concepts, we can now retool those measures as measures of robustness and stability.

Thinking of robustness in terms of tipping points also opens up (for me at least) a new way to think about problems such as fault tolerance, sustainability, adaptability, and various other topics related to robustness in some way. Under this terminology a system's "normal operation" is described by highly clustered states in areas where the basins of attraction show considerable overlap. Manageable perturbations, i.e. shocks that a system can recover from, look like loops in the state space that take the system arbitrarily far from "normal operation" but eventually lead back into one of these states (which means that it too completely exists in states of the same level of basin overlap). It is only by excluding possible outcomes that a system's behavior is changed because doing so excludes all states in that outcome's basin of attraction (which must include ones among the "normal operation" states). That is what we call a "tip away" tipping point, and those are precisely the events that negatively affect a systems resilience (whether robustness or stability).