What Can AI Safety Teach Us about Strategic Planning? Part 2: Coherent Extrapolated Volition

In Part 1 we reviewed two key foundational theories of AI safety and how they relate to strategic planning in business. Instrumental convergence tells us that there will be potentially unforeseen and unspecified “emergent” goals that will manifest themselves as the result of the stated objectives that a company has. Perverse instantiation warns us of the potential for the creation of targets that, from the isolated point of view of the people who set them, make a great deal of sense. When viewed from a top level organisational perspective, and in concert with goals set in isolation elsewhere, they can create conflict and cause the organisation to fail in the delivery of its overall goals.

How can we overcome these problems? A solution for underpinning the thinking behind future planning and strategy in business exists in a theory for AI friendliness/value alignment outlined by Eliezer Yudkowski in his 2004 essay: Coherent Extrapolated Volition (CEV).

It's a fair assumption that long term strategic planning is, with any meaningful level of accuracy, impossible. Therefore, any attempt at it must instead focus on establishing a series of flexible end goals, preceded by an initial organisational and cultural state that will enable the business to move towards those goals (or different ones) in a way that considers the potential for them to be interfered with by instrumental convergence and perverse instantiation.

It must make the assumption that between “now” and “then” needs of customer, user and employee will change: staff will change, societal paradigms will shift, all in completely unpredictable ways. It must be done under the assumption that we’re idiots or barbarians and those who come after us (including our future selves) will be smarter, better people with greater experience, more empathy and a far more distant boundary to their rationality. And those who come after us must make the same assumptions about themselves. We must focus therefore on creating the optimal conditions for adapting to unknown future states, instead of trying to predict the future, based on the following premise:

  1. You can’t reliably, accurately predict the future.
  2. If you can’t reliably, accurately predict the future, you can’t know what the future holds for your business.

  3. Therefore it is impossible to prepare for precisely what the future holds for your business.

The obvious problem is that the same problem of invisibility can be applied to the act of creating optimal conditions for adapting to unknown future states as it can be to predicting those future states you hope to be able to adapt to. An optimal starting point for now is dependent, in some way at least, on the direction the future will move in.

This is where Coherent Extrapolated Volition can help us. But what does it even mean?

CEV is a theory that outlines a possible method of achieving “friendly AI”. More thought experiment than actual solution, I understand that it probably wouldn’t work as a way of ensuring AI safety due to the need to transpose the human-centric language that makes up its description into something that can be coded into a machine language. It’s more of a guideline for how we might minimise the risk of the way we (or, worse, one person might) think now being erroneously embedded as “the way things are” in perpetuity.

That is what makes this a powerful framework for strategic planning in business. We are, after all, humans not machines, and are notoriously bad at predicting the way we’ll feel in the future and acting in the best interests of our future selves.

Take for example the 2015 Pew Research findings relating to the automation of jobs, which found that 77% of Americans thought that it is: “realistic that robots and computers might one day be able to do many of the jobs currently done by humans”, whilst 70% of the same group believed that their own jobs or professions would be safe.

Work that one out: high level of understanding of the risk, near zero desire to prepare for it.

CEV concerns itself not with predicting future events, but trying to predict—in a highly optimistic way—how we would wish that we had responded to those events by trying to imagine how we might think in the future according to the conditions Yudkowski outlines here:

“In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.”

You could almost call it future idealistic hindsight.

Yudkowski goes into a great deal of detail in breaking down this passage, and I would recommend reading the paper in its entirety, but here’s a quick precis of how he explains each element.

Knew more is relatively self explanatory, and essentially refers to having a more distant boundary to our rationality.

Thought faster relates to an ability to arrive at the right solution with a great deal less thinking time and iteration.

Were more the people we wish we were—we’re all better after the fact than we are in the moment. Wouldn’t it be great if we were able to act now as we wish we had acted in retrospect?

Had grown up father together: any model of a future of any distance must take into account the interactions between people that influence behaviour. Think back to the first project you undertook with a colleague and the tenth—or even the second!—and how much better you were able to leverage each other’s skills and perspectives the more time you spent together. The further you grew up together. How can we find that sweet spot earlier?

Where the extrapolation converges rather than diverges: assuming we actually will grow farther together, think faster, know more and become more the people we wish we were, predictions from our current viewpoint (ie, now—this moment) become increasingly unpredictable. Options, therefore, must be left open; we must allow ourselves to make certain decisions about the direction we need to take as we arrive at those forks in the road.

Where our wishes cohere rather than interfere: coherence isn’t about a majority vote—a small number of people with a highly focused, well-thought-out viewpoint should outweigh a larger number of people who are less sure. Frédéric Laloux refers to something analogous to this in decision making in business in his idea of “the advice process” in Teal organisations (which we’ll explore later).

Extrapolated as we wish that extrapolated: “This is a lesser special case of the rule that the Friendly AI plan you made for the future should be consistent under reflection (which might involve the Friendly AI replacing itself plan you made being replaced with something else entirely).”

Interpreted as we wish that interpreted: When we ask for paperclip maximisation, we don’t mean turn the entire universe into paperclips.

A question that needs to be answered (in any context of CEV) is “whose volition should we extrapolate?”In the case of AI friendliness this comes down to a question of which entity’s values should be instantiated and, therefore, imposed upon humankind for the rest of time? I’m not getting into that here.

In fact, that question goes away entirely in the context of this article, since there’s no question over whose volition we should be focusing on: it is that of the business

Next time: Viewing an organisation as an entity and initial dynamic over strategy

In Part 3 of this series we’ll look at how we can view an organisation as an entity - as Frederic Laloux puts it, “[as] living organisms that have their own sense of direction” - using Nike as an example. We’ll also review some guiding principles adapted from CEV that outline considerations for setting an “initial dynamic” that is most conducive to helping those who work for an organisation realise its evolutionary purpose.

Leave your Comment