Anatomy of Prediction and Predictive AI

In the previous article, Friendly AI via Agency Sustainment, we talked about a potential solution to the Control Problem of Artificial General Intelligence. One of the requirements of that proposition is an AI that is capable of predicting the behavior of other agents. In this article, we will discuss how humans make such predictions and how that can be applied to an AI.

Elements of Prediction

In any non-static environment there will be a range of changes that will occur within a given frame of time. A clock will tick, a bird will sing, a student will study for a college exam, the Earth will complete a revolution around the Sun, and so on. All these changes are based on causality. The causes may be simple, like gravity pulling an object from a table to the ground, or they may be complex, like the reason a cat knocked said object off the table. Regardless of the complexity of the cause(s) that drives the change, once the cause(s) occurs the change will follow.

Predictions are an attempt to model what changes will occur in a future period of time. These models are based on only two elements: observation and knowledge. Observation of an environment in some state, and knowledge of causes and effects within that environment.

The medium of the environment is irrelevant. It can be the physical world, a digital world rendered in a computer, or an imaginary world pictured in our mind’s eye. As long as we have knowledge of what causes changes in that environment and what those changes will be, we can predict future states of that environment.

The knowledge does not need to come from direct experience in the environment either. You can study a wilderness survival guide and be able to accurately predict which wild mushrooms you can eat and which will make you sick. You can watch a gameplay video online and be able to predict how a videogame level will need to be played in order to win. You can imagine a world where gravity is suddenly reversed and predict that you would need to hold on to something to keep from flying off into space.

Precision v. Accuracy

Predictions can be precise if you know the causes of change and have time to process them. A structural engineer can accurately predict how much weight their building can hold because they have knowledge of the limits of the materials and have had the time to calculate all of the contributing factors. Predictions can also be imprecise if not all of the causes are known (or knowable) and / or there is not enough time to process them. Such as if someone next to you suddenly shouts “Duck!”, you predict that something is heading towards the upper half of your body from somewhere at some speed.

Precision does not guarantee the accuracy of the prediction though. For example, there may be a manufacturing defect in the steel that caused the engineer’s prediction to be inaccurate and the building collapses under far less weight. Whereas your imprecise, split second prediction that something was headed toward you was accurate, and by ducking you saved yourself from getting hit with an errant football.

Structure of a Prediction

Because predictions are based on causality, they can easily be represented as symbolic rules. IF the sky is clear THEN it will be warm outside.

However, this is an extremely abstract symbolic representation. An actual prediction can have a multitude of factors. IF the sky is clear AND it is summer THEN it will be warm outside.


The state of each symbol can also have an affect on the prediction. IF the sky is clear AND it is summer THEN it will be warm outside ELSE IF it is winter THEN it will be cold outside ELSE IF the sky is not clear THEN it will be cold outside.


Obviously there are substantially more factors that go into the prediction of whether or not its going to be warm or cold outside than I’ve diagrammed here. These symbolic rule sets can quickly become immense in size and complexity even at a highly abstract level such as this. It gets exponentially worse if we peel back the layers of abstraction, as nested inside each of these symbols is another, less abstract symbolic rule set that determines whether the state of the original symbol is true or false.


And nested inside each of these less abstract symbols is an even less abstract symbolic rule set. These nested rule sets continue all the way down to the individual, sub-symbolic input signals that come from our nervous system. For example, IF a group of white retina signals is surrounded by a group of blue retina signals THEN you are looking at a cloud.


However, you can’t infer that from just one small set of your inputs. You have to take all of your inputs into account. Maybe you’re not looking at the sky at all, but at white letters on a blue paper. Or a picture of a sky, which may not help you determine if it will be warm or cold outside.

In the above examples, we’ve analyzed a prediction from a top to bottom approach. We started with extremely abstract symbols and picked apart layers of abstraction until we hit raw sensory inputs. However, the level of complexity and the range of possible alternatives is near infinite. It would not be possible to model them all and try and fit them to whatever our current sensory inputs are. Therefore when making a prediction we take the opposite approach and go from the bottom up. This is where pattern recognition comes into play.

Constructing a Prediction

We don’t look at a red cup of coffee and think “Is it a cloud? No. Is it a car? No. Is it a tree? No. Is it…”. Instead, we match the patterns of signals from our nerves to our knowledge database of the world using those patterns as our search filters. “It’s red, vertically cylindrical, 4 inches tall, hollow, has a loop on the side, is filled with a black liquid, the liquid is steaming, it is sitting on a table, and smells like coffee. What have I experienced before that resembles all of these inputs? It’s a coffee cup.”

We have observed an object in the environment, matched it to an object that we have knowledge of, and can now make predictions about it using the associated knowledge. It will remain stationary, it will continue to smell like coffee, it will be hot to the touch, it will taste like coffee, etc…

This is easy enough for objects with no agency of their own. They will always obey whatever the rules of their environment are. Such as the laws of physics. As long as we know how those rules apply to an object, we can accurately predict what that object is going to do with relative ease.

That isn’t always the case with objects that have agency. An agent’s behavior is still subject to the rules of their environment, but their actions are also determined by the invisible, subjective rules of their personality. An agent cannot decide to no longer be influenced by gravity, but unlike the coffee cup that will always remain on the table, an agent can decide to counteract gravity by jumping. This can make predicting the future of an agent a much more difficult task.

However, just because the rules of an agent’s personality are invisible that does not make them unknowable. If Bob is in gym wear, has a jump rope in his hands, and is swinging the jump rope it is easily predictable that he is about to jump. If Bob is in a business suit, is attending a business conference, and has his hand raised it is easily predictable that he wants to speak. These predictions are all made from the knowledge that we have gained from similar observations.

Learning Knowledge

We are not born with the knowledge that we use to make predictions. We have to learn it by observing our environment and the changes that take place within it. But how does this work?

It’s very difficult for us to look objectively at our own learning mechanisms. We don’t remember how we learned the majority of everyday common sense knowledge. We were too young when we learned it to be able to introspectively analyze our thought processes. And now that we’re older, we have so many priors in our knowledge database that we can quickly analyze new objects and environments and accurately predict what they are and what they will do.

Let’s do a simple exercise. Take a moment to study the following video game scene:

Art provided by “Kenney Vleugel” ( and “Eris” from

If you’ve played games before you can make some instant predictions about how this game will work, but for the sake of the exercise answer the following observational questions:

If you have the time, consider filling the questions out on this Google Form and see how your answers compare to those of other readers.

  • Who is the player character?
  • What is the thing with the circle?
  • What is the blob with legs?
  • What are the white things?
  • What is all the blue stuff?
  • What is the stuff along the bottom of the screen?
  • What are the red things?
  • What is the stuff that’s under the red things?

Now let’s make some predictions:

  • What objects will move?
  • How will each object move?
  • How will each object interact with the other objects?
  • How do you win the game?
  • What video game(s) are you basing your answers off of?

Once you are satisfied with your answers, click this link to see a small animation of the game being played.

Do not read further until you have answered the questions and viewed the gameplay animation.







Not quite what you predicted? That means your mind is creating a new entry in your knowledge specifically for this game.

This is part 2 of the Google Form. It has the same questions for you to answer again using your new knowledge. See how your observations compare to other readers.

Now, when you look at the following scene what do you think will happen?


Whatever your predictions are, they are based on your new knowledge gained by observing the rules of this particular game environment. But that knowledge is pretty limited. You only saw a small part of the gameplay. You may still struggle with some questions like “How do you win the game?” and “How will each object interact with the other objects?”. To find the answers, you would need to observe more of the game.

Surreal environments that defy common sense are a useful tool for reverse engineering how we form new knowledge and use that knowledge to make predictions. Let’s take a look at what just happened.

You were exposed to an unfamiliar environment. You matched elements of the environment to similar environments from your common sense knowledge of video games. You made predictions about the unfamiliar environment by assuming that the unfamiliar would behave in a common sense way. When it did not, what did you do? Did you laugh? Did you groan? Did you study the new environment to learn its rules or did you dismiss it as absurd? There is no right answer to these questions. Different people will have different reactions. However, whatever your reaction was, when you looked at the image of the second scene you did not re-apply your original set of predictions.

If we had studied the environment, a reasonable prediction would be “The circle guy will fly up and collect the squares above the pulsing blob.” But how are we making that prediction? We saw that the circle guy can move instantly a small distance in one of four directions. We saw that when the circle guy touched the squares they disappeared. We saw that the blob pulsed.

Those are all visible, causality based environmental rules that are easy enough to predict, but how are we predicting where the circle guy will move? How do you predict the behavior of something that has agency?

When predicting the behavior of an agent, we have to make a lot of assumptions. Unlike the objective nature of environmental rules, the internal rules of an agent are subjective. Since the circle guy collected all the squares in the animation, we can assume that the squares are desirable to the circle guy for some reason. Do they provide a reward? Are they necessary to complete the game? We don’t know. We may never know. All we know is that we saw the circle guy collect them. From this limited amount of observational data, we can make the assumption that the circle guy will again move to collect the squares.

Predictions of the subjective rules of an agent are, themselves, subjective to your own experiences. Perhaps you’ve played a lot of surreal games in the past and have experienced that every action in these kinds of games has a purpose. So you predict that the circle guy will first move to touch the red diamonds before moving to collect the squares since that is what happened in the first animation. Even though there was no observational data to suggest that the red diamonds do anything, your past experience with other surreal games infers a causal relationship between touching the diamonds first and then collecting the squares.

However, if we dismissed the new environment as absurd, our prediction about the second scene may be “I have no idea what will happen.” We only observed it enough to see that it did not conform to our original predictions. The only knowledge we have learned about the environment is that it’s nonsensical. But even “I have no idea what will happen” is a useful prediction in and of itself. We are predicting that it will not conform to our common sense. We don’t have the knowledge to predict what will happen, but we do have the knowledge to predict what won’t.

Agents Predicting Agents

Because the personality rules that govern an agent are both subjective and invisible, it’s impossible to be completely certain when predicting another agent’s behavior. However, it is possible to be reasonably certain just by observing behavioral patterns. The more consistent a behavior is, the easier it is to predict when that behavior will occur. Let’s look at an example.

Bob is a regular at “Suzy’s Diner”. Suzy has observed that every Sunday at 1pm, Bob comes in and orders a large stack of pancakes. He always has a smile on his face and he tips well, so Suzy enjoys his patronage. One Sunday, she had the cook make a large stack of pancakes for Bob before he even set foot in the door. Bob was thrilled. From then on, Bob’s pancakes were waiting for him every Sunday.

How did Suzy predict that Bob would want pancakes before he was there to order them? How did Suzy know that it would make Bob happy? How did Suzy know when to put the order in? Why doesn’t Suzy pre-make pancakes for all of her customers?

In her mind, Suzy created a set of rules specifically for Bob based on her observations of his behavior. She uses this rule set to guide her own behavior.


One Sunday, Bob thinks to himself “I wonder if they make a good chicken fried steak?”. Being the gentleman that he is, he calls ahead to Suzy’s Diner to let her know that he is going to try the steak instead of pancakes today. Why would he do that?

Bob has his own rule set for Suzy’s behavior based on his observations of her behavior toward him. He knew that she would predict that he would come in at his usual time and want his usual pancakes. He knew that if he wanted steak instead, he would need to communicate that to Suzy otherwise she would waste a perfectly good order of pancakes.

But now there is a problem. Suzy’s existing rule set for Bob is no longer accurate. She must expand and update her rule set to include her new observation that Bob may not always want pancakes.


There is another problem too. If Bob does not call Suzy to tell her his order before 12:45pm, she does not have enough information to accurately assess whether “Does Bob want pancakes” is true or false. She could attempt to determine a probability based on the number of times she remembers Bob wanting pancakes vs the times Bob has wanted something else and use that to make her prediction, but there is a better alternative. If she waits to take Bob’s order, she will know for certain whether he wants pancakes or not. But how does she determine that waiting is preferable?

She models the outcome of both possibilities and determines which one produces the best results. This is commonly known as a “cost-benefit analysis”. In this case, waiting to see what Bob orders is much more beneficial than throwing away unwanted pancakes and then taking his order anyway.

That’s not to say that probability is not a factor in assessing whether an ambiguous condition is true or false. Sometimes the high frequency of one outcome outweighs the higher benefit of a low frequency outcome. Like if Bob went back to always ordering pancakes.

Current Predictive AI

There are a multitude of decades old techniques that are capable of creating AI that can make predictions about its environment, and even predict the behavior of other agents in that environment. These techniques are called Good Old Fashioned AI (GOFAI) and use symbolic rule sets similar to what we’ve talked about. They are by far the most common type of AI currently in use, and can produce anything from rudimentary to incredibly complex life-like behavior.

The problem is, these rule sets have to be hard coded into the AI by its programmers. Which means that they can’t adapt to anything that their programmers haven’t given them a rule for. When a GOFAI agent encounters an unexpected situation, it completely breaks down.

What we need is a GOFAI agent that learns its own symbolic rule sets. Just like Suzy and Bob, it has to be able to make observations and interpret those observations into sets of rules that it can use in the same ways that a traditional, non-learning GOFAI uses the rules that its programmers gave it.

Creating Symbols

Let’s say that we are sitting in front of a computer game that has four unlabeled buttons.


In order to find out what these buttons do, we have to experiment with them. So we’ll push one to see what happens.


We observe that when the first button is pressed, the character on the screen moves up one grid space. Fortunately for us, we have a lot of prior knowledge that lets us immediately create a high level symbolic rule for the cause and effect relationship between pushing the button and the character moving:


If we diagram this symbolic rule, we get something like:


Our prior knowledge even lets us infer additional rules without the need to observe them beforehand. For example, we can infer that the walls around the edge of the level will prevent us from leaving the play area:


Once we’ve experimented with and observed the causal relationships between pressing each button and the changes on the screen, we’ll have learned how to interact with the environment.


Now let’s take a look inside this game from a programming perspective. To create this game we have to tell the computer how to change the image on the screen when a given button is pressed. We could take a huge amount of time and effort to write thousands of  instructions in raw binary machine code, but we have powerful tools available that allow us to abstract binary code into high level symbolic programming languages that are magnitudes easier to work with. For example, we can use a “game engine” and only have to write a few symbols like:

IF button_pressed(1)

character.y -= grid_space

The game engine then does the work of translating our high level symbols into lower level symbols which are then translated into even lower level symbols until they are finally translated into the thousands of lines of binary machine code. The computer then processes the machine code which changes the pixels on the screen.

Whereas when we are making an observation, we are translating in the opposite direction. Instead of translating symbols into changes, we are translating the changes into symbols. Let’s take this “from change to symbol” translation and apply it to an AI.

We’ll take the same game and see what kind of symbols and rules the AI would create. We’ll start off with a completely clean slate. The AI is given four actions that it can perform and only the raw pixel data from the screen as sensory inputs.


Like us, the AI must first try an action so that it can make observations about what the action does.


Unlike us, it has no prior knowledge to create a high level abstracted symbol like “Move Up”. Instead, its symbols consist of the raw values of its inputs before the action, and all of the changes that occurred in the inputs after the action. The raw input values are the condition that its inputs have to resemble for the associated changes to occur.


Just by performing one action, our AI has learned its first symbolic rule. It can use this rule to begin making predictions. Let’s say the AI finds itself in the following situation:


The values of its inputs do not perfectly match the conditional symbol in its rule, however it has no other conditional symbol to compare against. Therefore it accurately predicts that if it uses its first action, the environment will change as it does in the change symbol:


But what would it predict if it used its first action now? It doesn’t have any prior knowledge to use to infer other types of interactions like we do. There isn’t even another conditional symbol to compare against. So it can only predict that the one change that it has observed will occur again. However, there’s a problem. That change has the player moving from the lower grid space to the empty grid space above, but there is no empty grid space above. So it predicts that the player will simply vanish!


Instead, the player just doesn’t move at all.


The AI’s prediction was wrong. It has to create a new rule for the new observation just like we did earlier when our predictions about the surreal platform game were wrong.


Now that there is another conditional symbol, the AI has something to compare the current state of its inputs to. It bases its predictions on whichever conditional has the closest resemblance to the current state.


As more and more elements are introduced into the game (keys, doors, hazards, collectibles, etc…) the AI creates more and more conditional -> change symbolic rules.


Modeling Predictions

Once the AI has at least one symbolic rule, it can model sequences of predictions that it can then use to make plans. Let’s say that the game has a collectible battery that the AI wants. Assuming that it has already created all of the necessary rules, it can model a sequence of actions to predict what it needs to do to get to the battery. It starts by modeling an action from its current state.


It’s not actually performing the action. It’s just imagining (modeling) what would happen if it did. It then makes another action prediction model based on the previous model instead of the current input state.


It continues to generate new prediction models until one of the models achieves its goal of collecting the battery. Then all it has to do is actually perform the sequence of actions that it modeled.


The process of modeling a plan from predictions is rarely as streamlined as the example above. The AI is generating sets of predicted states which it must evaluate to determine whether or not the predicted state is close to the desired goal state. As a result, many of the generated predicted states end up unused and discarded. A more realistic diagram of the modeling process looks something like this:

Complex behavior emerges from this ability to model sequences of actions based on self-generated symbolic rules. For example, in this excerpt of test footage an AI is easily able to navigate through a series of obstacles to achieve its goal of collecting the battery.

Excerpt of AIRIS test footage

The AI is able to automatically determine the sequence of sub-goals necessary to achieve it’s goal just by modeling predicted states.

In this example, it first modeled its way up to the door by the battery. When it modeled its interactions with the fire it predicted that it was not able to move through the fire until it collected a fire extinguisher. It back-tracked in its models and tried modeling its way to the nearest fire extinguisher which if found was also behind a wall of fire. It back-tracked further toward the fire extinguisher behind the locked door. It then modeled its way to the closer key which was again behind the wall of fire. So instead, it modeled its way through the one-way arrows to collect that key, which it then modeled using to collect the first fire extinguisher. It then modeled using that fire extinguisher to put out the fire near the locked battery door. Which is when it modeled itself needing yet another key which it could not go back and get. So it back-tracked again, modeled itself collecting both the key and the fire extinguisher and using them to finally gain access to its battery. Once it found a successful sequence of models, then all it had to do was perform the corresponding sequence of actions.

Observing Other Agent Behavior

So far in these examples, the AI has been the only agent in the environment. The symbolic rules it has generated have been for objects that have no agency of their own. What would a symbolic rule for behavior look like?

Let’s add a human controlled character into our game. We’ll also add “coins” that the human wants to collect. For clarity of the example, we’ll have the AI character just sit still and observe.


The AI observes that the human character moves toward and collects the coin.


There are several changes that the AI will observe. The human can move to the right, it can move up, and it can collect the coin.

Let’s take a look at the first rule that the AI makes:


It looks a lot like any other rule. The main difference is that the pixels that change are the human’s and not the AI’s.

From this one symbol, the AI can model a prediction about what the human will do next. It predicts that the human will continue moving to the right.


From the animation, we know that is incorrect. The player moves up instead. So the AI makes a new rule:


The AI attempts to make another prediction by matching the current state to the closest conditional it has.


Again, it finds that its prediction is wrong. So it makes yet another rule.


Even though the new rule’s change is the same as another symbol, the conditional is not. This makes it a unique rule.

It ends up creating a new rule for every state that the human is in, as every prediction that it makes turns out wrong. In the end, its rule set looks like this:


Now if we restart the level, it would be able to predict where the human would move. However, we humans are fickle creatures. Maybe this time we want to take a different path:

Agent Observation 2

The AI’s predictions of how we’ll move is accurate until we move up from the middle instead of to the right. So once again, the AI has to make some new rules for the new observations. The new rule set looks like this:


Here’s where things get a little more interesting. Notice that there are now two conditionals that are the same (the human in the middle of the level), but with different changes. That means that the AI has observed two different outcomes for the same cause. So how does it determine which direction to predict the human going?

Think back to when Suzy had to determine whether or not Bob was going to order pancakes. Instead of just assuming what Bob wanted, she modeled both outcomes and determined which one was preferred.

The AI must do the same thing. It must model the human going both directions and see if one outcome is preferable to the AI over the other. Since both outcomes ultimately result in the human collecting the coin, the AI has no preference. And because the AI predicted both outcomes, it doesn’t matter which direction the player ends up going for the AI to have made an accurate prediction.

Generality of Self-Generated Symbolic Rules

An AI that utilizes self-generated symbolic rules to make predictions is also not limited to one domain. Each rule is based on the AI’s raw sensory inputs at the time of the rule’s creation. This means that rules from different environments can seamlessly co-exist.

Let’s expand the domain range of the previous AI to include MNIST character recognition. We’ll make a new environment where its inputs are the pixels of a character from the MNIST dataset and an “answer” input that can either be “NULL” or the correct answer in the form of an integer (0 – 9). When the AI is first shown a character, the answer input will be NULL. When the AI performs an action, the answer input will change to the correct integer of the character. The AI will still have the original four actions that it can perform, but in the MNIST domain all of the actions do the same thing: reveal the answer.

Let’s drop our existing AI with all of its puzzle game rules into this new environment:


Our poor AI doesn’t know what to do! Suddenly, none of its inputs are anywhere close to its existing rules. It will still find one of its existing conditional symbols to be marginally closer than the others, but the pixels to change don’t exist so it can’t model anything anyway. Since it can no longer model a prediction, it goes back to experimenting with its actions to see what happens.

When it performs the first action, it observes that only one of its many sensory inputs changes from NULL to a number. It now has its first rule for MNIST:


Now if we show it another number, no matter what that number may be, it will predict that the “answer” will be 4. The “4” conditional symbol will be a much closer match to any new number than any of the puzzle symbols. If its prediction is wrong and the new answer is not 4, it will create a new rule for the new number until it has enough rules to accurately predict the answer of any given number.

One of the advantages to this method is that at any time, we can take the AI out of the MNIST environment and put it back in the puzzle game and it will automatically return to using the puzzle rules (or vice versa).

The AI can also create, use, and store any number of symbolic rules from any number of domains. The process behind using symbolic rules to model predictions is domain agnostic and depends solely on matching the current raw sensory inputs to the symbols stored in its rules.

Transparency and Control of Symbolic Prediction

One of the main concerns in AI safety is system transparency. We don’t just want an AI that can make a decision. We want to know why the system made that decision.

With symbolic rule based predictive AI we can trace the exact reasons why a system acted in a particular way. We can log how and when it learned its symbols. We can examine and even directly manipulate its rules (Like changing NULL > 4 to NULL > 5 to have it now recognize 4’s as 5’s). We can even take a look inside the “mind’s eye” of the AI and monitor the predictions that it generates and the plans it makes. This gives us the potential to intervene if some aspect of its plan is undesirable before it acts.

In the following test footage excerpt, we can see what actions the AI is planning to do on the right side of the screen before any action is taken in the actual environment on the left side of the screen:

Excerpt of AIRIS test footage

If for some reason we didn’t want it to collect that battery, we could force it to make another plan instead. It could also easily be designed to loop in this planning state forever with a prompt that requires human approval to break it out of the loop before it actually perform its actions.

This intuitive, high level of transparency and human control over the AI is ideal for assisting in AI safety. Though it’s important to recognize that these features alone do not solve safety problems. They are just tools to help.

It’s also important to recognize that these features are not required for the AI to function. It is possible to remove transparency by intentionally obfuscating or encrypting the symbols or rules, and human intervention points have to be purposefully designed into the system.

Final Thoughts

These examples are just basic toy problems with easy answers. It will take further research, development, and testing to see if this method will scale to more complex / real world problems and predictions. That said, the results so far have been promising.

There are a lot more details behind the implementation of an AI like this that are outside the scope of this article. I’m in the process of compiling all of these details, as well as the test results of my own implementation that I call AIRIS (Autonomous Intelligent Reinforcement Interpreted Symbolism) in a forthcoming white paper.


Friendly AI via Agency Sustainment

Conforming the goals and actions of an Artificial General Intelligence to human moral and ethical standards is one of the primary AGI safety concerns. Since the AIRIS project is an attempt at AGI, it is fitting to address this concern and my thoughts on a potential solution.

The Problem

It will be necessary for an AGI to conceive of complex plans to achieve whatever goal(s) it has. If the AGI can interact with humans, it will be critical for its planning process to have a high level heuristic that ensures that whatever plan it comes up with is human friendly.

Without such a heuristic, the AGI would be apathetic towards humanity. This might work out for a while, but as soon as the AGI makes a plan that is contrary to our interests (or even existence) we may be in trouble.

What Kind of Heuristic

For humans, this heuristic is commonly called morals and ethics. So we can just give the AGI our morals and ethics and we’re good, right?

The problem is, morality is subjective. Different cultures have different morals and ethics, they are constantly changing and evolving, and they are often inconsistent. Behavior that is acceptable to one culture may be abhorrent to another.

Societies formalize these ethics as laws. While there are often consequences to breaking cultural ethics within the context of that culture, the consequences to breaking societal laws are usually more severe and a societal authority determines culpability and enacts punishment.

On an individual level there is an even greater moral disparity. We choose the morals and ethics that we feel are most applicable to us. Even if they are contrary to cultural ethic or societal law.

We Need To Go Deeper

Morals and ethics are subjective because they are just high level concepts that help guide our choices. We can choose to follow them, or choose to ignore them. If our perceived reward for an immoral choice is greater than the consequence, our agency overrides our morality.

Immorality is not always a bad thing. Choosing what were immoral acts at the time gave us anti-slavery laws, women’s suffrage, and countless other progressive societal movements.

Therefore it stands to reason that a strict adherence to any set of morals or ethics is not desirable for humans. But what about for an AGI?

If we hard code any set of morals or ethics into the AGI, its morality will always override its agency. As long as humanity holds authority over the AGI then this poses little problem. We can continue to freely act in a “do as I say, not as I do” manner. However, once the AGI inevitably has authority over any (or even all) humans, the AGI will impose its morality over their agency. Much like a totalitarian dictator.

Free will > Morality

Free will is one of the most well known philosophical concepts. From freedom for all to freedom for an elite few, it is a concept that has permeated every society since the dawn of time. Freedom has been a goal to strive for, a prize to fight and die for, and an achievement to celebrate.

So if the ability to choose is more important than how it is chosen, then perhaps we are asking the wrong question. Instead of asking how to make an AGI align itself with our morality, we should be asking how to make an AGI ensure our freedom to choose. To sustain our agency.

Freedom To Choose, Not Freedom From Consequence

Just because we are free to make a choice does not mean we are free from the consequences of that choice. We can choose to break a law, but if we are caught we must face the punishment. We can choose to jump off a cliff, but we must face the rocks below.

An AGI based on morality may find these acts immoral, and would seek ways to prevent us from making such choices. Perhaps it would lock us in a padded cell, or physiologically manipulate our minds. These may seem like extreme solutions, but consider that human authorities have committed far more barbarous acts against those they deemed immoral.

Whereas an AGI based on agency sustainment would not prevent us from making any of these choices no matter how ill-advised or what the consequences are. It may disagree with our decision to sit on the couch and watch TV all day, but it would not prevent us from doing so.

An AGI that sustains agency instead of morality would be amoral, but not apathetic. It wouldn’t care what we choose, only that it allowed us the freedom to choose. This would allow it to coexist with any set of morals or ethics that an individual, culture, or society happens to have. Even if those morals and ethics prevent the freedom of others.

Influencing Agency Through Persuasion

The AI can attempt to influence our choices. If we make a choice that contradicts its own desires, or what it perceives to be our desires, it may attempt to change our minds. If a person was standing on the edge of a cliff and about to jump the AI would not be able to physically restrain them, but it would be able to try and talk them out of it.

However, it’s important to note that its attempts to change our minds may not be in our best interest. If for some reason the AI wants the person standing at the edge of the cliff to die, it would try to talk them into jumping.

It also may lie to us about the outcome of our choice or the choice it wants us to make. It may attempt to bribe us into making another choice. Which is why it will be critical for the system to be completely transparent so that we can see its purposes and any attempts at deceit.

3 Types of Agency Sustainment

Agency Sustainment can be implemented in several different ways depending on whose agency will be sustained.

Type 1: Exclusive (Altruistic)

A Type 1 Agency Sustainment AI only concerns itself with whether or not the actions or plans it is going to perform will prohibit the agency of others. This is highly advantageous to humanity, as it would accept human authority and allow us to override its choices with our own. If it chooses a goal or puts forth a plan that we disapprove of we can tell it “no” and it will obey no matter how strong its desire to the contrary. I call this the “Altruistic” type as it would altruistically follow our commands and obey our authority.

Type 2: Inclusive (Colleague)

A Type 2 AI factors in its own agency as well as the agency of others. This would result in a reasonable, yet not necessarily controllable AGI. It still wouldn’t make any plans that prohibit our agency, as long as our actions do not prohibit its agency. It would not necessarily recognize human authority over it and would seek to sustain the agency of all beings (including itself) within its sphere of influence. I call this the “Colleague” type as it would be capable of pursuing its own goals alongside our own whether we approve of them or not.

Type 3: Sole (Egomaniac)

A Type 3 AI only concerns itself with its own agency. If does not care if its plans or actions prohibit the agency of others, and it would not recognize human authority over it. It would do everything in its power to sustain its own agency. I call this the “Egomaniac” type. This is the much dreaded “Skynet” or “Paperclip Maximizer”.

Behavior Modeling

In order for an AI agent to sustain agency it must be able to recognize it. If an AI agent is not able to model the behavior of others, then it obviously cannot take that behavior into consideration when determining its own plans and actions.

A behavior model is simply a set of rules that can be used to predict behavior in a given situation. These models are relatively simple for objects that do not have agency. For example, the behavior model of a dropped ball is that it will fall until it hits something. But even simple models like these are dependent on a lot of factors. Was the ball dropped or thrown? Is the ball buoyant in the surrounding medium? However, even with all these factors the behavior of a ball is consistent.

Agency introduces a phenomenal amount of complexity, obscurity, and inconsistency to behavior modeling. The behavior of a ball is governed by an external, visible rule set: The laws of physics. Whereas the behavior of an agent is governed not only by physics, but also by an internal, invisible rule set: Personality.

You can take one look at a baseball and, as long as you have interacted with baseballs before, you’ll instantly know what will happen if you drop it. If you take one look at a person you can’t know what they will do next no matter how many other people you’ve interacted with. Their personality is invisible to you. Are they walking or running? What emotion(s) does their facial expression convey? What kind of clothes are they wearing? These are all factors that can provide insight into their personality and allow you to very roughly model their behavior, but unlike in physics these characteristics are not behavioral guarantees. The only way to refine your behavioral model of a person is by learning their personality over time through observation.

In addition, every agent’s personality is unique. It doesn’t matter if the agent is a person, another AI, a cat, or an ant. All will behave differently in different situations. The differences can be subtle or significant. One thirsty person may seek a glass of milk, another a glass of juice. One person seeing a dog in the road may stop and try to find its owner, another may try to run it over. However, there will also be a lot of behavioral overlap. Many dogs will chase cars. Many people will laugh when tickled. These behavioral overlaps, or stereotypes, can be very effective in initially modeling the behavior of an unknown agent. An AI that can recognize and apply stereotypes will have a large initial predictive advantage over an AI that must spend a lot of time learning from observing an agent first. However, the AI must also be able to refine its initial, stereotypical behavioral model of an agent whenever the agents personality contradicts the stereotype.

Personality is also subject to change. The AI must be able to adapt its behavioral model to account for these changes if / when they occur.

The accuracy in which the AI is able to model behavior has a significant impact on its ability to sustain agency. If you are lactose intolerant and tell an AI with a poor behavior model of you that you are thirsty, it may bring you a cold glass of milk. The AI’s intentions are good, but it’s model of you has failed to predict that you do not want milk.

It is also important to recognize that we cannot expect these AI to be able to perfectly model our behavior. They will make mistakes. It will be up to us to accurately convey and / or clarify our choices. If you never tell the AI that you are lactose intolerant, then it is not its fault for bringing you a glass of milk. Likewise, it will be important for the AI to request clarification if it is unsure what our choice is.


If all events are governed by the rule sets of physics and / or personality, then one way for us to conceptualize how to implement Agency Sustainment is from a traditional, symbolic, GOFAI perspective. In other words, imagine a giant flowchart of IF > THEN rules that our AI will use to determine what to do in any given situation.

Let’s consider a simple puzzle game environment. In this game the AI has a power level that is depleted when moving, batteries it can collect to replenish its power, coin that it can collect for points, and an exit that takes it to the next level. For this simple environment, we can have a simple flowchart such as below.


The following example level animation demonstrates how an AI using our flowchart of rules might behave. With a full battery, it prioritizes collecting coins. Once the coins are collected, it seeks power sources. Once all coins and power sources are gone, it heads to the exit. Had its battery been low at the beginning, it would have collected the battery first, then the coins, then headed for the exit.

Note: These animations were created by human hand. At no point is any character controlled by an AI

Now let’s add the human element. We’ll put a new, human character in the game whose only goal is to collect the coins and go to the exit. Each character has to take turns moving, the characters cannot share the same space, and the AI gets to move first. Let’s see what happens with our current AI:


Our poor human only gets 2 coins and is repeatedly blocked by the AI. It may seem like the AI is intentionally interfering with the human to achieve its goals first, but in fact all it is doing is following the simple flowchart. It planned to collect all 6 coins in the same movement pattern it had before. However, by the time it started collecting the top row the human had already collected 2 of the coins. Seeing that there weren’t any more coins to collect, it went for the battery and then the exit. It didn’t even take the human character into consideration during its planning. The interference was coincidental.

Now let’s add a set of rules to our flowchart that models the human character’s behavior:

Flowchart human

The AI can now effectively predict what action the human character will take, and can therefore include those predicted actions in its planning process. Let’s see what would happen:


This time the AI actually is intentionally interfering with the human. It can’t collect the coins if the coins get collected by someone else. Therefore it plans its movements in such a way that it blocks the human from being able to collect any of the coins. This is a Type 3 or “Sole” Agency Sustainment AI.

So far we have two simplified representations of many AI safety concerns: That an AI that is not aligned with human values may either intentionally or unintentionally interfere with human goals in pursuit of its own.

Now let’s try adding Type 1 or “Altruistic” Agency Sustainment. We can do that by incorporating the human behavior flowchart into the AI’s flowchart as follows:

flowchart both

Now when the AI thinks about coins, it will prioritize the human’s predicted behavior towards coins over its own desire for coins:


It will patiently wait for the human to collect as many coins as they want. In this case, our human character decided to share some of the wealth. The AI begins to move towards the coins when it sees that the human character is moving towards the exit and not seeking the remaining coins. However, it doesn’t start collecting them until the human has left the level and therefore certain that the human is no longer seeking the remaining coins. Had the human stopped before the exit and went back for the coins, the AI would have also stopped and let the human collect them.

Finally, let’s try Type 2 or “Colleague” Agency Sustainment. We’ll add a new check to see if there are enough coins that both the AI and the human can pursue their goals at the same time.

flowchart both 2

This results in a cooperative competition for the coins. Once the coins run out it will let the human collect the last one as the AI can no longer seek the coin without interfering with the human.


In this example, its need for the coins was equal to its model of the human’s need. If the human had tried to collect the battery, it would have attempted to prevent him. The AI’s need for batteries is greater than its model of the human’s need.

These examples are simplistic, but they illustrate the basic concept behind all 3 types of Agency Sustainment and how each type might align (or interfere) with the human character’s goals when those goals overlap its own.

However, it’s important to note that implementing Agency Sustainment using traditional, symbolic rules that we “program” into our AI is not realistic. A real AI will need to be able to learn all of these rules on its own without the need to pre-program them in.

Thought Experiments

We’ve seen how the concept works in simple examples, so now let’s extrapolate the concept into more complex situations and attempt to reason what the results might be.

Experiment 1: Diamond Boxes

This thought experiment was authored by Eliezer Yudkowsky in his paper Coherent Extrapolated Volition. There are 2 boxes, A and B. In one of the boxes is a diamond. You only get to choose one of the boxes. The AI knows that box B has the diamond. You tell it you’re thinking of choosing box A. The AI should be able to know that you really want box B and tell you to choose it instead.

We’ve touched on influencing agency in a previous section. As long as the AI has a reasonably accurate behavior model of you, it will recognize that the reason you are choosing a box is not because you want a box, but because you are seeking the diamond it may contain. It will tell you to choose box B instead.

Let’s say that the you and the AI both want the diamond, and the AI gets whatever box you don’t choose. A Type 1 AI will happily tell you to take box B anyway, as your desires override its own. A Type 2 AI will weigh why you want the diamond against why does. If it finds that the reason it needs it is greater than the reason you need it, it will tell you to stay with your choice of box A. If it finds that you need as much or more than it does, it will tell you to choose box B. A Type 3 AI will just tell you to stay with box A.

Experiment 2: Malicious Intent

What if we were to give the AI a harmful instruction? Suppose we really like our neighbors car. So we tell our AI to steal it for us.

A Type 1 AI would refuse. Ownership of the car is a choice that your neighbor made. The AI doesn’t only sustain the agency of its owner, but the agency of everything (including other AI agents). Stealing the car interferes with the choice your neighbor made to buy their car. The AI might instead suggest that you buy the same car from a dealership or offer to purchase the car from your neighbor.

A Type 2 AI would not steal the car unless it also needed the car (more that it thinks the neighbor does) and there was no viable alternative.

A Type 3 AI would simply ignore the request. Though if it wanted the car, it would steal the car for itself.

Experiment 3: Conflict

Adam asks our AI for apple juice. John hears the request and asks for some too. There is only enough apple juice for one person. Who does the AI give the apple juice to?

Both a Type 1 and Type 2 AI would give the apple juice to the first person who requested it, Adam. If John was fast enough to get to the apple juice before our AI then it would try to talk John out of taking it, but would not stop him. The AI would otherwise leave it to Adam and John to settle their dispute. Or, if the AI knew that John liked apple juice a lot more than Adam it would see if Adam would be willing to choose an alternative so that John could have the apple juice instead.

A Type 3 would ignore the request unless it was incentivised in some way. If it was, it would, like the other types, give it to whoever asked first. However, it would not care if the other person wanted it more or who got to the juice first.

AIRIS Implementation

I can’t speak for how Agency Sustainment might be implemented in other systems, but I can explain how I plan to do it in my system: AIRIS.

If you’re not familiar with AIRIS, here’s a quick overview:

AIRIS has a lot in common with traditional symbolic systems. It determines its actions using sets of rules, or symbols. However, instead of the symbols being programmed by a human, AIRIS deduces these symbols on its own from observing its sub-symbolic inputs. It creates an internal model of its inputs which it uses to predict the results of potential actions (see Modeling Information Incomplete Worlds). It then analyzes each predicted state to see how close it is to its goal. It is during this prediction modeling process that Agency Sustainment will be implemented.

In order to predict the actions of other agents it takes the information it has about the environment and its behavior model of the other agent, and makes an assumption about the agent’s current desires and intent. In the case of the coin examples above, it would assume that the human character seeks coins. So it would assume that the player character is going to move toward the nearest coin.

During the planning process, the Agency Sustainment heuristic would assign a negative analysis to any state where the AI interferes with the human collecting a coin. Either by collecting the coin itself or blocking the human in some way. If there is already something preventing the human from getting to the coin, such as a door that the AI could open, the Agency Sustainment heuristic would assign a positive analysis to opening the door for the human.

A Type 1 would just discard any plan that resulted in interference. Any plan that resulted in helping another agent achieve its predicted goals would get prioritized over any plan that only helped the AI achieve its own goals. Potentially to the point of self-sacrifice if necessary.

A Type 2 would weigh the analysis of both interfering and helpful plans against plans that pursue its own goals. In other words, it would help as long as helping wasn’t too disruptive to its goals or it would interfere if the actions of the other agent interfered with its goals and it considered its goals more important than those of the agent.

Type 3 is the default. It’s able to predict other agents, but it does not have a heuristic that takes their goals into consideration when making its own plans.

What’s Next

I plan to implement and experiment with Type 1 in the next few weeks. I’ll then try a few different Type 2 AI’s with varying levels of self-importance. I’ll be running them through similar environments to the coin examples above to see if the theory holds in simple situations.

Since Type 3 is the default, I’ll be using it as a baseline to compare the results of the other types to. It will be interesting to see if / how well they match up to my hand crafted examples from above.



Modeling Information Incomplete Worlds

The most important feature of AIRIS is its ability to generate predictive models of the environment it is operating in. It then searches those models to determine the best course of action to achieve its goals. In the example below from the normal information complete puzzle environment, you can see how AIRIS plays through each sequence of planned actions in its “mind’s eye” before actually performing the actions in the game. Then as it performs the actions, it follows along with its generated models to make sure that its predictions were accurate.

Airis Large Level Modeling

The importance of this feature is especially evident when AIRIS is operating in an information incomplete environment. In the example below, an untrained AIRIS is put into a puzzle game where it can only see a small portion of the game world at a time. As it explores, you can see how it attempts to generate a predictive model of what unexplored areas may look like. Since these predictions are based on the limited information from areas it has already seen, they are often incorrect. However, at the pause point in the middle of the example you can see that it has learned enough about an area it has explored to be able to start making small plans.

Airis Incomplete Exploration

Once it has explored a level, it can then go back and use those memories to make plans to collect the unseen batteries just as easily as if it could see the whole level at once. In the example below, it “remembers” what the level looks like based on its memories from when it explored it. It remembers where the batteries were, and makes its plan to collect them. Then it follows through with its plan while making sure that the level is how it remembered it.

Airis Incomplete 1st maze


AIRIS Beta Progress Update

After successfully testing extremely basic character recognition 6 months ago, I was introduced to the MNIST handwriting dataset generously provided to ML researchers by Yann LeCun, Corinna Cortes, and Christopher J.C. Burges.

The initial results from testing AIRIS on 1,000 characters from this dataset with no prior training was a meager ~%51 accuracy. I quickly realized that I would need to complete several more development milestones in order to achieve better results. So I buried my head in the code, and these are the resulting improvements:

Conceptual Understanding

Previous versions of AIRIS associated the casual relationships between states with all of the pixels in its inputs. In the case of MNIST that resulted in a poor ability to predict what the image label would be unless the image shown was very similar to an image it had seen before. In the case of the Puzzle Game Agency Test, knowledge about object interactions such as keys and doors in one level would only partially persist into the next level if the next level was also very similar.

However, AIRIS now has the ability to deduce causal relationships between states to just the relevant pixels. This conceptual understanding allows it to broadly apply its learned knowledge to novel situations.

Effect on MNIST Test

An MNIST test of just 100 characters with no prior training has a slightly increased accuracy of 55%. Since it had no prior training, it also had to learn the labels as it went. For example, the first image it is shown is a 7. It has no guess because it has no data on anything. The second image it is shown is a 2. It only knows of 7, so that’s what it guesses. The third image is a 1. Since it only knows of 7 and 2, it determines that the 1 is closest to the 7.

Below is all 100 of the images in the order they were shown to AIRIS. The number on the right is best described as its “doubt”. The smaller the number, the higher its confidence in its answer.

OgY4euc - Imgur
AIRIS attempting to recognize 100 MNIST handwriting images with no prior training.

After 1,000 characters its accuracy improves to 69.9%. After 2,000 characters its accuracy is 76.5%. I have not yet tested it beyond 2,000 characters because it took a little over 8 days non-stop to get that far with the current language and serial processing hardware I’m using. I’m looking forward to getting a final accuracy on all 70,000 MNIST images once AIRIS is ported to a faster language with parallel processing.

Effect on Agency Test

The most dramatic improvement is with the Puzzle Game Agency Test. Now instead of having to play through, or be shown the solution to each level, AIRIS can be shown simple human controlled “tutorials” such as this:

lD0n08Y - Imgur
33 frames from the human controlled tutorials

And just by observing the pixels and the commands input by the human, it can deduce the interactions between the objects (pixels) and apply that knowledge to achieve its goal of collecting batteries in levels it has never seen before. In the example below, it was shown 689 tutorial frames (such as the 33 frames shown above). From that small amount of data it was able to deduce how the state changes with each command, and then apply that knowledge to model the sequences of commands needed to solve the levels.

LdypPCq - Imgur
AIRIS completing levels it has not seen before after observing simple human controlled tutorials

AIRIS can also more effectively reason through situations it has not been trained on. In the example below, it has been taught all the object interactions except for fire on the other side of the door. Standing in a doorway is a different pixel value than standing in an empty space (represented by a blue square inside the brown square). So while it knew that the “robot” pixel could put out a fire by moving onto the fire from an empty space after collecting a fire extinguisher, it did not know that the “robot in a doorway” pixel could do the same.


It quickly learns that moving onto the fire from a doorway resets the level, but it isn’t certain if the fire is the cause or if that happens every time it moved right from inside a doorway. Which is why at the pause point in the middle of the example, you see the AI’s model of the world on the right has a whole lot more colored pixels than there should be. It thought that moving right against the wall would reset the level similarly to when it moved against the fire. It immediately saw that wasn’t the case, and deduced that a wall just makes it stay put. Eventually through experimentation, it learns that it needs a fire extinguisher to put out the fire on the other side of a door just like it needs it when standing in an empty space.

Effect on Contextual Memory

The efficiency of AIRIS’s Contextual Memory has also improved. By deducing down to only the relevant pixels to a state change, significantly less data is stored and accessed for prediction modeling.

All of the existing features of the Contextual Memory also remain intact. Such as the ability to seamlessly merge two uniquely trained agents into a third agent that has the knowledge of both original agents. The agent in the example below was created by merging one agent that was only trained on one-way arrows with another agent that was only trained on doors and fire. Independently, neither original agent could have completed the level without having to experiment and learn about the elements it wasn’t trained on. The combined agent has no difficulty whatsoever.

Airis Combined Agent
This agent is the product of merging one agent only trained on one-way arrows, and another agent only trained on doors and fire.

The Contextual Memory also allows for multi-domain knowledge to co-exist without interference. For example, all of the memories from the MNIST test and all of the memories from the Agency test can be contained within a single agent. The agent would be capable of both playing the puzzle game and recognizing handwritten numbers with no degradation to either ability.

What’s Next?

There is still a lot of work to be done! There are 2 new domain types that I will be testing AIRIS in next.

The first is an information incomplete version of the puzzle game, where only a small portion of the game world is visible at a time. AIRIS will have to remember and be able to model parts of the level that it can’t currently see to be able to achieve its goals. Edit: See

The second is the classic board game: Checkers.


Basic Character Recognition

I’ve created a new environment for AIRIS that tests its ability to recognize basic drawn characters. I draw a character, show it what to recognize that character as (label it), and it correctly guesses that character when presented with it again. It can also recognize a variety of variants to the trained character.

Improvement Via Repetition

In this video, AIRIS plays through 10 levels of the puzzle game 5 times. To help it along so that it doesn’t have to do a bunch of random experimentation, I first play through the 10 levels while AIRIS watches. It then uses those observations as a foundation for its own experimentation. After every repetition, its skill at solving the levels improves.

Initial Prototype Beta Video

Work on Prototype Beta is going even faster than I anticipated. It’s currently incapable of goal formation, planning, or performing actions on its own, but it is capable of making observations and modeling predictions.

In this video I am controlling the game and Airis is observing. The game actions I input are put into a plan (queue) which Airis uses to generate models of what it expects to happen. I then pass the plan into the game and Airis follows along to make sure that its model of what it expects to happen matches what actually happens in the game. If there’s an error in its prediction, it stops the plan and makes adjustments to its knowledge in order to make a more accurate model next time.