AIRIS is a powerful new causality based approach to reinforcement learning.

Through self-driven observation and experimentation, AIRIS is able to generate a dynamic internal world model that it uses to formulate plans of to achieve goals.

AIRIS has demonstrated the ability to quickly, effectively, and completely autonomously learn about an environment it is operating in.

History of AIRIS:

Current Prototype

The Current Prototype is written in Python. It has all the features of Prototype Beta, but is far more efficient and effective. It is currently being tested using environments from Gymnasium and OpenAI’s ProcGen Benchmark.

Prototype Beta (Complete)

Similar to the Alpha prototype, this version of AIRIS was developed in GameMaker: Studio. However, this version uses unsupervised raw pixel data for inputs. This allowed for a much broader variety of testing environments. Such as more complex puzzles, varying world mechanics, and image recognition.

Its final features included:

  • Unsupervised, Continuous Learning – It was placed into an environment and learned how to solve tasks with no prior training.
  • Indirect Learning – It could be trained by observing a human performing the desired task(s).
  • Small Data Learning – It required very little training data to become proficient at tasks.
  • Information Incomplete Modeling – It could operate in environments where only some information about the environment is available at any one time.
  • Contextual Memory – Its knowledge was domain agnostic. That allowed knowledge of multiple domains to seamlessly coexist in the same agent.
  • Knowledge Merging – Both intra-domain and multi-domain knowledge could be shared between agents.

Prototype Alpha (Complete)

The original Prototype Alpha of AIRIS was created using a video game development tool called GameMaker: Studio. This allowed for the rapid development of test environments and ensured that the concept was general enough to not require specialized software or hardware.

This prototype was fed semi-supervised information (sprite types and locations relative to it),  from the game engine for its inputs. This allowed for rapid debugging and concept testing, but had the side effect of limiting its overall capabilities. At its peak performance, it was able to complete a 5 step puzzle (collect extinguisher, put out fire, collect key, unlock door, and collect battery) with no initial information about the game. Just by autonomously exploring, observing, and experimenting.