GAMING WITH DATA SCIENCE

Monday, 27 June 2016

Lifelike Artificial Intelligence in Virtual Reality

Last weekend I started a new Virtual Reality and Artificial Intelligence project. This time I am using cats to model lifelike behavior in a virtual environment of a typical home with typical furniture and other objects. Behavior and AI controlled animations are still very limited, and there is lots of work to be done in those areas.

My goal is the same as in my previous projects - to make an lifelike artificial intelligence module with personalized and unpredictable, but logical behavior which evolves while the subject learns new things and experiences both pleasant and unpleasant events. This is not based on state machines or simple behavior trees. Instead, it is based on my own research in the field of artificial intelligence, and it resembles real cognitive information processing. It has both low level reactive and higher level analytical modules. Thus, it can react quickly when necessary, but otherwise it may analyze the situation and formulate complex plans.

While AI algorithms resolve the actions and emotions of the cat, these resolutions are expressed to the viewer by animating the movement, posture and expressions of the cat. Changes in emotional state are relayed to animation subsystem, which then adjusts head, tail, fur, legs, eyes and mouth in a realistic manner.

This is just a beginning, and many behavior patterns are still missing, as are many animation patterns. But my plan is to have quite soon a believable virtual cat in a virtual reality environment.

Friday, 27 May 2016

Old Garden

Old garden was still and quiet, except autumn breeze rustling the leaves. Only owls watched over me. Owls are not what they seem. Concept art from an unpublished game

Thursday, 26 May 2016

Train to Nowhere

Train to Prostokvashino

Prostokvashino station

Conceptart for an unpublished game

Saturday, 9 April 2016

AI Design for game and simulation environment

I have a long history in AI development, but my goal has always been to design and develop life-like AI. I see classic state machines as semi-AI, not really AI, but as a kind of a control logic. As processing power of computers and mobile devices has increased, it is now possible to program an AI which gives an impression of natural behaviour, even on mobile platform.

This AI is designed to mimic real, unpredictable behaviour of intelligent, semi-intelligent and non-intelligent beings. It is based on ideas from fuzzy and non-linear systems, which have certain rules and causalities but some degree of randomness, also. It is not totally deterministic, but not totally random, either. It should be emphasised that this is not a state machine.

As designed, this AI develops independent behaviour patterns in a node based associative and constructive memory. Nodes represent concepts, locations, actions and objects using a simple semantic language. Based on experience and training an AI entity will generate these nodes and link them to other nodes. During this process it will evaluate these nodes according to outcome of real events. Thus, it can separate e.g. actions or objects which are harmful, and avoid them in the future, or it can assess them as beneficial.

It is possible to evolve very complex behaviour patterns with deeply linked memory nodes, where low level concepts are combined and constructed into larger concepts. Some of these nodes represent basic instinctive behaviour, but entity will gain more experience or training later on. In any situation, an AI entity will evaluate possible actions according to what objects, entities etc. are present, what kind of environment it is in, what are its present goals, mood, physical and mental condition and physical needs using the concepts it has stored in its memory and whether they lead to beneficial or harmful outcome.

An AI entity can learn and change its behaviour by changing previously evaluated and assessed things and their parameter values. This makes it possible to adapt changing environment and situation, or to learn by training, where another entity gives positive and negative feedback. Basically, an AI entity can learn by trial and error.

Part of this AI is a communication system. An entity can express itself using rudimentary language. Depending on entity this communication may use vocal expressions, visual signals, smell or other senses. Any AI entity is capable to understand its environment - not only objects, but also these signals, and respond accordingly.

This AI framework has modular design and it is possible to customize it for various scenarios where natural, unpredictable behaviour is needed to simulate lifelike entities. In this case it has been adapted to represent dog behaviour, which is interesting since it has to be balanced between strong instinctive behaviour and conditioned training.

Wednesday, 30 December 2015

There are some very interesting "revelations" about the nature of ‪RPG‬ in the Official Book of ‪‎Ultima‬. This one is really important

"With Ultima IV, Garriot resolved to forge a world in which he could demonstrate the logical nature of ethical values... and why it makes sense for people to live by such values as natural habits in the game world, not just because the quest says so, but because the world will react more favourably to you. The antithesis of that as a habit is that the world will react negatively against you.
...
That is the whole story of Ultima IV. The quest you have to fulfill is largely irrelevant. The fact that you have to take item A to location B to solve the game is not what it's about. The game is about social issues of being a good person the world as opposed to evil person"

This is exactly the principle that I have personally kept as a guideline while designing RPG worlds. World should be like a real world which reacts to anything you do in a "logical" manner. Meaning that always there should be a clear reason to everything that happen. However, these reasons do not have to be clear to the player. That is part of the puzzle to find out why things happen as they happen. IMO Ultima parts IV - VI succeeded in this.

Tuesday, 24 November 2015

Importance of Computational Complexity

On frequent intervals I organize virtual workshops for selected individuals, who have some background in databases and are interested in learning SQL and NoSQL. These are purely learning experiences for every participant, where you can get some exposure to elementary procedures in NoSQL. But I have one hidden agenda, also. I try to teach them some basic ideas in computational complexity, which are quite often omitted in many computer courses.

Algorithms and computational complexity is the key thing to understanding why your program or DB query has so poor performance - meaning, why it so slow. Unfortunately, many computer science students hate algorithm analysis courses where the principles of computational complexity is taught. They seem to miss the point that algorithm analysis is a stairway to better programming, and is a necessary skill for DB query optimization, Data Science and Big Data analysis.

Latest workshop focused on MySQL and ArangoDB as an example engines. Participants had to learn some queries on premade SQL database, then export all data to ArangoDB, and build similar queries. This gave everybody an opportunity to compare SQL and NoSQL query styles, their performance and principles of computational complexities with query optimization.

During the workshop there was some really interesting progress on queries, which shows nicely what computational complexity is, actually.

Familiar country - SQL

After familiarizing themselves with current database the participants had several tasks where they had to write queries to collect certain types of data sets. One simple task was to find out top scoring players with player name included. Player data was in its own table, and scores were recorded in their own table including game type and other game information.

One early version of necessary query was made like this by a participant

Query A.

SELECT u.playername, h.gamescore
FROM (SELECT playerid, MAX(gamescore) as highscore
FROM scoretable
WHERE gametype = 'CTF'
GROUP BY playerid) x
JOIN
scoretable h ON x.playerid = h.playerid AND x.highscore = h.gamescore
JOIN
player u ON h.playerid = u.playerid
ORDER BY h.score DESC LIMIT 10

Participants were quite satisfied with it, since its running time seemed quite fast. It was 76 ms, to be exact. But I had to point out, that it is possible to make it faster. You can easily see, that Query A is doing unnecessary work since the query joins maximum scores for each and every player with player data, and after that sorts them in descending order picking out 10 of the best.

This means that query first groups gamescores and picks maximum value for each scoring player (N), then joins this with playerdata (M) and sorts the joined table. But we are only interested in 10 best scoring players so it is better to sort those score immediately and after that join only those best 10 players with player data table, like this

Query B.

SELECT u.playername, x.highscore
FROM (SELECT playerid, MAX(score) as highscore
FROM scoretable
WHERE gametype = 'CTF'
GROUP BY playerid
ORDER BY highscore DESC
LIMIT 10) x
JOIN
player u ON x.playerid = u.playerid

and also get rid of that unnecessary join in original query. Running time is actually halved to 35 ms. Joins cost in performance, and now we are joining only the 10 best players, not all scoring players.

Hostile territory - NoSQL

When SQL tasks were completed, whole database was exported into ArangoDB. This was very simple task, since you can export MySQL database in JSON format which is really easy to import into ArangoDB.

After the import participants had to recreate the same queries as before, but now with ArangoDB query language AQL. Participants had only few problems with AQL. It has some similarity with SQL which makes it easier to "jump into" for anyone familiar with SQL - compared to Javascript style queries. But there were enough differences also which lead to many mistakes.

Biggest difference seemed to be the process order, but when participants finally understood that AQL is like mix of procedural SQL and basic SQL, they started to grasp the skill of building AQL queries. Their first try on '10 best players' query was far from efficient, however.

Query C.

FOR d IN players
LET hscores = (
FOR s IN scoretable
FILTER s.playerid == d.idplayer
FILTER s.gametype == 'CTF'
LET numscore = TO_NUMBER(s.score)
COLLECT pid = s.playerid INTO scores
RETURN {'hscore': MAX(scores[*].numscore)}
)
SORT hscores[0].hscore DESC
LIMIT 0,10
RETURN {'player': d.playername, 'score': hscores[0].hscore}

This query works and gives correct result, but it has a disastrous performance of 26 seconds (26000 ms), which is unacceptable. After I guided players to analyse its computational complexity, they soon understood why it was so. You are going through every player in database, then every time go through score data to calculate his maximum score (even if he has no scores at all), then sort them and pick the 10 highest scoring players.

The real problem is the sort operation, however. Sort is done in such phase, that query optimizer cannot optimize it. Actually, SORT is a bit of a bottleneck in ArangoDB. But in this case there is also a lot of unnecessary calculations. Problem was that participants tried to join player data and player scores and collect the result simultaneously, like they did in SQL.

In AQL you can operate procedurally step-by-step. I guided participants to collect first just 10 best scoring player ids, and then join that with player names, leading to this query

Query D.

LET leader = (FOR s IN scoretable
FILTER s.gametype == 'CTF'
LET numscore = TO_NUMBER(s.score)
COLLECT pid = TO_NUMBER(s.playerid) INTO scoresbypid
LET hscore = MAX(scoresbypid[*].numscore)
SORT hscore DESC
LIMIT 0,10
RETURN {'pid': pid, 'score':hscore}
)
FOR l IN leader
FOR d IN devices
FILTER d.idplayer == l.pid
RETURN {'pid': d.playername, 'score': l.score}

Participants were astonished, when running time dropped to 132 ms. This performance is good but not the best. Only difference was that first we operated only on scoredata, grouped them by player id, calculated the maximum and then sorted it picking the 10 best. After that, this data with only 10 objects was joined with player names.

Even in this case the sort operation is a bottleneck, but less so than in previous case. To conclude the session, I presented final modification to Query D in order to optimize it even more.

Query E.

LET leader = (FOR s IN scoretable
FILTER s.gametype == 'CTF'
SORT s.score DESC
LIMIT 400
LET numscore = TO_NUMBER(s.score)
COLLECT pid = TO_NUMBER(s.playerid) INTO scoresbypid
LET hscore = MAX(scoresbypid[*].numscore)
SORT hscore DESC
LIMIT 0,10
RETURN {'pid': pid, 'score':hscore}
)
FOR l IN leader
FOR d IN devices
FILTER d.idplayer == l.pid
RETURN {'pid': d.playername, 'score': l.score}

Only difference is the sort operation before grouping. This may sound like a stupid idea, because we already know that sort is a bottleneck. However, it performs quite fast on original data. Since we are looking for 10 best scoring players, we are not interested in low scores at all. In this case, no player had more than 200 scores recorded. So, I just sorted quickly all scores and took only 400 best scores to subsequent calculations. Running time of this query dropped to 33 ms, which was even better than best MySQL query on fully optimized database.

Aftermath

Participants found it interesting how fast a SQL database is when it is properly indexed and uses foreign keys, so it can optimize queries. So you should not think that RDBMs and SQL are obsolete old technology. They are still very competent at what they do, if you know how to tune them.

It was also interesting how easy it is to do sloppy queries in NoSQL, and how you actually have to plan queries. On the other hand, in this case ArangoDB's AQL gives you lots of option to filter out the data you need.

Docker, future of virtualization?

Calling Docker a virtualization platform does not do it justice.In a sense it is more than that, although it is kind of virtualization, but it covers a wide spectrum from microservices to software distribution. Docker units are not virtual machines (VM) in the same sense as VirtualBox VMs, they are called containers. Unlike VM, which has guest OS, a container is more like a package of software with filesystem. Guest OS is missing completely and Docker engine runs the software in container.

Docker is an open source platform to develop, deploy and run distributed applications. It is extremely useful for developers, not just administrators. I have used virtualization platforms like VirtualBox and VMWare for more than a decade, and quickly saw the possibilities in Docker.

Since I have to develop quite often various software solutions and analysis pathways, I need a quick and reliable way to run software on various platforms in such way that components are isolated and contained. Sometimes I need SQL or NoSQL database on ad hoc basis for quick data import and analysis. Docker is invaluable tool, that solves scenarios where you have to develop and test something without the risk of messing other active processes.

Docker has several advantages. It is lightweight. It is isolated and secure, but uses host memory and processing resources more efficiently than VM. Perhaps the best part is that Docker is based on open standards, and the software itself is open source and free.

What makes it even better is portability and flexibility in matters of infrastructure. When properly packed, a docker container should run on any docker supported environment. Docker itself will take care of all dependencies in software.

In my opinion, Docker should be part of any Data Scientist's toolbox.