Description and modeling of FlapMMO score data
In this post, an exploration of a dataset of scores is performed, using descriptive statistics and testing some probabilistic models.
Retrieving data. Thanks to the work of Connor Sauve, it is possible to retrieve the flow of data, which contains for each attempt useful information as:
- an id field, which uniquely identify the player,
- a nickname,
- a date,
- a list of dates of jump (from the beginning to the end of the attempt).
Finally, I get two datasets:
- data20140213, collected by Connor Sauve on 13 Feb 2014, which contains about 400,000 attempts with more than 18,000 different players,
- data20140302, collected by myself on 2 Mar 2014, which contains about 100,000 attempts with more than 5,000 players.
In the next plots, data20140302 will be used. In the last paragraph, a brief comparison between the two datasets is done.
Variable of interest. For each id, I only focus on the successive pipes where the bird bangs. Then, I transform my datasets to obtain something like this:
|id||attempt 1||attempt 2||attempt 3||attempt 4|
For example, we can see that player with the id 3266 played 2 times. In his first try, his bird banged the second pipe. In his next try, his bird banged the first pipe. Then he stopped to play.
Note that I removed the attempts which do not reach the first pipe.
How much time each player continues to play? In the following graph, I plot the percentage of players as a function of the number of attempts.
We observe that most of people only play a few times: 50% of the players play 10 times or less, and 75% of the them try less than 25 attempts.
From this plot, we deduce the probability that a player plays again as a function of the number of attempts done.
This plot suggests that the more a player tries the game, the more he continues to play. It might reveal that the game is addictive.
How far each player go? The next descriptive graph represents the frequency of players as a function of the pipe reached in their best play.
Here, we observe that most players are able to pass a pipe, but 50% of them don’t reach the fifth pipe and 75% lose before the eighth pipe. It is noteworthy that someone reached the 140th pipe (outside of the graph).
Evolution of the score between two consecutive tries. Knowing the score (the pipe where the bird bangs) of a player for an attempt, we want to infer the score of the next try. For this purpose, we use a homogeneous Markov model. This is a simplistic model, because the next score may depend on all the history of scores (not a one step Markov model), and on the number of attempts the player has already done (not a homogeneous model).
An empirical transition matrix is obtained, where each cell represents the probability that a player who scores in a try will score in the next try. Only states are kept:
Here ‘’ represents a score greater than and ‘’ means that the player stopped to play. The matrix is given by:
For example, the probability that a player who scores in a try will score in the next try is . The probability that a player who scores greater than will leave the game is .
From this matrix, we represent the probability to score (respectively to score greater than ) in the next try as a function of the current try.
We deduce that players who score high for a try tend to score high in the next try, and vice versa. Then, this game is not a random game.
Also from the matrix, we plot the probability to leave the game as a function of the current try.
Thus, people who reach great scores are more likely to leave the game.
Skill of players. Now, we take players individually and for each of them, we want to measure his skill. We only assume the following assumption:
“When the bird of the player is in front of a pipe, it dies with probability ”.
Then for the player , each score follow a geometric distribution (on ) with rate . After estimation of the rates for all players, we plot the histogram of rates for players who make more than attempts.
The distribution of rates is not uniform and most of players (with more than attempts) have a rate around .
Note that the geometric hypothesis was tested with the Cramer-von Mises test, and for almost all the players, the hypothesis of geometric distribution cannot be rejected with this test (even for players who play many times).
Evolution of the skill. The previous model cannot exhibit the evolution of the skill when a player is making many attempts. To fix that, we modify the previous assumption by:
“When the bird of the player is in front of a pipe, letting the number of attempts already done by the player, it dies with probability ”.
We use a uniform convolution to estimate each rate . Then, for players who have made many attempts, we can observe if their skills increase well or not. Here the plots for 3 players.
Player begins without knowing how to play and then constantly improves his performance.
Player knows how to play, and his skill improvment is slow.
Player played many times but his skill remains more or less constant.
Comparison between data20140213 and data20140302. The shape of all plots looks similar, except that in data20140302 the skill of players is higher.
- Connor Sauve post, about how to retrieve data.
- How to install node.js on Debian.
- How to use node.js. (in French)
- How to install and use MongoDB. (in French)
- How to use MongoDB for node.js.
- An article performing a comparative study of goodness-of-fit tests for the geometric distribution.
- How to obtain confidence intervals for the rate of a geometric sample.
- Be cautious with log/log graphs and fitting...
Code and data:
- Check out data and code in github: FlapMMO code and data.
- How to find reasonable ways to fit my plots of sections "How much time each player continues to play?", "How far each player go?" and "Skill of players" ?