On sanma stable rank estimation

2025年1月27日 09:05

Motivated by 一方通行's work (hereafter Ippou's work) on yonma stable rank estimation, I explore here the estimation of sanma stable rank in Tenhou. I want to first point out this work is of little value for estimating one's own stable rank because for one player the statistical error scales as $${1/\sqrt{N_\mathrm{games}}}$$, often much larger than the bias of the naive estimator. Only when one studies many players at the same time, for example all players who has played 100-200 games does such bias become important. With my limited testing it seems I have found an estimator that performs better than a direct generalization of Ippou's work. Of course, both of these are much better than the naive estimator.

The simulation result in Ippou's work looks very good. However, when we have a small number of games, it seems we have some chance of getting no last place and the estimator is ill-defined. This might appear irrelevant but might also help us get better results for small number of games.

Let us first defined our notation. In Tenhou sanma there is no point change for second place. So we ignore those. Assuming we have a total of $${N}$$ games, in which we have first place $${n_1}$$ times and last place $${n_3}$$ times. We assume these resulted from $${N}$$ independent draws each with $${p}$$ probability for last. $${\hat{p}=n_3/N}$$ is the observed last frequency. True value of stable rank in houou is $${9/p-2}$$. This is linear in $${1/p}$$ so our task is the same as studying estimators for $${1/p}$$. The naive estimator for $${1/p}$$ is $${1/\hat{p}}$$.

To motivate our estimator, consider the same problem in Bayesian framework with flat prior for $${p}$$. The result is $${\frac{n_1+n_3+1}{n_3}}$$. If we have a beta distribution as prior we simply add that to $${n_1}$$ and $${n_3}$$. This motivates us to consider estimators of form $${\frac{n_1+n_3+\alpha}{n_3+\beta}}$$. In particular, if $${\beta>0}$$ we get rid of the singularity.

A direct generalization of Ippou's work gives us estimator for $${1/p}$$
$${\frac{1}{\hat{p}} (1-\frac{1-\hat{p}}{\hat{p}^2 (N-1)})}$$. In evaluation of the correction term I liberally replaced $${p}$$ with $${\hat{p}}$$ except for the variance. This might leads to a bias of higher order.

Now we can simply match this estimator with our template up to order $${1/N}$$, and find $${\alpha}$$ and $${\beta}$$. The result is $${\alpha=0}$$ and $${\beta=1-\hat{p}}$$. And the resulting estimator for $${1/p}$$ is $${\frac{N}{n_3+n_1/N}}$$.

Lastly we show the results of a numerical test. We concentrate on $${p=0.4}$$. This corresponds to a stable rank of 11.5 dan, and should provides an upper bound for bias in real life applications. I tested number of games from 5 to 839, with four estimators: naive, Ippou, Ippou2 where I replaced $${N-1}$$ by $${N}$$, and my estimator ("pokeii"). When $${n_3=0}$$ I throw away the trial except for my estimator, as all the other estimators are undefined. For each number of games I simulated a million times. For small number of games it appears my estimator performs better than Ippou and Ippou2. Incidentally, Ippou2 performed better than Ippou, indicating we happened to reduce some of the order $${1/N^2}$$ bias. This stays true for $${p=0.6}$$ (stable 4 dan). While for large number of games the results the difference of these are dominated by statistical noise but it is clear the naive estimator should not be used as it still has a bias of ~0.1 dan at 300 games.

On sanma stable rank estimation

いいなと思ったら応援しよう！