what Overkill learned

I wrote a little Perl script to read Overkill’s learning files from AIIDE 2015 and add up the numbers. The three strategy names are as Overkill spells them. The opponents are listed in tournament order, so the strongest are at the top.

	NinePoolling		TenHatchMuta		TwelveHatchMuta		total
opponent	n	win	n	win	n	win	n	win
tscmoo	57	26%	19	11%	18	11%	94	20%
ZZZKBot	80	46%	8	0%	8	0%	96	39%
UAlbertaBot	61	30%	20	15%	10	0%	91	23%
Aiur	13	54%	66	80%	3	0%	82	73%
Ximp	2	0%	30	83%	57	93%	89	88%
IceBot	4	25%	72	83%	14	57%	90	77%
Skynet	13	62%	19	68%	58	84%	90	78%
Xelnaga	75	81%	12	50%	3	0%	90	74%
LetaBot	78	100%	10	70%	2	0%	90	94%
Tyr	6	33%	25	64%	53	77%	84	70%
GarmBot	27	96%	27	96%	36	100%	90	98%
NUSBot	66	100%	13	77%	11	73%	90	93%
TerranUAB	30	100%	30	100%	30	100%	90	100%
Cimex	56	100%	33	94%	2	0%	91	96%
CruzBot	30	100%	30	100%	29	100%	89	100%
OpprimoBot	24	96%	33	100%	33	100%	90	99%
Oritaka	56	98%	10	70%	24	88%	90	92%
Stone	56	93%	12	67%	21	81%	89	87%
Bonjwa	30	100%	30	100%	30	100%	90	100%
Yarmouk	30	100%	30	100%	30	100%	90	100%
SusanooTricks	32	100%	23	96%	32	100%	87	99%
total	826	80%	552	80%	504	83%	1882	81%

The number n here is not the number of games played. There were 90 rounds. Some games were perhaps not recorded due to crashes or other errors, which could explain why some opponents have n< 90. Also, when the 10-hatch mutalisk strategy failed, Overkill assumes it must have lost due to a rush that would also kill the 12-hatch muta strategy. In that case Overkill records 2 game records, a 10-hatch muta loss and a 12-hatch muta loss, explaining why some opponents have n> 90. At least that’s what the code says; some of the data in the table doesn’t seem to match up (see the Xelnaga row). What did I miss?

Some of the strategy choices make sense intuitively. Overkill learned to get early zerglings against ZZZKBot and UAlbertaBot which play rushes, and learned that a more economy-oriented strategy worked against XIMP with its later carriers. These are examples of learning as a substitute for scouting and adapting.

Look at the bottom row. Each strategy ended up with virtually the same winning rate; the UCB algorithm evened them out accurately. But it didn’t use the strategies equally often; the 9-pool was more successful on average against this set of opponents. The early zerglings are important against many opponents, for whatever reason.

Look at the individual lines. Except for weaker opponents that Overkill defeats no matter what, for most opponents one or two strategies were clearly better and were played more often. How much did Overkill learn? If it had played strategies randomly, then the winning rate would be the average of the strategy winning rates. The gain can be estimated as the total winning rate minus the mean of the strategy winning rates—how far did you rise above ignorance? The number varies from zero to huge for different opponents. Because of sampling effects, the estimate will statistically tend to be higher than the truth.

This learning method has to play weak strategies to find out that they’re weak, so it can’t be perfect. The regret for each opponent can be estimated as the difference between the total winning rate and the winning rate of the best strategy if you’d known to play it from the start—how far did you fall short of omniscience? For many of the opponents, the regret estimated that way is 6% to 7%. If the learning algorithm converges to an exact solution, then in an infinitely long tournament the regret will fall to 0. Thinking about numbers like this can give you an idea of when learning makes sense.

The Perl script and related files are available as a zip archive.

Trackbacks

No Trackbacks

Comments

No comments

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA