learning signals
For strategy learning, current bots as far as I’ve seen learn based on the game result and nothing else. It’s also the only learning I’ve written up so far.
I’ll tell you one of the deep secrets of the dark magic of machine learning: If you want to learn better, don’t grub around for a better algorithm like I did yesterday. I mean, you can and it will probably help. But first, dig for better information to learn from. The big gains most often come from finding better learning signals. Yesterday’s suggestion about generalizing across opponents and across maps was an example.
When Bisu loses, does he adjust his probability of playing the game opening downward? Not like that, no. He thinks through the game events and finds the cause of the loss. If Bisu came out of the opening in a sound position, then it would be silly to blame the opening for the loss, no matter what happened later in the game. (By the way, this is an example of the classic credit assignment problem, one of the oldest named problems in AI: I got this result. What features of the situation deserve credit or blame for the result?)
I expect that it will be a long time before bots can reason about cause and effect. But they should be able to figure out “am I more likely ahead or behind?” In fact, that can be a learning target itself. The input data is scouting info seen at a point during the game, and the goal might be (for example) to estimate the actual supply difference as seen in a replay (if you do learning from replays) or to estimate the probability of winning the game (which works for learning during games by temporal differences—worth reading up on if you don’t know it). Once you have the ability to estimate whether you’re winning, you can learn to choose the opening that leaves you in the strongest position, not the opening that is seen to win most often. If your estimate is good then it provides more and better information (a score at the correct point in the game) than whether you won or lost the game (1 bit after the game), so you’ll learn faster and better.
As a rough cut you could say: A quick win or loss is definitely related to the opening. If the bot adapts during the game, then the longer the game, or the more adaptation done after the opening, the less credit or blame the opening is likely to deserve. In fact, if you lost a long game then the opening might deserve credit for putting you in a position to survive that long!
The same general idea, look for good learning signals, goes for all kinds of learning. You already knew that if you want your bot to learn micro, don’t count won or lost battles, count units lost and damage done. It’s obvious, right? And so on.
Comments
krasi0 on :
Sometimes, I won't have a better comment as a reply than the above. In those cases, I'd rather not post anything, but the lack of replies shouldn't dishearten you. It doesn't equal a lack of interest.
So please keep the analyses coming! :)
IMP on :
Here we are, soaking it up but without having anything of value to add just yet ;) Thank you for all your effort.