offline and online learning

Offline learning means learning at home; you do it before the tournament. Online learning means learning from games as you play them; you do it during the tournament. For good play, you want to learn as much as you can at home—humans are the same. But some things come up only during live play and can’t be prepared ahead of time. If you face an unfamiliar opponent or an unfamiliar map and need to adapt, you have to adapt on the spot.

Reinforcement learning algorithms are online algorithms, which means that you can use them either online or offline. Overkill tried to do both, and it looks as though the offline learning was successful but the online learning could not get enough data to bite. I love this part: Batch learning algorithms, which are considered offline algorithms, are more appropriate for the small amounts of data we can get in online learning. The names turned out to be reversed!

deep learning and memory

Here’s a thought experiment. Suppose you are DeepMind, and you know from your name that you’re going to do deep learning and no other kind. You decide that you want to learn opponent models, which has to be done online in real time. Deep learning is powerful, but it is also data-hungry to the max. It can’t work online. What do you do?

There are ways.

The simplest way is to make the past part of the input. Your deep learning network can’t remember things from game to game, but you can remember for it and let it know. For opponent modeling, you remember details about the opponent’s past behavior and present them as input to the network. You can collect a lot of details; deep learning is able to cope with large amounts of undigested input data. If you train your network thoroughly at home, it will learn how adapt to different kinds of opponent.

Another way is to give your deep learning network a memory under its own control—let it decide what information to store for next time. “What should I remember?” becomes part of the network output alongside “what should I do?” In the next game against the same opponent, you feed in “what should I remember?” from the last output as part of the current network input. In training, the deep learning model learns what it should best remember. Similar experiments have been done (not in Starcraft), so the idea is known to work. Figuring out what to remember is trivial next to playing go!

More generally, if you have a powerful learning method, then it can learn more than how to play on its own. It can learn how to play plus how to operate whatever kind of analysis machine will help it play better. A memory is only one example of an analysis machine that deep learning could learn to operate. It could also operate a build order planner or a tactics planner or whatever, request exactly the plans best for the situation as a whole, and take those plans as part of its input.

Summary: 1. Offline learning is usually preferable. 2. Online learning can be done offline. It’s at least good to know you have a choice!

Trackbacks

No Trackbacks

Comments

sijia xu on Sunday, November 27. 2016:

yes, reinforcement learning is really a attractive model, it mimick the human/animal's low level learning method, through the tradeoff between exploration and exploitation, one can gradually set up its own action model and the whole process is totally automatical.
so it maybe more sutiable than supervised learning(offline learning) to do some self-control agent/bot's develop.
for deep learning, I think it much like the eye of human, and the reinforcement learning is similar to the brain, combined them all is really amazing!

krasi0 on Monday, December 12. 2016:

Jay, how does that relate to LSTM networks?

Jay Scott on Monday, December 12. 2016:

Good question! The “long short-term memory” is a neural network design and can be used with deep learning. It is theoretically a way for deep learning to learn as it goes—train it after each game and it learns something of the events of the game and how they relate to the desired output (like winning). Tscmoo can tell us how well it worked for him. I’ve never used it and can’t say, but my first guess is that it could be highly flexible in its responses but slow to pick up on what it needed to do.

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA