Rich Sutton and The Bitter Lesson

In one of the Undermind podcasts there came up the recent Rich Sutton essay The Bitter Lesson. The basic point of the essay is true: “general methods that leverage computation are ultimately the most effective.” (That is why, though I don’t post about it, behind the scenes I am implementing a machine learning method for Steamhammer.) Nevertheless, I have a complaint about what he says.

Rich Sutton (home page) is a big name in machine learning, and his opinions are worth attention. He is co-author of the major textbook Reinforcement Learning: An Introduction, which I recommend. On his home page you can find more short essays on AI topics; I particularly like Verification, The Key to AI. He’s a smart guy who thinks things through carefully.

Reading The Bitter Lesson, the first paragraph makes perfect sense to me. I might quibble with minor details, but I have no substantial objection; I agree with it, it’s correct or nearly so. Then he goes into examples, and there I feel that he misrepresents history in a cartoonish way. That’s not how the bitter lesson was or is learned.

In the computer chess example, “this was looked upon with dismay by the majority of computer-chess researchers” is false. The Chess 4.x series of programs by Slate and Atkin showed in the 1970s that massive search was more successful than any other method tried. Long before Deep Blue defeated Kasparov, work on knowledge-based approaches was a minor activity; I have a pile of International Computer Chess Association journals to prove it.

The example of computer go is also presented misleadingly. I subscribe to the computer go mailing list and have listened in on the conversations since before the deep learning revolution in computer go. In the old days, people were writing complex programs with too many varieties of hand-coded knowledge, not because they were sure it was the right way—I think everyone agreed that it was not very successful—but because nobody had found a better one. There were experiments with neural networks that intimated that a breakthrough might be possible, but until AlphaGo, the breakthrough was not achieved. The big lesson was not “use a general method of massive learning,” it was “here is the recipe for massive learning.” Finding the recipe was extremely difficult; people had been working on it for many years of slow progress, and victory did not arrive until there were technical breakthroughs in deep learning and DeepMind brought its great resources to bear. Knowing that you should use massive search and/or massive learning is only a tiny part of the battle; figuring out how to use them is the real fight.

That is exactly where we stand in Brood War. We know, and most agree, that search methods (as used in combat simulation) and learning methods (like deep learning) have the potential to be successful. And we have intimations of a breakthrough, bots that have achieved some of that success, like LastOrder with its macro model and CherryPi in TorchCraft with its build order switcher. But we don’t know the full recipe yet, and finding it is difficult work.

Trackbacks

No Trackbacks

Comments

Marian on Wednesday, May 1. 2019:

Honestly I don't think that macro is a particularly difficult problem. Sure it's nice if a bot can learn it by itself but deterministic algoritms with support of book of openings can do the job really well.
The interesting problems in my opinion are:
- estimating army strength fast and decisively by taking all the unit types, special abilities and terrain into account
- estimating what opponent is doing and how many units/buildings it has with limited scouting information (opponent modeling)
- estimating where enemy unit might be with limited scouting information
- army group movement and positioning
- multiple group synchronization

Jay Scott on Wednesday, May 1. 2019:

LastOrder’s macro model tries to take into account both the strategic and tactical situation in deciding what to build or research next. It attempts to do some of what you’re talking about.

Jay Scott on Wednesday, May 1. 2019:

And CherryPi’s build order switcher is really a strategy switcher that takes the whole situation into account. The strategies are a fixed, hand-written set, but choosing one potentially can include most of what you mention.

Joseph Huang on Wednesday, May 1. 2019:

Macro is definitely is hard, how much should you follow a fixed build vs reacting to what the opponent is doing?

Tully Elliston on Wednesday, May 1. 2019:

Thanks for the undermined podcast! Any other recommendations?

jtolmar on Saturday, May 4. 2019:

The modding community just interviewed the programmer who wrote the game's built in AI. Might be something you're interested in.

http://www.staredit.net/topic/17809/0/

Jay Scott on Saturday, May 4. 2019:

Sorry about the delay approving your comment. My internet was out. :-(

Johannes Holzfuß on Sunday, May 5. 2019:

Hi. Can you take a look at RedRum? I think it is cheating somehow.

In a game on stream against the new bot Dragon it drew its "I can't see this unit, but I know it used to be there" rectangles for invisible units and then *kept them up to date* while under the fog of war.

The game starts at https://www.twitch.tv/videos/420630546 around the 5 hour mark.

Jay Scott on Sunday, May 5. 2019:

Yes, RedRum appears to be cheating. I will check it out—after I examine Dragon, which is more interesting!

MarcoDBAA on Monday, May 6. 2019:

Watched some Dragon games.

Dragon scored an impressive win vs SAIDA, where it exposed SAIDAs goliaths, by luring them into tank fire repeatedly with its wraiths. M&Ms did well vs PWs carriers and storm (expected for a CPi descendent) too.

However it showed a clear weakness dealing with harassment. Overreactions vs just a few units, getting stuck trying to chase tscmooz mutas, and seeming to lose any sort of real inititative (game vs Proxy).

Bytekeeper on Monday, May 6. 2019:

The author left a config setting he used for local testing. This should have been prevented by the server (the TM) - but the API version he used has some bugs in that regard.

The bot was disabled immediately.
Mistakes happen.

MicroDK on Monday, May 6. 2019:

RedRum is a fork of SH and has show all information set to true in its config file. Unfortunately, the tournament manager on SSCAIT for BWAPI 4.1.2 has a bug that allows bots on this BWAPI version to set that setting to true.
On Discord, the author has been informed to change the config file and upload again. The author was testing something locally but forgot to turn the setting to false.

Johannes Holzfuß on Monday, May 6. 2019:

Thanks for clarifying that.

I was unaware of that bug and thought that the TM would prevent an accidentally enabled debug flag ("Tournament mode engaged"), so I assumed foul play was at work. I guess I shouldn't have.

Sorry. :/

Joseph Huang on Tuesday, May 7. 2019:

Will SSCAIT fix this? Seems like a pretty big deal.

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA