performance differences on BASIL ladder

On the BASIL ladder, Steamhammer in early days performed poorly. But today Steamhammer ranks #8, ahead of KrasioP. If we skip the BASIL participants which are not in the SSCAI tournament (Krasi0 terran and ChimeraBot), that corresponds to place #6 on SSCAIT, just behind BananaBrain, as compared to Steamhammer’s actual tournament finish at #11. The performance corresponds in general to Steamhammer’s performance curve in AIIDE 2018, starting low and rising strongly, but seems even more dramatic.

Many bots have different rankings on BASIL compared to SSCAIT. Random bots are handicapped on BASIL by comparison, since the opponent knows the race ahead of time. There are other differences in rules, plus the environment can cause different reliability and possibly different behavior. For most bots, I think these differences should not matter much—though anything could happen for a bot with reliability problems. Am I wrong? Am I missing something that can make a big difference?

If I’m right, then the important difference is that BASIL plays more games, so learning bots learn more. Other than environment-specific bugs, I don’t know another way to explain big differences in rank, such as Killerbot by Marian Devecka being #19 on BASIL while it came in #7 in SSCAIT 2018: Killerbot is not a learning bot. Another difference (just to give a second example) is that BASIL ranks Ecgberht one step below Arrakhammer, rather than far below (SSCAIT #16 versus #33): Ecgberht is a learning bot.

Steamhammer has a surprisingly high crash rate on BASIL, over 6%. It doesn’t crash remotely that often on SSCAIT. I’ll have to look into that.

Trackbacks

No Trackbacks

Comments

Joseph Huang on Friday, January 18. 2019:

PW also reports higher crashes on BASIL.

Dan on Saturday, January 19. 2019:

I haven't observed any crashes (it's generally very hard to crash as a Java client), but there's a common bug I can't reproduce where static mineral/gas detection fails and PW fails to find base locations which aren't starting positions -- this accounts for a good percentage of PW's losses on BASIL.

Ecgberht on Saturday, January 19. 2019:

Current BASIL version of Ecgberht is an updated sscait version with a lot of bugfixes and improvements, thats why Its doing so well lately on BASIL.

I only really worked on the sscait update the last 2 days before the SSCAIT deadline so the amount of testing I did was not enough and I introduced a few undetected, critical bugs (crashes against the zealot rush bots for example, academy first with the 2 fac build, etc.).

I guess that without those few bugs I introduced Ecg would have been around 58~60% winrate instead of the 55-56% It got.

Even with the problems Im happy with Ecg performance this year :D

Jay Scott on Saturday, January 19. 2019:

Oh, some bots also have updates only on BASIL. That is another reason. Thanks.

MicroDK on Saturday, January 19. 2019:

McRave had a bug that only manifested itself on the BASIL ladder, not on SSCAIT, not locally and not locally using sc-docker. He found out he had pointers to objects that were deleted. So memory handling of the local OS / computer can affect how bots run.

Bruce on Sunday, January 20. 2019:

The McRave bug could be reproduced locally on sc-docker. I think the only known Basil-only thing right now is PW’s static mineral thing mentioned above.

Marian on Saturday, January 19. 2019:

I would guess my low ranking is because of this bug:
http://www.openbw.com/replay-viewer/?rep=http://basilicum.bytekeeper.org/bots/JumpyDoggoBot/JumpyDoggoBot%20vs%20Marian%20Devecka%20Fighting%20Spirit%20CTR_E5F9535.rep
I have only seen it in ZvZ and it might affect basil more than sscait - I have to investigate more...

Bytekeeper on Saturday, January 19. 2019:

I did a quick analysis:
113 of the 116 marked crashes are caused by either of the 2 bot docker containers failing to start (or crash so horribly that no result remains).
SC_DOCKER (even my fork) doesn't announce which bot failed, if one of them crashes immediatly (or manages to crash its docker container). BASIL cannot determine the cause and will add a "crash" for both bots.
This might help bot authors to find real crashes. I also reminds me to fix this problem at some point.
Everytime a bot is marked as crashed, its log files will be saved - yours are here: http://basilicum.bytekeeper.org/bots/Steamhammer/logs/

I checked a few logs and couldn't find a problem.
But I found "more work": The log files show a game was played but still one container crashed. This shouldn't happen, I'll have to investigate it further.
I'm glad it doesn't happen too often.

PS: Those kind of crashes are not counted as losses, and the ELO rating will not be updated in that case.

Jay Scott on Saturday, January 19. 2019:

Maybe you can mark those games differently, not “crash” but “incomplete” or something.

Bytekeeper on Sunday, January 20. 2019:

I guess I could, but I'd rather "fix" it, ie. assign the crash correctly or fix a potential bug in sc-docker.

A small update on my analysis: I could reproduce the problem playing Steamhammer vs Flash a few times. It only seems to happen if Steamhammer wins against Flash.
"Sometimes" it seems to hang in a busy loop after it won. After 70 seconds sc-docker "detects" this as crashed game.

Jay Scott on Sunday, January 20. 2019:

Yes, ideally fix everything. I expect it’’s hard to fix all infrastructure bugs, though.

Bytekeeper on Thursday, February 14. 2019:

Some Steamhammer based bots seem to run into a busy loop when they won. I observed that the game is actually over it just hangs with 100% CPU.

I can't always reproduce it. But I also can't reproduce it with other bots at all.

Is there a way it could run into an endless loop? Maybe due to some file permission restriction within sc-docker/linux?

Jay Scott on Thursday, February 14. 2019:

Steamhammer’s behavior when it surrenders is: It prints “gg”, then wait 35 frames for the message to be read before ending the game with leaveGame(). It issues no actions during those frames and does no computation beyond recognizing that it has surrendered; the frames are completely empty. After leaving, onEnd() writes the updated opponent model and it’s done.

So any problems have to be due to either the wait period or to writing the opponent model. To me they seem equally likely suspects.

Jay Scott on Thursday, February 14. 2019:

Also, for Steamhammer-based bots which retain the separate configuration file and haven’t messed with it too much, the opponent model can be turned off in configuration. Set WriteOpponentModel to false in the JSON. If you can still reproduce the problem after that, then the wait period is somehow causing a problem.

Bytekeeper on Thursday, February 14. 2019:

Ok. But the problem occurs only if Steamhammer wins. I saw that in onEnd the opponent file is written. But I doubt there's "busy loop" hidden somewhere.
I believe onFrame might be called after onEnd, maybe something is hidden there.
This game was lost by Assberht, which also acknowledge it:
http://basilicum.bytekeeper.org/bots/Steamhammer

SC_Docker says "One lingering container has been found after single container timeout (70 sec), the game probably crashed"

I also got a similar ones for Locutus, and Randomhammer. My log config is a bit "shallow" - so this is in the span of 1-2 days.

Nevertheless, everytime a lingering bot was found, it was a Steamhammer based bot. That doesn't mean there's a problem in there per se, but it makes it very hard for me to analyze further.

Bytekeeper on Thursday, February 14. 2019:

The author of Locutus confirms it for his local tests (docker). He tried to debug it but couldn't find the reason.
He thinks the VMs other ladders/tourneys use ignore the hanging and just shut down the VM. Maybe that's why it doesn't affect them.

Jay Scott on Thursday, February 14. 2019:

Hmm, I will look into it as I have time and see if I turn anything up. Winning is simpler than losing (strange as it is!) and I don’t know what problem it could cause.

Jay Scott on Thursday, February 14. 2019:

There is one other check you could make: Does it happen for UAlbertaBot = Dave Churchill, Steamhammer’s parent? There are UAlbertaBot forks which are outside the Steamhammer family, like Wuli and Flash, so I’m guessing you would have seen if they had the same problem, but I would like to be sure. That should narrow it down.

Bytekeeper on Thursday, February 14. 2019:

I've seen Microwave as well. I will increase the logs being kept and check for the bots you mentioned in a few days. That should be enough time that at least one of them might fail.

Bruce Nielsen is also trying to debug again, thanks for that!

Jay Scott on Thursday, February 14. 2019:

Hmm, Microwave has independently written I/O. It is from long before Steamhammer’s opponent model. Maybe it’s something totally unexpected....

Bytekeeper on Thursday, February 14. 2019:

I will observe as promised. But I can already tell it's highly unlikely David Churchill is affected:
http://basilicum.bytekeeper.org/bots/Dave%20Churchill/logs/?C=M;O=D

The first 2 are too large yes, but the other crashes are pretty "old" and it seems mostly game timeouts. But since some logs are not uploaded because they had too much output there is a chance for them to have the issue.

Jay Scott on Thursday, February 14. 2019:

I found a couple trivial issues with surrendering that could conceivably cause trouble, but so far nothing with winning. I’ll look more.

Bruce on Friday, February 15. 2019:

It is a known issue (MicroDK mentioned it on here sometime last year), but as it only seems to matter on sc-docker it hasn’t really mattered. In my local testing setup I added logic to detect the hanging container and kill it.

For debugging it I added logging to all the handlers. The bot correctly processed the onEnd handler and onFrame is not called again afterwards. I’ve also tested with all file i/o disabled except loading the configuration file and it still happens, so I don’t think it is related to the actual end-of-game logic. It’s more likely to be some hanging resource preventing clean shutdown, though with file i/o disabled I’m not sure what that would be.

I’m going to try to see if I can get a container to stay hanging for long enough to shell into it and see what I can find there.

Jay Scott on Friday, February 15. 2019:

A hanging resource... that makes me think of the timer manager.

Jay Scott on Friday, February 15. 2019:

No, after looking into how the timers work, I don’t think that can be it.

Bruce on Saturday, February 16. 2019:

I think I’ve tracked it down: the destructor of Squad calls clear(), which will call into WorkerManager if there is a worker in the squad. This is potentially problematic, as WorkerManager itself might already be destructed at that point.

I made a quick hack that exits the destructor immediately if the game is over (just checking a global bool that is set in onEnd), and it seems to be working: no issues after running about 40 test games locally. I just uploaded it to sscait, so we’ll see how it goes when basil is updated.

Jay Scott on Saturday, February 16. 2019:

Great work! In principle the error could affect other modules. One fix would be to not write destructors for data structures whose scope is the whole game, which involves making sure all uses really are permanent. That’s not far from how things stand already. Another would be to unmake some data structures earlier, maybe in onEnd().

Jay Scott on Saturday, February 16. 2019:

I implemented the fix of clearing all squad data in onEnd(). It’s a little more involved but cleaner.

MicroDK on Saturday, February 16. 2019:

UAlbertaBot does the same thing so it should also be affected: https://github.com/davechurchill/ualbertabot/blob/master/UAlbertaBot/Source/Squad.cpp#L305

Jay Scott on Saturday, February 16. 2019:

Yes, it is the original behavior. Whether it causes a problem in practice depends on fine details of the unwinding process. :-/

MicroDK on Sunday, January 20. 2019:

Microwave does the same thing. I can only see 3 real crashes in its logs.

krasi0 on Wednesday, January 23. 2019:

Another difference in the ranking (at least during the SSCAIT competition) could be explained by SSCAIT using win % to rank players and BASIL using ELO.

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA