Overkill’s new learning 4 - the model and its features
What does it mean to have a linear model with binary features? “Linear” means that each feature comes with a number, its weight, so that with binary features you find Q(s,a) by adding up the weights for each feature that is present. Usually only a small proportion of all the features are present, so it’s not as crazy as it may sound.
Overkill gives its features long multi-part names, which it implements throughout as strings accessed via maps. (I was surprised to see that in a real-time program, but it’s probably easier.) The feature names are written out plainly in the I/O files. Here are a few scattered samples from the file feature_valueAiur, which lists 9638 features altogether:
action_battle_combine_state_battle_feature:enemyKeyBuilding_hasP_robotics_facility*hydraBuild:0.13396 action_battle_combine_state_battle_feature:enemyKeyBuilding_hasP_robotics_facility*mutaBuild:0.07588 action_battle_combine_state_battle_feature:enemyKeyBuilding_hasP_robotics_facility*zerglingBuild:0.06963 action_battle_combine_state_battle_feature:enemyKeyBuilding_hasP_stargate*hydraBuild:0.05439 action_battle_combine_state_battle_feature:enemyKeyBuilding_hasP_stargate*mutaBuild:0.10049 action_battle_combine_state_battle_feature:enemyKeyBuilding_hasP_stargate*zerglingBuild:0.26210 state_raw_combine_feature:enemyP_cannon_1*ourHydra_6:-0.21410 state_raw_combine_feature:enemyP_cannon_1*ourHydra_12:-0.43786 state_raw_combine_feature:enemyP_cannon_1*ourHydra_18:-0.08806 state_raw_combine_feature:enemyP_cannon_1*ourHydra_24:0.24174 state_raw_combine_feature:enemyP_cannon_1*ourHydra_36:0.42465 state_raw_combine_feature:enemyP_cannon_1*ourHydra_48:0.39939 state_raw_combine_feature:enemyP_cannon_1*ourHydra_60:0.52629 state_raw_combine_feature:enemyP_cannon_1*ourHydra_max:0.59403 state_tech_feature:ourKeyUpgrade_zerglingsAttackSpeed:2.33542 state_tech_feature:ourTechLevel_hatchery:2.28803 state_tech_feature:ourTechLevel_lair:0.25170 state_tech_feature:ourTechLevel_hive:1.48611
You can guess what the feature names mean: Enemy has 1 cannon and we have up to 6 hydralisks, for example. That’s how it got so many features!
Each opponent’s file seems to list a different number of features, probably leaving out features that never came up, so 9638 is not the total number of features. But there’s something here I don’t understand. 9638 is not divisible by 3. Each line gives one weight—shouldn’t there be 3 weights for each state, so that the 3 actions can all be evaluated?
Here’s the routine that calculates Q(s,a). Its arguments are reversed—it puts the action before the state.
double StrategyManager::calActionFeature(std::string curAction, std::map<std::string, std::map<std::string, int>>& features)
{
for (auto categoryStateFeature : features)
{
if (categoryStateFeature.first == "state_raw_combine_feature" || categoryStateFeature.first == "state_building_feature")
{
for (auto stateFeature : categoryStateFeature.second)
{
std::string combineFeatureName = stateFeature.first + "*" + curAction;
features["action_battle_combine_state_battle_feature"][combineFeatureName] = 1;
}
}
}
if (features["state_tech_feature"].find("ourKeyUpgrade_zerglingsAttackSpeed") != features["state_tech_feature"].end())
{
std::string combineFeatureName = std::string("ourKeyUpgrade_zerglingsAttackSpeed") + "*" + curAction;
features["action_battle_combine_state_battle_feature"][combineFeatureName] = 1;
}
double curQValue = 0;
for (auto categoryFeature : features)
{
for (auto curfeature : categoryFeature.second)
{
int curfeatureValue = curfeature.second;
if (parameterValue.find(categoryFeature.first) != parameterValue.end() && parameterValue[categoryFeature.first].find(curfeature.first) != parameterValue[categoryFeature.first].end())
{
double curParameterValue = parameterValue[categoryFeature.first][curfeature.first];
curQValue += curParameterValue * curfeatureValue;
}
}
}
return curQValue;
}
parameterValue holds the model. curAction is the action and the features map with its nested type is the state. Having read this, I still don’t understand. The action name is coded into some feature names and not others, which we see above as + curAction. The list of actions:
stateActions = {"zerglingBuild", "hydraBuild", "mutaBuild"};
Here’s the call, the bit of code which chooses the action with the highest Q value. (Below this is another bit where it changes the action if it feels like exploring.)
for (auto action : stateActions)
{
std::map<std::string, std::map<std::string, int>> actionFeatureValue = featureValue;
double curQValue = calActionFeature(action, actionFeatureValue);
if (curQValue > maxQValue)
{
maxQValue = curQValue;
maxAction = action;
maxFeatureValue = actionFeatureValue;
}
}
The call does nothing to differentiate actions. As far as I can tell, only the features which include the action in their names can be used to tell actions apart, and the other features are irrelevant constants that happen to be added in.
$ grep hydraBuild feature_valueAiur | wc -l
2176
$ grep mutaBuild feature_valueAiur | wc -l
2267
$ grep zerglingBuild feature_valueAiur | wc -l
2403
So 2176+2267+2403 = 6846 features out of 9638 encode the build name in the I/O file for AIUR. As far as I can tell, the other 2792 features are irrelevant. And those 2792 features include some that look important. Surely you want to pay attention to what upgrades you have when you choose which units to make!
The number of features is different for each action. That means two things. 1. The fact that the total number of features is not divisible by 3 is meaningless. 2. Not all actions have been explored in the different states. As expected, the games played against AIUR were not enough to fill in the model.
Either I’ve misunderstood something, or Overkill’s learning has flaws (I wouldn’t go so far as to say bugs, it is only a loss of effectiveness, not an error). Can anybody correct me? I’ll contact Sijia Xu.
Next: How it fits into the rest of the program.
Comments