Jump to content

AI cheats! (with real data)


Recommended Posts

  • Replies 250
  • Created
  • Last Reply

Top Posters In This Topic

Originally posted by Maastrictian:

That gives a P value of .0365, different from the number Warren Peace got, but still signifigant. Scientists generally use .05 as a cut off point between meaningful and non-meaningful data.

Great analysis! I believe you're confusing level of significance (commonly accepted as 0.05) with the probability of a Type I error (p-value). The commonly accepted value for significance in the p-value is 0.001 IIRC. It's easy to confuse the two, even if you deal with statistics on a somewhat regular basis.

Hehe...I'm sure someone will correct me if I'm wrong. redface.gif

Link to comment
Share on other sites

This must be the BEST all around, ALL PRO, big time THREAD for the Statistics Grogs amongst us. :D

I know NOTHING about stats but it is fun to read the comments, just spare us the math equations please ;) .

Maybe some more testing needs to be done and some more results collected for further Statistical analysis :confused:

-tom w

[ October 30, 2002, 12:59 PM: Message edited by: aka_tom_w ]

Link to comment
Share on other sites

Originally posted by Warren Peace:

Test is a six-lane gunnery range (tall pines between lanes). I have six StugIIIG's vs. Six T34/85. All with regular crews. Range is about 730 meters.

I'm assuming by lanes Warren means that a t34 would only be able to shoot at one StugIII. That would eliminate turrent rotation as a variable assuming that the tanks started facing each other.

Were the tanks immoblized? That would keep them from moving and changing the Hit %.

ALthough I did not follow this rigourously, it appeared that the AI's first shot had a higher hit % than the human first shot, and that this may be the cause of the difference.

Let the discussion begin!

The thing to do would be to set the test up so that one side had one shot of ammo. Then measure the number of 1st shot hits. The other side would just be targets with no ammo.
Link to comment
Share on other sites

No statistician here so I'll let the numbers grogs fight that battle, however I do believe the culprit here is the manual targeting bit. If Warren, or someone else (OK, I'll admit I'm too lazy) could run this same test sans input on the targeting, I'd like to see the results.

Link to comment
Share on other sites

Guest Sgt. Emren

Mannheim,

Warren mentions lanes of pines. So I just assumed that each pair of tanks were cut off from the next pair. If that is wrong, then I agree that the sample is too small. Otherwise I see it as six independent duels each time, played out twelve times. 36 samples from each side.

[ October 30, 2002, 01:02 PM: Message edited by: Sgt. Emren ]

Link to comment
Share on other sites

Originally posted by jgdpzr:

No statistician here so I'll let the numbers grogs fight that battle, however I do believe the culprit here is the manual targeting bit. If Warren, or someone else (OK, I'll admit I'm too lazy) could run this same test sans input on the targeting, I'd like to see the results.

I agree. This gets down to the problem that several others and I alluded to. There are too many variables being tweaked at once. If you keep the TacAI in charge of "calling the shots", you'll at least rule out that possibility.
Link to comment
Share on other sites

Originally posted by Sgt. Emren:

Mannheim,

Warren mentions lanes of pines. So I just assumed that each pair of tanks were cut off from the next pair. If that is wrong, then I agree that the sample is too small. Otherwise I see it as six independent duels each time, played out twelve times. 36 samples from each side.

That could be, now that I reread his setup. This still doesn't answer the TacAI issue, which I now feel to be the culprit even more than before.
Link to comment
Share on other sites

Is it possible that Global Morale is affecting the outcome? Is it more accurate use 72 single 1-on-1 tank duels, where Global Morale is not an issue, or should you do 6 12-on-12 battles, where the losing side's performance should decrease (snowball, really) as they start to fall behind in the duels? Does Global Morale affect Tank AT performance?

Link to comment
Share on other sites

Originally posted by Maastrictian:

Each duel is seperate because he says: "Test is a six-lane gunnery range (tall pines between lanes)"

So each lane can't see the enemy tanks in the other lanes.

--Chris

But he also says:

Played until one side or another had all vehicles destroyed.
This can only be accomplished if all vehicles have LOS to all other vehicles. A little clarification would help.
Link to comment
Share on other sites

Originally posted by Mannheim Tanker:

</font><blockquote>quote:</font><hr />Originally posted by Maastrictian:

That gives a P value of .0365, different from the number Warren Peace got, but still signifigant. Scientists generally use .05 as a cut off point between meaningful and non-meaningful data.

Great analysis! I believe you're confusing level of significance (commonly accepted as 0.05) with the probability of a Type I error (p-value). The commonly accepted value for significance in the p-value is 0.001 IIRC. It's easy to confuse the two, even if you deal with statistics on a somewhat regular basis.

</font>

Link to comment
Share on other sites

AcePilot -- it's possible, if he allowed vehicles to move from lane to lane after destroying their opponents in their own lanes. Of course, this completely annihilates the "independent trial" assumption and drops his number of tests WAY below the recommended 30. It would also mean that he'd be estimating the hypothesis (the ratio of kills seen by, say, playing one side) on just six observations...

Link to comment
Share on other sites

Damn you all. Caught me peeking during my lunch time...

Create a map that use cliffside elevations to construct as many distinct firing "lanes" as you like. No visibility outside the firing lane possible. Scratch one confounding variable.

Place the "test subject" armor, immobilized at a certain location in each of the lanes. Obviously, a firing line setup would be the best. Outfit each test subject with the same vehicle, same crew experience, and a single shot of the same AP.

Line up your target vehicles, immobilized, at a given distance from the firing line down each of the firing lanes.

Each lane becomes a distinct trial given the same vehicles, same terrain (none is the ideal), same ammo.

Shoot. Tally. Rinse. Repeat.

One set of trials (maybe 5 runs of 6 shooters at a time) letting the AI target the shooters. Second set of trials exactly the same except that you manually target the shooter.

Then compare your results. IIRC, it's a Student's T or sumfink like that to test the means of two test groups against each other.

If you want to, reverse the roles of the target and shooter armor units and repeat the whole process again.

Link to comment
Share on other sites

Originally posted by Mud:

AcePilot -- it's possible, if he allowed vehicles to move from lane to lane after destroying their opponents in their own lanes. Of course, this completely annihilates the "independent trial" assumption and drops his number of tests WAY below the recommended 30. It would also mean that he'd be estimating the hypothesis (the ratio of kills seen by, say, playing one side) on just six observations...

Good point - yet another possible explanation. Warren - can you clear up this point for us? Thanks.
Link to comment
Share on other sites

Guest Sgt. Emren

To limit the effect of global morale, place an appropriate number of units far away from the shooting lanes, for both sides. Morale seems to be accumulative, so by increasing the number of units in the setup, the effect of loosing a bunch of tanks should hardly affect the morale of the remaining firing tanks.

Link to comment
Share on other sites

Not to be a wet blanket, but I don't see how much we are going to get out of this without detailed knowledge of how the CM shooting algorithm works.

That said, Warren's test appears to be an interesting preliminary result that would suggest some further investigation. We should not, however, begin to make any sort of "The AI Cheats!!!" conclusion from his statistics. First off, since this was not a perfectly controlled experiment, I would look at the results in an econometric light.

From that angle, it seems pretty likely that we have a lot of unobserved forces at work here besides our observed explanatory variable. Luckily, since this is a computer simulation, we can do our own.

So, I would suggest that Warren do a Monte Carlo study, varying all of the possible variables that could affect the outcome. Which brings me back to my original point: how could we possilbe know all of the variables that contribute to the outcome of the experiment if we don't have to program's code sitting there? Maybe if we sat there with Steve and Charles all day running experiments we could figure this out.

Until then, a strong correlation between these variables should be seen as just that and nothing more!

Edited a second time cause I am retarded:

Edited cause I saw Herr Oberst's post: There you go, he has set up the simulation conditions for independent testing. I am interested in the results!

[ October 30, 2002, 01:47 PM: Message edited by: Lumbergh ]

Link to comment
Share on other sites

Herr Oberst has the right idea on how to set up the map. It should eliminate most of the variables (targeting, global morale, etc) that we are complaining about.

Seeing we are only talking about 1st shot hits does it matter what the target vehicle is?

I can't watch TV tonight so I'll try to set it up and see how it goes.

Link to comment
Share on other sites

Ok, I had a few minutes to kill so I did my own test. I used StuG III G(late) vs T-34/85 1944 model.

The map was essentially all tall pines at elevation 12, with lanes one tile wide running east-west. My test had 10 lanes. The lanes were separated by 100m of tall pines. I also put a tile of tall pines at the edge of each lane to keep vehicles from escaping off the map, and put a single elevation 0 tile one tile into each lane from each end to keep the vehicles from advancing. This limited each vehicle to a single 20m x 20m tile. No vehicle could see any other vehicle not in its lane.

For the curious, range between each pair was 757 meters; the stated hit chance for the StuG (to hit the T-34) was 43%, and the hit chance for the T-34 was 38%.

I ran the test scenario 10 times, 5 "playing" as Axis, and 5 as Allied. I did _not_ issue any targetting orders; I let the AI do that. Each test consisted of a single CM turn. After the test, I counted "kills" as a tank that was either knocked out/abandoned, in the process of bailing out, or otherwise soon to be knocked out (e.g., vehicle morale broken or gun damaged, with the opposing vehicle still manned and able to fire).

Results:

(Human as Axis)

Trial StuGs killed T-34s killed

1 5 5

2 6 4

3 6 5

4 2 9

5 4 8

Total 23 31

(Human as Allied)

Trial StuGs killed T-34s killed

6 3 7

7 3 7

8 6 6

9 4 6

10 3 8

Total 19 34

My hasty calculations give a probability of 0.389. That is, there is not a significant deviation from chance (of the side the human plays affecting the outcome).

One very interesting observation: I did not gather statistics on this (yet), but my strong impression is that the AI side fired first in virtually all engagements. That is, in the "human as allied" trials, the StuGs got the first shot, and in the "human as axis" trials, the T-34s got the first shot. Given the relatively high hit probability and lethality at the test ranges, a discrepancy here could very well have a significant effect. I suggest that to gather data on this, you view the firing range from above using view 9 and watch the smoke plumes. Again, my impression is that most or all of the AI tanks fired before any "human" tanks fired, regardless of nationality.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.

×
×
  • Create New...