Jump to content

AI cheats! (with real data)


Recommended Posts

Test is a six-lane gunnery range (tall pines between lanes). I have six StugIIIG's vs. Six T34/85. All with regular crews. Range is about 730 meters.

I played six times as Axis and six times as allies. Played until one side or another had all vehicles destroyed. Whichever side I was playing I would manually target the enemy vehicle on the first turn.

Found the following aggregate data.

As the Allies I lost 30 T34/85 and the axis lost 8 stugIII. (numbers don't add up to 36 because of occasional double kills).

As the Axis I killed 17 T34/85 and lost 20 StugIII's.

The P value using a chi square is 0.003

ALthough I did not follow this rigourously, it appeared that the AI's first shot had a higher hit % than the human first shot, and that this may be the cause of the difference.

Let the discussion begin!

Link to comment
Share on other sites

  • Replies 250
  • Created
  • Last Reply

Top Posters In This Topic

Guest Sgt. Emren

Warren,

Like you yourself point out, a higher hit-% from cheating may be the cause of your result. But, there are a lot of variables in each shot computation. Every shot fired is not exactly identical to the previous shot, and every hit is not identical.

Thus, you cannot base your conclusion on a single factor - the AI cheats - using this particular method.

Link to comment
Share on other sites

Hmmmm

I just tossed a coin 6 times, and it came up heads 4 times.

I then tossed it 6 times again, and heads came up 5 times.

Is the head side cheating? ;)

I agree with CMPlayer...your sample is too small. 50 times from each side would be a better test IMO.

Harv

Link to comment
Share on other sites

hummm, this one bothers me.

OK for the P factor, I have some stats background, no need to have 100 tanks firing at each other 100 times :eek: !

But I don't think it's due to AI cheating/increased % ... The idea I have is that indeed you tried to optimize Hit/Kill chances when playing, maybe forgetting that MOST important factr in this case is shooting FIRST - I think the Hit% are pretty high even on first shot at this distance.

The AI does that perfectly, but us poor humans are given no indication of swivel/aim time...

What do you think of this Warren ?

Link to comment
Share on other sites

Guest Sgt. Emren

The sample size is fine. 6 lanes played out 12 times equals 72 duels.

The problem is that there are a number of variables involved in every shot. The first shot will be different every time.

Link to comment
Share on other sites

This pseudo-science gibberish has more holes in it than a Tetley tea bag. First of all his study doesn't have any blinds, he hasn't identified any independent variables, and it doesn't replicate easily.

Anybody can do stats, but only the pros can come up with bullet-proof research models.

This research has as many holes in it as a T-37.

Link to comment
Share on other sites

Warren Peace wrote:

The P value using a chi square is 0.003
P VALUE! Good lard Son, most folk here barely understand how to spell "lose". Or is it loose? And a chi square of 0.003?!? What the devil is that? Look, get them tanks in a hull down position and stop fretting over what shell bounces off of which turrent. Else, your gonna loose to a feller who knows how to hunt hull down instinctively without using no T-square.

[ October 30, 2002, 11:54 AM: Message edited by: Bruno Weiss ]

Link to comment
Share on other sites

Originally posted by Sunflower Farm Boy:

This pseudo-science gibberish has more holes in it than a Tetley tea bag. First of all his study doesn't have any blinds, he hasn't identified any independent variables, and it doesn't replicate easily.

Anybody can do stats, but only the pros can come up with bullet-proof research models.

This research has as many holes in it as a T-37.

Yeah, typical troll answer : you don't have any explanation of why the tests gave these results, so you bash the tester out of "methodological" reasons...

Tell me how would you test game fairness better than pitting the same situation from the two sides, Mr Professional ??

I don't think Warren's conclusion is right, now we have to find out WHAT happened, not bash him with such "pseudo-science gibberish" you seem to master so well :mad:

Link to comment
Share on other sites

Giving this a little more (serious) thought...

As it is possible to use the same equipment on both sides now, a matchup between like units would remove another variable from the tests.

I'm no statistician, but variables are bad...right?

Has anyone pitted T-34s against captured T-34s? Panther vs Panther? Mk IV vs Mk IV?

Link to comment
Share on other sites

Harv & CM Player, you guys really don't understand statistics. Warren Peace may have failed somehow to set up the experiment in exactly the right way, but the results he obtained would only be obtained by sheer chance with two equal foes three times in 1000. The only reasonable conclusion is that under his experimental conditions these were not two equal foes.

I'm befuddled by a lot of things, I'll admit. But I did, at one point,

teach statistics to graduate students at UC Berkeley and spent many years in various universities doing experimental research. Warren Peace is correct about his interpretation of the p value.

As for Sunflower Farm Boy's comments... "no blind"? "no independent variables"? I have to hope you are engaging in levity. I'll assume you are until further evidence. Warren Peace's basic experimental design was fine.

The whole process of Normal Science can now go forward. Warren Peace's informed detractors will first attempt to find flaws in the experimental design. They will second argue for alternative explanations of his data. Finally, they will attempt to modify the theory his data threatens so that it can accomodate the new findings. Finally, when all of this fails, they will adopt the stance of, "Yes. Of course Warren Peace's experimental results are correct. Everyone knows that. Only a fool such as yourself would even raise this as an issue. Don't waste my time." His uninformed detractors will engage in ad hominem attacks on Warren Peace and anyone who seems to support him.

Ahhh... Normal Science. What a life!

-- Lt. Kije

"Finally! Something I know something about!"

Link to comment
Share on other sites

Guest Sgt. Emren

Look, to do this right, you'd need to define more clearly what is being examined. I don't believe it's very sensible to measure the number of kills. It makes more sense to measure chance-to-hit-%. Anything beyond that, there are WAAY too many variables to say anything meaningful in a statistical sense.

And, yeah, just start loking for hull-down positions instead!! ;)

Link to comment
Share on other sites

The AI does that perfectly, but us poor humans are given no indication of swivel/aim time...
Perhaps Pasco has given us the clue. By manually targetting, perhaps you are not always targetting the opponent who requires the smallest time for turret rotation, or perhaps you are retargetting to a different tank one the TacAI has already chose, causing an additional crucial delay.

I am not discounting your research, but BFC have repeatedly said the AI does not cheat, so until proven it must be assumed the answer lies somewhere else.

Link to comment
Share on other sites

Lt. Kije, you are correct in that I know very little about stats. My first post was mostly in jest, but isn't a larger test more significant all the same?

This does intrigue me though, and I am always willing to learn more about these types of things without going back to school. If the discussion/testing continues, can some of this be explained in layman's terms as we go?

Oh btw, and aren't there 3 kinds of lies...lies, damn lies, and statistics? ;)

Link to comment
Share on other sites

People who don't know what a P value is should read this page, specifically chapter 8. The P value derived from a chi square test specifically refutes arguements that this could have occured from chance. The idea behind chi square is to figure out how unlikely a certain event is. People who are arguing with this methodology should also sugest their own methodology to test the same thing. Its easy to say you don't like a study, much harder to come up with a concrete idea of how to improve that study.

Warren Peace --

As has been sugested, can you do a similar analysis for a human vs. human hotseat game? I'd be very interested to see it as that would provide a control.

Can you also do a similar analysis where both sides are left completely alone, no targeting orders given? This would be just straight tac AI vs. tac AI and would also prove interesting.

Can you provide a more detailed description of how you get your Chi Square value? I get a different one, still significant, but less so.

Here is how I'm doing my calculations, based on the data you've given us.

Chi Square = Sum of all ((Observed - Expected)^2 / Expected)

Russian Expected = (30 + 17) / 2 = 23.5

German Expected = (8 + 20) / 2 = 14

So we get the Chi Square for the AI's losses by:

Chi Square = ((8 - 14) ^2 / 14) + )(17 - 23.5) ^2 / 23.5) = 4.369

Now to get the Chi Square for the Human's losses:

Chi Square = ((20 - 14) ^2 / 14) + (30 - 23.5) ^2 / 23.5) = 4.369

Now we can look up those Chi Square values to get the P. You can use this calculator , but I just used the ChiDist function in Excell. The Degrees of Freedom are the number of terms in the sum you used to get the chi square, minus one. So there is only one degree of freedom.

That gives a P value of .0365, different from the number Warren Peace got, but still signifigant. Scientists generally use .05 as a cut off point between meaningful and non-meaningful data.

If we want to get the P value down more, or up more for that matter if this really is an aberation then we can run more tests in the same style. Its easy to do these tests, and the math is pretty easy as well, so why don't the doubters run their own experiments. When I get a chance I know I will.

--Chris

Link to comment
Share on other sites

Originally posted by Panzer Leader:

</font><blockquote>quote:</font><hr /> The AI does that perfectly, but us poor humans are given no indication of swivel/aim time...

Perhaps Pasco has given us the clue. By manually targetting, perhaps you are not always targetting the opponent who requires the smallest time for turret rotation, or perhaps you are retargetting to a different tank one the TacAI has already chose, causing an additional crucial delay.

I am not discounting your research, but BFC have repeatedly said the AI does not cheat, so until proven it must be assumed the answer lies somewhere else.</font>

Link to comment
Share on other sites

Originally posted by Warren Peace:

To CM player:

The P value says that these results would be observed 3/1000 times by chance. For most lines of research this is considered a quite significant value. What number do you think is significant? Do you have any background in statistics?

Warren

I do (have a background in statistics). Your sample size is way too small. Rule of thumb is at least 30 repetitions for something like this.

In any case, as others have pointed out, the real sources of error creeping into your experiment are due to the other variables not being set as constants (the only variable changing should ideally be the one you're testing). In short, your assumptions are flawed. For example, were the starting positions of the AFVs identical in each run?

[ October 30, 2002, 12:47 PM: Message edited by: Mannheim Tanker ]

Link to comment
Share on other sites

Guest Sgt. Emren

Harv,

If my memory serves me: Sample size, following a Normal distribution, you generally need a sample above 30. It depends a bit on what book you read, but usually 30-50 is sufficient. For a Chi-Square distribution, it follows the same set of assumptions as the Normal distribution.

Warren's sample is 72 duels, so it's quite big enough to base conclusions. The p-value signifies the level at which you must reject the Null-hypthesis (are the samples identical). In other words, a p-value of 0,003 indicates that there's 99,7% chance that the two set of tests are really different. Why they are different is another matter.

Link to comment
Share on other sites

Originally posted by Sgt. Emren:

Warren's sample is 72 duels, so it's quite big enough to base conclusions.

Is his sample unit each duel, or each set of duels? The way I'm reading it, his n=6, since there are so many dependencies (covariance) built into the experimental design. For example, the results of each duel is dependent on the results of the duel preceding it. If by the end of the replication you have 20 T-34s vs. 1 Stug, well then you will have a greater chance of some T-34 plugging the Stug while it's reloading.

Therefore, I believe you need to look at each battle as an independent sample.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.

×
×
  • Create New...