Jump to content

AI cheats! (with real data)


Recommended Posts

1. We should be careful to clearly state just what it is we're measuring. Some people are measuring hit percentage, others kills, and still others who fires first. Personally, I don't believe that there is any difference between AI and human performance in the hit percentage and kill percentage areas. The area I think might (might) be skewed is in how long it takes to get a shot off. However, initial impressions are that a human using manual targetting gets similar performance to the OpAI (which presumably performs manual targetting as part of its orders generation). More data is needed to really do a valid test of this.

2. Random (or more properly "pseudo-random" numbers. The built-in random number generation functions that are provided in the standard C library (and presumably others) are (or at least were) notoriously non-random. It is also the case that many if not most pseudorandom number generation algorithms given in textbooks suffer from similar problems. (A good way to test a random number generator is to generate lots (and I mean lots, like a million or two) of random numbers and plot the results in a histogram. You should see a very even distribution. If there are any spikes in the histogram, you have a problem.) There was an article in the Communications of the ACM several years back that discussed common problems and presented a nice, easily-implementable algorithm to generate "good" random numbers using any seed. (One feature of problematic generators is that similar seeds generate similar strings of "random" numbers.) I used to have a copy of the article but don't think I do anymore; however, I do have an implementation of the algorithm that passed the tests nicely.

Link to comment
Share on other sites

  • Replies 250
  • Created
  • Last Reply

Top Posters In This Topic

Originally posted by L.Tankersley:

1. We should be careful to clearly state just what it is we're measuring. Some people are measuring hit percentage, others kills, and still others who fires first. Personally, I don't believe that there is any difference between AI and human performance in the hit percentage and kill percentage areas. The area I think might (might) be skewed is in how long it takes to get a shot off. However, initial impressions are that a human using manual targetting gets similar performance to the OpAI (which presumably performs manual targetting as part of its orders generation). More data is needed to really do a valid test of this.

2. Random (or more properly "pseudo-random" numbers. The built-in random number generation functions that are provided in the standard C library (and presumably others) are (or at least were) notoriously non-random. It is also the case that many if not most pseudorandom number generation algorithms given in textbooks suffer from similar problems. (A good way to test a random number generator is to generate lots (and I mean lots, like a million or two) of random numbers and plot the results in a histogram. You should see a very even distribution. If there are any spikes in the histogram, you have a problem.) There was an article in the Communications of the ACM several years back that discussed common problems and presented a nice, easily-implementable algorithm to generate "good" random numbers using any seed. (One feature of problematic generators is that similar seeds generate similar strings of "random" numbers.) I used to have a copy of the article but don't think I do anymore; however, I do have an implementation of the algorithm that passed the tests nicely.

I agree we should know what we're testing smile.gif The first post was about 1st shot "enhancements" so I was testing to disqualify that.

Right now I wish I had the "shot-time" results for the 100 shots. I can promise you, however, that by T=5 ALL the vehicles had fired whether human or computer controlled. If I get ambitious, I'll redo those test with the time taken into account.

Maybe that's what I'll do tomorrow since I'd like to lay this all to rest smile.gif

Link to comment
Share on other sites

I have read the entire thread. I will test this 1,000 times. Empirical data is what we need. No fancy stuff. Let this blue collar geek handle this once and for all with brute force. smile.gif

I will devise my own test because that is the only fun part of this thing. Since I've read the whole thread, I'm aware of the pitfalls in setting up the test.

Before I run the actual time consuming test I will describe it in detail right here so people can point out any flaws in it that I may not have thought of.

Great thread, eggheads!! We shall get to the bottom of this once and for all.

Treeburst155 out.

Link to comment
Share on other sites

It seems that no one is taking up my sugestion of everyone testing on the same scenario. That's fine, I forgive you all ;)

But, I will request the following. When you post data please post it in as detailed a fashion as possible and give us the following:

1) Results of each trial

2) FOW setting used

3) Targeting orders including any cover arcs

4) Details of how you came to any P values or Chi Squre values, if you provide them.

5) If the tests are independant (seperate firing lanes, no movement)

6) If you eliminated special rounds (non HE / AP)

7) What you think you are testing smile.gif

8) What vehicles are being tested.

9) A link to the scenario.

Wow, I'm sure I've forgotten something.

I'm happy to provide hosting space for scenarios if people don't have them.

[Edit: Cameroon provided a link and sent me his scenario. I just completely missed it till now, sorry! Here is another link to his scenario, and another link to mine for anyone who wishes to replicate either test.

Maastrictian

Cameroon

]

I never mentioned what FOW I was using in my test above, it was extreme.

I strongly agree that there could be a random number problem. Any problem that there may be could crop up in any number of situations, so testing first shot percentages, time to fire percentages, and kill percentages are all valuable. Just keep in mind that because someone has ruled out one of those (and I don't currently think we have enough data to rule out any of those yet) does not mean the others may not be broken.

Ok, I'm going to bed. I will get back to this when I can, hopefully tomorrow night.

Thanks to all who are doing tests.

--Chris

[ October 31, 2002, 12:06 AM: Message edited by: Maastrictian ]

Link to comment
Share on other sites

Interesting....a Soviet T-34 (1941 model) does not penetrate armor as well as a CAPTURED Soviet T-34 (1941 model) according to the charts. Everything else in the info screen is exactly the same. Just the penetration charts are different. Oh well, not to get sidetracked. BTW, I've very carefully avoided the cast turret T-34 since there is no captured units of this type available to the Germans. Continuing with the test setup.

Treeburst155 out.

Link to comment
Share on other sites

Originally posted by Treeburst155:

Interesting....a Soviet T-34 (1941 model) does not penetrate armor as well as a CAPTURED Soviet T-34 (1941 model) according to the charts. Everything else in the info screen is exactly the same. Just the penetration charts are different. Oh well, not to get sidetracked. BTW, I've very carefully avoided the cast turret T-34 since there is no captured units of this type available to the Germans. Continuing with the test setup.

Treeburst155 out.

Better German Ammo? I imagine that's the reason right there. smile.gif
Link to comment
Share on other sites

Chris,

Here is my data using your range. Extreme FOG. Used N key to target each lane.

Human as Russians

26 T34 vs. 21 Stug

34 T34 vs. 14 Stug

29 T34 vs. 14 Stug

Human as Jerry

34 T34 vs. 15 Stug

36 T34 vs. 11 Stug

34 T34 vs. 14 Stug

Also did test without Targeting.

Human as Russians

31 T34 vs. 14 Stug

32 T34 vs. 12 Stug

27 T34 vs. 18 Stug

Human as Jerry

31 T34 vs. 14

26 T34 vs 19

26 T34 vs 20

Based on these numbers I do not see an AI advantage like I did in my scenario. I did play my scenario three times from each side and got the following.

As Russians

6 T34 vs. 0

5 T34 vs 2

4 T34 vs 2

As Germans

2 T34 vs 4

4 T34 vs 2

5 T34 vs 1

Clearly these results are not statistically significant, but again I see a trend that the AI does better then the human.

WHy might this be true in my scenario and not yours? I see two differences. First, I did not control for global morale. Second my tanks are free to move around.

Unfortunately I too must go to bed. I will try to get more data from my scenario tomarrow.

Link to comment
Share on other sites

The question I will attempt to answer:

Will the AI achieve a first round HIT more often than I do when we both play the same side in the exact same situation, AND I manually target the enemy?

The Test:

20 isolated firing lanes

two more isolated areas filled with 2,500 pts. of troops for each side to cut down on possible morale issues.

Firing vehicle is captured T-34 (1941 model) loaded with one AP round only and regular German crews.

Target vehicle is Russian T-34 (1941 model) with ZERO ammo and regular crews, facing 180 degrees away from the Germans (showing their rear) and buttoned up.

Target and firing vehicle are limited to 20 meter square by water to front and rear, and woods to the sides.

Range is 742 meters.

Fog of War = NONE

I will play the firing Germans 50 times, giving 1000 isolated tests. I will record the HITS achieved.

I will then play the exact same turn from the Russian "target" side 50 times, giving no orders, and record the AIs HITS against the targets. The Tac AI will have full control of my Russian units.

Note: By facing the Russian target vehicles away from the firing units, and buttoning them up, I'm hoping the Russian strat AI does not "see" the German vehicles and therefore won't give any orders while it is "thinking". IOW, I want the strat AI to play the Russian side exactly as I do.

EDIT: With 20 firing lanes it will be difficult to count the hits (detailed armor hits). I must either count them, or look for evidence of a hit after the mad minute. Does a vehicle have to be hit to cause the crew to bail, or become broken or routed? Perhaps simply being without ammo, and without an escape is enough to cause these things.

I need to run the movie enough times to actually count all the hits. BTW, all hits are first round hits since there is only one round in the firing vehicle.

EDIT 2: OK, I figured out how to count the hits easily. Running tests now. Finally. BTW, it does appear a vehicle has to be hit to cause the crew to break, rout, or bail. At least in my test.

Treeburst155 out.

[ October 31, 2002, 03:43 AM: Message edited by: Treeburst155 ]

Link to comment
Share on other sites

Originally posted by Seanachai

Good God.

I've finally encountered a group of people with so much time on their hands that even touching themselves has worn thin...

At least YOU still that to fall back on. :D

Hey while your at guys...I have lost money in Lake Tahoe all 4 times I went. The one time I went to Las Vegas I won $120. Does this mean Lake Tahoe cheats and Las Vegas is the place to go? :confused:

Please advise as travel plans are being discussed.

Link to comment
Share on other sites

WARREN PEACE'S "AI CHEATS" THREAD IS LIKE BRUSSEL SPROUTS!!!

You hate the damn stuff but you still eat it (read this thread) 'cause it's good for you to try and understand these bright people and their statistical tests! :D

It's early morning here in SA and I sure don't want to start working - hence reading this.

Shame on you gents for letting me read every post in this thread. Fascinating. Continue gentlemen...

Charl Theron

header_Winelands02.gif

[ October 31, 2002, 03:00 AM: Message edited by: WineCape ]

Link to comment
Share on other sites

Never in the field of simulated conflict has so much been read by so many, and understood by so few (inc Me).

I do hope Charles does not get too upset that his word is being doubted.

Keep up the good work I look forward to the awarding of the first Combat Mission inspired nobel prize

Link to comment
Share on other sites

Fascinating stuff folks....

Just a quick question.. what year are the test being run? (I know, I know 2002.. ho, ho...)

I might be wrong, if I missed it already I'm sorry..

As you are pitting regular Allied crews versus regular Axis crews... don't the russkies get an experience handicap in the earlier years of the war (can't remember if it's up to '43 or '44)

That might be another one of those nasty variable things...

Link to comment
Share on other sites

Playing the firing Germans in my test above 200 times, I scored 73 first round hits.

Letting the AI play the firing Germans there were 82 first round hits out of 200 tests.

With 20% of the tests completed, the AI is leading in first round HIT percentage 41% to 36.5%.

Inconclusive at this point IMO. If the gap doesn't close by the time I've tested 1,000 times, it might be worth it to do it an additional 1,000 times. At that point maybe the statisticians can say if we have something here. Isn't there some "90% confidence in your figures" point you eventually reach? I mean, a point where your margin of error is very small due to the high number of trials? If there is a 70% chance of an event occurring and you test it a couple thousand times, it's going to happen very close to 70% of the time, right?

Treeburst155 out.

[ October 31, 2002, 05:13 AM: Message edited by: Treeburst155 ]

Link to comment
Share on other sites

Guest Sgt. Emren

Treeburst 155:

You can stop testing. Your results already prove that the AI does not have an advantage (with your particular setup). You don't need to try it out more than 30-50 times! smile.gif

To carry out a statistical test really requires that you understand advanced statistics - and from the way that you ask, I don't think you do? No offence. I cannot, right off the top of my head, write down the statistic you need to prove your point (it's been a while), but a difference of 9 hits in a hundred shots is definitely not high enough to conclude a difference within any reasonable statistical sense.

Link to comment
Share on other sites

Originally posted by Treeburst155:

Fog of War = NONE

I will then play the exact same turn from the Russian "target" side 50 times, giving no orders, and record the AIs HITS against the targets. The Tac AI will have full control of my Russian units.

Note: By facing the Russian target vehicles away from the firing units, and buttoning them up, I'm hoping the Russian strat AI does not "see" the German vehicles and therefore won't give any orders while it is "thinking". IOW, I want the strat AI to play the Russian side exactly as I do.

By setting the FoW to none, you guarantee that the Russian OpAI will be aware of the german vehicles and hence react to them. I don't know if this makes the Russian TacAI also aware of them. The OpAI's awareness isn't a problem I think because the command delay all vehicles will probably only move after all shots have been fired. But the TacAI may react sooner.

Just a thought.

[ October 31, 2002, 08:32 AM: Message edited by: gnuif ]

Link to comment
Share on other sites

Originally posted by Treeburst155:

Playing the firing Germans in my test above 200 times, I scored 73 first round hits.

Letting the AI play the firing Germans there were 82 first round hits out of 200 tests.

Very interesting.

Rather then simply declare that this is or is not signifigant (sorry Sgt. Emren, but rather than simply making a categorical statment its best to use statistical analysis) I will take a look with chi square.

Expected = 77.5

Observed 1 = 82

Observed 2 = 73

Chi Square = ((82 - 77.5)^2/77.5) + (73-77.5)^2/77.5) = .522

P (one degree of freedom) = .470

So this is not significant.

Incedentally, if you do about 1000 of these tests and see the same ratio then P will be just under .05, and therefor significant.

The weird thing about all these tests is that we always see some small advantage to the AI. I have no good explanation for this, nor have we definitiavely seen this to a significant P value, but that is why I'm still conducting tests and am interested in seeing other's tests. Its just so odd.

--Chris

Link to comment
Share on other sites

Originally posted by yapma:

sgt emren:

no offense, but i think you dont know what you are talking about.

Why do you say that? I believe that sgt emren is correct. The variance described by treeburst seems entirely in line with no statistical difference.

Masstrictian's test bears this out. (Nice work BTW, Chris).

[ October 31, 2002, 10:19 AM: Message edited by: Mannheim Tanker ]

Link to comment
Share on other sites

Originally posted by Mannheim Tanker:

</font><blockquote>quote:</font><hr />Originally posted by yapma:

sgt emren:

no offense, but i think you dont know what you are talking about.

Why do you say that? I believe that sgt emren is correct. The variance described by treeburst seems entirely in line with no statistical difference.

Masstrictian's test bears this out. (Nice work BTW, Chris).</font>

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.

×
×
  • Create New...