Wild Bill's Rumblings of War [Part III]

Treeburst155 · January 31, 2002

Peter,

Thanks for volunteering on the AARs. I'll make sure you have all of them when this is over. They're still coming in.

Treeburst155 out.

wadepm · February 2, 2002

Page 3! That won't do. I am finally done. Completed "We Can't Wait" last night. Now to do some AARs and wait to see if I finished dead last.

[ February 02, 2002, 09:54 AM: Message edited by: wadepm ]

gredeker · February 2, 2002

Wade, thanks for the update and for saving the thread.

Bertram and Warren - how far along are you two in this scenario? Maybe we'll have another scenario to talk about shortly.

Treeburst155 · February 2, 2002

"We Can't Wait" has now been completed by all participants!! Here's how it shaped up:

We Can't Wait (A)

Allied player on the left

Bertram 19 Warren Miron 75

Scot Johnson 45 Jukk-Pekka 55

Pixelmaster 37 Wade Moore 63

Redeker 67 Jon Sowden 32

Holien 54 Kingfish 41

Georges (Mick) 30 Tom 65

Svensson 37 CapitalistDog 63

von Lucke 37 Kettler 54

Davidson 35 Zalewski 60

Gaspari 56 Travisano 44

Juha Ahoniemi 40 Rohde 51

Enoch 26 Dickens 74

Treeburst155 out.

Treeburst155 · February 2, 2002

A note on the scoring system:

The scoring system has been revised significantly based partly on results in this tournament. This tournament however will be scored using the formula described at the very early stages (probably the first thread) of the tourney. This is necessary to allay any possible suspicions that the formula may have been manipulated to produce a certain result. The one fairly recent change to the scoring that will be implemented is the splitting of contested VL points so that point totals always equal 100.

The scoring formula in place for this tournament is very good IMO, but we are always improving it since Nabla is a genius and keeps coming up with improvements.

Treeburst155 out.

von Lucke · February 3, 2002

Wow --- I thought it was only my ill thought out battle plan that cost me from pushing through to Bastogne. Didn't the Americans historically win in "We Can't Wait"? Wouldn't know it by those results...

Holien · February 3, 2002

This is like waiting for a Christmas when you were a kid.

So we can chat about "We Can't Wait"... Over to the other forum....

H

John Kettler · February 3, 2002

Following a long hiatus, finally owning something like a working brain and having shipped certain

domestic COMJAM units to Florida, I was finally able to renew work on my remaining AAR. Am now just short of halfway done with what will be a long, detailed AAR covering an absolute nailbiter of a battle. If this happy trend continues, I should be ready to send it to Treeburst155 and Tom in a day or so.

Regards,

John Kettler

John Kettler · February 3, 2002

The latest scenario numbers continue the bleak statistical trend I noted previously in my initial analysis of the first three completed scenarios.

I placed third in Section II as the Germans in "We Can't Wait," and in assessing overall position, again emerged over 20 points, 21 to be exact, behind the best performer. The only good news is that I was 22 points better than the worst overall performance playing my slot. Of the four completed scenarios, I did best in this one, and was still mediocre.

PROPOSAL

For the serious stat hounds, I suggest that you compute and present the mean and the standard deviation for each side in a given scenario, as well as the mean and standard deviation of each player's overall performance in Rumblings, separately for attacking and defending, once all the results are in, of course. This will give both a better idea of the true capabilities of the players and will give Wild Bill and company a far better handle on the swing factor as applied to scenario outcome. If the average score is one value but the standard deviation turns out to be large, this is important for the scenario designer to know.

Regards,

John Kettler

[ February 04, 2002, 04:38 AM: Message edited by: John Kettler ]

WineCape · February 4, 2002

Gentlemen,

For those players that have finished all their matches, I have a prepared MSExcel97 spreadsheet with all the tourney tables neatly organized with match totals and section medians, barring a couple of match results still awaiting.

Anyone interested, email me (WineCape@global.co.za).

Best wishes,

Charl Theron

----------------------------

”Wars, conflict, it's all business. One murder makes a villain. Millions a hero. Numbers sanctify.”

Charlie Chaplin (Sir Charles Spencer C.) British film actor, 1889 - 1977

WineCape · February 5, 2002

Treeburst,

Gregory Redeker reports an inaccuracy in the tourney match results.

The score in his game against Warren should be 83-17, not 83-19. [Scenario C - "Sounds in the Night", Section I]

Please confirm. I'll update the spreadsheet to the new score as reported in the mean time.

Sincerely,

WineCape

Treeburst155 · February 5, 2002

WineCape,

83-17 is correct for the Greg Redeker/Warren Miron game.

John Kettler,

Interesting proposal, thoughts and comments coming as soon as I take care of other tourney related things.

To All,

"The High Cost Of Real Estate" has now been finished by all participants. Here are the results:

High Cost (F)

Jon Sowden 33 Warren Miron 61

Wade Moore 34 Jukka-Pekka 51

Redeker 51 Scot Johnson 49

Pixelmaster 32 Bertram 68

John Kettler 37 Kingfish 56

CapitalistDog 46 Tom 54

von Lucke 59 Georges (Mick) 41

Svensson 12 Holien 88

Zalewski 24 Dickens 76

Rohde 36 Travisano 50

Enoch 31 Gaspari 63

Juha Ahoniemi 40 Davidson 60

Feel free to discuss this scenario. We have only two games to go in "Crisis At Kommerscheidt" and two games to go in "Sounds In The Night". After that it will be time to crunch the numbers into official tourney scores.

Treeburst155 out.

Treeburst155 · February 5, 2002

Calculating the statistics John suggests would be very interesting IMO. I, however, do not have the skills to do it. Since the results will be available for all to see, maybe someone will volunteer to take on this task. I know we have a few statisticians and mathematicians who frequent this forum.

If the results from one side of a scenario swing wildly from say 20 to 80 points what does that tell us? The scenario could be highly dependent on luck, or the player with the least skill just happened to play a scenario from the same side as the very best player. The opponents of these two players would also be a factor. If our grade A player plays the Allied side against a grade D player, and another grade D player plays the same side of that scenario against a grade A player the two final scores could very well be opposite. It is interesting.

Treeburst155 out.

[ February 05, 2002, 03:15 PM: Message edited by: Treeburst155 ]

John Kettler · February 6, 2002

Treeburst155,

This is precisely why I asked that someone gin up the statistics for a given player's overall performance on attack and defense (X has a mean score of Y with a standard deviation of Z as attacker, but...), as well as simply computing scores within a scenario relative to the other players. This way, you'll be able to identify the strong players, not to mention finding out who is balanced in performance vs. who is markedly better at attack or defense.

I'd be willing to believe that I'm better on defense than attack, and my scores in the first four scenarios fully support this, with my best, a 54, being the defender in "We Can't Wait" and my worst, a 26, as attacker in "Real Guts."

I believe that anyone with decent spreadsheet competence can easily run the stats we need, but I'm not that person and am also on deadline for an article, so I need to get my research rolling.

The information I'm seeking would give Wild Bill and company a much better sense of how balanced the scenarios are and how big a part the players played in determining the outcome, as opposed to luck.

Regards,

John Kettler

Wild Bill Wilder · February 6, 2002

Though not following perhaps as detailed a workout of the numbers as you suggested, John, I have been watching them closely. Even a quick glance down the list gives me an indication of just what is happening and how the balance is in each battle.

In the first three completed scenarios, I've tweaked a bit to try to swing them even closer into a tight fight for either side.

Of course, one has to figure in the skills of the players and that is hard to do. Still, here again, I can see a trend in an individual's scoring that tells me something.

I don't know if perfect balance could ever be achieved in a scenario. There are just too many variables to guarantee such a thing. Some changes can and will be made to try to make them more evenly matched, based on the results being fed to me by Winecape and Treeburst...plus those fine AARs you guys have been writing.

Balance is not something that can be achieved by numbers only but it is a strong contributing factor.

So be aware that I am aware and watching all of this very closely.

Thanks, buddy. It is a good idea.

Wild Bill

Nabla · February 6, 2002

Originally posted by Wild Bill Wilder:
I don't know if perfect balance could ever be achieved in a scenario. There are just too many variables to guarantee such a thing.

Luckily this is not necessary any longer.

Here are some thoughts about these issues that came to my mind.

As John Kettler noticed above, standard deviation of the results is one possible additional measuring stick for scenarios. What would it measure? If the standard deviation of the results of a scenario is zero, the scenario does not differentiate the players in any way. At the other end, consider a scenario with maximal standard deviation: half of the Allied players score 0 and the other half +100 (this is still balanced). Does this scenario measure the goodness of the players in any way?

The answer depends completely on the correlation between player skills and the end result. If the two are completely uncorrelated then the battle measures pure luck. Put two tanks head to head in an open field, facing one another. There's not much you can do to improve your chances. But if there is a positive correlation then the scenario does measure the goodness of players. If the correlation is negative then there is probably something wrong with the way CM calculates points or models battles. Of course, the real problem here is the determination of "player skills", since it is also done with the scenarios.

If in the previous example we assume that the skills of the players follow a normal distribution, a maximal correlation is achieved if players below mean score 0 and players above mean score 100. Would such separation be preferable? Probably not, unless you want to have a cruel playoff system.

What then would the optimal scenario be like with respect to the distribution of results and their correlation with player skills?

First the correlation with player skills. This should probably be maximal, given the other attributes of the distribution. Minimize the **** happens -factor.

Then the distribution. Keeping the same assumption that the skills of the players follow a normal distribution, the results should also probably follow a normal distribution. Why? I'm not sure... If we have a scoring system like the one we're using in this tournament, then the scoring curve maps normally distributed scores into uniformly distributed ones (at least approximately). It spreads results near the median and clumps them together further away. Such a mapping differentiates the scores maximally, that is, the standard deviation of the final scores will be maximal. This is reasonable as long as the standard deviation is the same for all scenarios. If it is not, which is of course the reality in all tournaments, then the scoring system should neutralize this difference.

As someone may have noticed, I may have shot into my own leg with the previous argumentation. The scoring system should maybe normalize the results so that all scenarios have the same standard deviation. But I am a bit confused, and probably also confusing you with this post. I'm not sure why a uniform distribution of final scores would be optimal. I will think about this.

[ February 07, 2002, 02:58 AM: Message edited by: Nabla ]

Treeburst155 · February 7, 2002

Oh Nooooo!! Nabla's brain is working in overdrive again! I can see the smoke curling out of his ears.

You can only learn so much about a scenario's balance when you use players whose skill level is unknown. We are trying to determine an unknown (scenario balance) by using another unknown (player skill) and vise versa. There is only so much we can learn about either over the course of a tournament.

This is a much more difficult thing than in chess. In chess, you know you have a balanced "scenario" with a straight win/lose/draw outcome. Even so, you need to play twenty games (the same balanced scenario) before your rating is taken off "provisional" status. At that point it is felt that the player's rating is a fairly accurate reflection of his skill.

What we would need is an official community wide CM ratings scenario. All players desiring an official rating would have to play the scenario 20 times from each side. They would then receive two ratings, one for each side. Somehow, the already complex chess formula would have to take into consideration degrees of victory. An 85-15 outcome is far superior to a 60-40 outcome, but both are wins.

Also, the rating would only be valid for that one scenario. Unlike chess, every CM scenario is an entirely different game with respect to setup. So the interesting problem of determining player skill and scenario balance is a huge one. Besides, do you really want to play a scenario twenty times from each side just to get a rating that's only valid for one scenario?

The best way to determine skill would be to have a never ending tournament using Nabla's scoring system where the same twenty four people fight it out over and over again in round robin fashion (no Sections) with new scenarios for each round.

Treeburst155 out.

[ February 06, 2002, 06:12 PM: Message edited by: Treeburst155 ]

JonS · February 7, 2002

The two more recently finished scens (WCW and HCoT) don't seem to have been as balanced as the first three.

WCW

Allied Avg: 40.25 (StdDev 13.46)

Axis Avg: 56.42 (StdDev 12.98)

Allied Range: 26 - 67

Axis Range: 32 - 75

{I got by far the lowest as Germans in this one }

HCoT

Allied Avg: 36.25 (StdDev 12.23)

Axis Avg: 59.75 (StdDev 12.89)

Allied Range: 24 - 59

Axis Range: 41 - 88

Averages for the first three (DaD, RaR, & RG) are towards the bottom of pg.6 of this thread.

Regards

JonS

Edit:

I was in the middle of writing this when the previous series of posts went up. So, 'balance' in the sense that I've used here may not necesarily be correct. The averages are simply the average score acheived by each side in each scenario.

Granted this is not the be all and end all, but if we look at the case of HCoT, taking the average for each side and the SD, and assuming that the results are normally distributed (BIG assumption) and the contestants are of equal skill (even BIGGER assumption), then this scen can be said to be balanced to 1SD (95% confidence).

But, given the assumptions, I would feel more confident describing it as strongly favouring the German side, in blind play. The Allied player having fore-knowledge that the PBs etc are locked, and about the reinforcement locations, would strongly change the results IMHO.

[ February 06, 2002, 06:23 PM: Message edited by: JonS ]

Nabla · February 7, 2002

Originally posted by JonS:
The two more recently finished scens (WCW and HCoT) don't seem to have been as balanced as the first three.

HCoT
Allied Avg: 36.25 (StdDev 12.23)
Axis Avg: 59.75 (StdDev 12.89)
Granted this is not the be all and end all, but if we look at the case of HCoT, taking the average for each side and the SD, and assuming that the results are normally distributed (BIG assumption) and the contestants are of equal skill (even BIGGER assumption), then this scen can be said to be balanced to 1SD (95% confidence).

I have a question for you. Are you doing this analysis to

1. determine if the scenarios are balanced

or

2. assess whether the scenarios have been fair from the point of view of this tournament?

These are two different things. The scoring system is immune to unbalance (median different from 50), so from the point of view of this tournament a median (or mean) which differs from 50 is not a problem. However, if you want to use the scenarios in ordinary (non-tournament) play, then the situation is different.

[ February 07, 2002, 02:56 AM: Message edited by: Nabla ]

JonS · February 7, 2002

Originally posted by Nabla:
I have a question for you. Are you doing this analysis to
1. determine if the scenarios are balanced
or
2. assess whether the scenarios have been fair from the point of view of this tournament?

I didn't follow the full discussion w.r.t. point 2, however I'm confident that the result was unbiased. So its more to to with point 1. And to make me feel better in the scenarios where I got spanked - now I can point to the numbers and say "See, it was the scenario! It's not because I suck and have all the tactical finesse of a toothless pirahna!"

:eek:

Unfortunately that doesn't work for WCW. In that case I simply suck.

JPS · February 9, 2002

Page three - back to top!

How are the remaining battles going on, schedulewise?

Spanish Bombs · February 11, 2002

BUMP

Tabpub and I wrapped up Sounds in the Night, at last report there was one other "Sounds" scenario going, methinks, so perhaps we are on the cusp of that result as well.

My AARs are done, no more tournament turns to plots or movies to watch. What am I to do with this void in my life? I suppose I could get some exercise..... nahhhh. When's the CMBB tourney start?

mPisi · February 11, 2002

Pix and I are on turn 21 or so of Sounds. I am already feeling the pangs of not playing so many fine games...

Treeburst155 · February 11, 2002

I ran into a small glitch while getting the input file ready for the scoring program (a dry run since all input is case sensitive, spelling, spaces, etc.). It seems the program only accepts whole numbers for scores. This means when I split the points for contested VLs I have to drop the .5 fractions so that some scores will only total to 99 points.

The impact of this should be next to nothing as near as I can figure. Even if a player had a half dozen of these fractions dropped (highly unlikely) it would only amount to 3 CM points spread over the seven games. Also, his opponents in those games will have also lost the .5 fraction. Conversion to the Nabla score reduces even further any impact these lost fractions might have. Is there any objections to dropping these .5 fractions? There's no good way around it unless I use the latest Nabla formula (developed a short time ago) or forgo splitting the contested VL points.

____________________________

We have only three games left to complete, two Kommerscheidt and one Sounds In The Night. When they are done I will post the input file for the program here. Although I've checked and rechecked the scores it is still possible I made some mistake. It would be wise to look your games over in the file just to make sure I've entered the correct scores. These scores will reflect the splitting of VL points (if there's no objection to the above) and should always total 99 or 100 points.

Treeburst155 out.

JPS · February 12, 2002

Rescue from page three.

Ignoring half-points seems quite reasonable thing to do if necessary.

Wild Bill's Rumblings of War [Part III]

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites