The (Possibly) Unbalanced Tournament Scoring System

Nabla · September 26, 2001

I decided to start a new thread for this subject since it concerns all CM tournaments. I have given up my efforts (see below) to come up with a good acronym so from now on the system will be called just the "Nabla system" as suggested by Treeburst155. Here are some of the most important messages from the "Wild Bill's Rumblings Of War"- A Tournament thread in which the discussion started.

First the message that introduced the idea.

<blockquote>quote:<hr>Originally posted by Treeburst155:

Listen up!! This is important!

I received an email from "Nabla" (Jarmo Hurri)of Finland. He described to me a scoring system that I think is excellent. It's a Duplicate Bridge scoring system. It's a good way to score CM tournaments because scenarios are very difficult to truly balance. Here's how it works. I'm quoting Nabla's email to me:

"A particular problem I've been interested in is scoring in the case of

uneven battles, that is, in battles where one side has a much better

chance of winning to begin with. This is important for two reasons.

1) Uneven battles are what a commander generally faces in the real

world.

2) A truly even battle is very difficult to generate (and some losers

will disagree anyhow).

While many games with scoring systems are inherently even (such as

chess), fortunately there is an uneven game from which we can borrow

the scoring system: bridge. In bridge you have to play with what

you've been given, and you can have very uneven hands.

What I've got in mind is an n-player tournament where you have n-1

battles. Each player plays each battle once, and also plays once

against each opponent. The results of ONE BATTLE are scored as

follows. The player who gets the best ALLIED CM POINTS in a battle

scores n-1 GAME POINTS for that battle and the player who gets the

worst ALLIED CM POINTS gets 0 GAME POINTS for that battle. Points are

computed similarly for Axis side. Sum up over battles, see who has

most points, and you've got a winner. This is the scoring system used

in duplicate bridge. Note that in this system the performance of each

player is compared to the performance of players playing on the same

side (for example, as Allied). This enables the use of uneven battles.

As mentioned above, the real beauty of such a game system would be the

possibility to have them play uneven battles. I mean, tournament

administrators would design the battles, and they could be as uneven

as one wishes (actually they could be generated partially randomly), you

could have intelligence info which might be inaccurate etc. And it

would not take such a long time to design the battles since they would

not have to be balanced. "

This I think is a brilliant scoring idea! I would like to convert this tournament to this scoring system.

Say you are playing the Allies in Scenario A and you get whipped 80-20. Your 20 points for that scenario will be compared to all the other Allied scores from that scenario. If the best any Allied player did on that scenario is 35 points then it is fairly obvious the scenario is not balanced well (VERY difficult to do without extensive, repeated testing). Your 20 points will be ranked (compared)with all other Allied scores for that tournament. It's a perfect way to handle scenario imbalance!!

What do you say, guys? Should we convert to the duplicate bridge scoring system? I think we should. Let's call it the "Nabla" system. Since the highest score you can get for a game is 7 points we would make AARs worth .7 additional points. This makes an AAR worth 1/10th of a perfect game, just like it is now. How about it? Let me have some feedback. The only drawback I can think of is that you really won't know how you stand until ALL the scenarios are finished. Until then you would only have an approximation from the game scores. I would redo the standings page by listing results for each scenario. I'll have to think about how to best do it.

Treeburst155 out.<hr></blockquote>

So this is the basic idea, except for the one mistake I made there, that is, if you have n players each battle is played n/2 times, so the best player gets (n/2 - 1) points, not n-1.

<blockquote>quote:<hr>Originally posted by Peter Svensson:

I'm not sure how NABLA handles this, but I suggest that if two people

are tied for, for example, second place, they split the points for

second and third place. If second place is worth 10 points and third

9, then each of the players would receive 9.5 points. The player with

the next lower score would receive the points for fourth place, i.e. 8

points. If three people were tied for second, they could split 10+9+8

points three ways, and so on.<hr></blockquote>

Yes, this is a correct and sensible way of handling ties.

<blockquote>quote:<hr>Originally posted by John Kettler:

The proposed scoring system (or variant thereof) seems fine to me, enables frighteningly realistic scenario development, and addresses some knotty statistical issues as well arising from the relatively small number of trials used to establish play balance.<hr></blockquote>

This is exactly to the point. The main motivation of the scoring system is to enable the use of unbalanced scenarios, which for example frees tournaments from the notorious 2:1 attack ratio and makes it possible to devise really interesting scenarios with completely unbalanced forces. A second motivation was the fact that it is almost impossible to create a balanced scenario.

<blockquote>quote:<hr>Originally posted by tabpub:

But then look at a less extreme situation:

Say that in a certain scenario the average score for the 28 games for one side is ..um 56 say. Without being precise, let us say that 27 of the scores fall in the range 50-60 and 1 score was 85. In the

abovementioned system of scoring, the player with an 85 would score 11

pts and the next highest(say it was 60) would score 10. Is that

necessarily equitable? The one player that was able to "shine" above

the rest would not reap the full benefits of this accomplishment.And

the last place person would be shut out because he was a 50.<hr></blockquote>

This is true. There are other systems in which the score is related to the difference. Tadpub and Treeburst155 suggested two possible variants for CM.

<blockquote>quote:<hr>Originally posted by tabpub:

Think this over. Take that average score for a side in a specific

scenario. Compare each players score to that average. If you are

above, you get that many positive points;below, and you get that many

negative points. At the end, add each players "comparative" scores

together to get a grand total.<hr></blockquote>

<blockquote>quote:<hr>Originally posted by Treeburst155:

OK, how about this scoring system. First the average score is

determined for each side/scenario. Points would be assigned to players

based on this average in the following manner:

+/- 4 of average=0 points

+5 to 9 of average=1 point

+10 to 14= 2 points

+15 to 19= 3 points

etc....

If you are more than 4 points below the average for that side/scenario

your points go into negative numbers in the same fasion. This system rewards overwhelming victories and punishes crushing defeats to a certain extent while avoiding punishing the lower scores in a situation where the scores are all very near the average. What think

ye, gentlemen?

Both of these scoring systems are possible. There are deep issues involved here concerning the type of gameplay rewarded. Basically both of the schemes suggested by Tadpub and Treeburst155 are pretty risk-neutral (that is, do not reward either risk-taking or risk-averse gameplay) while the original suggestion is highly competitive and may encourage people to take risks to win (although not big risks to achieve overwhelming victories, as Tadpub correctly noted). Right now I have no opinion about which of these would be best. (For example in bridge there is a very popular corresponding scoring system which actually discourages taking big risks - the additional final score you gain when you gain an additional game point is smaller if you already have a lot of points than it would be if you have less points.)

Anyhow, the response seems very positive and we should continue the discussions. I am currently arranging the first tournament together with Treeburst155 where which will take full advantage of the scoring systems ability to handle unbalanced battles.

[ 09-27-2001: Message edited by: Nabla ]

[ 12-11-2001: Message edited by: Nabla ]

WWB · September 26, 2001

I had a balance problem in one tournament I ran, albeit unexpected. I just let the top 4 players advance who had played each side, comparing them only against their peers, not everyone else.

WWB

The_Capt · September 26, 2001

I like it.

As long as we don't know what the "running average" of a scenario is before we play it which may influence play style. Otherwise it is a very good way to judged who played well and who didn't beyond the simple point score.

Under the current system as sides are random (Axis and Allied) then how are we ensureing that everybody is getting a balanced chance at playing the easy scenarios and racking up a high score.

If you average a scenario across all three groups and give points scaled to that score this would eliminate the "easy high score" advantage of an unbalanced scenario.

For example if you draw the bad straw and get 5 "bad draws" out of seven your score would be much lower than if you got 5 "good draws", even though you may play your games very well. The system you purpose takes care of that.

I vote yes.

Wreck · September 26, 2001

I like the system.

I note that is similar to the system I proposed to do a tournament based on an operational level campaign; the difference is that mine just used total score achieved to rate players, but kept the players in two separate groups (German and American). This allows comparability, but only within the two groups. You end up with two "champions".

The BUBTS system (ugh -- awful acronym -- let's think of something nicer please) has the nice property of allowing all players to be one group. This allows a single champion to be named, and also allows players to play any side for any given battle.

As for the point awards, I rather agree with tacpub. What we really want to do is seek out players (player pairs, really) that are not average, and reward/penalize them for it. Most CM attack scenarios, I would guess, will be single modal, though a well designed scenario should be bimodal, which is a bit of a problem, for any scheme assuming a single mode. Treeburst's for instance. I am not sure what to do with this, but let me give an example so you see what I am talking about.

Consider a scenario where it so happens that an early tank duel is going to snowball into a clean win or loss for whoever wins the duel. 24 people play, and the scores come out with 6 defense wins, say 75-25 with small variation around that, and 6 offense wins, say, 25-75 again with small variance around that. This is a bimodal distribution. In Treeburst's system, the average score being 50, all players would be assigned either -4 or +4 points. This is not wrong per se, but it does have a strong effect on player rankings based on what, at bottom, may have been a roll of the virtual dice in that initial tank/tank confrontation.

Anyway, we might just assume we will get single moded results (i.e., bell curves), and fit a point system to that. In which case Treeburst's is fine.

Nabla · September 26, 2001

<BLOCKQUOTE>quote:<HR>Originally posted by Wreck:

I like the system.

I note that is similar to the system I proposed to do a tournament based on an operational level campaign; the difference is that mine just used total score achieved to rate players, but kept the players in two separate groups (German and American). This allows comparability, but only within the two groups. You end up with two "champions".

The BUBTS system (ugh -- awful acronym -- let's think of something nicer please) has the nice property of allowing all players to be one group. This allows a single champion to be named, and also allows players to play any side for any given battle.

Acronyms - not my stronghold, so perhaps we should just make it BUB? (I'll edit it into the opening message)

Anyhow, concerning the real thing, you've got an important point here that must be kept in mind when a tournament is arranged. I mean, it is possible to screw up things so that you are always compared against the same guys in different games. I'll think of some easy algorithm to take care of this (on the average a random assignment should do well).

<BLOCKQUOTE>quote:<HR>Originally posted by Wreck:

Consider a scenario where it so happens that an early tank duel is going to snowball into a clean win or loss for whoever wins the duel. 24 people play, and the scores come out with 6 defense wins, say 75-25 with small variation around that, and 6 offense wins, say, 25-75 again with small variance around that. This is a bimodal distribution. In Treeburst's system, the average score being 50, all players would be assigned either -4 or +4 points. This is not wrong per se, but it does have a strong effect on player rankings based on what, at bottom, may have been a roll of the virtual dice in that initial tank/tank confrontation.

Yes, this is true since CM is to some extent a game of luck. However, I would like to point out that in the original suggestion based purely on ordering you would at least have quite a fight for qualifying at a high position in the subgroup of losers or winners. Of course luck can have an impact here as well...

[ 09-26-2001: Message edited by: Nabla ]

Treeburst155 · September 26, 2001

I still think we should call it the "Nabla" system.

Any scenario that can be decided one way or another by an early tank duel is not a good scenario IMO. Players might as well just flip a coin to determine the winner.

Nabla's point about players scores ending up being compared with the same player many times is something to keep in mind. In small tournaments this could definitely be an issue and scheduling of matches should probably give this priority over even distribution of attack/defend duties and/or sides.

Someone commented about using tactics based on the known average for that side/scenario. I don't think this is a big enough problem to worry about. If a player is one of the last to complete a scenario he will indeed have a good idea of what he needs to score. I would argue that his game would be well underway by the time all the other games are completed. His knowledge of what score he needs at that point won't be real helpful. Also, all players near the end will probably have a fairly good idea of the score they need for their last few games. If this is a concern of players it is easily remedied by not informing players of game results. They could be kept completely or partially in the dark for the duration of the tourney. Reporting only names and scores only would be one way to severely limit players knowledge of what is going on with others.

I still think my compromise between the original "Nabla" system and Tabpub's revision is the best. It would be a shame to have an average score of 55 with a low score of 50 getting zero points and a high score of 60 getting the maximum points. Any extremes in results will mess with the system, whether the scores are tightly packed around the average or spread wildly from top to bottom. My answer to this is best IMO.

EDIT: Let's look at the pros and cons of the three variations.

Original Nabla: Here there is no reward for an overwhelming victory, and no punishment for a crushing defeat. This would tend to create a tight race. This could be a good thing. The drawback is when the low score is very near the average and gets zero points due to all the scores being near the average. This is bad, IMO.

Tabpub's version: Here we have rewards for overwhelming victories and punishment for crushing defeats. This will tend to spread people out to a certain extent. This may actually be more fair, but perhaps less exciting as far as the race for the crown is concerned. There is also the possibility that a high score is due more to a poor opponent than superior abilities of the winner. Of course, all players would get to play this poor player, giving them the opportunity to score well above average in one of their games. Hmmm....this realization has me leaning toward the Tabpub version.

Treeburst variation: This is simply a compromise between the two systems above. Like the Tabpub system, extremes are rewarded/punished and tight groupings around the average are handled well. In addition, extremes will be somewhat tempered, thereby tending to keep the race tighter than the Tabpub variation.

Wrecks concept of bi-modal grouping of the scores with significant distance between the groups is interesting and would tend to mess with the system a bit just like any other extreme variation from a nice bell curve.

All scenarios, if played enough times by enough different people will tend to fall into a bell curve peaking at some point on the line between 0 and 100 points. The peak's proximity to a score of 50 would indicate balance. The exception to this would be a scenario that is highly dependent on luck. The fingerprint of such a scenario would be Wreck's bi-modal grouping. I can see no other way a scenario could display these bi-modal characteristics if played enough times.

A scenario will always display a consistent, singular relationship to a balanced state unless it is highly dependent on luck. Scenarios are either in balance or they are out of balance in one direction or the other. They can't be out of balance both ways unless their is a high degree of luck involved. Any scenario that swings wildly from Tactical Victory to Tactical Defeat for a given side, with few results in between is highly dependent on luck and should be avoided in competitions.

Treeburst155 out.

[ 09-27-2001: Message edited by: Treeburst155 ]

Nabla · September 27, 2001

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

I still think we should call it the "Nabla" system.

I see that for the first time my failure to create a successful acronym has not turned against me. Ok, since we also have Fionn's rules lets call this "the Nabla system".

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

EDIT: Let's look at the pros and cons of the three variations.

Original Nabla: Here there is no reward for an overwhelming victory, and no punishment for a crushing defeat. This would tend to create a tight race. This could be a good thing. The drawback is when the low score is very near the average and gets zero points due to all the scores being near the average. This is bad, IMO.

This is a fair statement, and as was noted above, is related to the type of gameplay rewarded.

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

Tabpub's version: Here we have rewards for overwhelming victories and punishment for crushing defeats. This will tend to spread people out to a certain extent. This may actually be more fair, but perhaps less exciting as far as the race for the crown is concerned. There is also the possibility that a high score is due more to a poor opponent than superior abilities of the winner. Of course, all players would get to play this poor player, giving them the opportunity to score well above average in one of their games. Hmmm....this realization has me leaning toward the Tabpub version.

I think that Tadpub's version is a good system if we are looking for a neutral scoring system. As you noticed, poor players do not really affect the situation as long as they play poorly consistently. However, the situation is different if someone's playing style is inconsistent - for example if they play poorly at the end of the tournament because they have not been as successful as they wanted.

Let me illustrate this with an example. Let's say that we have a six player tournament with players from A to F.

Now assume in the first four of his five games F does not fail completely but does not really do as well as he wanted. So F decides to call it a day, and plays the last game only for fun, taking unnecessary risks etc., and loses royally. Let's say the results are as follows (Allied side first):

A-B 20-80

C-D 10-90

E-F 90-10

In Tadpub's version the mean of each side (Allied mean 40, Axis 60) is subtracted to give final results which are: A -20, B +20, C -30, D +30, E +50 and F -50. So B did a lot better than A, and D did a lot better than B, and E got the best bonus. Notice that if F would play badly all the time these bonuses would roughly cancel each other out.

If F had played consistently we can for example assume that the E-F game would have ended 20-80. Then the scores would have been approximately A: 3, B -3, C -7, D 7, E 3, F -3. Note now that B and A are approximately on the same level.

There is something we can do about this situation. Consider using another statistic called median instead of the mean. Median is defined for a set of numbers X as the value which is larger than 50% of the numbers of X, and therefore also 50% of the numbers are smaller. For example, mean of numbers 0, 1, 2, 3 and 94 is 20, while the median is 2. You can see here how the median protects against outliers.

Using the median instead of mean in the example gives the following scores (Allied median 20, Axis 80): A 0, B 0, C -10, D +10, E +70, F -70. Now A and B did equally well (as they would have if F had played consistently), and the difference between C and D is smaller.

This solves the problem partly by not punishing A and C for the benefit of B and D. However, as a result E now get's an overwhelming victory. This is one of the reasons why in corresponding bridge systems there is another step which does not reward overwhelming victories to the fullest. Let us denote by d the difference between the points given by CM and the median (so for player E this is 70). As an example, consider a system in which the final score s is computed as s = sgn(d)100*(1 - exp(-|d|/100)), where sgn(d) denotes the sign of d, exp() denotes e to the power of the argument. and || denotes taking the absolute value. The difference between this scoring and the original Tadpub scoring in illustrated the following picture (Tadpub: green line, robust scoring: blue line).

Using this scoring the final scores of the players would be A 0, B 0, C -9.5, D +9.5, E +50, F -50. Note that this is just an example which illustrates the principle. We could easily make the reward curve flatter or steeper, thereby decreasing or increasing the reward for large victories. The point is that it can be adjusted, and that there is a reason behind the adjustment.

Now I'm not saying that such a (complicated) method should be used, but it could be used and would give some protection against inconsistent playing. I do know that similar methods are used in bridge. You know far better than me if inconsistent playing is a problem in CM tournaments (it's just my hunch that it is). As a downside the method does give less value to overwhelming victories, as Tadpub noted. But as noted, this can be adjusted.

Let the discussion continue.

[ 09-27-2001: Message edited by: Nabla ]

Treeburst155 · September 28, 2001

Wow!! This truly is developing into the "Nabla CM Scoring System". I don't fully understand why the math works, but I understand what you are trying to accomplish with it. The effects of inconsistent play should be neutralized if at all possible. I hadn't even considered this aspect.

If I understand correctly, this is a refinement of the Tabpub system which means the "tight grouping around the median" situation will result in no undue punishment of the low score. This is good.

Does the steepness of the reward curve affect the "tight grouping" situation? I'm just curious. As long as the system allows for negative points I would think the low score in a tight grouping around the median would not be punished. Correct?

What is the relationship between the steepness of the reward curve and the neutralizing of the effects of inconsistent play? I would assume the flatter the curve the more effectively we have dealt with inconsistent play, but I'm not sure. Is inconsistent play dealt with soley by using the median instead of the mean?

The only problem I have is that I would not be able to do the calculations at the end of the tourney. :eek: You would probably have to figure the final results of any tourney that used this system. :eek:

Your very clear presentation of what the new equation will accomplish is hard to argue with. It's just better than what we've been considering so far. All the issues we had before have been solved, along with the "inconsistent play" issue I hadn't even thought of. I'm all for using this equation.

Now, how would you work AAR's into this equation? I like to give points for AARs, but I don't want them to interfere with the raw CM scores used for the calculation since AAR points have nothing to do with CM skill. I'll just tack AAR points onto the final calculated tourney score for a side/scenario as some percentage of the high score for that scenario/side.

All that remains is deciding how flat/steep to make the reward curve. I would say take some middle ground between the Tabpub variation and the original Nabla system. Perhaps even a little flatter than the middle.

Good Work, Nabla!

Treeburst155 out.

[ 09-27-2001: Message edited by: Treeburst155 ]

Treeburst155 · September 28, 2001

I've had a chance to digest this a little further now. First of all,

what's the median of a set of numbers that contains an even amount of

numbers in the set? For example: {2,3,4,5,6,7,} Is 4 the median, or

would 5 be the median? I'm just curious as to how that is handled.

It appears from your explanation that there is a balancing act that must

take place between rewarding overwhelming victories and protecting against

inconsistent play. We can't do both. I would lean in favor of protecting

against inconsistent play, which I think could be quite prevalent in cases

where a player feels his performance has been poor enough that he can't

possibly win the tournament. An all-out effort to win would often be

replaced with an attitude tending toward experimentation and big gambles

just for the fun of it.

Also, if the reward curve is too steep a person could develop quite a lead

over the rest just by one overwhelming victory. This detracts from the fun

for the others IMO. A tight race is much more interesting. Having said

that I do think there should be at least a little reward for a complete

rout of the enemy. I vote for a curve that is 1/3 of the way up from flat

to the Tabpub variation. Does that make any sense?

Below are two situations, Allies on the left, that I would like to know the

scores for if you feel so inclined. Fool with the equation so that the curve

is on the flat side. I'd like to see the final score difference between

H and B in the first situation come out to 1.5 times the final score difference

between D and B. This gives a small but noticeable reward for the high score.

The second situation is just a tight grouping around the median.

I'd love to see what the scores there turn out to be once you've tweaked the

curve to achieve my desired results in the first situation. I would do all

this myself, but I don't understand the equation. I think part of the

problem is the way you have to write it on the forum. What's the asterisk?

A:40 B:60

C:50 D:50

E:60 F:40

G:5 H:95

And:

A:45 B:55

C:47 D:53

E:49 F:51

G:51 H:49

I:53 J:47

Thanks, Nabla!!

Treeburst155 out.

Nabla · September 28, 2001

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

If I understand correctly, this is a refinement of the Tabpub system which means the "tight grouping around the median" situation will result in no undue punishment of the low score. This is good.

The reward function has been designed so that it follows closely the neutral (Tadpub) function near zero. Therefore near the median the reward for each extra CM point (and the punishment for each lost point) is roughly one extra score point, and the reward (and punishment) decrease further away from the median. So the punishment is by no means undue, but still encourages "average" players to strive for those extra points.

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

Does the steepness of the reward curve affect the "tight grouping" situation? I'm just curious. As long as the system allows for negative points I would think the low score in a tight grouping around the median would not be punished. Correct?

Hmm... I think it does affect the situation. Let us talk about penalties for being below the median for a while. As was noted above, the penalty for each lost CM point is largest near the median. The steepness of the curve affects the relationship between this largest penalty and penalties further away from the median. Therefore, if the curve is very flat then the penalty for being 10 points below median may be almost as large as the penalty for being 50 points below. This is illustrated in the figure below which shows the negative side of two reward curves with different flatnesses.

The red, very flat curve has been computed for a maximum penalty or reward of 10. For a CM point score of 10 points below median the final score is -6.3, while if you are 50 points below median the final score is -9.9. So being sligtly below is punished quite hard. If you are twice 10 points below median in a tournament it is worse than if you are once 50 points below.

The blue line has been computed for a maximum penalty or reward of 50. For a CM point score of 10 points below median the final score is now -9.2, while if you are 50 points below median the final score is now -34.5. That's a big difference.

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

What is the relationship between the steepness of the reward curve and the neutralizing of the effects of inconsistent play? I would assume the flatter the curve the more effectively we have dealt with inconsistent play, but I'm not sure. Is inconsistent play dealt with soley by using the median instead of the mean?

Not entirely. Let us consider again the example with six players I used in my previous message (I'll leave F out of the discussion for now). By using just the median you reduce some effects of inconsistent playing: now A, B, C and D are in more equal positions. However, the victory of E is still overwhelming, and E will very likely win the tournament. But note that now E's victory is just as overwhelming for all A, B, C and D, whereas earlier A and C got the worst beating.

A moderately flat reward curve improves this situation by reducing the effect of E's overwhelming victory. However, it must be emphasized here again that it also reduces the effect of all overwhelming victories. In my opinion this is not a bad thing since it should keep the tournaments more interesting and should enforce uniformly strong gameplay among those who want to do well in the whole tournement. Would you agree?

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

The only problem I have is that I would not be able to do the calculations at the end of the tourney. :eek: You would probably have to figure the final results of any tourney that used this system. :eek:

Heck no. As soon as we can agree on the system I will write a computer program for you which does the whole thing.

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

Your very clear presentation of what the new equation will accomplish is hard to argue with. It's just better than what we've been considering so far. All the issues we had before have been solved, along with the "inconsistent play" issue I hadn't even thought of. I'm all for using this equation.

I'm very happy that people find this system a strong candidate despite of its complexity. I myself think that it has some very good properties.

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

Now, how would you work AAR's into this equation? I like to give points for AARs, but I don't want them to interfere with the raw CM scores used for the calculation since AAR points have nothing to do with CM skill. I'll just tack AAR points onto the final calculated tourney score for a side/scenario as some percentage of the high score for that scenario/side.

That sounds reasonable, but I'll have to think about it for a while to make sure.

[ 09-28-2001: Message edited by: Nabla ]

Nabla · September 28, 2001

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

I've had a chance to digest this a little further now. First of all, what's the median of a set of numbers that contains an even amount of numbers in the set? For example: {2,3,4,5,6,7,} Is 4 the median, or would 5 be the median? I'm just curious as to how that is handled.

The median is computed as follows. Order the numbers. If you have an odd amount of numbers take the middle one. If you have an even amount take the two numbers a and b in the middle and compute (a + b)/2. So in your example the median would be 4.5.

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

It appears from your explanation that there is a balancing act that must take place between rewarding overwhelming victories and protecting against inconsistent play. We can't do both. I would lean in favor of protecting against inconsistent play, which I think could be quite prevalent in cases where a player feels his performance has been poor enough that he can't possibly win the tournament. An all-out effort to win would often be replaced with an attitude tending toward experimentation and big gambles just for the fun of it.

Also, if the reward curve is too steep a person could develop quite a lead over the rest just by one overwhelming victory. This detracts from the fun for the others IMO. A tight race is much more interesting.

I agree with you completely. I think that it is not a question of balance between rewarding overwhelming victories and protecting against inconsistent play because I think that we should encourage uniformly strong gameplay. I think the balance is between uniform gameplay and strong gameplay - we still want to encourage people to play much better than others in one battle if they can do it. In setting the correct "flatness" we must also take care that we will not punishing players just below the mean too hard, as the example in my previous message demonstrates. This, however, is probably not a balancing act since punishing players too hard probably also implies too low rewards for large victories.

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

Having said that I do think there should be at least a little reward for a complete rout of the enemy. I vote for a curve that is 1/3 of the way up from flat

to the Tabpub variation. Does that make any sense?

I'm not sure what you mean by this. If you mean a curve in which the maximum reward is something like 100/3=33 it looks like this (score function blue line, neutral score function plotted in green for comparison).

But let us look at your example since it is a very concrete way of looking at what is going on here.

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

Below are two situations, Allies on the left, that I would like to know the scores for if you feel so inclined. Fool with the equation so that the curve is on the flat side. I'd like to see the final score difference between H and B in the first situation come out to 1.5 times the final score difference between D and B. This gives a small but noticeable reward for the high score.

A:40 B:60

C:50 D:50

E:60 F:40

G:5 H:95

Allied median 45, Axis 55. Differences dA -5, dB 5, dC 5, dD -5, dE 15, dF -15, dG -40, dH 40. So you want to have s(dH)-s(dB) = 1.5(s(dB)-s(dD)), or s(40)-s(5)=1.5(s(5)-s(-5)). Below is a curve in which the ratio is 1.4991 (only the positive side shown here again, score function blue line, neutral score function plotted in green for comparison)).

The maximum reward in this case for a 100 point difference from median is 20.6. This is a fairly flat reward curve. The general form of these curves is s(d)=sgn(d)*(1/a)*(1-exp(-a*|d|)). Here a is a parameter which controls the steepness. For the curve above a=0.048.

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

The second situation is just a tight grouping around the median. I'd love to see what the scores there turn out to be once you've tweaked the curve to achieve my desired results in the first situation. I would do all this myself, but I don't understand the equation. I think part of the problem is the way you have to write it on the forum. What's the asterisk?

A:45 B:55

C:47 D:53

E:49 F:51

G:51 H:49

I:53 J:47

Sorry, I didn't explain that. The asterisk is just used to denote ordinary multiplication in many of those mathematical programs I've used.

Anyway, onto your example. Allied median 49, axis 51. Differences dA=-4, dB=4, dC=-2, dD=2, dE=0, dF=0, dG=2, dH=-2, dI=4, dJ=-4. Scores s(dA)=-3.6, s(dB)=3.6, s(dC)=-1.9, s(dD)=1.9, s(dE)=0, s(dF)=0, s(dG)=1.9, dH=-1.9, dI=3.6, dJ=-3.6. So the scores are very close to CM scores since the score function operates nearly 1-1 near the median. However, if we add to this two more scores (two just to simplify things so that the median will not change)

K:70 L:30

M:20 N:80

The differences are dK=21, dL=-21, dM=-29, dN=29. The scores are s(dK)=13.2, s(dL)=-13.2, s(dM)=-15.6, s(dN)=15.6. Whether these are reasonable punishments / rewards when compared with the near-median scores of A to J is up to you to decide.

This is really interesting, and it's very nice to see that we are getting closer and closer to a good solution.

[ 09-28-2001: Message edited by: Nabla ]

Peter Svensson · September 28, 2001

I can't follow the details of Nabla's latest proposals, but what I like about it, compared to the "golf-style" system proposed by Treeburst earlier, is that it's continous - i.e. you increase your final score for even a small increase in the percentage score. Say I have 60 points in the next to last turn of the game. I have a chance to hunt down some hidden enemies, which could push the score up to 62 or so. If the final score is determined in bands, i.e. if I get the same score whether I have 60 or 62 points in the AAR, then I have no incentive to do my best.

Intuitively, it makes sense to use the median instead of the average to determine the "normal" score, since it reduces the effect of extreme results. If the Allied players in five games of the same scenario score 25, 30, 35, 50, and 85, the average is 49, while the median is 35, which better reflects the "normal" score.

The drawback of any system using exponents is, of course, transparency. It would best if everyone understood how their score is determined. Any chance of simplyfing the formula to avoid using e?

Nabla · September 28, 2001

<BLOCKQUOTE>quote:<HR>Originally posted by Peter Svensson:

The drawback of any system using exponents is, of course, transparency. It would best if everyone understood how their score is determined. Any chance of simplyfing the formula to avoid using e?<HR></BLOCKQUOTE>

Hello Peter!

I'm not sure what you mean by transparency. I could devise another function for the purpose (if not else then by approximating the one we've made), a polynomial or something else more familiar. However, I don't think that would make much of a difference in providing intuition about the function itself since it would probably not be any clearer. But is this what you were talking about?

I think the best way to provide an intuitive feeling about the score functions is to plot them as we've done above. Peter, what else would you like to know (in addition to the plots)? Another question is the fact that everyone should be able to check the scoring. For this we can make Excel formulas or something like that.

Please help me understand what you mean so we can try to improve.

[ 09-28-2001: Message edited by: Nabla ]

Treeburst155 · September 28, 2001

Peter,

The best way to make the formula understandable for all is to simply distribute the scoring program Nabla is going to write to all the players. This way the players can set up hypothetical situations as we have been doing above to gain an understanding of the relationship between raw CM scores and the computed tourney scores for a given side/scenario given different outcomes of the games. The formula could be understood by repeated trial of hypothetical situations.

Nabla,

I need to study your latest post for awhile longer. You are pushing me to the limits of my education and intellect. I like that! I agree the final curve should encourage consistent, strong play. I'm absorbing you latest now.

Treeburst155 out.

Treeburst155 · September 28, 2001

Consider the last graph which depicts a maximum score of 20.6 for being 100 pts. above the median. Only the left hand portion of the graph is really of concern because chances are the highest/lowest outliers will rarely be more than 50 points above the median. This is because it is impossible to score more than 100 CM points in a game. Only if a scenario is extremely out of balance and played very poorly by the person using the strong side would the resulting score be more than 50 off the median. Say the median for the strong side in an unbalanced scenario is 75. Now assume our poor player only manages 10 points from the strong side. This is only -65 from the median even though this is an extreme situation. So, it would seem to me the only relevant portion of the curve is about +/- 40 of the median, maybe even only +/- 30.

In that last graph (20.6 max tourney points) the curve works well out to +/- 40 of median.

The outlier victories of 80/20 and 70/30 that you compared to the tightly packed group around the median seems to score nicely compared to the median IMO. 15.6 points for those nice victories doesn't seem out of line to me. Perhaps we can flatten it just a bit more. What if max reward was dropped to 18 as opposed to the 20.6 depicted in the last graph?

My main point here is that the entire curve is not relevant, only +/- 30 or 40 of the median. Do you agree?

Treeburst155 out.

Treeburst155 · September 28, 2001

Using a relatively flat curve punishes the median only as the median relates to extreme outliers. I think we should concentrate on the portion of the curve from -40 to +40 of the median, ignoring the effects of outliers further from the median. IOW, outliers more than 40 from the median will be rare enough that we can disregard those tourney scores'relationship to scores packed around the median. A flat curve will appear to punish the median, but only when compared with scores achieved by the extreme outliers beyond +/- 40 of the median. We're in the ballpark with 20.6 as max reward IMO. Like I said above, 18 might be even better.

Treeburst155 out.

[ 09-28-2001: Message edited by: Treeburst155 ]

Nabla · September 28, 2001

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

Consider the last graph which depicts a maximum score of 20.6 for being 100 pts. above the median. Only the left hand portion of the graph is really of concern because chances are the highest/lowest outliers will rarely be more than 50 points above the median. This is because it is impossible to score more than 100 CM points in a game. Only if a scenario is extremely out of balance and played very poorly by the person using the strong side would the resulting score be more than 50 off the median. Say the median for the strong side in an unbalanced scenario is 75. Now assume our poor player only manages 10 points from the strong side. This is only -65 from the median even though this is an extreme situation. So, it would seem to me the only relevant portion of the curve is about +/- 40 of the median, maybe even only +/- 30.

In that last graph (20.6 max tourney points) the curve works well out to +/- 40 of median.

The outlier victories of 80/20 and 70/30 that you compared to the tightly packed group around the median seems to score nicely compared to the median IMO. 15.6 points for those nice victories doesn't seem out of line to me. Perhaps we can flatten it just a bit more. What if max reward was dropped to 18 as opposed to the 20.6 depicted in the last graph?

My main point here is that the entire curve is not relevant, only +/- 30 or 40 of the median. Do you agree? <HR></BLOCKQUOTE>

Yes, your argumentation about this is convincing. As you also noted the resulting scoring seems to achieve the robustness criterion well since it "flattens" the very improbable extreme results. Looking good!

Nabla · September 28, 2001

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

Using a relatively flat curve punishes the median only as the median relates to extreme outliers. I think we should concentrate on the portion of the curve from -40 to +40 of the median, ignoring the effects of outliers further from the median. IOW, outliers more than 40 from the median will be rare enough that we can disregard those tourney scores'relationship to scores packed around the median. A flat curve will appear to punish the median, but only when compared with scores achieved by the extreme outliers beyond +/- 40 of the median. We're in the ballpark with 20.6 as max reward IMO. Like I said above, 18 might be even better.

Treeburst155 out.

[ 09-28-2001: Message edited by: Treeburst155 ]<HR></BLOCKQUOTE>

Where did the suggestion about me writing the scoring program suddenly disappear. Am I dreaming or have you been editing.

Anyway, I thought I'd write the program ASAP so that you can start experimenting. Perhaps I'll start today (it's pretty late here in the polar bear land), but definitely I'll work on it tomorrow. I'll get back to you on this.

BTW, do you have a C/C++ compiler on your computer? Are you using a MAC or a PC?

Treeburst155 · September 28, 2001

Yes Nabla,

I've been editing feverishly for the last couple hours trying to make myself clear. As a matter of fact I do have a C++ compiler, Microsoft's Visual c++. I intended to teach myself C++ at one point but didn't get too far with it. I've probably forgotten how to use the compiler. I'm on a PC BTW. Why do you ask? What good does a compiler do if I can't write programs?

Treeburst155 out.

Nabla · September 28, 2001

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

As a matter of fact I do have a C++ compiler, Microsoft's Visual c++.<HR></BLOCKQUOTE>

That is excellent! You see, I can use tens of compilers on various Unix platforms but I have none in my Windows. Now I can write and test the program and then send you the C++ code which you can then compile to get the executable.

Unfortunately I have some "social responsibilities" now. Peace at home, you know. Until tomorrow!

Peter Svensson · September 29, 2001

<BLOCKQUOTE>quote:<HR>Originally posted by Nabla:

Hello Peter!

I'm not sure what you mean by transparency. I could devise another function for the purpose (if not else then by approximating the one we've made), a polynomial or something else more familiar. However, I don't think that would make much of a difference in providing intuition about the function itself since it would probably not be any clearer. But is this what you were talking about?

'Tis.

<BLOCKQUOTE>quote:<HR>Originally posted by Nabla:

I think the best way to provide an intuitive feeling about the score functions is to plot them as we've done above.

You're right - the graph is probably sufficient to make the formula graspable. Even if we can't use the formula program, we can approximate our scores using the graph, and that should be enough to make people comfortable with the system.

[ 09-28-2001: Message edited by: Peter Svensson ]

Treeburst155 · September 29, 2001

Peter,

I'm working with the formula right now with a scientific calculator I dug out of the closet. Unfortunately, my curve does not look like Nabla's. I'll get it worked out however as soon as Nabla gets online. When I do I will be able to make available lots of charts showing what is happening with the scoring. It will be clear.

Nabla,

Using a value of .054 for "a" in the formula I calculate a high reward of 18.15 points. Trying various values for (d), the CM score minus the median, my curve looks quite different than yours. As I approach the median it flattens. +/-10 of the median only gives 1.34 points. This means there are only 2.68 points separating -10 from +10.

I'm not getting the one for one correspondence you were gettin very near the median. Here's the formula I'm using:

sine(d)*(1/a)*(1-"e" to the power of (-a*|d|))

What's wrong with my formula? :confused:

Treeburst155 out.

[ 09-29-2001: Message edited by: Treeburst155 ]

Treeburst155 · September 29, 2001

EDIT: I just carried my formula out to 100. It appears to be almost linear between 15 and 70, then flattening to 90 and actually begins a slight turn down from 90 to 100. I'm obviously not doing something right. Ah well, it's been fun.

Treeburst155 out.

[ 09-29-2001: Message edited by: Treeburst155 ]

Nabla · September 29, 2001

<BLOCKQUOTE>quote:<HR>Originally posted by Treeburst155:

Here's the formula I'm using:

sine(d)*(1/a)*(1-"e" to the power of (-a*|d|))

What's wrong with my formula? :confused:

You should replace the sine(d) with sgn(d), that is, just the sign of d. This is -1 for d < 0 and +1 for d > 0. The sgn(d) is used in the graph just to give the negative scores the correct sign. If you are studying positive values of d you can forget it.

I'm starting to write the computer program now.

Nabla · September 29, 2001

Hello again!

The first program is ready , but don't get too excited since it doesn't do much. :rolleyes:

I decided to write a few small programs instead of a large one. This first one computes the curve parameter for a given maximum reward value. Running (at DOS prompt)

nabla-curve-parameter -d 50

gives you the curve parameter which corresponds to a maximum reward of 50, and prints some debugging information (as a result of the optional -d debug parameter).

But the best news is that I now have a working compiler for the Windows environment (thanks to all GNU people once again) in addition to my Linux environment. Now I will start to write the actual scoring program.

For some reason the parameter program is fairly large (477k). I will try to strip it as soon as I can find a suitable Win program that does it.

The parameter program and its source code can be found at

http://www.cis.hut.fi/jarmo/nabla-system/

The (Possibly) Unbalanced Tournament Scoring System

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Announcements