Jump to content

The (Possibly) Unbalanced Tournament Scoring System


Nabla

Recommended Posts

I also received an email from Fionn Kelly which I think we should all think about. Here's the email.

---------------------------------------------------------------------

Date: Fri, 28 Sep 2001 13:41:31 +0100

From: Fionn Kelly <fionnkelly@ireland.com>

To: jarmo.hurri@hut.fi

Subject: re: NABLA scoring system

I like your idea with the scoring system.

My only caveat would be that I think the scoring system should give players "excessive" rewards for crushing victories in excess of what the points would give.

e.g. It is FAR more difficult to get an 80 to 20 win than a 70 to 30 win. The difference between points awarded should reflect that.

E.g. if the average win score was 65 then someone who got 70 might get a 10% bonus while those who got an 80 score might get a 50% bonus.

This could be easily tied into Standard Deviations and the bonus points determined according to the number of standard deviations away from the mean score. Something like that is quite statistically

defensible.

---------------------------------------------------------------------

I'm not going to comment on this right now since this requires some thinking (but I will return to this later on). Right now I'm focusing on writing the program(s) so that we can all test different scenarios. That should provide a better basis for discussions. But while I'm coding you can think about what Fionn said. :D Especially this should concern Treeburst155 whos ultimately responsible for deciding the curve.

(I'm naturally making the system flexible so that you can easily implement scoring rules with different flatnesses. However, the std thing mentioned by Fionn is something I'll have to consider in detail.)

Link to comment
Share on other sites

  • Replies 91
  • Created
  • Last Reply

Top Posters In This Topic

OK, I'm awake here in Zulu -6. I'll download the program now. BTW, I was wondering why you used "sgn" instead of "sine". I'm sure it will work for me now.

I agree with Fionn that it's much more difficult to score 80 points that it is to score 65 points in general, but we're counting on consistent play for that to be true. If player A goes into "experimentation and fun" mode due to perceived poor performance in previous games these overwhelming victories would not be as difficult to achieve against him. We're back to the uniform and consistent play concept.

Fionn's proposed use of some sort of calculation involving standard deviation may be the answer, but I certainly wouldn't know considering my high school math skills. I'll follow where Nabla leads as far as the scoring formula is concerned. It would seem to me that consistent excellent scores would result in a player pulling well ahead of the pack using the current Nabla system. It's all a matter of adjusting that value for "a" in the Nabla formula.

Treeburst155 out.

Link to comment
Share on other sites

The little program works great! In a matter of a few minutes I've calculated the values for "a" yielding max rewards from 10-30 by twos. Thanks, Nabla!

EDIT: Heres some thoughts on the curve we get with a=.055. Maximum reward for this curve is 18.11. The curve provides a nice reward for scoring 15-25 over the median when compared with scores within 10 of the median. I think it will be somewhat uncommon for scores to go over 25 of the median. Supposed the median for a scenario is 60. If a player scores 85 he will get 13.58 points while the player on the median will get zero. This is ample reward for outstanding play IMO. Even if a player only manages 15 above the median he gains 10.21 points on anyone at the median. +40 of the median yields 16.17 points. Considering that max (+100 of median) is 18.11 points the curve is nearly flat by +40. This is good since scoring that high would be very rare IMO, and would likely come about by an opponent slipping into "fun and experimentation" mode. If anything, I'm leaning toward an even flatter curve. Here's the numbers for max reward=18.11 achieved with "a"=.055.

(d)=40, score=16.17

(d)=35, score=15.53

(d)=30, score=14.69

(d)=25, score=13.58

(d)=20, score=12.13

(d)=15, score=10.21

(d)=10, score=07.69

(d)=05, score=04.37

Another EDIT: Punishment for dropping below the median will be felt but staying just a few points above will be rewarded. Players will want to strive to beat the median rather than take risks to gain huge victories. Consistent strong play is rewarded. I will be studying the effects on the near median scores of different curves now.

Treeburst155 out.

[ 09-29-2001: Message edited by: Treeburst155 ]

Link to comment
Share on other sites

Get some sleep, Nabla. You have two hours. :D

Let's test out the curve above. Consider a scenario balanced in favor of the Germans such that the Allied median CM score is 30.

Below is a typical set of Allied player results with one extreme outlier who we will call Fionn. Fionn represents the guy who demonstrates consistent excellent play. The CM score is on the left and the converted tourney score is on the right.

Fionn.....70...16.17

Player B..50...12.13

Player C..33...02.77

Player D..30...00.00

Player E..27...-2.77

Player F..24...-5.11

Player G..11...-11.79

Keep in mind these are all Allied scores achieved on the same scenario that is out of balance in favor of the Germans.

Player B did well having managed a dead even draw inspite of the German advantage. For this he scores almost 10 full points more than Player C who just beat the median. Fionn actually managed a high tactical victory from the disadvantaged side. This is quite an accomplishment. His reward is 4 more points than player B even though player B himself did substantially better than the median.

Hardest hit by this scenario is player F who only fell 6 points short of the median but lost 5.11 points. Since he is in the steep part of the curve every point from the median costs him close to one tourney point. Of course, the same is true on the plus side of the median, which makes up for this. Player G took quite a beating compared to the others who played his side and suffers accordingly. Interesting to note is even if player G had scored 0 points he would only lose approximately 14.25 points. This is because he can do no worse than -30 of the median with this side of the scenario. Regardless of the median score players should keep in mind that the most points you can lose or gain using this curve is 18.11. A bad game can only do so much damage to your situation. A couple moderately good games will make up for it. By the same token, a lucky overwhelming victory because your opponent had other things on his mind will not allow you to run away with the tournament title. Consistent play at 10-15 above the median is what players need. Major victories will add noticeable icing to the cake but you can't rest on your laurels. I like it, but am still looking at other curves.

I have now mastered the formula on my fancy calculator so I can explore the curves. Nabla is working on a program for everyone that will allow them to explore the scoring system themselves without the need for a $100 calculator. Unless Nabla comes up with something he thinks is better, I think we've perfected "The Nabla CM Scoring System".

Treeburst155 out.

Link to comment
Share on other sites

Personally, I hadn't reckoned on this degree of responsivness, but it is appreciated. Originally, I was just concerned about the "linear" nature of the original proposal. This has been addressed quite well in my opinion. Thanks to Nabla and Treeburst for all the time and effort that they have put into this system. I think that we will all be happy in the end. God knows that I may need this in one scenario, cuz I may be a "bit" under the median in that one it appears. :mad:

Link to comment
Share on other sites

An interesting study Treeburst and Nabla!

Rewarding consistent good play (even if scoring is only a few points above the median compared to the scores from combatants on the same scenario) is the essence here and is well rewarded by the "Nabla system."

Outlier results as postulated by Fionn, especially when there is a strong imbalance in the scenario and a player has succeeded against the odds of scenario imbalance should also not be completely negated IMHO.

Program/test away!

Regards,

Charl Theron

header_Winelands02.gif

-----------------

"Sparkling Muscatel. One of the finest wines of Idaho!"

-- waiter in The Muppet Movie (1979)

Link to comment
Share on other sites

Outlier results such as Fionn's in the above example are really not adequately compensated using this curve if one assumes the victory was achieved against an opponent who was trying his best to score well, as opposed to playing for "fun" because he can no longer win the tournament. However, if a player shows consistent excellent play his large victories are rewarded amply enough that he will still pull away from the pack and win the tournament. The difference between Fionn's final tourney total and the next in line may not be proportional to the actual CM point difference between the two, but first place is still first place.

Treeburst155 out.

[ 09-30-2001: Message edited by: Treeburst155 ]

Link to comment
Share on other sites

Ok, the first version of the program is ready. NOTE: if there is

more than one scenario in a tournament (as there obviously usually

should be), the final score of a player is computed as an average over

all Nabla system scores from single scenarios. Is this ok? (Note

that when ordering players the result would be the same if we used the

sum of all single scenario Nabla system scores since the average is

just the sum divided by the number of scenarios, which is of course

constant for all players).

Here's how the program works. The input to the program is a file with

the following format (this is file battle-results.txt, the lines -----

are not part of the file).

-------------------------------------------------------------------------------

# The_Aftermath

A 100 B 0

C 100 D 0

E 0 F 100

# Humiliator

C 100 D 0

A 100 B 0

E 0 F 100

-------------------------------------------------------------------------------

Here we have two scenarios, The_Aftermath and Humiliator. New scenario

results are started with #. Scenario name must follow the #

character. Scenario names must consist of a single word, and they must

be unique.

The lines following new scenario start contain battle results in the form of player1 p1points player2 p2points. The order of the players is significant since the median is of course computed separately for first and second players, so always put for example the points of the Allied player first. Player names must be single words and they must

be unique. The battles can be in any order.

You can add whitespace (spaces, tabs, newlines) anywhere you want in

the input file. Actually, input does not have to be divided into lines

as long as input words are separated by whitespace. For example the

following file is legitimate input.

-------------------------------------------------------------------------------

# The_Aftermath A 100 B 0 C 100 D 0 E 0 F 100 # Humiliator A 100 B 0 C

100 D 0 E 0 F 100

-------------------------------------------------------------------------------

However, lines obviously help human interpretation.

The program does all kinds of sanity checks to insure that the input

is legitimate. I challenge you to try to get it to crash or produce

"insane" results. One thing the program does not check is a

sensible ordering of opponents in the games. I'll return to this issue

below.

So let's test the program. First compute the flatness parameter for

maximum reward 50 (these are from my Linux terminal but the programs

which I've distributed work in DOS)

-------------------------------------------------------------------------------

[jarmo@itl-pc59 nabla-system]$ ./nabla-curve-parameter 50

0.0159362

-------------------------------------------------------------------------------

Next we run the scoring program. The -d option gives (currently faily

little) diagnostic information about what the program is doing.

-------------------------------------------------------------------------------

[jarmo@itl-pc59 nabla-system]$ ./nabla-score-tournament -d 0.0159362 battle-results.txt final-scores.txt

Read scenario name [ The_Aftermath ].

Read battle result [ A 100 B 0 ].

Read battle result [ C 100 D 0 ].

Read battle result [ E 0 F 100 ].

Read scenario name [ Humiliator ].

Read battle result [ C 100 D 0 ].

Read battle result [ A 100 B 0 ].

Read battle result [ E 0 F 100 ].

[ The_Aftermath ] medians [ 100 0 ].

[ Humiliator ] medians [ 100 0 ].

-------------------------------------------------------------------------------

The results can be found in file final-scores.txt

-------------------------------------------------------------------------------

A 0

B 0

C 0

D 0

E -50

F 50

-------------------------------------------------------------------------------

So E and F got minimum and maximum possible results, which is what

they should have received. (The results are currently printed out in

alphabetical order. I'll order them by the final score in the next

version.)

Let's look at another example so that the results agree with

previously computed ones - look at Treeburst155's post above (this is

file battle-results-2.txt)

-------------------------------------------------------------------------------

# The_Nice_Scenario

Fionn 70 X1 30

B 50 X2 50

C 33 X3 67

D 30 X4 70

E 27 X5 73

F 24 X6 76

G 11 X7 89

-------------------------------------------------------------------------------

Use Treeburst155's value a=0.055

-------------------------------------------------------------------------------

[jarmo@itl-pc59 nabla-system]$ ./nabla-score-tournament -d 0.055 battle-results-2.txt final-scores-2.txt

Read scenario name [ The_Nice_Scenario ].

Read battle result [ Fionn 70 X1 30 ].

Read battle result [ B 50 X2 50 ].

Read battle result [ C 33 X3 67 ].

Read battle result [ D 30 X4 70 ].

Read battle result [ E 27 X5 73 ].

Read battle result [ F 24 X6 76 ].

Read battle result [ G 11 X7 89 ].

[ The_Nice_Scenario ] medians [ 30 70 ].

-------------------------------------------------------------------------------

Scores from final-scores-2.txt (the program rounds the results to

three significant numbers).

-------------------------------------------------------------------------------

B 12.1

C 2.77

D 0

E -2.77

F -5.11

Fionn 16.2

G -11.8

X1 -16.2

X2 -12.1

X3 -2.77

X4 0

X5 2.77

X6 5.11

X7 11.8

-------------------------------------------------------------------------------

Treeburst155's values were

<BLOCKQUOTE>quote:</font><HR>Originally posted by Treeburst155:

Fionn.....70...16.17

Player B..50...12.13

Player C..33...02.77

Player D..30...00.00

Player E..27...-2.77

Player F..24...-5.11

Player G..11...-11.79

<HR></BLOCKQUOTE>

So it seems ok.

But I urge you to test and retest the program. I will also do

it once I have the time but during the weeks I don't have that much

free time. It would be a small miracle if the program had

absolutely no bugs. Also, tell me if you'd like to have some

other features in the program.

I will write two more programs once I have the time.

1) A program that will print out a "sensible opponent plan" for a

given set of players and scenarios. By a "sensible opponent plan" I

mean a pair of opponents in each scenario so that when computing

the final scores everyone gets compared against everyone (see posts

above).

2) A very small program which prints out different 'difference from

median' & 'final score' values and an Excel macro which can plot

these prints. Using the program and the macro tournament players

can plot scoring curves and see for themselves how they will be scored.

Now go play with this thing. :D

http://www.cis.hut.fi/jarmo/nabla-system/

(The source code is not yet there, but it will appear into the

directory once I've rearranged it up a little bit.)

[ 10-01-2001: Message edited by: Nabla ]

Link to comment
Share on other sites

I see no problem with taking the average of the final scenario scores rather than just summing them up. Like you say, the final ordering of players is the same. I will be playing with the program over the next few days.

Proposed matchup selection program: This would be a good thing to insure players aren't repeatedly compared with the same players in their scenarios. I'm going to look at my Wild Bill schedule which was set up to split attack/defend duties without regard to who gets compared to whom just out of curiousity. I suspect this may only be a significant issue with smaller tournaments.

Proposed program #2: This is the one most useful to the players I think. If they can get the tourney score for different distances from the median they will gain a good understanding of the curve. However, once the curve has been decided on there would just be one chart/graph necessary. I assume this program would allow players to change the curve and look at results. This would be nice. There could be tourneys with steep curves and shallow curves based on players' wishes. With this program they would be able to determine the curve they like, just as I have done with my calculator.

I'm downloading your latest now. Thanks, Nabla!!

Treeburst155 out.

Link to comment
Share on other sites

It appears the program does not like CM scores that do not add up to 100, a common outcome in CM. Once that rather large bug is fixed this will be a great tool! Especially for tourney managers.

Let me attempt to clarify for interested players how to use this program.

1) create a .txt file called "battle-results" using the format Nabla described. Keep all Allied scores on the same side. Place this file into the folder that contains the program.

2)from a DOS prompt type "nabla-score-tournament .055 battle-results.txt final-scores.txt" without the quotation marks. Be sure you are in the folder first.

The program will generate a file called "final scores.txt" in the same folder as the program, which you open to see the results. You can score a whole tournament all at once, not just one scenario at a time.

Treeburst155 out.

Link to comment
Share on other sites

<BLOCKQUOTE>quote:</font><HR>Originally posted by Treeburst155:

It appears the program does not like CM scores that do not add up to 100, a common outcome in CM.<HR></BLOCKQUOTE>

Oops, that was implemented intentionally. I will fix it tomorrow.

Link to comment
Share on other sites

<BLOCKQUOTE>quote:</font><HR>Originally posted by Treeburst155:

It appears the program does not like CM scores that do not add up to 100, a common outcome in CM. Once that rather large bug is fixed this will be a great tool!<HR></BLOCKQUOTE>

I decided to fix it immediately. Now the sum of the CM points can be anything up to 100 (but not over). Correct?

So the program should be a bit "greater" now. smile.gif

Link to comment
Share on other sites

  • 2 weeks later...

Ok, I made the third needed program. This creates tournament schedules so that players are compared against other players as evenly as possible (see above). The program take two input files. Here is the contents of file scenarios.txt (the === is not part of the file)

==========================================

The_Aftermath

Humiliator

Bumblebee

Rabbit

GoGo

==========================================

and here is the contents of players.txt

==========================================

A

B

C

D

E

F

==========================================

now run the program so that results are saved into file schedule.txt (this is run from Unix prompt but the distributed program works under DOS)

==========================================

./nabla-tournament-schedule scenarios.txt players.txt schedule.txt

==========================================

Now the schedule has been saved in schedule.txt (this can be filled and given directly as input to the final scoring program).

==========================================

# The_Aftermath

A B

C F

D E

# Humiliator

C A

B D

F E

# Bumblebee

A D

E C

F B

# Rabbit

E A

D F

C B

# GoGo

A F

B E

C D

==========================================

As you can see, A is compared twice against all players B-F. This is the optimal solution. Also, A plays both sides as evenly as possible.

The program can be downloaded here.

[ 10-12-2001: Message edited by: Nabla ]

Link to comment
Share on other sites

Your mongo-EXE file size is probably related to using a Windows capable compiler to build your application.

Adding in a command line interface does not exclude all the "accessory" code included by default.

Just last week, I wrote a 93k command-line application in Visual C++, using just stdio and math.

You know, a GUI might be nice to have.

If you'll ship the code to HerrOberst@cox.rr.com, I can take a look at making a Win-Friendly version.

But, off camping this weekend, so no rush.

[edited cause I wasn't an English major...]

[ 10-12-2001: Message edited by: Herr Oberst ]

Link to comment
Share on other sites

Very nice, Nabla!

I especially like how the scores can just be put into the schedule.txt file and then run through the scoring program. Very convenient.

I just realized that using your programs means I really don't even need a spreadsheet anymore. All I have to do is update the schedule.txt file with game results, run the file through the scoring program at the end and add whatever AAR points the player earned. No more spreadsheets!! Thanks!

Treeburst155 out.

Link to comment
Share on other sites

<BLOCKQUOTE>quote:</font><HR>Originally posted by Herr Oberst:

Your mongo-EXE file size is probably related to using a Windows capable compiler to build your application.

Adding in a command line interface does not exclude all the "accessory" code included by default.

<HR></BLOCKQUOTE>

I'm using the GNU C++ compiler so I don't think that's the case. I think the reason is the fact that I'm using many classes from the standard template library. But I did strip the binaries so that now they are a bit less than half the size they used to be. You can find the new binaries here.

<BLOCKQUOTE>quote:</font><HR>Originally posted by Herr Oberst:

You know, a GUI might be nice to have.

If you'll ship the code to HerrOberst@cox.rr.com, I can take a look at making a Win-Friendly version.

<HR></BLOCKQUOTE>

I agree with that idea about a GUI version, but having the source code in several places gives me the creeps. So the first solution I would suggest is that you write a graphical wraparound program for the DOS-binary. Shouldn't be too hard. How about it?

[ 10-13-2001: Message edited by: Nabla ]

Link to comment
Share on other sites

Hello everyone!

I've made the changes requested by Treeburst155 to the scoring program nabla-score-tournament.

1. The program now prints differences from medians and nabla scores for individual scenarios to standard error if the -d debug option has been given.

2. The final scores are sorted and printed with two decimals.

The new version can be downloaded from the usual place.

Link to comment
Share on other sites

Thanks, Nabla! What's "standard error"?

EDIT:

OK, I get it now. The only problem I have is when I use "-d" with more than 22 players for a scenario (24 is common)the results scroll by and I can't read the top ones. I've tried inserting "/p" and "/w" in various places in the command line to no avail. Is there any way I can stop the scrolling until I'm ready to move down the list?

Treeburst155 out.

[ 10-20-2001: Message edited by: Treeburst155 ]

Link to comment
Share on other sites

It seems that there is no way to redirect standard error under DOS. I changed the program so that debug information is printed to standard output instead. So now you can use either

command-string |more

or

command-string > debug-file.txt

In the first case the output is displayed in a pager screen by screen, in the second case output can be found in file debug-file.txt after running the program.

Get the new version here.

Link to comment
Share on other sites

  • 3 weeks later...

Hello everyone!

People in the Nordic CM tournament noticed that the scheduling program does not produce completely balanced schedules. In particular, it was noted that for six players the schedule that is created by the program makes one of the players play the allied side four times. Also the comparisons to other players are not equal. Here's an example.

==========================================

# Scenario_1

PlayerA PlayerB

PlayerC PlayerF

PlayerD PlayerE

# Scenario_2

PlayerC PlayerA

PlayerB PlayerD

PlayerF PlayerE

# Scenario_3

PlayerA PlayerD

PlayerE PlayerC

PlayerF PlayerB

# Scenario_4

PlayerE PlayerA

PlayerD PlayerF

PlayerC PlayerB

# Scenario_5

PlayerA PlayerF

PlayerB PlayerE

PlayerC PlayerD

==========================================

Note that C plays as allied four times out of five. Also note that C is compared against E and F only once. So there is some unevenness in the results. The question that naturally comes up is whether the result is optimal in the sense we would like it to be.

First of all, let us define what we mean by optimal. We would like the schedule to fulfill the following criteria exactly.

#1. every player plays each scenario exactly once

#2. every player plays against every other player exactly once

In addition, we would like the schedule to minimize the following criteria (definition first, translation afterwards smile.gif ). The criteria are optimized in this lexicographic order (a schedule which has a smaller #3 value is better even if its #4 value is larger).

#3. The absolute difference between maximum and minimum number of comparisons between different player pairs. The objective of this criterion is to have the players compared evenly against other players. (Using just tthe minimum number of comparisons might do here as well, but this is better.)

#4. The absolute difference between maximum and minimum number of games played on the allied side for different players. The objective of this criterion is to have the players play both sides evenly.

The example above fulfills #1 and #2. For #3 it scores a 2 (C is compared against B three times and against E just once), and for #4 the score is 2 (C plays allied four times, while D plays allied twice).

I got the algorithm for the original program from a mathematician who's also a bridge player. The algorithm is used in bridge, but he did not have a formal proof of its optimality w.r.t. #3 and #4. In fact, after thinking about the problem for a while we suspected that the algorithm is in fact not optimal. (In particular, we suspected that the algorithm is further from optimal if you have a lot of players.)

So I went on to write a program which tries to perform better. The improved portion of the program starts from the schedule created by the original program, and then performs a brute force optimization over all possible side changes to see how well it can do. For six players the result can be suboptimal, but for four players the result is guaranteed to be optimal. This is because for four players the pairings for each scenario are unique. If A plays B, then C has to play D, and you can examine all possible combinations just by changing sides. For six players the situation is more complicated. If A plays B, you still have to select whom C plays.

So let's start with the four player example for which the solution is optimal for sure. For four players the original program gives the following results.

==========================================

# Scenario_1

PlayerA PlayerB

PlayerC PlayerD

# Scenario_2

PlayerC PlayerA

PlayerB PlayerD

# Scenario_3

PlayerA PlayerD

PlayerB PlayerC

==========================================

The score for this schedule is #3=0, #4=2. The optimal solution is the same. Note that this means that all the criteria can not be achieved to the fullest when the number of players is four. One player - in this case D - plays all the time for the same side.

Now let's move on to the case when we have six players. The original solution had #3=2, #4=2. In this case the (possibly sub)optimal solution obtained by all possible side changes is again the same.

So for now, at least when the number of players is smaller than 8 we'll stick with the original results. As noted above the case of six players has not been proven in a bullet proof way. I'll try to work on it, perhaps together with tss who works at the same university, but in a lab which has all the tools - and education smile.gif - to play around with logical and combinatorial problems.

I'll post the new program (which has the optimality switch and computes #3 and #4 for the schedules) once I compile it for DOS (probably tomorrow). I'll post a message here when you can download the program.

BTW, the running time of the brute force optimal version increases exponentially with the number of players. I'm running it now for eight persons and it takes at least hours (the case of six persons is computed in ten seconds). So it is unfeasible to run it for a large number of players.

[ 11-06-2001: Message edited by: Nabla ]</p>

Link to comment
Share on other sites

I'm amazed at the effort you are putting into this! Thank you, and thank your mathematician friend too. I will go buy a 2.0 Ghz CPU right now so I can use the program. LOL!!

Seriously, all tournaments I run from now on will use 6 players per section. Considering the speed at which players complete their games this is the optimal size IMO. I will also run a minimum of 4 sections to minimize the issue of multiple comparisons between the same players in a section.

This is a minor issue with 4 sections of 6 IMO.

Thanks for your hardwork!

Treeburst155 out

Link to comment
Share on other sites

We have proved on the blackboard that the six player schedule we have is indeed optimal. So the search for the better schedule is over (assuming that we made no mistakes smile.gif ).

I'm also examining other possibilities for the scoring function. The flatness of the current function with large values when R is around 20 is somewhat disturbing. We have a good candidate. I'll get back to you on this once I have some pictures. ("You" probably being Treeeburst155 since we seem to be the only ones reading this now. :D )

Link to comment
Share on other sites


×
×
  • Create New...