The Cheating Commish

March 21, 2013

The time for NCAA bracket submission is upon us. As promised last year, we one-upped ourselves and present to you another Datascope creation: a data-driven app that chooses the best possible bracket for your pool, which we call:

The Cheating Commish does exactly what it advertises: if you are the commissioner of your NCAA tournament pool, and you have few to no scruples, this tool allows you to enter your opponents’ brackets and scoring rules of your pool and will calculate the bracket that will maximize your likelihood of taking away your friends’ hard-earned money.

If you’re not the commissioner of your league, or if you’re just not really in a kill-or-be-killed, Rodney Ruxin kind of mood, we have a gentler option, too, which calculates the bracket with the greatest payoff likelihood for your pool.  We model the competition based on the “market” of actual bracket picks from espn.com.

The whole process takes a little bit of time (up to ten minutes if you have a lot of brackets to enter), so if you are in a hurry, click now and come back to read about the method behind the madness.


How to use The Cheating Commish

Add a new bracket by clicking on the “+” at the top. Enter your opponents’ picks by clicking through the team names. Modify the points system and payout structure of your pool in the settings. If you don’t know anyone else’s brackets, you can still enter details about your pool (number of opponents; winnings for top finisher, top three, etc), and we’ll generate optimized picks for your pool structure based on mass-market picks.

When you’re done, just click “Submit all brackets.” We’ll tweet or email you when your best bracket is ready to view.


Work smart, not hard

Last year, we tried to create the ultimate predictive algorithm to foretell the winner of the NCAA tournament. Offensive ranking, defensive ranking, strength of schedule, home court advantage, boosts for win streaks -- like alchemists slaving away, ever tinkering, we produced a bracket-picker which was a burnished work of art... and which came up with pretty much the same picks as everyone else out there.

This year, in our more seasoned maturity, we realized that there is greater power not in joining the pack, but in observing it. Hundreds of dataheads just like us were using the same available statistics to make the same bracket picks, arguing over the eleventh decimal place like hyenas gnawing the last bits of meat from a carcass. If you want the sleekest, most sophisticated bracket based on statistical analysis of performance, we are sure you will be able to find it. The problem is, everyone else can find it, too, and eventually entire pools will be winning and losing the same points on the same games because their brackets were all based off of essentially the same model, and everything will end in a boring, 12-way tie. You’ll all go home with the same $5 you put in the pot, and worst of all, no one will get to taunt anyone else.

The smarter approach comes from looking at the actual people that you will be competing against and making your picks based on what they have picked. Maybe your pool includes three or four stats junkies, some superfans who can’t bring themselves to pick against their alma mater, and a handful of fairweather followers who make their picks based on the cutest mascots and jersey colors. For any given pool, there exists some bracket which, in light of the other brackets in the pool and the underlying performance-based probabilities of tournament outcome, will maximize the chance of walking away with a piece of the money. The Cheating Commish is the app that finds that “golden” bracket for you.


How it works

The Cheating Commish works by exploiting the differences between picks that are the most popular and picks that are most likely to advance in the tournament. These gaps are defined within the context of a “market” -- your competition. Since winning your bracket pool depends on your performance vs. the peers in your market, victory depends not only on choosing the winners, but on picking right when your competition picks wrong.

So what are people picking? We can get an idea of what most brackets look like by using espn.com’s Who Picked Whom. And to determine which picks are undervalued, we need a trusted source, which we cull from the demonstrated statistical prowess of Nate Silver. To illustrate how the Cheating Commish exploits differences between popular opinion and statistical likelihood of winning, take a look at Louisville and Florida. Louisville is a properly-valued pick. Nate Silver has them at a 23.8% chance of taking the tournament, and sure enough, 21.7% of espn.com brackets have Louisville in the top position. Now look at Florida. As of writing, only 2.9% of users have chosen Florida to win the tournament, but their odds of winning are much higher than that -- 13.8%. This discrepancy means that making a bet on Florida could help you to pull away from the pack.

This, of course, is only valid when “the pack” is everyone making picks on espn.com. But we can do a better job by tailoring our picks to a more specific “market” -- to the brackets in your particular pool, the ones that your bracket will actually be competing against. Maybe your bracket pool isn’t undervaluing Florida, because they read Nate Silver just like we do.  But due to certain sentimentalities (have a bunch of Michigan alums in your pool?), shared readership of the same sports blogs, or simple fluctuations of chaos, there are other gaps that can be exploited.

This is when the Cheating Commish really shines: anyone with access to the brackets of everyone else in his or her pool can enter those brackets, and a genetic algorithm will find the bracket that is the most likely to result in cash winnings for that specific pool. In addition, we take into consideration other factors like the total number of people in the pool and the specifics of who walks away with money. The optimal bracket is different if you need to take bigger risks and shoot for first place or nothing, or if you can be a little more conservative because the top three places all take money. (Side note: if you are in a pool where multiple finishers win money, but you would rather follow a first-place-or-nothing strategy, feel free to lie about the pool structure when you enter your brackets. The Cheating Commish will not know the difference, and it certainly has no room to judge you on moral grounds.)


The Genetic Algorithm

Since there are 63 games played in the NCAA tournament finals, there are a total of 263 different possible brackets. If simulating one bracket takes a tenth of a second, it would take 2,500,000,000,000,000 (2.5 quintillion) years to run all possible brackets and find the one overall winner. In order to get a bracket selected by Thursday at noon, we have to take a more sophisticated approach.

We simulate the tournament 100 times, where each team’s chances of winning are based on Nate Silver’s predictions.  For each set of tournament simulations, we calculate how each of your opponents fared based on their bracket picks, the points system, and payout structure of your pool. We also randomly generate 50 possible brackets for you, and calculate how they would fare.

This is where genetics come in. If we view each of these 50 possible brackets as a sequence of 63 games where each game has “Team A” winning or “Team B” winning, this is like viewing each bracket as a string of A’s and B’s 63 letters long, sort of like a DNA string!

We then calculate the “fitness” of your 50 brackets against your opponents for 100 tournament simulations, and the top 10% (i.e. the 5 brackets that made the most money) “survive” to the next round of the algorithm; the bottom 45 brackets become dinosaur food.

Next, we “mutate” most of the remaining brackets by randomly changing the outcome of a few games, and make some of them “have children.” Having children means first undergoing genetic recombination: taking two different brackets, switching their strings at a random point, and outputting a new string that has parts from each of its “parents.” We also introduce new brackets, again by randomly generating strings of length 63, as “immigration” to this population of brackets.

The outcome of all this genetic stuff is that we have 50 brackets to try in the next set of tournament simulations: some of these brackets are the top performers from last time, some are “related” to those top performers, and some are totally new.

After 200 generations, the bracket you take to your pool has already emerged a victor several thousand times over. We can’t guarantee that the result will be the same in your pool, but you can submit your picks with confidence that they have been polished by the finest algorithms science has to offer.

As always, we welcome your feedback, and we’re always looking to improve. Please tell us what you think!

contributors to this post

headshot of Bo Peng
Bo
headshot of Laurie Skelly
Laurie
headshot of Aaron Wolf
Aaron
headshot of Mike Stringer
Mike
headshot of Dean Malmgren
Dean
blog comments powered by Disqus