And it's that time of year again...

March 15, 2012

Welcome back to March Madness, a time filled with trash talking emails, unfathomable upsets, and way, way, way, way too much Jay Bilas.

Just like everyone else in this country, I’ve been picking the Final Four since puberty. I’ve tried and tested many different methods for picking the best bracket possible over the past 20+ years --- gut picks, vegas odds, analyst predictions, etc --- but despite my best efforts, I’ve only picked all final four teams once. I was 14. I won’t say I got lucky but I’ll explain my in-depth analysis:

  1. Oklahoma State: Simple, I had a man crush on Big Country and I’ve modeled my dismal hoop game after him ever since
  2. North Carolina: Sheed and Stackhouse on the same squad, simply unstoppable
  3. UCLA: Tyus Edney (sorry Missouri fans) and the O’Bannon brothers were heavy favorites going into the tournament
  4. Arkansas: You can call me a softie but my girlfriend at the time was from Arkansas

5% hit rate in picking the final four is a little disheartening. I knew there had to be a better way and considering my new employment with Datascope, this was the year to completely revamp my bracket selection process and implement a more robust data-driven method. Enough of the intro, let’s get down to the nuts and bolts as I present you with...DataBall v1.0

SRS method

The first version of DataBall is an extension of the simple rating system (SRS) described by basketball-reference.com. The basic premise is that each team is rated based on their average margin of victory adjusted by their strength of schedule. The nice thing about SRS is that it is pretty damn simple (aptly named, to be sure) to implement which is important when you always wait until the week of the tournament to throw something together. We here at Datascope took that analysis a step further and separately calculated each team’s offensive rating and defensive rating by focusing on average points scored and average points allowed, adjusting each for the strength of their opponents.

For example, the Michigan Wolverines average points scored over the season was below the national average by 0.9 points. Now we need to take into consideration that Michigan plays in the B10, whose teams play excellent defense. After adjusting for the average defensive rank of Michigan’s opponents, their offensive rating jumps to 4.8 points. In other words, Michigan is expected to score 4.8 more points than the national average for a given team because it outscores its opponents, most of whom play excellent defense.

Off rating = avg points scored - national avg points scored + avg opp def rating

Similarly for their defensive ranking, Michigan allowed 5.4 points below the national average this season. After adjusting for the strength of Michigan’s opponents, we find that their adjusted defensive rating is 10.4 points above the national average. In other words, Michigan is expected to allow 10.4 points less than the national average for a given team because Michigan’s typical opponent played excellent offense.

Def rating = national avg points allowed - avg points allowed + avg opp off rating

We also took into account some other factors for DataBall v1.0, such as adjustments for home court advantage, bonus points for a victory, and placing a limit on margin of victory. Without boring you to tears with the details, all three factors played a role in determining our final results.

Results

When starting this project, my expectations were to revolutionize the NCAA tournament seeding process for all mankind. Ironically, my results hardly differ from other notable NCAA models like Jeff Sagarin, Kenpom, Nate Silver, and the Markov Chain method from the guys at Georgia Tech. All of these different methods lead to roughly the same answer in terms of who to pick for the NCAA tournament. This is a little eye-opening and begs an answer to the real question: how risky should my picks be to win my NCAA bracket? (Mike, I owe you a beer, you were right)

In the spirit of full disclosure, here are the official DataBall v1.0 picks based on the DataBall rankings: image

No obvious Cinderella’s

Everyone wants to pick the upset and this year it looks like South Florida is in the best position to make a big move. DataBall considers them underrated by two seeds and they are playing the most overrated team in the bracket in Temple. Since you are obligated to pick at least one 12 over 5 seed in your bracket (12 seeds beat the 5 seed 1 / 3 games historically) go ahead and pencil this one in.

In the next round, South Florida will face-off against a much tougher Michigan team. Expect the Wolverines to fast forward the clock to midnight and smash South Florida’s Cinderella slippers.

Some additional interesting match-ups that could go either way are the following:

  • 11 NC State over 6 San Diego State, 0.2 difference in rating
  • 10 Purdue over 7 St. Mary’s, 0.2 difference in rating
  • 11 Texas over 6 Cincinatti, 0.6 difference in rating
  • 11 Colorado State over 6 Murray State, 1.1 difference in rating...Ensuring Colorado States status as best basketball team in the state

Next steps

As mentioned before, a better question is how to win your bracket given the scoring system, number of players, and payout. Divising a strategy to maximize the chances of winning will be the focus of our next version. We also might dabble a little with Markov chains as well to see if the results are significantly different than version one.

In the words of Bas Rutten, "God speed and party on!"

contributors to this post

headshot of Aaron Wolf
Aaron
headshot of Brian Lange
Brian
headshot of Dean Malmgren
Dean
headshot of Mike Stringer
Mike
headshot of Suet Yi Lee
Suet