Thursday, 20 March 2014

CricAnalytics: Analytics in Cricket



I don't usually watch cricket but off lately I got a chance to watch one after a long time. It is surprising to see the effect of Analytics in cricket. With the advent of low cost big data and analytics technologies and ease of access to them, we can see the penetration of such analytics into many real-time applications like cricket.

While watching the match I noted down some of the insights that were being flashed on the screen. Some of them are as below.
  1. Whenever Hafeez scores 30+ and takes 2 wickets Pakistan wins
  2. Chances of winning and WASP
  3. Other reporting like Balls per dismissal vs off spin and India average
It is really great to see the effect of analytics into cricket. Let us try to understand above use cases.

1) Whenever Hafeez scores 30+ and takes 2 wickets Pakistan wins

Let us try to understand the above statement by dividing it into parts “If (Hafeez scores 30+) + (Hafeez takes 2 wickets) then (Pakistan wins)”. Such insights can be determined by using Bayesian learning to search for the conditions in which (Pakistan wins).

The algorithm to find out insights of this type would take all the matches that Pakistan won making the later part of the statement true and then it would take the scores of all players along with one other parameter like number of wickets taken. Once this list is generated we can get the minimum score and minimum number of wickets taken. Then we would filter out the players having less than significant score and significant number of wickets. This would leave only a few players. This data can then be normalized into whole numbers to increase the level of confidence.

2) WASP: Winning and score predictor and Chances of winning

Wikipedia defines WASP(Winning and Score Predictor) as “a calculation tool to predict scores and possible results of a cricket match.”

It gives the probability of winning for the batting team. This is first of its kind insight that can provide real time insights to understand the probability of winning.

Dr Seamus Hogan (Co-creator, WASP) describes WASP as follows:

“Let V(b,w) be the expected additional runs for the rest of the innings when b (legitimate) balls have been bowled and w wickets have been lost, and let r(b,w) and p(b,w) be, respectively, the estimated expected runs and the probability of a wicket on the next ball in that situation, we can then write,

V(b,w) = r(b,w) + p(b,w) V(b+1,w+1) + (1-p(b,w))V(b+1,w)

This model works on Dynamic programming model so that we need not calculate V for all the possibilities. V(b+1,w) and V(b+1,w+1) are calculated by using linear regression on data points. The second innings model is a bit more complicated, but uses essentially the same logic. As a drawback this model cannot handle scenarios like what to do when a retired batsman returns.

3) Other reporting

Analytical reporting is the difference in what commentators of past used to get during the match and what modern commentators get. Earlier commentators could ask the back office team to show snippet of last match like “Show the ball on which Yuvraj got out in the match last week”. On such request the back office member would manually search for the tape in the record and open the exact tape connect that to the tape of the current snippet on the roll and play them together.

But now with the big data technologies in place now they can ask for insights like Balls per dismissal vs off spin, Average, India average (Average against India), etc. Then the data professional in the back office can immediately run the query and provide the insight on screen.

Concluding remarks

It is sad to see that qualified data scientists are involved in providing analytical insights for cricket instead of working on topics of social interest like improving life of poor but still it is good that some good work is happening in the field of data science. So we can hope that in near future data science as a branch will evolve bringing about more data scientists to work on matters of social interest also.

3 comments:

  1. This is fun game on numbers nothing else sneh...I have been hearing nowadays regarding this numbers..but it doesnt decide fate...Effort and hardwork decides winner...

    For eg...PAK never loose matches on friday and Ind never lost to PAK in WC
    Now both this conditions were applied sametime...Last friday ,Ind vs PAK...
    so we cant go this way...best team wins...

    ReplyDelete
    Replies
    1. Thanks Miraj, for your insights, but currently I am just addressing HOW this is done instead of WHY and WHY NOT as that would lead to more philosophical debate instead of technical one.

      Delete
  2. Testimonial: Indians understand cricket more than anything else. So to explain the data analytics tools and power one can take cricket as an example. -Prof V Umamaheshwar (Prof at IUCCA, Pune and Ex VP at Patni Computer Systems)

    ReplyDelete