I don't usually watch cricket but off lately I got a chance to watch one after a long time. It
is surprising to see the effect of Analytics in cricket. With the advent of low
cost big data and analytics technologies and ease of access to them, we can see the penetration of such analytics into many real-time applications like cricket.
While watching the match I noted down some of the insights that were
being flashed on the screen. Some of them are as below.
- Whenever Hafeez scores 30+ and takes 2 wickets Pakistan wins
- Chances of winning and WASP
- Other reporting like Balls per dismissal vs off spin and India average
It is really great to see the effect of analytics into cricket. Let us
try to understand above use cases.
1) Whenever Hafeez scores 30+ and
takes 2 wickets Pakistan wins
Let us try to understand the above statement by dividing it into parts “If (Hafeez scores 30+)
+ (Hafeez takes 2 wickets) then (Pakistan wins)”. Such insights can be
determined by using Bayesian learning to search for the conditions in which (Pakistan
wins).
The algorithm to find out insights of this type would take all the matches that Pakistan won making the later part of the statement true and then it would take the scores of all players along with one other parameter like number of wickets taken. Once this list is generated we can get the minimum score and minimum number of wickets taken. Then we would filter out the players having less than significant score and significant number of wickets. This would leave only a few players. This data can then be normalized into whole numbers to increase the level of confidence.
2) WASP: Winning and score
predictor and Chances of winning
Wikipedia defines WASP(Winning and Score Predictor) as “a calculation tool to predict scores and possible results of a cricket
match.”
It gives the probability of winning for the batting team. This is first
of its kind insight that can provide real time insights to understand the
probability of winning.
Dr Seamus Hogan (Co-creator, WASP) describes WASP as follows:
“Let V(b,w) be the expected additional runs for the rest of the innings
when b (legitimate) balls have been bowled and w wickets have been lost, and
let r(b,w) and p(b,w) be, respectively, the estimated expected runs and the
probability of a wicket on the next ball in that situation, we can then write,
V(b,w) = r(b,w) + p(b,w) V(b+1,w+1) +
(1-p(b,w))V(b+1,w)”
This model works on Dynamic programming model so that we need not
calculate V for all the possibilities.
V(b+1,w) and V(b+1,w+1) are calculated by using linear regression on data
points. The second innings model is a bit more complicated, but uses
essentially the same logic. As a drawback this model cannot handle scenarios
like what to do when a retired batsman returns.
3) Other reporting
Analytical reporting is the difference in what commentators of past used to get during the match and what modern commentators get. Earlier commentators could ask the back office team to show snippet of last match like “Show the ball on which Yuvraj got out in the match last week”. On such request the back office member would manually search for the tape in the record and open the exact tape connect that to the tape of the current snippet on the roll and play them together.
But now with the big data technologies in place now they can ask for insights
like Balls per dismissal vs off spin, Average, India average (Average
against India), etc. Then the data professional in the back office can immediately run
the query and provide the insight on screen.
Concluding remarks
It is sad to see that qualified data scientists are involved in providing analytical insights for cricket instead of working on topics of social interest like improving life of poor but still it is good that some good work is happening in the field of data science. So we can hope that in near future data science as a branch will evolve bringing about more data scientists to work on matters of social interest also.
