“Who's winning?" at Cricket: Communicating Machine Learning Algorithms

4 Sept

The communication of statistics to the general public is generally a well covered topic. Statistics are used in all parts of our lives, ranging from the news, to politicians, to usage in many people’s day to day work. This has been particularly prominent in recent years with case rates for COVID and economic measures during the recent cost of living crisis.

Generally, statisticians and journalists have learned how to communicate these things effectively. An example of this is this BBC News article about why a fall in inflation does not mean a fall in prices. This is a complex statistical measure, carefully explained both in this article and throughout the BBC’s coverage. These days, simpler forms of statistical analysis are sometimes being replaced by machine learning.

Over the last decade, the use of machine learning algorithms has exploded. I would loosely define them as allowing computers to “develop” their own algorithms without using any human developed algorithms.

These algorithms are trained on historical data, and then can produce a “decision” based on an input, examples include face detection in photos, auto-completing of your emails, or your maps app working out the best route relative to your position and prevailing traffic conditions. Some of these algorithms are predictive: given the current situation, a likelihood of a future event can be predicted. An obvious example of this is the weather forecast. Meteorologists have machine learning algorithms which will take all available inputs and then predict that there is a 60% chance of it raining during the cricket game you are going to watch tonight.

In cricket, a recent innovation in the fan experience has been developed by a company called CricViz, which is a win predictor algorithm called WinViz. This at any time during a game, can tell you the likelihood of either team winning, in percentage terms. At the start of the game this will be close to 50:50, and if a team is particularly on top, they will have a higher percentage. Basically, it allows fans to “objectively” answer the question of “who is winning?” which is less clear in cricket than other sports. However, WinViz can be badly understood and disliked. Part of the problem is that people think WinViz was “wrong” if it predicts that someone is going to win, and then they lose. This results in people not trusting it in future. When correctly understood, a team winning when they only had a 20% chance of winning the match, does not mean that the algorithm was wrong, just that you witnessed a remarkable, and unlikely victory!

Is it a fan’s fault for misunderstanding what the predictive algorithm is saying? Or should this be something that is communicated more clearly? Is communicating likelihood as a percentage the best way to people who may not have a background in statistics? Perhaps, at the end of the games, WinViz should emphasize and celebrate when it predicted “wrongly”, because that means you have seen a more remarkable cricket match. There are few things more important than cricket [editor’s note, this view is held by Martin Shine and is not representative of Butterfly Data’s perspective of global importance], but this idea of communicating machine learning outputs clearly may be important elsewhere. These algorithms are extensively used in many sectors, but I suspect we will see the communication of the outputs of these algorithms increase over time as they are used more in sectors such as healthcare and policing.

As this happens, great care will need to be taken, not just that the outputs of these algorithms are correct, but also that they are understood by the people they are affecting.

Data Analysis

Martin Shine

“Who's winning?" at Cricket: Communicating Machine Learning Algorithms

Making Online Learning Stick

The Race To Generative AI