Data Science in Sport

Table of content:

Moneyball and Data Science in Sport

In 2011, the movie Moneyball came out. It was the first time when the term data science entered into the public’s consciousness. At the time Billy Beane was the general manager of the Oakland Athletics. Through the use of sabermetrics, the empirical analysis of baseball statistics he was able to build a winning team despite having a limited budget. Sabermetrics involves using raw stats such as in this example, stolen bases or batting averages to construct more detailed models to evaluate player performances.

First of all, the term sport science appeared back in 2011 and after more than 10 years big data is starting to change the game and how we understand it. Secondly, sport coaches and managers like Billy Beane understood that by using simple math, they can estimate profiles and success rates of their offensive players. By using statistics, they managed to construct a winning team with a limited budget, which shows great encouragement and can be used as an example for other sports professionals on how to use science in order to help achieve your goals and better the decision-making process.

Data Science simplified

Thankfully, this data driven approach doesn’t always entail the use of complex formulas. Former Houston Rockets general manager Daryl Morey and former coach Mike D Antoni popularized a style of play that emphasized the three most efficient ways to score in basketball, free-throws, layouts and three point shots just by using simple math.

Picture 1: Data Science in baseball

When you hear the term data science, the first thing that usually comes to mind are types of complicated models or math science terms. But, if you check the example above, you can see that using simple math increases your chance for success. By not using it at all, the numbers are working against you.

Dr. Tim Gabbet published a few papers where he mentions guidelines on how to train hard, but still prevent getting injured. By using simple heuristics, like the 10% rule of increase in the training load from week to week you can decrease the likelihood of injuries.

It is not so complicated and yet a data-driven approach to planning training sessions. The more complex model that he proposes is the use of acute to chronic workload relationship model. In this case, you need to track your players chronic and acute load and make sense of it. Basically, it compares your current acute load with your previous chronic load in order to check how much have you increased or decreased the training load that your players are adapted to.

The acute-chronic workload ratio may be the best solution to the training-injury prevention paradox.


It is not the idea of this paper to go into more detail about this model, maybe in the next ones, but just to show you how simple data can help you create your own luck.

Injury prevention and Data Science

We are living in an era where sport and technology are constantly evolving and pushing new limits. With that, the intensity of the game is also increasing and not just the intensity, but also sports related injuries. An MLS study of their players during the period from 2014 until 2019 concluded that the most commonly reported injuries were hamstrings strains, ankle sprains, and adductor strains. The injury incidence during matches was 4.1 times greater when compared with training (2).

This is not the only study that confirms this thesis. Injuries are one of the biggest problems in the sports industry. By getting injured, players are missing their training sessions and matches. Depending on the type of injury and the severity of it, the clubs are spending a lot of money on various rehabilitation processes. That is why it is in the interest of main stakeholders to have injury free athletes and to reduce the severity and frequency of injuries. Data science and professionals working in sports environments are using different technologies such as global positioning system devices (GPS) to track the external load of players, but are also monitoring internal reactions to understand how much players are loaded. All for the purpose of reducing injuries. Sports is moving in that direction, and we need to keep up with the progress.

When Data Science backs up practice

The Best practice is “a procedure that has been shown by research and experience to produce optimal results and that is established or proposed as a standard suitable for widespread adoption (3).

In their research, Buchheit and colleagues (3) examined for the first time the association between the programming of days after a match and injury rates, using retrospective data from 18 elite teams performing in top leagues including EPL, the Italian Serie A, Bundesliga, and 3 more leagues from January 2018 to December 2021.

In this paper, you can see a great example of how science supports best practices and how data can help you in your decision-making process as we have already stated in the text. In his book, R. Verheijen (4) provides a few principles for week programming and one of them is to start your training week with a recovery session and then an off day (MD+1 recovery, MD+2 off day).

Picture 3: Weekly structure with MD+1 recovery and MD+2 off

Training at MD+1 and having a day off at MD+2 may offer several advantages, both on the performance and injury side of things. They also mentioned in the research (3) that by having training on MD+1 and a day off on MD+2, a loading pattern was associated with a 2 to 3 times lower frequency of injuries in football. This is how science and data can support best practices.

Training at MD+1 and having a day off at MD+2 may offer several advantages both on performance and injury sides of things.

Take home message

The sport is progressing, as are the technologies in it. When talking about football, we can see how the intensity of the game increases and the time we have at our disposal for recovery decreases. What is worrying is the number of injuries that are also increasing. It is important to mention that injuries do not follow the trend of technology development. There is some logic that injuries should decrease with the growth of knowledge and technology but it does not happen. Maybe we don’t have answers to all the questions yet. But we have to follow the progress, we have to adapt to the game. In the text we have seen several simple examples of how it is possible to use sports science or simple mathematics to your advantage. We should not be afraid of progress. We should embrace it and use it to our advantage.

References :

  1. Gabbett TJ. The training-injury prevention paradox: should athletes be training smarter and harder? Br J Sports Med. 2016 Mar.
  2. Forsythe B;Knapik DM;Crawford MD;Diaz CC;Hardin D;Gallucci J;Silvers-Granelli HJ;Mandelbaum BR;Lemak L;Putukian M;Giza E; (n.d.). Incidence of injury for professional soccer players in the United States: A 6-year prospective study of Major League Soccer. Orthopaedic journal of sports medicine.
  3. Buchheit, M. (2022, December 5). Planning the microcycle in elite football: to rest or not to rest? Martin Buchheit.
  4. R. Verheijen. The Original Guide to Football Periodisation Part 1. World Football Academy, 2014.
  5. Lets talk about weekly plans in soccer, (04.01.2023.)

More from the blog:

Walking Works