Overview
Optimizing training routines is crucial for improving player performance while preventing injuries. While many factors in sports are uncontrollable – such as the strength of opponents or unpredictable game conditions – training time is one area where coaches have significant influence. By leveraging GPS data, teams can gain detailed insights into player performance, helping coaches better balance intensity and recovery during training.
This analysis explores GPS data from youth Croatian National Soccer Teams to examine how different training activities affect player performance and health. By developing new metrics and applying clustering techniques, this study uncovers patterns that could help improve training efficiency and player safety.
GPS Data Overview
The data set used in this study consists of GPS data collected from wearable sensors worn by players during training sessions. This data includes metrics such as speed, heart rate, odometer (cumulative step count), acceleration, and precise GPS coordinates, all captured at a high frequency (10x/second for most variables).
This initial analysis of the GPS data reviewed just six training sessions for an individual athlete, so there is potential for more insights to be uncovered in future work. The aim of this work was not to solve a specific problem (such as injury prediction) but to explore the data for general patterns that could inform future training optimization.
Methods
The project began with basic exploratory data analysis (EDA) to understand the structure and distribution of the data. The distribution varies significantly from variable to variable.

The two key processes included:
- Creating Supplemental Metrics: For example, by tracking the number of steps over different time intervals (1, 2, 3, 4, and 5 minutes), a smoother representation of player activity was achieved. This made it easier to identify periods of higher or lower exertion during training.
- Clustering: A k-means clustering model was applied to group training data into different clusters based on original metrics along with ones created in the first exercise. This clustering helped identify distinct phases within each training session, such as warm-up, high-intensity exertion, and recovery.
Key Insights


- Training Segmentation: By analyzing step data and visualizing players’ movements, two primary types of activities were identified: scrimmages and drills. Scrimmages showed steady movement, with a clear break midway through, while drills exhibited more variable step patterns. This distinction helped label different periods within training, allowing for deeper analysis.
- Heart Rate and Activity: By plotting heart rate data against player movement, the impact of different training activities on cardiovascular exertion was visualized. During drills, players exhibited periods of high intensity followed by brief recovery phases, reflected in fluctuating heart rates. In contrast, scrimmages showed more sustained activity with less pronounced fluctuations in heart rate, indicating more consistent, moderate exertion throughout these segments.
- Step Grouping: A new metric based on the number of steps taken in the last five minutes was created to identify periods of higher physical demand. For example, a “step group” classification revealed that players spent the most time in moderate activity (Group 3), with occasional spikes into higher activity (Group 6). This metric could be used to fine-tune the intensity of drills and identify when players are overexerting themselves.
- Clustering Results: The k-means clustering model produced five distinct clusters:
- End of Training: High step count and moderate intensity.
- Warmup: Low intensity, with limited movement.
- High Intensity: High exertion, especially acceleration—important for performance optimization.
- Halftime: Low exertion, likely a recovery phase.
- Moderate Intensity: Steady movement without significant acceleration.

Cluster 3 – The High Intensity Cluster – is particularly notable. It represents periods of maximal effort, and targeting this segment could provide valuable insights into optimizing player conditioning.

Actions and Next Steps
This initial analysis serves as a starting point for more targeted research. The following next steps are now possible:
- Injury Prediction: Integrating injury data could help correlate physical exertion patterns with injury risk. Understanding when players are most at risk for injury based on their activity levels would be invaluable for injury prevention strategies.
- Expanded Data: More data from a wider range of players and training sessions could help validate these findings and reveal further patterns across different types of players (e.g., by position or experience level).
- Drill Optimization: By analyzing specific drills during training, insights can be gained on which drills promote the most beneficial movement patterns and conditioning, while others might need to be adjusted to avoid overtraining or undertraining.
- Enhancing the Clustering Model: With a larger sample of data, the clustering model can be refined to better distinguish between high-performance and high-risk periods, allowing for more precise recommendations on training intensity and recovery.
Conclusion
This analysis demonstrates the potential of GPS data to provide valuable insights into player performance and health during training. While the study focused on exploratory analysis, it opens the door to further research aimed at optimizing training routines, improving performance, and reducing injury risk. With more comprehensive GPS data and refined models, coaches can gain a deeper understanding of how to structure training for maximum benefit.
References
- Piłka, T.; Grzelak, B.; Sadurska, A.; Górecki, T.; Dyczkowski, K. “Predicting Injuries in Football Based on Data Collected from GPS-Based Wearable Sensors.” Sensors 2023, 23, 1227.
- Rossi, A., et al. “Effective Injury Forecasting in Soccer with GPS Training Data and Machine Learning.” PLoS ONE 2018, 13,7: e0201264.