Published July 10, 2020 | Version v1
Presentation Restricted

Tennis Analytics

Creators

Description

DineshKumar Padmanabhan, DATS 6401, Summer 2020

Final Technical Report

Stats on The Association Tennis Professional Tour from 2009 -2019

 

Abstract

Visualization has been an important means of match analysis in a variety of sports. Like many other popular sports, tennis is increasingly defined by data. This project aims to visually show the players’ comparison and performance in various tournaments. Specifically, the aim is to analyze and present how ATP tennis players have performed over the years in Grand Slam Tournaments – Australian Open, French Open, Wimbledon and the US Open, and evaluate the game over different surfaces (hard, green and clay), as well as head-to-head against other players in various master’s series and International tournaments. The historical grand slam data results for this project will come from tennis data portal and the statistics of aces, while the serve points will come from GitHub.

Objectives

The primary objective of this project is to visualize a history of the performance of Men’s tennis and allow users to explore interesting trends and stories behind the numbers and understand the game patterns and strategies of serve points on the rally. This project aims to visually show the players’ comparison and performance in various tournaments. Key questions that we aimed to answer through visualizations are: Who has the best overall record in terms of wins and losses in Grand Slam performances since 2009? How did player rankings vary since 2009? Who were the most dominant players in various playing surfaces - clay, hard, grass courts? Who were the most dominant players in various Grand Slams - does the Australian Open for example, have an undisputed favorite? Which players were a part of some of the most unexpected results in the last 8-9 years(upsets)? and finally How Data Can Improve the Game?

 

Functional Requirements

The key functional blocks in scope of are the following using user story format:

AS A Tennis Player, I WANT TO visualize and explore the historical data results (in terms of wins and

losses in various tournaments) in tennis, So That I Can understand my strengths and weaknesses.

AS A Tennis Player, I WANT TO visualize and explore the serve points in tennis (what percent of points

you win on your first serve and your second), So That I Can optimize my serves.

AS A Tennis Player, I WANT TO visualize and explore the statistics of aces in tennis, So That I Can

understand how my shot selection affect the game’s outcome.

AS A Tennis Player, I WANT TO visualize and explore certain points (game points, set points, match

points), So That I Can understand how well I perform under pressure

 

System Architecture and Description

The data preprocessing, consolidation, and summarization occurred exclusively in CSV files and Microsoft Excel. The historical grand slam data results for this project will come from tennis data portal and the statistics of aces, while the serve points will come from GitHub. The data was then combined using the Tournament name as the primary key into a consolidated data set that could be analyzed. For this analysis, two-dimensional data tables were created that correspond with each visualization that included the data to only what pertained to the question being answered. Each of these data tables were stored into separate CSV files that could be read in using d3. All data, including the raw files, consolidated structured data, and the summary tables, were stored locally.

Development Platforms

The project will be presented in the form of an HTML webpage with CSS and JS, hosted from GitHub repository. The basic layout for our website is built with Bootstrap front-end framework. Visual Studio tool will be used for coding and PyCharm IDE with Python programming language will be used to clean the data and into the format that fits the purpose. Furthermore, the visualization for this project will be created with D3.js visualization library, Tableau, Gephi and Plotly. I plan for my data to be reported on a series of tabs on the performance page then the analysis by theme to keep the site organized. In addition, I have also used Font Awesome’s icon fonts to represent the icons on the tab and the checkbox that filters data by surface on the performance page.

Proposed Visualizations

To decide the type of I chose an exploratory analysis method on tools like Tableau and simple d3 bar charts to mine insights and Tableau was extremely helpful to quickly obtain many types of visualization communicating the same performance data. To begin with Game Fish Visualization provided a single glance overview of one game from a tennis match. For the Point progressions of all games in a match Game Tree chart was used. Rally Tree chart depicted the Point distributions and % chance won by rally length, while Sankey diagram provided shots for parallel sets and a compact match overview using sunburst. What was pleasantly surprising was that using tree maps which are technically used to represent hierarchical data, to show a linear prioritized list worked because the representation of the tree map with the right color resembled that of a tennis court which fits the context perfectly. Using a line chart to show rankings and using a two-sided bar chart to show number of wins and loss during the time period offered both the simplicity and clarity that our performance page needed.

Experimental Analyses and Conclusions

As the visualization started to take shape, I realized that the data in its new form, informed additional UI modifications. I needed to now have a row of tournament rounds fixed at the top of the screen as you scroll, because the number of charts could be enormous for a selection. Factors such as surface, head-to-head records, and even a discomfort playing against left-handers start to influence the overall outcome in little increments. While I acknowledge the weaknesses and bugs in the current implementation, I’m confident that this is a good start into plotting predicted chance of events happening with the actual result. Data will not replace hard work, coaches, or great tennis wisdom, but it will serve as another tool to take your tennis game to the next level. Overall, I found that whatever the player does on the court is the only thing that matters

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.