The data of ball trajectories are one of the most fundamental and useful information in the evaluation of players' performance and analysis of game strategies. Although vision-based object tracking have been developed to analyze sports videos, it is still challenging to recognize and position a high-speed and tiny ball from broadcast sport videos. In this paper work, we develop a deep learning network, called TrackNet, to track the tennis ball from broadcast videos in which the ball images are small, blurry, and sometimes with afterimage tracks or even occluded. The proposed heatmap-based deep learning network is trained to not only recognize the ball image from a single frame but also learn flying patterns from consecutive frames. TrackNet takes images with size 640x360 to generate a detection heatmap from either a single frame or several consecutive frames to position the ball and can achieve high precision even on public domain videos. The network is evaluated on the video of the men's singles final at the 2017 Summer Universiade, that is available on YouTube. The precision, recall, and F1-measure of TrackNet reach 99.7%, 97.3%, and 98.5%, respectively. To prevent overfitting, 9 more videos are partially labeled together with a subset from the previous dataset to implement 10-fold cross validation, and the precision, recall, and F1-measure are 95.3%, 75.7%, and 84.3%, respectively. A conventional image processing algorithm is also implemented to compare with TrackNet. Our experiments indicate that TrackNet outperforms conventional method by a big margin and achieves exceptional ball tracking performance.

Demo Video

based on
Y.-C. Huang, "TrackNet: Tennis Ball Tracking from Broadcast Video by Deep Learning Networks," Master Thesis, advised by C.-W. Yi and G.-H. Huang, National Chiao Tung University, Taiwan, 2018.

Source Code

Click the link https://nol.cs.nctu.edu.tw:234/iSport/TrackNet/ to access the GitLab project.

The source code and trained models of TrackNet as well as the label tool are availabe in the GitLab project.
The weights for TrackNet model I can be found in "Code/TrackNet_One_Frame_Input/weights", and the weights for TrackNet model II and II' are in "Code/TrackNet_Three_Frames_Input/weights".
**You may need to change the path of files in the code according your sytstem environment.**


The first dataset is based on the broadcast video of the tennis men’s singles final at the 2017 Summer Universiade. The resolution, frame rate, and video length are 1280×720, 30 fps, and 75 minutes, respectively. By cutting off rally unrelated video, 81 game-related clips are segmented and each of them records a complete rally, starting from ball serving to score. There are 20,844 labeled frames in the first dataset.

The second dataset consists of video clips from 9 broadcast videos. Each video contributes around 2000 frames. The second dataset incoporated with a subset from the first one are used for 10-fold validation and training a general model. In total, there are 23,903 labeled frames in the second dataset.

In the label file, each frame may have the following attributes: "Frame Name", "Visibility Class", "X", "Y", and "Trajectory Pattern."

Download Dataset

Click the link https://drive.google.com/open?id=1GzJZeEPEi8lJjEAAtVnHhRdX8TVR14yK to download the dataset.

Format of label files

In the dataset, the data for each video clip are stored in a separated directory. In each clip directory, the original video clip together with the frames of the clip, a label file, and an annotated video can be found. The label file is named "Label.csv." Each line of the label file may consist of file name, visibility, x, y, trajectory pattern.

Attributes in the label file

  • file name: The name of the frame files
  • visibility: VC for short, indicates the visibility of the ball in each frame. Thepossible values are 0, 1, 2, and 3
    • 0: the ball is not within the frame.
    • 1: the ball can be easily identified.
    • 2: the ball is in the frame but can not be easily identified.
    • 3: theball is occluded by other objects.
  • x-coordinate: The x coordinate of tennis in the pixel coordinate.
  • y-coordinate: The y coordinate of tennis in the pixel coordinate.
  • status: The ball movement types are classified into three categories and labeled by 0, 1, and 2.
    • 0: flying
    • 1: hit
    • 2: bouncing

Example: A piece of label files

load image failed

Example: An example of visibility class 2

The ball image can not be easily recognized. (a)(b)(c) are the original images, and the red dots in (d)(e)(f) are the labeling result.

load image failed

Example: An example of visibility class 3

The ball image is occluded. (a)(b)(c) are the original images, and the red dot in (e) is the occluded ball interpolated from the red dots in (d) and (f).

load image failed

Example: An example of trajectory pattern 2

(a) and (b) are labelled as moving, and (c) is labelled as bouncing.

load image failed

Example: An example of trajectory pattern 3

(a) and (b) are labelled as moving, and (c) is labelled as hit.

load image failed