Professional tennis is a fast-paced sport with serves and hits that can reach speeds of over 100 mph and matches lasting long in duration. For example, in 13 years of Grand Slam data, there were 454 matches with an average of 3 sets that lasted 40 minutes. The fast pace and long duration of tennis matches make tracking the time boundaries of each tennis point in a match challenging. The visual aspect of a tennis match is highly diverse because of its variety in angles, occlusions, resolutions, contrast and colors, but the sound component is relatively stable and consistent. In this paper, we present a system that detects events such as ball hits and point boundaries in a tennis match from sound data recorded in the match. We first describe the sound processing pipeline that includes preprocessing, feature extraction, basic (atomic) event detection, and point boundary detection. Then, we describe the overall cloud-based system architecture. Afterwards, we describe the user interface that includes a tool for data labeling to efficiently generate the training dataset, and a workbench for sound and model management. The performance of our system is evaluated in experiments with real-world tennis sound data. Our proposed pipeline can detect atomic tennis events with an F1-score of 92.39% and point boundaries with average precision and recall values of around 80%. This system can be very useful for tennis coaches and players to find and extract game highlights with specific characteristics, so that they can analyze these highlights and establish their play strategy.