In 2017, smart speakers (such as Amazon Echo, Google Home, etc.) became a commercial success. Most smart speakers have a circular microphone array to provide hands-free, voice-only interaction from a distance. In this work, we exploit this mic array for opportunistically sensing gestures and tracking exercises. To this end, we measure the Doppler shift on a pilot tone caused by a gesturing human body, and use beamforming of the mic array to extend the range of the detection. Data from 12 participants show that gestures can be detected with an accuracy of 96.8% up to a distance of 2.5 meters using an inaudible 20 kHz pilot tone. For exercise tracking, we train a deep neural network to recognize 10 different exercises, and count repetitions by peak-finding heuristics. Data from 17 participants show that exercise classification accuracy is 96% and count accuracy is 91.8%. To conclude, we discuss hardware enhancements to smart speakers to further increase their gesture sensing capabilities.