We model and analyze a User-Equipment (UE) based wireless network selection method where individuals act on their stochastic knowledge of the expected behavior off their available networks. In particular, we focus on networks with millimeter-wave (mmWave) radio. Modeling mmWave radio access technologies (RATs) as a stochastic 3-state process based on their physical layer characteristics in Line-of-Sight (LOS), Non-Line-of-Sight (NLOS), and Outage states, we make the realistic assumption that users have no knowledge of the statistics of the RATs and must learn these while maximizing the throughput obtained. We develop an online learning-based approach to access network selection: a user-centric Multi-Armed Bandit Problem that incorporates the cost of switching access networks. We develop an online learning policy that groups network access to minimize costs for RAT selection, analyze the regret (loss due to uncertainty) of our algorithm. We also show that our algorithm obtains optimal regret and in numerical examples achieves 24% increase in total throughput compared to existing techniques for high throughput mmWave RATs that vary over a fast timescale.