Web services, large and small, use in-memory caches like memcached to lower database loads and quickly respond to user requests. These cache clusters are typically provisioned to support peak load, both in terms of request processing capabilities and cache storage size. This kind of worst-case provisioning can be very expensive (e.g., Facebook reportedly uses more than 10,000 servers for its cache cluster) and does not take advantage of the dynamic resource allocation and virtual machine provisioning capabilities found in modern public and private clouds. Further, there can be great diversity in both the workloads running on a cache cluster and the types of nodes that compose the cluster, making manual management difficult. This paper identifies the challenges in designing large-scale self-managing caches. Rather than requiring all cache clients to know the key to server mapping, we propose an automated load balancer that can perform line-rate request redirection in a far more dynamic manner. We describe how stream analytic techniques can be used to efficiently detect key hotspots. A controller then guides the load balancer's key mapping and replication level to prevent overload, and automatically starts additional servers when needed.