While we have made significant progress over image understanding and search, how to meet the ultimate goal of satisfying both exploration and exploitation in one single system is still an open challenge. In the context of landmark images, it means that a system should not only be able to help users to quickly locate the photo they are interested in (exploitation), but also to discover different parts of the landmark which have never been seen before (exploration), which is a common request as evidenced by many recent multimedia studies. To the best of our knowledge, existing systems mainly focus on either exploration (e.g., photo browsing) or exploitation (e.g., representative photo identification), while users' need of exploration and exploitation is dynamically mixed. In this paper, we tackle the challenge by organizing landmark images into a hierarchical summary which gives user the flexibility of conducting both exploration and exploitation. In the hierarchical summary construction, we introduce two principles: the coherence principle and the diversity principle. Behind these two principles, the intrinsic concept is 'detail-level,' which measures how much detail that an image reflects for a certain landmark. A new objective function is derived from the definition of both exploration and exploitation experience on detail-level. The problem of finding an optimal hierarchical summary is formulated as searching over a space of trees for the one that achieves the best objective score. Extensive quantitative experimental results and comprehensive user studies show that the optimized hierarchical summary is able to satisfy both experiences simultaneously.