The need for algorithms that optimize building energy consumption is usually motivated with the high energy consumption of buildings on a global scale. However, the current practice for evaluating the performance of such algorithms does not reflect this goal, as in most cases the performance is reported for one specific simulated building only, which provides no indication about the generalization of the score on other buildings. One approach to overcome this severe issue is to establish a shared collection of environments, each representing one simulated building setup, that would enable researchers to systematically compare and contrast the efficacy of their building optimization algorithms at scale. However, this requires that the individual environments are well designed for this goal. This paper is thus targeting the design of suitable environments for such a collection based on a detailed analysis of related publications that allows the identification of relevant characteristics for suitable environments. Based on this analysis a guide is developed that distills these characteristics into questions, intended to support a discussion of relevant topics during the design of such environments. Additional explanations and examples are provided for each question to make the guide more comprehensible. Finally, it is demonstrated how the guide can be applied, by utilizing it for the design of a novel environment, which represents an office building in tropical climate. This environment is released open source alongside this publication. We also indicate how test scenarios from existing publications could be enhanced to comply with the required characteristics according to our guide, underlining its importance for the future development and evaluation of building energy optimization algorithms, and thus for the sustainability of buildings in general.