The ability to generate candidate molecules with certain chemical properties is key in novel material discovery. The generation capability has been improved using larger training data, sophisticated generative models, and sampling techniques. However, the multi-level evaluation of generative models for material discovery and the characterization of the generated candidate molecules is not extensively studied in the state of the art. Such evaluations help to improve understanding of the generative process, differentiate across models, and facilitate interaction between machine learning researchers and materials scientists. To this end, we propose a toolkit for Multi-level Performance Evaluation of Generative mOdels (MPEGO) for material discovery applications. MPEGO aims to hierarchically characterize and quantify the capability of generative models across the chemical and biological properties of molecules. The toolkit is validated with two generative models: Graph Convolutional Policy Network (GCPN) and a Flow-based Autoregressive (GraphAF) trained on ZINC-250K molecules. Preliminary results show that the GCPN generated molecules achieve higher independence from the training molecules compared to GraphAF's, across multi-level evaluation metrics, whereas GraphAF molecules are found to achieve higher independence in scaffolding and molecular weight features. Finally, as MPEGO is model-agnostic, it can be integrated with any generative models for material discovery and beyond.