Discovery and development of polymer materials is driven by experimental data acquisition. Experiments unfold under conditions of delayed rewards on incredibly rich landscapes shaped by multiple experimental degrees of freedom, including continuous (concentration, temperature, radiation, time) and categorical (monomers, catalysts, initiators, solvents) [1,2]. Deep reinforcement learning (RL) emerges as an appealing approach with a capability to interact with lab equipment, handle delayed rewards, and find non-trivial research strategies under realistic constraints of discovery/development projects. We report development of an end-to-end RL approach applied to preparation of spin-on-glasses (SOGs). The primary focus of the talk is meta-learning strategies  that ensure generalizability of the RL agent performance, and associated task of data augmentation at the training stage. 1. Li, H. et al. “Tuning the Molecular Weight Distribution from Atom Transfer Radical Polymerization Using Deep Reinforcement Learning” Mol. Syst. Des. Eng., 2018. 2. Zhou, Z. et al. “Optimizing Chemical Reactions with Deep Reinforcement Learning” ACS Cent. Sci. 2017. 3. Kobbe, K. et al. “Quantifying Generalization in Reinforcement Learning” PMLR 97, 2019.