We present the first application of reinforcement learning in materials discovery domain that explicitly considers logical structure of the interactions between the RL agent and the environment. Here, environment is defined as the space of experiments accessible via a realistic experimental platform. We explicitly pursue training of generalizable agents that learn to navigate abstract space of experiments relevant to materials preparation. The training is facilitated by a data-augmentation strategy that recycles moderate volume of real experimental data. Experiments show that the agent can successfully search for the experiments to produce materials of the desired properties and characteristics. Furthermore, the agent learns to avoid proposing experiments that will result in undesired materials, for example the agent avoids a cross-linked form of a polymer when cross-linking should be avoided.