Deep neural network models have been very successfully applied to Natural Language Processing (NLP) and Image based tasks. Their application to network analysis and management tasks is just recently being pursued. Our interest is in producing deep models that can be effectively generalized to perform well on multiple network tasks in different environments. A major challenge is that traditional deep models often rely on categorical features, but cannot handle unseen categorical values. One method for dealing with such problems is to learn contextual embeddings for categorical variables used by deep networks to improve their performance. In this paper, we adapt the NLP pre-training technique and associated deep model BERT to learn semantically meaningful numerical representations (embeddings) for Fully Qualified Domain Names (FQDNs) used in communication networks. We show through a series of experiments that such an approach can be used to generate models that maintain their effectiveness when applied to environments other than the one in which they were trained.