We seek to determine which manufacturing processes and sequence of processes optimize semiconductor product performance, when individual processes are represented only as categorical variables. Using Natural Language Processing techniques, we model each wafer as a vector of weighted terms in the processing space vocabulary. Each term in the vocabulary is composed of a processing step and recipe. The weights reflect how general a term is. Rare terms with more discriminative power will receive higher weights. Such a vector representation of processing history enables additional analyses such as wafer classification, clustering, and identification of similar processing histories.