Human judgments of word similarity have been a popular method of evaluating the quality of word embedding. But it fails to measure the geometry properties such as asymmetry. For example, it is more natural that ‚ÄúEllipses are like Circles‚Äùthan ‚ÄúCircles are like Ellipses‚Äù. Such asymmetry has been ob-served from a psychoanalysis test called word evocation ex-periment where one word is used to recall another. Althoughuseful, such experimental data have been significantly under-studied for measuring embedding quality. In this paper, weuse three well-known evocation datasets to gain insights intoasymmetry encoding of embedding. We study both static em-bedding as well as the contextual embedding such as BERT. Evaluating asymmetry for BERT is generally hard due to the dynamic nature of embedding. Thus we probe the conditional probabilities of BERT (as a language model) using a large number of Wikipedia contexts to derive a theoretically justifiable Bayesian asymmetry score. The result shows that con-textual embedding shows randomness than static embedding on similarity judgments while performing well on asymmetry judgment, which aligns with its strong performance on ‚Äúextrinsic evaluations‚Äù such as text classification. The asymmetry judgment and the Bayesian approach provides a new per-spective to evaluate contextual embedding on intrinsic eval-uation, and its comparison to similarity evaluation concludesour work with a discussion on the current state and the futureof representation learning.