Background: Virtual patient simulators (VPSs) log all users' actions, thereby enabling the creation of a multidimensional representation of students' medical knowledge. This representation can be used to create metrics providing teachers with valuable learning information. Objective: The aim of this study is to describe the metrics we developed to analyze the clinical diagnostic reasoning of medical students, provide examples of their application, and preliminarily validate these metrics on a class of undergraduate medical students. The metrics are computed from the data obtained through a novel VPS embedding natural language processing techniques. Methods: A total of 2 clinical case simulations (tests) were created to test our metrics. During each simulation, the students' step-by-step actions were logged into the program database for offline analysis. The students'performance was divided into seven dimensions: the identification of relevant information in the given clinical scenario, history taking, physical examination, medical test ordering, diagnostic hypothesis setting, binary analysis fulfillment, and final diagnosis setting. Sensitivity (percentage of relevant information found) and precision (percentage of correct actions performed) metrics were computed for each issue and combined into a harmonic mean (F1), thereby obtaining a single score evaluating the students' performance. The 7 metrics were further grouped to reflect the students' capability to collect and to analyze information to obtain an overall performance score. A methodological score was computed based on the discordance between the diagnostic pathway followed by students and the reference one previously defined by the teacher. In total, 25 students attending the fifth year of the School of Medicine at Humanitas University underwent test 1, which simulated a patient with dyspnea. Test 2 dealt with abdominal pain and was attended by 36 students on a different day. For validation, we assessed the Spearman rank correlation between the performance on these scores and the score obtained by each student in the hematology curricular examination. Results: The mean overall scores were consistent between test 1 (mean 0.59, SD 0.05) and test 2 (mean 0.54, SD 0.12). For each student, the overall performance was achieved through a different contribution in collecting and analyzing information. Methodological scores highlighted discordances between the reference diagnostic pattern previously set by the teacher and the one pursued by the student. No significant correlation was found between the VPS scores and hematology examination scores. Conclusions: Different components of the students' diagnostic process may be disentangled and quantified by appropriate metrics applied to students' actions recorded while addressing a virtual case. Such an approach may help teachers provide students with individualized feedback aimed at filling competence drawbacks and methodological inconsistencies. There was no correlation between the hematology curricular examination score and any of the proposed scores as these scores address different aspects of students' medical knowledge.