Frontiers in Immunology

Exploring the impact of clonal definition on B-cell diversity: implications for the analysis of immune repertoires

View publication


The adaptive immune system has the extraordinary ability to produce a broad range of immunoglobulins that can bind a wide variety of antigens. During adaptive immune responses, activated B cells duplicate and undergo somatic hypermutation in their B-cell receptor (BCR) genes, resulting in clonal families of diversified B cells that can be related back to a common ancestor. Advances in high-throughput sequencing technologies have enabled the high-throughput characterization of B-cell repertoires, however, the accurate identification of clonally related BCR sequences remains a major challenge. In this study, we compare three different clone identification methods on both simulated and experimental data, and investigate their impact on the characterization of B-cell diversity. We observe that different methods lead to different clonal definitions, which affects the quantification of clonal diversity in repertoire data. Our analyses show that direct comparisons between clonal clusterings and clonal diversity of different repertoires should be avoided if different clone identification methods were used to define the clones. Despite this variability, the diversity indices inferred from the repertoires’ clonal characterization across samples show similar patterns of variation regardless of the clonal identification method used. We find the Shannon entropy to be the most robust in terms of the variability of diversity rank across samples. Our analysis also suggests that the traditional germline gene alignment-based method for clonal identification remains the most accurate when the complete information about the sequence is known, but that alignment-free methods may be preferred for shorter sequencing read lengths. We make our implementation freely available as a Python library cdiversity.