With 10 teams submitting a total of 22 submissions for the three challenges, we found some clear winners. Both original methods, DCLEAR (distance-based cell lineage reconstruction) and AMberLand excelled. The former using an estimation of the distance between cells based on different character lengths; the latter applying in a novel way a classic ML method called Gradient Boosting, an ML technique that produces a prediction model in the form of a decision tree based on aggregating small trees.
We also learned the choice of the mutation rate and the diversity of mutations in the simulations has a strong effect on the accuracy of cell lineage reconstruction. There’s a sweet spot between “too low” and “too high” mutation rates.
Our results show that it could be possible to train algorithms on smaller trees and then use these to build algorithms for reconstructing much larger trees—even the human one. Having a training set of trees with the actual solution was essential for both developing new approaches and reaching this conclusion.
If we can unravel the mystery of how the human body originates from a single cell, that knowledge could put us on the path to new treatments and even cures for congenital diseases and developmental problems, from Down syndrome to cancer. Given the complexity of this problem and the potentially huge payoff makes it a perfect challenge for AI. This also shows that there are still many fields where AI can be applied and have real impact in the improvement of predictions. All that’s needed are new ideas and new datasets.
Gong, W., Granados, A., Hu, J., et al. Benchmarked approaches for reconstruction of in vitro cell lineages and in silico models of C. elegans and M. musculus developmental trees. Cell Systems. Volume 12, ISSUE 8, P810-826.e4, August 18, 2021. ↩