Primary tab navigation

Genome sequencing meets chocolate

How analyzing cacao plant genes could save chocolate


Plants have DNA, too. So just as genome sequencing has been conducted on fruit flies and humans, scientists have also discovered the genomes of rice, mustard, and a few trees - including as of 2010, the cacao.

IBM, the United States Department of Agriculture’s Agricultural Research Service (USDA-ARS), and candy-maker Mars Inc. teamed up in 2008 to sequence the cocoa genome in an effort to help farmers grow tastier, more disease-resistant and more productive cocoa trees. The initial phase of work yielded a surprising result: identifying the genes that dictate the color of the plant may be the best indicator for better-tasting, healthier plants.


Why sustain cocoa?

  • 70 percent of today’s global cocoa is produced in equatorial Africa
  • 2,000,000 small-scale cocoa farms in West Africa depend on this crop
  • 1/3 of all cocoa produced in Africa is lost to drought, pests and fungal disease
  • $800 million (U.S.) lost due to failed cocoa crops

 

 


 

Genome sequencing: how computers deciphers genetic code

In order to sequence the cacao’s genetic material, scientists had to first crush the leaves, pods and other parts of the plant at the USDA’s Agricultural Research Service lab. A DNA sequencer then extracted the nucleotides that make up the plants unique set of genes. And while the team at IBM, Mars and the USDA finished the sequence three years ahead of schedule, the real work is in deciphering the more than 30,000 estimated cacao genes.

The 30,000 is arrived at by using algorithms to identify patterns of genes embedded in the genome. The algorithms “spot” the genes by comparing similarities with known genes from other species.

While this fully automated process can estimate the total number of genes, specific genes are handled by in silico-in vitro screenings – a combination of experimental biology and computer analysis. The analysis is a delicate problem that, in IBM's efforts to identify cacao pod color, demanded algorithmic precision at a different scale (a sort of “genome whisperer”) that analyzed hundreds of pods collected from different geographies.

Typically, the hardest task is in finding gene(s) responsible for a phenotype in the "mass" of genes in the genomes. The breakthrough here is being able to pinpoint the genes respon-sible for pod color.

Dr. Laxmi Parida, computational genomics manager at IBM Research

 

Traits of flavor and sustainability through pod color

Classification of  cacao cultivars using the  IRiS algorithm, developed by the IBM scientists
Classification of cacao cultivars using the IRiS algorithm,
developed by the IBM scientists

The color coding identified within these cacao candidates, as suggested by these specialized algorithms, were put through an additional vetting process of targeted sequencing, as well as RNA analysis of the relevant tissues in these target plants. The combination of these processes identified which genes expressed pod color, and where they reside within the genome.

The red pod color trait is positively correlated with an undesirable flavor characteristic in the cacao. On the other hand, the green pods taste better but have a lower yield. The ability to screen young cacao seedlings with red or green molecular markers – and then select only those carrying the alleles genes that result in green pods – would greatly reduce the population sizes required for the laborious and expensive evaluations of unlinked flavor and yield traits.

In other words, using marker assisted selection, versus naturally breeding the plants (which takes years), will greatly speed up the effort of identifying and selecting the most flavorful and sustainable cacao plants.

 

Analyzing any genome

IBM's work to identify the cacao’s pod color is genetic selection, not modification.

IBM conceived, designed and developed these specialized algorithms to identify candidate genes, to provide guidance for specific experiments that support Mars’ work to improve cocoa taste and sustainability. Mars hopes to use these results to rapidly accelerate breeding programs to improve the quality of chocolate produced from cacao beans. All of IBM’s algorithms are available for any research community to apply to their own plant or animal studies.

Share this story

Cacao vs. Cocoa

“Cacao” refers to the tree Theobroma cacao and its seeds (and the beans inside of the seeds). The word “cocoa” refers to the processed product of the tree, seed and bean.

This article refers to the gene sequencing of the “cacao” plant.

Meet the researcher

  • Laxmi Parida

    Laxmi Parida

    Manager, Computational Genomics Group,
    Thomas J. Watson Research Center


Learn more about genome sequencing


Share this story