The Cannabis Pan-Genome Project: Advancing Cannabis Breeding

Building upon the reference genome of the Jamaican Lion cultivar, we have identified the genetic variations that cause a plant to produce the important cannabinoids of THC, CBD, or a mixture of the two, referred to as chemotypes (I-IV). This information provides a “cookbook” of genetic recipes for different types and cultivars, and it is a key to breeding for cannabis yield, potency, and a host of other traits.

“The pan-genome focuses on whole genome sequencing and de novo assembly using the Sequel II System by Pacific Biosciences (PacBio) to catalog structural variation inheritance patterns in closely and distantly related cultivars. This family of genomes will form the foundation for cannabis breeding programs across the industry,” said Medicinal Genomics Chief Scientific Officer, Kevin McKernan. “The economic value of the genetics that govern cannabinoid expression, seed development, and fiber yield is one of the most promising opportunities of the decade. And now, every breeder and grower can use this tool, retaining complete ownership of the results for their own cultivars.”

The project was seeded with a cryptocurrency grant from the Dash DAO in June of 2018. Since then, over 200 Gb of PacBio sequencing has been performed on Jamaican Lion to assemble the genome and build transcriptome and methylation maps across multiple tissue types.

A new standard for cannabis genetics

The completeness of a genome assembly is calibrated in a number known as “N50.”  Assemblies with low N50 numbers are like photographs missing thousands of pixels. Leave out enough pixels and the image is incomprehensible. The longer the N50 number, the more information the assembly contains. The latest MGC cannabis assembly has an N50 of 7.6 megabases with a completeness of approximately 97%, establishing a new industry record. This updated reference is the foundational piece to the Pan Cannabis Genome and is accessible to all MGC partners.

Using MGC’s assembly of the female Jamaican Lion cultivar as a baseline, genomic DNA from a sibling male plant and multiple offspring were isolated and are being sequenced with the Sequel II System long-read platform to identify structural variations and other types of important genetic variations. This “family” sequencing strategy yields a recombination map and is the basis for creating a pan-genome of cannabis.

“The annotation of structural variations in cannabis will be critical to understanding the genetics of yield, seed production, and other desirable traits,” said Timothy Harkins, MGC Senior Advisor. “We continue to build new genetic tools for the entire community by using the Jamaican Lion assembly combined with the methylation and transcriptome maps and now the recombination hotspots. This combination of data and technologies are accelerating our insights, which breeders and growers can now take advantage of. None of it would be possible, however, without the extraordinarily long sequencing read lengths generated by the PacBio platform. This genetic tool wasn’t available until they came along, and that has made all the difference.”

The program’s progress will be presented at a webinar in June, hosted by PacBio, and at the SMRT Leiden meeting in Leiden, Netherlands this May.

Cannabis DNA