On June 3rd, 2018 The DASH DAO funded Medicinal Genomics to finally crack the cannabis genome. Medicinal Genomics was the first company to sequence the cannabis genome in 2011 but the sequencing technology of that era was not capable of getting the genome to the standard set by the Human Genome Project in 2001. Fast forward to 2018 and Pacific Biosciences now has tools than can sequence 20,000-100,000 base pair reads with single molecule sequencing.

Human Genome Project Celebration at the Whitehouse. N50 of 0.5Mb or 500Kb.

Human Genome Project Celebration at the Whitehouse. N50 of 0.5Mb or 500Kb.

This major breakthrough in sequencing technology has enabled the cannabis genome to finally meet the 500kb N50 standard set by the Human Genome Project back in 2001.

In addition to the Chemdawg and LA Confidential assemblies performed in 2011 on the Illumina platform, many other parties have been contributing to public cannabis genome resources (Page, Lynch, Sawlyer, Phylos, Steep Hill, Sunrise genomics) but until now no one has been able to crack the 500kb N50 barrier.

Thanks to DASH and the technology at Pacific Biosciences, the cannabis genome is on track to have N50s of several megabases by years end. Below is a table reflecting the various genomes sequenced to date.

Several unique findings are already emerging. The cannabis genome is over 1 billion bases in size. This is 30% larger than most previous sequence based estimates. Additionally, we now know that the inactive form of THCAS Synthase (Q33DQ2.1) and the active form of THCA Synthase (Q8GTB6.1) are about 311kb apart and now exist on the same contig emersed in segmental duplications of other questionably functioning synthase genes. This cluster of THCA Synthase like genes is likely playing a roll in homologous recombination between the synthase genes. Weiblen et al implies CBDAS and THCAS are in linkage.

There is also evidence of an interesting CBCA synthase cluster on contig 000410. This has 3 tandem copies of CBCA synthase with one additionally mutated copy each about 20-40kb apart. There are subtle silent and non-silent SNPs between the copies.

The most stunning aspect of this work is the speed at which it occurred. The grant application was submitted in mid May. The DASH master node vote came down to the last hour where (at the stroke of midnight) the proposal passed and 30 DASH were distributed to Medicinal Genomics on June 3rd. 60 days and 60 DASH later we have a genome ready for public consumption. This genome was notarized on the DASH blockchain proving its time stamp on a global public immutable ledger. A DASH a day might keep the doctor away.

The DNA assembly has completed the polishing steps and a Higher Molecular Weight DNA library is being constructed in attempt to close the remaining 2,603 gaps. The genome is 1.03Gb in size. The N50 of this genome is 665Kb with a 92.9% completeness score from BUSCO. Currently this is 4.12X more contiguous than any public cannabis assembly and 303X more contiguous than the first Cannabis Assemblies published in 2011. If you would like to download the latest assembly, please use the form at the bottom of this page.

To learn more about this project and its future sequencing efforts please visit our Crypto-Funded Public Genomics page.

To learn more about DNA assembly quality control, Adam Phillippy  delivers a great presentation about the impact of next generation sequencing read length on the quality of genomic assemblies. He also underscores the dizzying rate of improvement in sequencing technology compared to the Human Genome Project celebration in 2001. This justified Whitehouse celebration displayed an assembly quality of 500kb N50s. We only hope the Whitehouse recognizes the need for a similar celebration given the medicinal value of Cannabis expressed in various Department of Health and Human Services patents.

Download the Jamaican Lion DNA Assembly File

The DNA assembly has completed the polishing steps and a Higher Molecular Weight DNA library is being constructed in an attempt to close the remaining 2,603 gaps. The genome is 1.03Gb in size. The N50 of this genome is 665Kb with a 92.9% completeness score from BUSCO. Currently, this is 4.12X more contiguous than any public cannabis assembly and 303X more contiguous than the first Cannabis Assemblies published in 2011. The email address provided will be used to update you with new assemblies as they become public.