Welcome to Kannapedia: The Distributed Consensus on Cannabis Genetics

The Cannabis Phylogenetic tree has been studied for years. Only recently have we had the tools to expand the resolution of these maps 10,000 fold with the advent of Next Generation Sequencing. Kannapedia is designed to be an open source genetic registry of cannabis genetics that are registered with the Bitcoin Blockchain to drive authenticity and trusted consensus on the phylogenetic registration of Cannabis Strains.

For a shortcut to the Blockchain etched data visit Kannapedia.net


100,000-200,000 SNPs per strain were sequenced using a Reduced Representation Shotgun approach that has been adapted to the Illumina HiSeq/MiSeq platforms. 30ng (billionths of a gram) of DNA from a SenSATIVAx DNA prep is utilized to sequence millions of reads per strain. This data is then uploaded to the AWS and phylogenetic trees are generated on the fly. VCF files are SHA-256 hash digested and etched into the Bitcoin Blockchain to demonstrate cryptographic proof of existence at a particular time stamp. 100,000-200,000 SNPs is excessive to simply phylotype a cannabis strain. High marker density can be helpful for Marker Assisted Selection but for fast turn around time, 10,000-20,000 SNPs can suffice to provide high information content fingerprints. For this reason we have developed a nested set of 10,000 – 20,000 SNPs for StrainSEEK® that delivers phytotrees like the one below.

2016 4-20 Sequencing

Phylotree 4.15.16

This tree is preliminary and will shift as more data emerges. A distributed consensus will emerge as the community continues to submit DNA to organize the historical tree of this important plant. Extreme phenotypes like Australian Bastard are being used to help inform the ancestry as best as possible. Australian Bastard is a cross with a sativa so it shares sativa genetics and what is a 4th folklore species of cannabis from Australia. Current Cannabis speciation remains a healthy debate. It is believed Cannabis first arrived in Australia with English contact.

There are several reassuring aspects of these early phylogenetic trees.

  • USO-31 and Finola are hemp strains. Many of the CBD lines like Otto and CBD Mango Haze are clustering by the hemp lines. CBD has been associated with wild hemp in publications as early as 1940.
  • The Hazes are clustering near each other as are the Kush lines.
  • Kandy Kush from DNA Genetics is known to be OG x Trainwreck origin and places closes to these expected genetics.
  • Many replicates are intentionally sequenced to confirm they line up side by side.
  • A few blinded samples like Holland1R have been submitted and confirmed to be LA Confidential.
  • LA Confidential  has an OG heritage and this shows itself in the tree.
  • The Hybrid 50:50 THCA/CBDA lines Wild Zombie (WZ) and Ringo’s Gift are clustering together
  • Chocolope is a OG Chocolate Thai x Cannalope Haze and falls right between the Haze lines and the OG lines
  • Clear outliers are obvious. Pineberry, a high THCA strain is likely a mislabelled Sour Tsunami prepped on the same day.

We believe these data can be a powerful complement to chemotype information derived from these strains.

FastQ Files– Warning: These files were not all generated with the same conditions but can still be helpful for those exploring the data. Samples with similar timestamps and RunIDs have similar conditions.

Whole Genome Shotgun Data– email Kevin.McKernan at Medicinalgenomics.com for access (large files)

  • OGKush
  • Chocolope
  • LA Confidential
  • Kandy Kush
  • Recon
  • Green Crack
  • Dr. Grinspoon
  • ChemDog
  • Australian Bastard
  • Grape Stomper
  • LemonSkunk
  • SensiStar
  • White Fire (WIFI)
  • OG Kush Salmon Creek California
  • Purple Kush (van Bakel)
  • USO31 (van Bakel)
  • Finola (van Bakel)


2016 Emerald Cup Sequencing

2016 4-20 Sequencing


Chemotype overlays with phylogenetic data. Recent issued cannabis patents speak to 3%CBD/3%THC and low myrcene. https://www.lens.org/lens/patent/US_9370164_B2

Chemotype overlays on phylogenetic tress. The Blue/Green pie charts are CBD(blue)/THC(green) ratios. The diameter of the pie chart represents the magnitude of the expression. The smaller pie charts closer to the center are terpenes. There are a few type II plants where Myrcene is not the dominant terpene. This is relevant to this Cannabis Patent Issuance.

Australian Bastard

One of thousands of Illumina MiSeq images. Each spot is a DNA cluster that was amplified into 1000 copies from a single DNA molecular seed. These clusters can be sequenced with sequencing by synthesis reversible terminator sequencing.



30ng of SenSATIVAx DNA is digested with NlaIII. Samples are SenSATIVAx prepped post digestion to eliminate NlaIII and remove small fragments.  Barcoded overhang adaptors are then ligated to size selected libraries and SenSATIVAx purified post ligation. Barcoded Libraries are pooled and Size Selection is performed with a SAGE system* to eliminate large fragments.12 Cycles of PCR are performed and Libraries are re-Size Selected. Samples are quantitated with a Qubit or qPCR system and loaded on MiSeq or HiSeq sequencers for 2 x 250bp sequencing.

2 x 250bp reads are aligned using BWA-MEM to multiple references and VCF files generated from alignments to the CanSat3 reference. We only use biallelic SNPs at positions that have a coverage of at least 5x across all the samples to generate the tree. Only reads that were uniquely mapped were considered for variant calling.

*StrainSEEK® V1.0 (most samples on this page) had 150-200bp inserts. StrainSEEK® V2.0 has 300bp inserts (these data will be posted shortly). An Example of Samples with larger inserts soon to be posted are below.

Screen Shot 2015-09-04 at 8.37.58 AM
Screen Shot 2015-09-11 at 4.46.20 PM