100-300mg of plant material was obtained and purified in Holland to respect US State and Federal cannabis prohibition. Stem and root are preferred tissues. DNA Purified with a Qiagen MaxiPrep column is predominantly more than 10,000 bases in length for 10 different cultivars. Yields varied by cultivar and is more likely due to purification variance than genome size differences reported by Lee below.
Some cultivars (Grand daddy Purple and Purple Kush) extracted with visible pigment in the aqueous phase and failed to freeze at -20C suggesting residual salts or glycols in the preps. A 1:1 SPRI was used to further purify the samples to enable freezing. Samples with residual pigment were deprioritized for sequencing.
DNA from Sativa and Indica strains were selected for Sequencing. Triple Back Crossed L.A. Confidential was prioritized for the Indica line to confront polymorphism potentially present in the outbred lines.
Purified DNA is then nebulized to break the DNA down to more managable pieces as Large DNA acts like a viscous polymer which is difficult to manage. Once broken into smaller pieces, known sequences or “Primers” (also known as “Adaptors”) are added to both ends of the DNA fragment. These known sequence sites can be any sequence a person desires but are preferably sequences the popular DNA sequencing platforms utilize for sequencing. Once “Adapted” one can size select the size range of DNA one wants for sequencing. It is preferable to have a very tight size distribution. Much tighter than is shown above where fragments range from 50bp to 1500bp. A fraction of this material in the 300-400bp range is collected and a Polymerase Chain Reaction performed to make many copies of the molecules in this size range. Once many copies are made they can be put on a Next Generation Sequencer for Massively Parallel Sequencing.
After 15 cycles of PCR, a much tighter size distribution is available. 15 cycles of PCR, although recommended by many sequencing companies is the limit in terms of the numbers of cycles one should use and ideally 5-10 cycles are preferred as Library PCR is a competitive process and small amplicons and GC rich amplicons can amplify at faster rates than longer or AT rich regions. This creates a darwinian bias in the amplification which can challenge DNA assembly algorithms. One can see a slight shoulder at 240bp emerges do using so many PCR cycles.
This slight shoulder is then exacerbated in cluster PCR which preferably amplifies AT rich clusters into bigger clusters than GC rich clusters.
Initial methods used to sequence the cannabis genome in 2011 have been replaced with more recent methods. In 2011, 3-5ug (micrograms or millionths of a gram) of DNA were required to sequence a genome. In 2012, methods have been optimized to enable genome sequencing with <10ng (nanograms or billionths of a gram) of DNA. This represents a 1000 fold reduction in the amount of DNA required to sequence a cannabis genome. As a result its likely
2011 Methods cost $200 to make a library and utilized Acoustic Bombardment to fragment the DNA from chromosomes into 500bp fragments ready for sequencing. DNA is fragmented, then end repaired with exonucleases and polymerases. Finally, Primers are ligated to the DNA fragments so they can be PCR amplified and sequenced.
2012 Methods now cost $50 to make a library. DNA shearing, End Repair and ligation have all been replaced with a Nextera transposon method from Epicentre. This 10min protocol is picogram sensitive and can be performed for less than $10 with optimization. DNA purification from cannabis seeds has been described to provide 150ng to 500ng per seed.