Complexity of the Cannabis Genome:
- 400Mb est via Next Gen Sequencing. Kew Gardens estimate of 700-800Mb
- 67% AT
- Highly Polymorphic (SNP every 10-20bp in Synthase genes)
- Highly repetitive
- Selective breeding for 30 years for chemotyptic QTLs
- 5-10 week breeding cycle
- Sativa, Indica, Ruderalis likely interbred
Due to this complexity we have incorporated a diverse set of tools to decode the Cannabis Genome.
- Breeding collaborations with DNA Genetics in Amsterdam to obtain triple backcrossed Pure Indica DNA from Cannabis Cup winner LA Confidential
- Roche has kindly provided early access to their 750bp run module on the GS-FLX+ platform. Over 15M 700bp reads (630 Average, 750 mode) of sequence was obtained through their service center. Preliminary assemblies are with CLC bio.
- 131 Gb of Illumina HiSeq 2 x 100 reads from 230bp inserts applied to a Sativa/Hybrid cultivar ChemDawg.
- Breeding collaboration with Greenhouse Seeds to advise on high CBD inbred landraces and Ruderalis DNA.
- We are investigating Long Mate Pair SOLiD sequencing for super contig generation and Ion Torrent for validation.
- 92Mb of RNA-Seq has become partially available this summer via a web portal blast server at Medicinal Plant Genomics Resource (unrelated and a different cultivars). http://medicinalplantgenomics.msu.edu/
Read Length from 454 750bp Run Module
FastQC Report on ILMN HiSeq Data
Fri 1 Jul 2011lane2_49.5m.2.sequence.txt
|File type||Conventional base calls|
Per base sequence quality
Per sequence quality scores
Per base sequence content
Per base GC content
Per sequence GC content
Per base N content
Sequence Length Distribution
Sequence Duplication Levels
No overrepresented sequences
|Sequence||Count||Obs/Exp Overall||Obs/Exp Max||Max Obs/Exp Position|