Complexity of the Cannabis Genome:
- 400Mb est via Next Gen Sequencing. Kew Gardens estimate of 700-800Mb
- Diploid
- 67% AT
- Highly Polymorphic (SNP every 10-20bp in Synthase genes)
- Highly repetitive
- Selective breeding for 30 years for chemotyptic QTLs
- 5-10 week breeding cycle
- Sativa, Indica, Ruderalis likely interbred
Due to this complexity we have incorporated a diverse set of tools to decode the Cannabis Genome.
- Breeding collaborations with DNA Genetics in Amsterdam to obtain triple backcrossed Pure Indica DNA from Cannabis Cup winner LA Confidential
- Roche has kindly provided early access to their 750bp run module on the GS-FLX+ platform. Over 15M 700bp reads (630 Average, 750 mode) of sequence was obtained through their service center. Preliminary assemblies are with CLC bio.
- 131 Gb of Illumina HiSeq 2 x 100 reads from 230bp inserts applied to a Sativa/Hybrid cultivar ChemDawg.
- Breeding collaboration with Greenhouse Seeds to advise on high CBD inbred landraces and Ruderalis DNA.
- We are investigating Long Mate Pair SOLiD sequencing for super contig generation and Ion Torrent for validation.
- 92Mb of RNA-Seq has become partially available this summer via a web portal blast server at Medicinal Plant Genomics Resource (unrelated and a different cultivars). http://medicinalplantgenomics.msu.edu/
Read Length from 454 750bp Run Module
FastQC Report on ILMN HiSeq Data
Fri 1 Jul 2011lane2_49.5m.2.sequence.txt
Summary
Basic Statistics
| Measure | Value |
|---|---|
| Filename | lane2_49.5m.2.sequence.txt |
| File type | Conventional base calls |
| Encoding | Illumina 1.5 |
| Total Sequences | 49500000 |
| Sequence length | 101 |
| %GC | 33 |
Per base sequence quality

Per sequence quality scores

Per base sequence content

Per base GC content

Per sequence GC content

Per base N content

Sequence Length Distribution

Sequence Duplication Levels

Overrepresented sequences
No overrepresented sequences
Kmer Content

| Sequence | Count | Obs/Exp Overall | Obs/Exp Max | Max Obs/Exp Position |
|---|---|---|---|---|
| GGGGG | 3417040 | 5.193743 | 9.2707405 | 95-96 |
| CCCCC | 2201195 | 3.5637 | 4.859973 | 95-96 |
| GGGCC | 2064415 | 3.2180467 | 3.7392786 | 3 |
| GGCCC | 1983870 | 3.131778 | 3.508401 | 95-96 |