The C. Sativa Genome

http://csativa.elasticbeanstalk.com/

  • We have recently released a genome assembly performed with CLCbio’s workbench and 1 Lane of the Illumina HiSeq data.
  • This brings the complexity of the data from 650M 100 base pair reads down to 174K contigs (contiguous consensus sequences) with an average size of 2250 bases: A 3000 fold improvement in the order of the data (174K sequence strings instead of 650M).
  • This is 1/7th of the total ChemDawg dataset so the assembly only adds up to 286,273,807 bases.
  • We have a 2 lane assembly currently being reviewed which will be released concurrently with the Indica “LA confidential”. This assembly adds up to 378,692,809 bases and was provided by CLCbio. It was assembled with their Command Line Interface on a 48Gb RAM box.
  • Estimating the size of a highly polymorphic diploid genome will be affected split alleles. Most diploid genome sizes are reported as n=1 like the human genome (n=1 of 2.9Gb). Most assemblers will split maternal and paternal alleles into seperate contigs if they are highly variable as they are in Cannabis so the assembly sizes are likely effected by this. We encourage many other assmblers to be tested.

Our Partners in the Cloud

Our Partners in the Cloud