SeqMan Genome Assembler - Early Release

Seqman Genome Assembler (SMGA) is a command-line applications that uses a unique algorithm to assemble fragment data sequenced using Next Generation sequencing Illumina® and 454® platforms and Sanger sequencing technologies on a desktop computer. Its integration with Lasergene for analysis and visualization features provides users with a simple and powerful tool to use with Next Generation sequencing data assembly and analysis.

Key Features of SMGA include:

  • » Desktop computer (Win or Mac) assembly for Illumina, 454 and Sanger platforms
  • » Integrated with popular Lasergene sequence analysis software to permit a wide range of sequence analyses and visualizations
  • » Easy to set up and use
  • » Full technical support by DNASTAR.

Below is a general workflow for SeqMan Genome Assembler that shows how it integrates with the SeqBuilder and SeqMan Pro modules of Lasergene.

 

» Annotate template sequence(s) in SeqBuilder (optional)

» Write Script for Assembly

» Run Script in SMGA to Assemble Sequence Data

» View Assembly Statistics and/or Analyze in SeqMan Pro

 

Annotating Template Sequence Prior to Assembly

Prior to assembling your sequences in SMGA, a user may annotate their template sequence in SeqBuilder for known SNPs/variations and other features. SeqBuilder is the sequence editing and visualization application in the Lasergene suite. Assembling an annotated template sequence in SMGA will enable the user to better analyze the identified putative SNPs when viewing the assembled project in SeqMan Pro.

SeqBuilder offers many feature types, a subset of which can be viewed within SeqMan Pro:

» Variation (SNP)

» CDS

» Miscellaneous RNA

» Rep origin

» rRNA

» tRNA

» Exon

 

User Defined Flexibility

SMGA offers complete flexibility in adjusting assembling parameters to meet the needs of your specific data set.The default parameter settings are also included to simplify analysis. Since data sets and data types vary greatly, SMGA offers the flexibility to adjust each assembling parameter to fit your data set.

An example of the readout with representative parameter settings is shown below:

Example
setParam useRepeatHandling:true
setParam coverageType:fixed
setParam fixedCoverage:6
setParam matchSize:15
setParam minMatchPercent:90
setParam matchSpacing:10
setParam maxRepeatPercent:150
setParam maxUsableCount:25
setParam maxGap:15
setParam matchWindowLength:50
setParam matchScore:10
setParam maxAssemblyCoverage:0
setParam gapPenalty:30
setParam mismatchPenalty:20
setParam min454SeqLen:50
setParam max454SeqLen:350
setParam defaultQuality:15
setParam templateDefaultQuality:50
setParam splitFalseJoins:True
setParam falseJoinMinColDepth:4
setParam falseJoinMinInconsistent:4
setParam falseJoinMinFraction:25
setParam falsJoinMinMatches:2
setParam falsJoinUniformQual:true
setParam falseJoinQualThresh:15
setParam allowConstraintBased:true
setParam skipRealign:false

In addition, users may select a number of preprocessing options, including vector and end-trimming along with the ability to exclude known repeats and contaminant sequences, such as primer reads, from your assembly. An example of the readout with representative parameter settings is shown below:

Example

assemble
trimEnds:false
vectScan:false
repeatScan:false
contamScan:false
doAssemble:true

 

Viewing Assembly Results in SeqMan Pro

Following assembly in SMGA, the saved project can be viewed in Lasergene's SeqMan Pro to analyze coverage and to identify SNPs. As shown below, your assembly will be displayed in the Project window. If you choose to save the unassembled sequences in your assembly, they will be displayed in the Unassembled Sequences window.

 

 

Viewing Areas Exceeding the Maximum Depth of Coverage

SepMan provides users with the option of entering a value for the Maximum Assembly Coverage. Once this is done, it is easy to visualize areas that likely exceed this coverage parameter.


Strategy View

As the following example shows, by going to the Strategy View in SeqMan Pro, areas exceeding the desired Maximum Expected Coverage will be shown in Red


 

Dual End Pair Characteristics

SMGA also permits users to set dual end pair specifier characteristics for the paired Sanger sequences in their assembly. Pair specifiers define the naming convention for sequence pairs, as well as your requirement for a minimum and maximum distance between the opposite ends of your inserts. Forward and Reverse naming patterns for clones can also be included.

 

Exporting of Reports

SMGA can also export a report summarizing your assembly statistics including the number assembled/unassembled (matched/unmatched) sequences and contigs in your project, the parameters used, the average quality scores and the number of sequences excluded from the assembly due to exceeding the Maximum Coverage parameter.

 

Next Generation Sequencing Assembly

Illumina data requires a template sequence (also called a reference sequence) for assembling in SMGA. Multiple templates can also be used if desired. By adjusting parameters, such as the match size, the quality of assembly can be altered where the genomic template exceeds 5Mb.

454® and Sanger data sets can be assembled either de novo or with a template sequence. Deep 454 assemblies can be performed. Adjusting the gap parameter value to allow more sequences to be aligned into each contig can be done by the user to optimize the quality of the analysis.

Lasergene v7.2 or higher is required for use with SeqMan Genome Assembler. For more information on Lasergene v7.2 click here.