Choose Assembly Type

The Choose Assembly Type dialog allows you to choose between several types of templated and/or de novo assemblies, along with an accuracy test for variant calls.

 

The image below shows only one version of this screen. Depending on your selection in the Choose Assembly Workflow screen, you may see a different subset of choices.

 

 

Choosing the assembly type (upper half of screen):

 

A subset of the following options available, depending on the workflow selected in the Choose Assembly Workflow screen. Note that options are described in alphabetical order and not in the order in which they appear in the wizard screen:

 

Assembly Type

Description

Reference based - normal

To assemble/align reads onto one or more reference sequences/templates. This type of assembly can include billions of reads and large eukaryotic genomes. The BAM-formatted assembly cannot be edited, but can be viewed and analyzed using a utility such as DNASTAR’s SeqMan Pro or GenVision Pro.

Reference based with host removal

To remove the DNA sequence of a specified host before assembling/aligning the remaining reads onto one or more reference sequences/templates.

 

Note: This option is only available if you chose Metagenomics/population assembly in the Choose Assembly Workflow wizard dialog.

De novo

To run a de novo (untemplated) assembly of up to 30 million sequence reads and up to a 50 Mbase total length for all contigs combined. The capacity is determined by the amount of available RAM.

 

When assembling a data set de novo, we recommend using paired end data if available. An exception is the de novo transcriptome assembly, which does not consider pairs.

 

Note: This option does not appear if you selected Exome and Gene Panel in the Choose Assembly Workflow wizard dialog.

De novo with host removal

To remove the DNA of a specified host before running a de novo (untemplated) assembly.

 

Note: This option is only available if you chose Metagenomics/population assembly in the Choose Assembly Workflow wizard dialog.

Reference guided with gap closure

To assemble/align reads onto one or more reference sequences/templates to identify structural variants. This option automates the assembly of indels using mate-pair data, and can include up to 10 million reads and up to a 100 Mbase genome. The SQD-formatted assembly can be edited at a later time using SeqMan Pro. For more information about this assembly type, see Reference-Guided Assembly with Gap Closure.

Reference guided - special

To assemble/align reads onto one or more reference sequences/templates. This workflow is most frequently used for extending off the ends of saved contig consensus sequences. This type of assembly can include up to 10 million reads and up to a 100 Mbase genome. It can be edited at a later time using a utility like SeqMan Pro.

 

Note: This selection was eliminated in SeqMan NGen 4.1, but was reintroduced has a heritage/legacy workflow for use only with the Whole Genome project type. We encourage you to use the normal templated or the reference-guided workflows whenever possible.

Variant calling accuracy test

To perform a reference SNP accuracy test. This option is only available if you selected Exome and Gene Panel in the Choose Assembly Workflow screen.

Sanger validation

To use Sanger reads to validate SNPs in Illumina assemblies, or to close gaps in genome workflows using both Sanger and Illumina reads. For more information about this assembly type, see Sanger Validation Workflow.

 

 

Specifying system settings (lower half of screen):

 

A “System information” section is provided in the lower half of the wizard screen. This section varies according to the selection above, but may display the amount of System memory available on your computer, as well as the current Temporary file location and the amount of Free space in that directory.

 

If you selected a templated or reference-guided assembly type in the upper part of the screen, you must designate a Temporary file location for the intermediate files produced during assembly by clicking the Browse button or by dragging and dropping the directory on the currently listed temporary file location. We recommend using an external hard drive as the temporary file location. SeqMan NGen will remember and use the temporary file location for future assemblies.

 

Click here for technical requirements – (local assemblies only) Click the link to open a DNASTAR web page describing technical requirements for different types of assemblies.

 

 

Tips regarding temporary files:

 

      Never save the assembly output files or temporary files directly to the desktop, as the many intermediate files and folders created during assembly may hamper or prevent further computer operations. However, files may be saved to a folder on the desktop.

 

      By default, most temporary files are deleted when the assembly is complete. Other files (e.g., [template_name].FasInfo.sqlite and [template_name].mer) may remain in the temporary file location in order to facilitate efficient reassembly of data in the future.

 

      You do not need to specify a temporary file location when following a de novo or special reference-guided workflow.

 

 

Once you are finished, click Next > to continue to the next wizard screen. SeqMan NGen will populate the rest of the wizard with appropriate default parameters for your assembly.