SNG Commands

Note: To see how SNG commands and parameters map to equivalent SeqMan NGen wizard settings, see Equivalence Between Wizard Settings and SNG Scripting Commands.

 

Command

Parameter

Description

Allowed values (defaults in bold)

Wizard equivalent

Project Management Commands

closeProject

 

Closes the current project and frees the memory in use so that the system is ready for additional assemblies. This can be useful if you want to run multiple assemblies in one script.

runScript

 

Allows you to run a table script within the current script. A table script references variable values for specified parameters and other elements in a script. This enables you to run multiple projects from the same script, substituting new parameter values and other variables each time. SeqMan NGen will run the table script repeatedly, using the variable values from one row of the table for each iteration of the script until all of the rows have been used.

 

Example:

 

runScript

 script: “/Library/abc_Project/abc_script.script”

 table: “/Library/abc_Project/table.txt”

 

file

Specifies the directory and file/folder.

[directory/filename enclosed in quotes]

 

 

script

(required) Specifies the directory and file name of the table script you wish to run.

[directory/filename enclosed in quotes]

 

 

table

(required) Specifies the delimited text file containing the variable values.

[directory/filename enclosed in quotes]

 

saveProject

 

This command saves the assembly to a project file. By default, the SeqMan Pro project file format (.sqd) is used. Phrap (.ace) and FASTA (.fas) formats may also be specified by using the format parameter, and specifying the desired file extension using the file parameter.

 

Note: As a command-line tool, SeqMan NGen will not prompt you if you try to save a new project file with the same name as an existing file in the same location. When you run a script multiple times, be sure to change the file name of the project to be saved each time to prevent existing project files from being overwritten.

 

Example:

 

SaveProject

 file: “/Library/My projects/ABC_project.sqd”

 format:seqman

 openInSeqMan:true

 

file

(required) Specifies the directory and file name of the project file to be saved.

directory/filename enclosed in quotes]

 

 

format

Specifies the output file format.

 

    SeqMan - Saves a 64-bit SeqMan Pro project file (.sqd) that is compatible with SeqMan Pro version 8.1 and higher (default).

 

    SeqMan8 - Saves a 32-bit SeqMan Pro project file (.sqd) that is compatible with SeqMan Pro version 8.0 and higher.

 

    SeqMan7 - Saves a 32-bit SeqMan Pro project file (.sqd) that is compatible with SeqMan Pro version 7.2 and higher. Note that this project file will be much bigger than the same project created in either of the SeqMan formats listed above.

 

    Phrap - Saves an .ace file.

 

    Fasta - Saves .fas and .qual files of the consensus sequence for each contig.

 

    BAM - Saves a BAM file (SNG/SMNG templated assemblies only).

 

    SAM - Saves a SAM file (SNG/SMNG templated assemblies only).

[SeqMan|SeqMan8|SeqMan7|Phrap|Fasta|BAM|SAM]

 

 

onePackage

Specifies whether an assembly containing multiple reference sequences should be bundled into a single .assembly package. If ‘false’ is entered, one .assembly package is created per contig.

[true|false]

 

 

openInSeqMan

Specifies whether to automatically launch SeqMan Pro and open the completed assembly once the script has completed.

[true|false]

 

saveReport

 

Exports a report as a text file that summarizes assembly statistics, including the parameters used, the number of assembled/unassembled sequences and contigs, average quality scores, and the number of sequences excluded from the assembly due to exceeding the maxAssemblyCoverage parameter. The same information contained within this report is also saved within the SeqMan Pro project file (.sqd) regardless of whether you choose to export the report by setting this parameter. The report can be viewed in SeqMan Pro using the Project>Report command.

 

Example:

 

saveReport

 file: “/Library/abc_Project/abc_report.txt”

 

file

(required) Specifies the directory and file name of the report to be saved.

[directory/filename enclosed in quotes]

 

WriteUnassembledSeqs

 

Saves all sequences that were not assembled in the project as .fas and .qual files.

 

file

(required) Specifies the directory and file name of the unassembled sequences to be saved.

[directory/filename enclosed in quotes]

 

 

saveTrimmed

Specifies whether to save only the trimmed portion of the unassembled sequences.

[true|false]

 

File Loading Commands and Parameters

load454PairedEnd

 

Loads a file of Roche 454 sequences and checks for the presence of a linker defining the paired end sequences. If the linker is found, the linker is removed and the remaining portion is split into two sequences linked with a paired end constraint.

 

Example:

 

load454PairedEnd

file: “/Library/454 data/123_Pairedend.fas”

linker: “/Library/454 data/123_linkerseqs.fas”

min: 0

max: 10000

DiscardLinkerless: false

 

DiscardLinkerless

Specifies whether to discard any read where no portion of the mate pair linker was found. In this way, reads that do not have a linker sequence will be discarded from the assembly.

[true|false]

 

 

file

The directory and file name of the .fas, .fna, or .sff file containing the 454 sequences.

[directory/filename enclosed in quotes]

 

 

linker

The directory and file name of the .fas, fna, or .sff file containing the 454 linker sequences. If not specified, SeqMan NGen will use its default 454 linker sequence: GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC.

[directory/filename enclosed in quotes]

 

 

max, maxDistance

The maximum distance for the paired end constraint.

[number]

 

Default = 10000

 

 

min, minDistance

The minimum distance for the paired end constraint.

[number]

 

Default = 0

 

loadConstraint

 

Loads a constraint file. The file can be in the NCBI ancillary file format, or in the CAP3 constraint file format. SeqMan NGen uses constraint files to identify paired end reads, similar to using the setPairSpecifier command. Constraint files in the NCBI ancillary file format also contain trimming information, which SeqMan NGen will load and use. SeqMan NGen will create a CAP3 file when saving a Phrap project (.ace) that used paired end constraints.

 

Example:

 

loadConstraint

file: “/Library/constraints/123_xyz.con”

 

file

The directory and file name of the constraint sequence file.

[directory/filename enclosed in quotes]

 

loadContaminant

 

Loads a contaminant sequence file to be used to identify known contaminants, such as primers, in the assembly. Sequences that contain at least 12 matching 17-mers are flagged as contaminant sequences and will be removed from the assembly. See our website for a list of supported file types.

 

Example:

 

loadContaminant

 file: “/Library/contaminants/123_abc.seq”

 

file

The directory and file name of the contaminant sequence file. A folder may also be specified, in which case all of the sequence files within that folder will be loaded and used for contaminant screening.

[directory/filename enclosed in quotes]

 

loadLayout

 

Loads a layout file to be used for an assembly. The format may be either a SOLiD General Feature Format file (.gff) or a File of Filenames file (.fof). When this command is used, SeqMan NGen still aligns each read from the file to the template, but uses the information contained within the specified file to determine the overall layout of reads.

 

Example:

 

loadLayout

 templateFile: “/Library/123_project/template.seq”

 layoutFile: “/Library/123_project/layoutfile.gff”

 

layoutFile

(required) Specifies the directory and file name of the layout file. Both .gff and .fof formats are accepted

[directory/filename enclosed in quotes]

 

 

templateFile

(required) Specifies the directory and file name of the reference sequence file.

[directory/filename enclosed in quotes]

 

loadRepeat

 

Loads a sequence file to be used to identify repeat sequences in the assembly. All sequences identified as repeats will be added to the assembly last, after all non-repeats have been assembled. See our website for a list of supported file types.

 

Example:

 

loadRepeat

file: “/Library/repetitive_seqs/123_repeat.seq”

 

file

(required) Specifies the directory and file name of the repeat sequence file. A folder may also be specified, in which case all of the sequence files within that folder will be loaded and used as repetitive sequences.

[directory/filename enclosed in quotes]

 

loadSeq

 

Loads a sequence file or files for assembly. See our website for a list of supported file types.

 

Example:

 

loadSeq

file: “/Library/ABC_project/ABC_sequences.fas”

 

blockContig

Used in the reference-guided workflow.

[text string]

 

 

blockContigID

Used in the reference-guided workflow.

[number]

 

 

blockName

Used in the reference-guided workflow.

[text string]

 

 

blockPos

Used in the reference-guided workflow.

[number]

 

 

DiscardLinkerless

Specifies whether reads that do not have a linker sequence should be discarded from the assembly.

[true|false]

 

 

file

(required) Specifies the directory and file name of the sequence file(s) to be loaded. A folder may also be specified, in which case all of the sequence files within that folder will be loaded.

[directory/filename enclosed in quotes]

 

 

groupName

Used to identify the multi-sample group name for a read file.

[text string]

 

 

isPair

Specifies whether the query files contain paired end data.

[true|false]

 

 

linker

The directory and file name of the .fas, fna, or .sff file containing the 454 linker sequences. If not specified, SeqMan NGen will use its default 454 linker sequence: GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC.

[directory/filename enclosed in quotes]

 

 

max

The maximum distance for the paired end constraint.

[number]

 

Default = 10000

 

 

maxSeqs

Specifies the maximum number of reads to load from a file.

[number]

 

 

mergePairs

Specifies whether the reads are paired end data that overlap and should therefore be merged.

[true|false]

 

 

min

The minimum distance for the paired end constraint.

[number]

 

Default = 0

 

 

minSeqLen

Minimum length of a sequence required to include it in the assembly.

[number]

 

 

multiplex

Specifies whether reads are from a multi-sample run.

[true|false]

 

 

seqTech

Specifies the offset to be used when converting compressed quality scores into numerical values. These are the offsets used for the technology specified:

 

Data Type

Value

Offset

IonTorrent

IonTorrent

33

Applied Biosystems SOLiD

SOLiD

33

Illumina

Illumina

64

Roche 454

454

33

Other types

normalScore

33

 

Note 1: For 454,quality scores for homopolymeric runs of ≥ 2 are oriented from 5' to 3' on the top strand.

 

Note 2: If possible, the data type of unknown data is determined automatically based on the first data file.

 

[IonTorrent|SOLiD|Illumina|454|normalScore|Other]

 

 

templateFragment

Used in reference-guided assemblies with gap closure.

[number]

 

LoadTemplate

 

Loads a sequence file to be used as a template for all other sequences to be assembled to. The template sequence will be displayed as a “reference” sequence in SeqMan Pro for SNP analysis. See our website for a list of supported file types.

 

Example:

 

loadTemplate

file: “/Library/abc_Project/abc_template.seq”

 

file

(required) Specifies the directory and file name of the template sequence file to be loaded. A folder may also be specified, in which case all of the sequence files within that folder will be loaded and treated as template sequences.

[directory/filename enclosed in quotes]

 

LoadVector

 

Loads a vector sequence file to be used for vector trimming. See our website for a list of supported file types.

 

Example:

 

loadVector

file: “/Library/vectors/123_vector.seq”

cloneSite:826

 

cloneSite

This parameter specifies the position of the cloning site on the vector where insertion occurs.

[number]

 

 

file

(required) Specifies the directory and file name of the vector sequence file to be used for vector trimming.

[directory/filename enclosed in quotes]

 

openProject

 

Loads an existing assembly project into memory.

 

file

(required) Specifies the directory and file name of the project file to be loaded.

[directory/filename enclosed in quotes]

 

setDefaultDirectory

 

(required) Defines the default directory for the project. When a default directory is specified, files located in that directory only need to be identified by their subfolder and/or file name in subsequent commands.

 

Examples for setDefaultDirectory:

 

setDefaultDirectory: “/Library/ABC_proj/”

 

Once you have set a default directory, you may use two periods before a file name to specify that the file you wish to use is located in the parent folder of the default directory you specified.

 

Example:

 

loadVector file: “../123Vector.fas”

 

This specifies that the vector file, 123Vector.fas, is located in the ABC Data folder, the parent folder of the default directory.

 

directory

(required) Specifies the default directory.

[directory/filename enclosed in quotes]

 

 

defaultMacDirectory

Specifies the default directory for Macintosh.

[directory/filename enclosed in quotes]

 

 

defaultWinDirectory

Specifies the default directory for Windows.

[directory/filename enclosed in quotes]

 

Parameter Settings Commands

setContaminantParam

 

Allows you to adjust the parameters used for scanning for contaminant sequences. In order to be applied, this command must appear in the script before the loadContaminant command, and the contamScan parameter for the assemble command must be set to ‘true.’

 

Example:

 

setContaminantParam MerLength:17

setContaminantParam MinMerMatch:12

 

MerLength

The minimum length of a mer required to be considered an exact match when scanning for contaminants.

[number from 5-50]

 

Default = 17

Advanced Trim/Scan Options: Mer length

 

MinMerMatch

The minimum number of matching mers required to mark the sequence as a contaminant.

[number from 1-50]

 

Default = 12

Advanced Trim/Scan Options: Minimum matches

setParam

 

Allows you to adjust the stringency of one or more of the assembling parameters for the project. SeqMan NGen will use the default values for any parameter that is not specified within the script.

 

Example:

 

setParam SNP: true

setParam snp_minVariantDepthToScore: 2

setParam snp_minWeight: 5

setParam snp_combineSubs: true

setParam snp_excludeBasesEdge: 0

setParam snp_maxRun: -1

setParam snp_maxStrandBias: -1

setParam snp_minHomopolDelDepth: 0

setParam snp_minHomopolDelFrac: 0

setParam snp_minHomopolInsDepth: 0

setParam snp_minHomopolInsFrac: 0

setParam snp_minSoftDepth: -1

setParam snp_minSoftPnotRefPct: -1

setParam snp_minSoftSnpPct: -1

setParam snp_minStrandCov: 0

setParam snp_runVar: false

setParam snp_checkStrandedness: false

setParam snp_minProbNonrefToCall: 0.1

setParam SNPmethod: diploid

setParam snp_minPctToScore: 0.05

 

Example:

 

In the transcript annotation workflow, reads clustered with XNG are reassembled using SNG. In order to minimize mis-joins, the initial phase of the assembly is done at high stringency using the following parameters:

 

setParam    

  merLength: 21

  minMatchPercent: 97

  useRepeatHandling: false

  minContigSeqs: 101

 

Two assembly passes are performed for each read cluster. During the first pass, contigs are assembled from the reads after which those with less than 101 reads are dis-assembled and added to the unassembled sequences pool for that cluster. During a second pass SNG attempts to merge the assembled contigs and add any of the unassembled sequence reads from the first pass. To facilitate merging, minMatchPercent is lowered to 85 for this pass.

 

setParam

  minMatchPercent: 85

 

 

AllowConstraintBased

Specifies whether the assembler should use constraints during assembly.

[true|false]

 

 

AssembleBoneyard

Specifies whether, after a templated assembly has been completed, the unassembled sequences remaining should be assembled into contigs. If the template has been split, SeqMan NGen will attempt to join the split contigs together in new arrangements. (Note: “Boneyard” is a term for sequences that were not assigned to any contig).

[true|false]

Assembly Options (De Novo, Special Reference-Guided): De novo assemble unassembled reads

 

CoverageType

Specifies the type of coverage to be used for repeat handling. ‘Genome’ uses the length of the genome being assembled to calculate the expected coverage. ‘Fixed’ uses a fixed value as the expected coverage. If you know the length of the genome/fragment being assembled, we recommend using ‘genome’ for this parameter and then specifying the length using the genomeLength parameter. If you do not know the genome/fragment length, use ‘fixed’ and provide the most accurate estimate of expected coverage for the FixedCoverage value.

[genome|fixed]

 

 

DefaultQuality

The value used for the base quality of sequences without quality scores.

[number from 5-100]

 

Default = 15

Advanced Assembly Options (De Novo): Default quality

 

FixedCoverage

The estimated depth of the sequencing, which can be used instead of the genome length for repeat handling. Use caution when estimating the value for fixedCoverage. If the value you use is significantly lower than the actual depth, the assembly may take a much longer time to complete and may have too many mers flagged as repeats.

[number from 1-65535]

 

Default = 20

 

 

GapPenalty

The penalty for opening or extending a gap during an alignment. This penalty is deducted from the pairwise score used to calculate match percentage. A high gap penalty suppresses gapping, while a low value promotes gapping.

[number from 0-1000]

 

Default = 30 for most workflows, 50 for the transcript annotation workflow

Advanced Assembly Options (De Novo): Gap penalty

 

GenomeLength

Specifies the length of the genome or fragment being assembled. This is used to calculate expected coverage in determining repeat handling. (Note: this parameter was called “setGenomeParam” prior to SeqMan NGen 2.0.)

[number from 0-1015 ULL]

 

Default = 0

 

 

HaploidSNP

Specifies whether to use the second most common base at a position when performing SNP passes. (See the snpPasses parameter). Using this parameter will increase the SNP percentage for SNPs occurring on one allele of a diploid genome in a templated assembly. When haploidSNP is set to ‘true,’ the lowCoverageThreshold parameter value should be greater than zero.

[true|false]

 

 

HaploidThreshold

The minimum number of times that the second most common base must occur at a position in order for it to be used to find SNPs during haploid SNP passes. (See the haploidSNP parameter above).

[number from 0-100]

 

Default = 0

 

 

LowCoverageThreshold

The minimum coverage required in an assembly to be excluded from SNP passes. SeqMan NGen will include regions in an assembly that have coverage less than the value specified as well as regions with zero coverage when it performs SNP passes. (See the snpPasses parameter).

[number from 0-10000]

 

Default = 0

Advanced Assembly Options (De Novo): SNP low cover cutoff

 

MatchRepeatPercent

The percent frequency a mer occurs compared to its expected frequency. Mers exceeding this value are flagged as repeated and not used as mer tags in determining overlaps. (Note: this parameter was called “maxCoverageRatio” prior to SeqMan NGen 2.0.)

[number from 100-1000]

 

Default = 150

Advanced Assembly Options (De Novo): Match repeat percent

 

MatchScore

The score for a base match during an alignment. This score contributes to the pairwise score used to calculate match percentage. Increasing the matchScore value will allow for longer or more frequent gaps, thus forcing bases that match to be assembled together.

[number from 1-1000]

 

Default = 10

Advanced Assembly Options (De Novo): Match score

 

MatchSize

The minimum number of matching consecutive bases required to determine the overlap of sequence reads. If an even number is entered, SeqMan NGen will automatically increase the value to the next odd number. (Note: this parameter was called setParamMerLength prior to SeqMan NGen 2.0.)

[odd whole number]

 

Default = 21

Assembly Options (De Novo, Special Reference-Guided): Mer size

 

MatchSpacing

The length of the window of a sequence read where at least one mer tag will be chosen. (Note: this parameter was called “merTagWindow” prior to SeqMan NGen 2.0.)

[number from 1- 1000000]

 

Default = 50

Advanced Assembly Options (De Novo): Match spacing

 

MatchWindowLength

The size of the window used to calculate the match percentage.

[number from 10-1000]

 

Default = 50

Advanced Assembly Options (De Novo): Match window

 

MaxAssemblyCoverage

The maximum depth of coverage allowed in the templated assembly. SeqMan NGen will not exceed the coverage specified by this threshold. This parameter is only available for templated assemblies, and should be used with caution as it will limit the number of sequences included in the assembly. A value of 0 indicates unlimited coverage.

[number from 0-65535]

 

Default = 0

Advanced Assembly Options (De Novo): Maximum coverage

 

MaxContigs

The maximum number of contigs to write to an .assembly project. This command is not generally needed due to SeqMan's capacity to handle a very large number of contigs.

[number]

 

 

MaxGap

The maximum number of gaps allowed per 1000 bases in the alignment.

[number from 0-1000]

 

Default = 6

Advanced Assembly Options (De Novo): Max gap

 

MaxUsableCount

Any mers occurring more frequently than FixedCoverage multiplied by MaxUsableCount are disregarded as mer tags from the assembly.

[number from 1-65535]

 

Default = 25

Advanced Assembly Options (De Novo): Max usable

 

MinContigSeqs

The minimum number of sequences in a contig. After an assembly has been completed, any contigs without a template sequence will be disassembled if they contain fewer sequences than the number specified. The use of this parameter is recommended when performing de novo assemblies using data from Next Generation sequencing technologies, such as Illumina, as these types of assemblies can produce tens of thousands of very small contigs.

[number from 0-10000]

 

Default = 0

 

 

Minimizer

(Intended for internal use only). An experimental way of choosing mer tags that may save time and memory. The accuracy of this parameter has not been verified by DNASTAR.

[number]

 

 

MinMatchPercent

The minimum percentage of matches in an overlap required to join two sequences in the same contig. (Note: this parameter was called “minMatchPercentage” prior to SeqMan NGen 2.0.)

[number from 0-100]

 

Default = 93

Assembly Options (De Novo, Special Reference-Guided): Minimum match percentage

 

MismatchPenalty

The penalty for a base mismatch during an alignment. This penalty is deducted from the pairwise score used to calculate match percentage.

[number from 0-1000]

 

Default = 20

Advanced Assembly Options (De Novo): Mismatch penalty

 

SkipRealign

This parameter only affects de novo assemblies, and specifies whether to skip the realignment step of the assembly. The realignment step will then analyze each sequence at the nucleotide level to determine the exact position of each sequence in the alignment.

[true|false]

Assembly Options (De Novo, Special Reference-Guided): Realign reads after assembly

 

SNP

Specifies whether a SNP detection pass of the gapped alignment is made during the assembly.

[true|false]

 

 

snp_checkStrandedness

Specifies whether the strand that each read comes from is considered in the SNP calculation. This is ignored by the simple SNP calling method (used when genome ploidy is “Heterogeneous”).

[true|false]

 

 

snp_minPctToScore

Specifies minimum percentage of reads in a column which must differ from the reference in order to score the column. For the simple SNP calling method (used when genome ploidy is “Heterogeneous”), this is the only criteria used to call a SNP. For the Bayesian SNP calling methods (used when genome ploidy is “Diploid” or “Haploid”), this is a filter applied before the other parameters.

[number from 0-1]

 

Default = 0.05

 

 

snp_minProbNonrefToCall

Specifies the minimum probability of a SNP column which is required to call a SNP, expressed as a number from 0 and 1. The probabilities of all genotypes other than Homozygous Reference are totaled and checked against this number. This is the final filter applied during the Bayesian SNP calling methods (used when genome ploidy is “Diploid” or “Haploid”). This is ignored by the simple SNP calling method (used when genome ploidy is “Heterogeneous”).

[number from 0-1]

 

Default = 0.1, requiring a minimum 10% change

 

 

snp_minVariantDepthToScore

(required if “snp” is true) Specifies the minimum depth required for a specific base (or deletion) in a column before it is considered usable for SNP calling. This is the second filter applied during the Bayesian SNP calling methods (used when genome ploidy is “Diploid” or “Haploid”). This is ignored by the simple SNP calling method (used when genome ploidy is “Heterogeneous”).

[number from 0-100]

 

Default = 2

 

 

snp_minWeight

Called “Minimum base quality score” in the SeqMan NGen wizard, this parameter specifies the minimum quality score for a base to be considered in the SNP calculation.

[number]

 

 

SNPMatchPercentage

The minimum match percentage required during passes to fill in SNP regions. See the snpPasses parameter.

[number from 0-100]

 

Default = 90

Advanced Assembly Options (De Novo): SNP match percent

 

snpMethod

Specifies the SNP detection method to use. Simple produces a count of each type of base in the column and calculates the percent of non-reference bases. Haploid uses a Bayesian statistical model to calculate a probability score that the position contains a polymorphism and give a quality score for the base called at that position. Diploid uses a Bayesian statistical model to calculate a probability score that the position contains a polymorphism and give a quality score for the base(s) called at that position. Based on the scores, it also calls the genotype at each position.

[simple|haploid|diploid|population]

 

 

SNPPasses

The number of times SeqMan NGen will cycle through a templated assembly, attempting to fill in regions with low coverage or no coverage due to SNPs.

[number from 0-10]

 

Default = 2

Advanced Assembly Options (De Novo): SNP passes

 

SplitFalseJoins

Specifies whether the assembler should identify and splits false joins based on the set of false join parameters indicated.

[true|false]

 

 

SplitTemplateContigs

Specifies whether, after a templated assembly has been completed, the template should be split into contigs at areas where there is zero coverage. Split contigs will be grouped into scaffolds with a defined position to allow for easy sorting when the project is viewed in SeqMan Pro. Annotations on the template sequence will also be split, and any /codon_start qualifiers will be adjusted to stay in frame.

[true|false]

 

 

TemplateDefaultQuality

The value used for the base quality of template sequences without quality scores.

[number from 5-50000]

 

Default = 500

Advanced Assembly Options (De Novo): Default template quality

 

TrimToMer

Specifies whether to trim the reads to the matching mer tags within the read. For each read, SeqMan NGen looks for mers that exist in the template (for templated assemblies) or in any other read in the assembly (for de novo assemblies). It then sets the trimming for the read to the start of the first mer found and the end of the last mer found. Trimming to mer may be useful when assembling data without accurate quality scores, data with very short linkers, or when assembling SOLiD data.

[true|false]

 

 

UseRepeatHandling

Specifies whether to use the repeat probabilities to determine if a mer occurs too frequently to use. This parameter should only be used for de novo assemblies, unless the assembleBoneyard parameter is set to ‘true’ for the templated assembly.

[true|false]

Assembly Options (De Novo, Special Reference-Guided): Repeat handling

setQualityParam

 

Allows you to adjust the parameters used for quality trimming. In order to be applied, the trimEnds parameter for the assemble command must be set to ‘true.’

 

Example:

 

setQualityParam winLength:30

setQualityParam minAveLowQaul:14

setQualityParam minAveHiQaul:18

setQualityParam minEndBaseQaul:15

setQualityParam endRegion:15

setQualityParam nTrimWinLength:50

setQualityParam maxN:2

setQualityParam maxNHiQual:1

 

EndRegion

The number of bases at the end of a sequence considered to be the “end region” which is used by other quality parameters.

[number from 1-100]

 

Default = 5

 

 

MaxN

The maximum number of “N” bases permitted in the window used for N-based quality trimming.

[number from 1-100]

 

Default = 2

 

 

MaxNHiQual

The maximum number of “N” bases permitted in the window used for N-based quality trimming to meet the high-quality threshold.

[number from 0-100]

 

Default = 1

 

 

MinAveHiQual

The minimum averaged quality score of the evaluated window required to be considered high-quality.

[number from 10-40]

 

Default = 22

 

 

MinAveLowQual

The minimum averaged quality score of the evaluated window required to be considered low-quality.

[number from 5-40]

 

Default = 20

Advanced Trim/Scan Options: Minimum quality

 

MinEndBaseQual

The minimum quality base score required in the specified end region.

[number from 5-40]

 

Default = 15

 

 

NTrimWinLength

The length of the window used for “N-based” quality trimming. N-based quality trimming trims bases that are called “N” and is used only when quality scores are not available.

[number from 5-100]

 

Default = 7

 

 

WinLength

The length of the window used for averaging quality scores.

[number from 2-100]

 

Default = 5

Advanced Trim/Scan Options: Window

setRepeatParam

 

Allows you to adjust the parameters used for scanning for repetitive sequences. In order to be applied, this command must appear in the script before the loadRepeat command, and the repeatScan parameter for the assemble command must be set to ‘true.’

 

Example:

 

setRepeatParam merLength:17

setRepeatParam minMerMatch:2

setRepeatParam maxMerGap:10

setRepeatParam minFlagLength:50

setRepeatParam alignCutoff:100

setRepeatParam minEndFlagLength:25

 

AlignCutoff

The minimum acceptable alignment score. When the alignment score drops below the specified value, this indicates that the end of the alignment between the read and the repeat has been reached, and the alignment will stop.

[number from 10-1000000]

 

Default = 100

 

 

MaxMerGap

The maximum distance between two mers required to be considered a matching pair.

[number from 0-50]

 

Default = 10

 

 

MerLength

The minimum length of a mer required to be considered an exact match when scanning for repeats.

[number from 5-50]

 

Default = 17

Advanced Trim/Scan Options: Mer length

 

MinEndFlagLen

The minimum length required for a mer to be flagged as a repeat if the segment is bound by the end of the read.

[number from 5-1000000]

 

Default = 25

 

 

MinFlagLength

The minimum length required for a mer to be flagged as a repeat.

[number from 5-1000000]

 

Default = 50

Advanced Trim/Scan Options: Flag length

 

MinMerMatch

The minimum number of matching mers required to start an alignment.

[number from 2-25]

 

Default = 2

Advanced Trim/Scan Options: Minimum matches

setVectorParam

 

Allows you to adjust the parameters used for vector trimming. In order to be applied, this command must appear in the script before the loadVector or TrimVector command, and the vectScan parameter for the assemble command must be set to ‘true.’

 

Example:

 

setVectorParam merLength:9

setVectorParam minMerMatch:3

setVectorParam MerGap:5

setVectorParam minTrimLength:30

setVectorParam minEndTrimLength:5

setVectorParam alignCutoff:100

setVectorParam endRegion:15

setVectorParam endCutoff:25

setVectorParam endMerMatch:1

 

AlignCutoff

The minimum acceptable alignment score. When the alignment score drops below the specified value, this indicates that the end of the alignment between the read and the vector has been reached, and the alignment will stop.

[number from 10-1000000]

 

Default = 100

 

 

EndCutOff

The distance to the endpoint where trimming will go all the way to the end of the sequence.

[number from 0-1000000]

 

Default = 25

Advanced Trim/Scan Options: Trim to end

 

EndMerMatch

The minimum number of mer matches required to start an alignment in the specified end region.

[number from 1-25]

 

Default = 1

 

 

EndRegion

The number of bases at the end of a sequence where a lower stringency for matching and trimming is used.

[number from 0-1000000]

 

Default = 15

 

 

MaxMerGap

The maximum distance between two mers required to be considered a matching pair.

[number from 0-50]

 

Default = 5

 

 

MergeTrimGap

Maximum distance between two trim segments that will cause the segments to be merged. MergeTrimGap limits trimming to the ends of sequence reads, while EndCutOff doesn't. Controls how sensitive trimming should be in areas where some portions of the sequence match a vector and other portions don't. The higher the number the more likely the vector trimmer will find all the vector sequence in a region of poor quality. The smaller the number, the more confidence there is that the bases trimmed are actually vector and not a spurious match.

 

[number from 0-1000000]

 

Default = 7, which is suitable for trimming linkers from the ends of sequences.

 

 

MerLength

The minimum length of a mer required to be considered an exact match when searching for vector.

[number from 5-25]

 

Default = 9

Advanced Trim/Scan Options: Mer length

 

MinEndTrimLength

The minimum length to be trimmed when a vector matches the end of a read. This parameter can be useful in preventing small spurious matches from being trimmed, which may be significant with short read technologies.

[number from 5-1000000]

 

Default = 5

 

 

MinMerMatch

The minimum number of matching mers required to start an alignment.

[number from 1-25]

 

Default = 3

Advanced Trim/Scan Options: Minimum matches

 

MinTrimLength

The minimum length required for a mer to be considered as a match for vector trimming.

[number from 5-1000000]

 

Default = 30

Advanced Trim/Scan Options: Trim length

Preprocessing and Assembling Commands and Parameters

assemble

 

(required) Reprocesses and assembles the sequences that have been loaded. Preprocessing may include quality trimming, and scanning for vector, repetitive, and contaminant sequences.

 

Example:

 

assemble

   trimEnds:false

   vectScan:false

   repeatScan:false

   contamScan:false

   doAssemble:true

 

assembleBlocks

Specifies whether the assembly is a reference guided assembly.

[true|false]

 

 

contamScan

If true, sequences will be scanned for the specified contaminant sequences before assembling. Also see loadContaminant.

[true|false]

 

 

doAssemble

If false, only the preprocessing will be done, and the sequences will not be assembled.

[true|false]

 

 

repeatScan

If true, sequences will be scanned for the specified known repetitive sequences before assembling. Also see loadRepeat.

[true|false]

 

 

trimEnds

If true, the sequences will be trimmed based on quality scores before assembling.

[true|false]

Read options: Quality end trim

 

vectScan

If true, the sequences will be scanned and trimmed for vector before assembling. Also see loadVector.

[true|false]

 

fixedTrim

 

Trims reads prior to assembly using fixed values. Based on the parameter settings for this command, SeqMan NGen will trim reads either by a specified number of bases from each end, or to a specified range.

 

Example:

 

fixedTrim

 end5:10

 end3:20

 trimRelative:true

 

end3

If trimRelative (see below) is set to ‘true,’ then this value indicates the number of bases for SeqMan NGen to trim from the 3' end of each read. If trimRelative is set to ‘false,’ then this value indicates the specific 3' coordinate to which reads should be trimmed.

[number from 0-1000000]

 

Default = 0

Advanced Trim/Scan Options: 3’ trim

 

end5

If trimRelative (see below) is set to ‘true,’ then this value indicates the number of bases for SeqMan NGen to trim from the 5' end of each read. If trimRelative is set to ‘false,’ then this value indicates the specific 5' coordinate to which reads should be trimmed.

[number from 0-1000000]

 

Default = 0

Advanced Trim/Scan Options: 5’ trim

 

trimRelative

Specifies whether the value for the end3 and end5 parameters should indicate the number of bases for SeqMan NGen to trim from the 3' or 5' end of each read. When ‘false,’ the value specified for the end3 or end5 parameter indicates the specific coordinate to which reads should be trimmed.

[true|false]

 

RealignContigs

 

Does another pass through a templated assembly once the initial assembly is complete, and realigns contigs as needed. (This step occurs automatically for de novo assemblies.) Using this command may improve the accuracy of the final assembly by correcting occasional misalignments that can occur in gapped regions, however note that this step may significantly increase the time to assemble. This command must appear in the script after the assemble command.

 

RemoveSmallContigs

 

This command disassembles any contigs without template sequences that have fewer than the specified number of sequences.

 

minLength

Specifies the minimum length of a contig to prevent it from being disassembled.

[number]

 

Default = 0

Assembly Options (De Novo, Special Reference-Guided): Minimum length

 

minSeqs

(required) Specifies the minimum number of sequences necessary in a contig to prevent it from being disassembled.

[number]

 

Default = 100

Assembly Options (De Novo, Special Reference-Guided): Minimum sequences

setPairSpecifier

 

Defines the paired end pair specifier for the paired Sanger and Illumina sequences in the assembly. This command must appear in the script before the assemble command, but after sequences have been loaded (loadSeq). For more information on assembling 454 paired end data, see the load454PairedEnd command. Pair specifiers define the naming convention for sequence pairs, as well as requirements for a minimum and maximum distance between the opposite ends of the inserts. Expressions for forward and reverse naming conventions should be created using the paired end specification language. Forward and reverse sequences must have identical names except for the unique portion that determines the direction of the clone.

 

Example:

 

(defines 2 pair specifiers each with different size ranges)

 

setPairSpecifier

   pairs:{{forward:”(.*)(2kb)(.*)-FP.*$”reverse:”(.*)(2kb)(.*)-RP.*$” min: 1500 max: 2500}

       {forward:”(.*)(8kb)(.*)-FP.*$” reverse:”(.*)(8kb)(.*)-RP.*$” min: 7000 max: 9000}}

 

pairs

This parameter lists the paired end constraints, specified by the following four values. Each value should be separated by a space and the list of values enclosed in double brackets {}. An additional set of brackets is required around all of the paired end constraints, regardless of whether one or multiple pair constraints are specified.

[forward|reverse|min|max]

 

 

forward

A naming pattern to match forward clones.

[text string enclosed in quotes]

 

 

max

The maximum distance for the paired end sequences to be separated.

[number]

 

 

min

The minimum distance for the paired end sequences to be separated.

[number]

 

 

reverse

A naming pattern to match reverse clones.

[text string enclosed in quotes]

 

SplitLinkerReads

 

Splits specified reads based on their match to given linker sequences. Reads that align to the linker and include the linker site (as specified by the linkerSite parameter or by the cloneSite option in an .fof file) will be split into two reads. The two newly split reads will be designated by _A and _B appended to the name.

 

Example:

 

splitLinkerReads

 seqFile: “/Library/123_project/reads.fas”

 linkerFile: “/Library/123_project/linker.fas”

 linkerSite:30

 

linkerFile

The directory and file name of the linker file.

[directory/filename enclosed in quotes]

 

 

linkerSite

The position indicating where reads should be split.

[number]

 

 

seqFile

The directory and file name of the sequence reads.

[directory/filename enclosed in quotes]

 

SplitTemplates

 

Splits template contigs into multiple contigs in areas where there is zero coverage. Split contigs will be grouped into scaffolds with a defined position to allow for easy sorting when the project is viewed in SeqMan Pro. Annotations on the template sequence will also be split, and any /codon_start qualifiers will be adjusted to stay in frame.

 

appendToAssembly

 

(This command is for the reference-guided workflow and is intended for internal use only).

convertReads

 

Converts a sequence from one file format to another. This command is particularly useful for converting SOLiD .csfasta files into .fastq files that can be used by the XNG assembler.

 

destination

The location and filename for the output.

[directory/filename enclosed in quotes]

 

 

file

reads

The input file containing the reads.

[directory/filename enclosed in quotes]

 

 

format

Specifies the format of the output file. If ‘genbank’ is entered, the output will be in .gbk format. If ‘fastq’ is entered, the output will be in .fastq format.

[genbank|fastq]

 

extendContigs

 

(Intended for internal use only).

 

extendPasses

 

[number]

 

 

mergeContigsInScaffold

 

[true|false]

 

include

 

When building a script, this command can be used to call up additional lines of script previously stored in a text file. In this way, a group of commands can be shared between two or more scripts.

 

file

Specifies a directory and name for the file.

[directory/filename enclosed in quotes]

 

MakeSeqNamesUnique

 

(Intended for internal use only).

set

 

Used to set variables. See the example below and those under the runScript command.

 

Example:

 

set $snp:true

set $snpMethod:”Diploid”

setAssemblyReport

 

(Intended for internal use only). Used to designate a file for a tab delineated report, similar to a report that XNG generates. This is useful during development to test how code changes impact results.

 

file

name

Specifies the folder and file name.

[directory/filename enclosed in quotes]

 

SplitMIDSeqs

 

Used to split 454 MID reads into individual files with one file per MID tag.

 

destination

The location and filename for the output.

[directory/filename enclosed in quotes]

 

 

file

reads

The input file containing the reads.

[directory/filename enclosed in quotes]

 

SplitPairs

 

Used to split 454 or ion torrent mate pair files into forward and reverse (and singleton) files.

 

Example:

 

SplitPairs

 destination:”c:data\splitReads\”
 reads: {

    { file:”C:data\reads\file1.fas” format: IonTorrent }
    { file: “C:data\reads\file2.fas” format:454 discardLinkerless: true}
    }

 

destination

The location and filename for the output.

[directory/filename enclosed in quotes]

 

 

DiscardLinkerless

Specifies that reads without a linker sequence should be discarded from the assembly.

[true|false]

 

 

file, reads

The location and filename for the input.

[directory/filename enclosed in quotes]

 

 

seqTech

Specifies the offset to be used when converting compressed quality scores into numerical values. These are the offsets used for the technology specified:

 

Data Type

Value

Offset

IonTorrent

IonTorrent

33

Applied Biosystems SOLiD

SOLiD

33

Illumina

Illumina

64

Roche 454

454

33

Other types

normalScore

33

 

Note 1: For 454,quality scores for homopolymeric runs of ≥ 2 are oriented from 5' to 3' on the top strand.

 

Note 2: If possible, the data type of unknown data is determined automatically based on the first data file.

 

[IonTorrent|SOLiD|Illumina|454|normalScore|Other]

Input Sequence Files: Read technology

TrimVector

 

Used for fast trimming vector sequence. Each read file is processed and the trimmed file is saved to the destination folder. If the file with the same name exists, the number will appended to the file name. The file is saved in .fastq format, including trimming statistics.

 

setVectorParam

    EndCutOff: 130

    MatchSize: 11

    MinTrimLength: 15

 

TrimVector

    reads: {

        file:  "C:\data\input.fastq"

    }

    LinkerFile: "c:\data\adapter.fas"

    destination: "c:\data\Out\"

 

file, reads

The location and filename for the input.

[directory/filename]

 

 

LinkerFile

The location of file or folder with vector sequence.

[directory/filename]

 

 

destination

The location of output folder.

[directory]