Note: To see how SNG commands and parameters map to equivalent SeqMan NGen wizard settings, see Equivalence Between Wizard Settings and SNG Scripting Commands.
Command |
Parameter |
Description |
Allowed values (defaults in bold) |
Wizard equivalent | ||||||||||||||||||
Project Management Commands | ||||||||||||||||||||||
closeProject
Closes the current project and frees the memory in use so that the system is ready for additional assemblies. This can be useful if you want to run multiple assemblies in one script. | ||||||||||||||||||||||
runScript
Allows you to run a table script within the current script. A table script references variable values for specified parameters and other elements in a script. This enables you to run multiple projects from the same script, substituting new parameter values and other variables each time. SeqMan NGen will run the table script repeatedly, using the variable values from one row of the table for each iteration of the script until all of the rows have been used.
Example:
runScript script: “/Library/abc_Project/abc_script.script” table: “/Library/abc_Project/table.txt” | ||||||||||||||||||||||
|
file |
Specifies the directory and file/folder. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
script |
(required) Specifies the directory and file name of the table script you wish to run. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
table |
(required) Specifies the delimited text file containing the variable values. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
saveProject
This command saves the assembly to a project file. By default, the SeqMan Pro project file format (.sqd) is used. Phrap (.ace) and FASTA (.fas) formats may also be specified by using the format parameter, and specifying the desired file extension using the file parameter.
Note: As a command-line tool, SeqMan NGen will not prompt you if you try to save a new project file with the same name as an existing file in the same location. When you run a script multiple times, be sure to change the file name of the project to be saved each time to prevent existing project files from being overwritten.
Example:
SaveProject file: “/Library/My projects/ABC_project.sqd” format:seqman openInSeqMan:true | ||||||||||||||||||||||
|
file |
(required) Specifies the directory and file name of the project file to be saved. |
directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
format |
Specifies the output file format.
•SeqMan - Saves a 64-bit SeqMan Pro project file (.sqd) that is compatible with SeqMan Pro version 8.1 and higher (default).
•SeqMan8 - Saves a 32-bit SeqMan Pro project file (.sqd) that is compatible with SeqMan Pro version 8.0 and higher.
•SeqMan7 - Saves a 32-bit SeqMan Pro project file (.sqd) that is compatible with SeqMan Pro version 7.2 and higher. Note that this project file will be much bigger than the same project created in either of the SeqMan formats listed above.
•Phrap - Saves an .ace file.
•Fasta - Saves .fas and .qual files of the consensus sequence for each contig.
•BAM - Saves a BAM file (SNG/SMNG templated assemblies only).
•SAM - Saves a SAM file (SNG/SMNG templated assemblies only). |
[SeqMan|SeqMan8|SeqMan7|Phrap|Fasta|BAM|SAM] |
| ||||||||||||||||||
|
onePackage |
Specifies whether an assembly containing multiple reference sequences should be bundled into a single .assembly package. If ‘false’ is entered, one .assembly package is created per contig. |
[true|false] |
| ||||||||||||||||||
|
openInSeqMan |
Specifies whether to automatically launch SeqMan Pro and open the completed assembly once the script has completed. |
[true|false] |
| ||||||||||||||||||
saveReport
Exports a report as a text file that summarizes assembly statistics, including the parameters used, the number of assembled/unassembled sequences and contigs, average quality scores, and the number of sequences excluded from the assembly due to exceeding the maxAssemblyCoverage parameter. The same information contained within this report is also saved within the SeqMan Pro project file (.sqd) regardless of whether you choose to export the report by setting this parameter. The report can be viewed in SeqMan Pro using the Project>Report command.
Example:
saveReport file: “/Library/abc_Project/abc_report.txt” | ||||||||||||||||||||||
|
file |
(required) Specifies the directory and file name of the report to be saved. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
WriteUnassembledSeqs
Saves all sequences that were not assembled in the project as .fas and .qual files. | ||||||||||||||||||||||
|
file |
(required) Specifies the directory and file name of the unassembled sequences to be saved. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
saveTrimmed |
Specifies whether to save only the trimmed portion of the unassembled sequences. |
[true|false] |
| ||||||||||||||||||
File Loading Commands and Parameters | ||||||||||||||||||||||
load454PairedEnd
Loads a file of Roche 454 sequences and checks for the presence of a linker defining the paired end sequences. If the linker is found, the linker is removed and the remaining portion is split into two sequences linked with a paired end constraint.
Example:
load454PairedEnd file: “/Library/454 data/123_Pairedend.fas” linker: “/Library/454 data/123_linkerseqs.fas” min: 0 max: 10000 DiscardLinkerless: false | ||||||||||||||||||||||
|
DiscardLinkerless |
Specifies whether to discard any read where no portion of the mate pair linker was found. In this way, reads that do not have a linker sequence will be discarded from the assembly. |
[true|false] |
| ||||||||||||||||||
|
file |
The directory and file name of the .fas, .fna, or .sff file containing the 454 sequences. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
linker |
The directory and file name of the .fas, fna, or .sff file containing the 454 linker sequences. If not specified, SeqMan NGen will use its default 454 linker sequence: GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
max, maxDistance |
The maximum distance for the paired end constraint. |
[number]
Default = 10000 |
| ||||||||||||||||||
|
min, minDistance |
The minimum distance for the paired end constraint. |
[number]
Default = 0 |
| ||||||||||||||||||
loadConstraint
Loads a constraint file. The file can be in the NCBI ancillary file format, or in the CAP3 constraint file format. SeqMan NGen uses constraint files to identify paired end reads, similar to using the setPairSpecifier command. Constraint files in the NCBI ancillary file format also contain trimming information, which SeqMan NGen will load and use. SeqMan NGen will create a CAP3 file when saving a Phrap project (.ace) that used paired end constraints.
Example:
loadConstraint file: “/Library/constraints/123_xyz.con” | ||||||||||||||||||||||
|
file |
The directory and file name of the constraint sequence file. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
loadContaminant
Loads a contaminant sequence file to be used to identify known contaminants, such as primers, in the assembly. Sequences that contain at least 12 matching 17-mers are flagged as contaminant sequences and will be removed from the assembly. See our website for a list of supported file types.
Example:
loadContaminant file: “/Library/contaminants/123_abc.seq” | ||||||||||||||||||||||
|
file |
The directory and file name of the contaminant sequence file. A folder may also be specified, in which case all of the sequence files within that folder will be loaded and used for contaminant screening. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
loadLayout
Loads a layout file to be used for an assembly. The format may be either a SOLiD General Feature Format file (.gff) or a File of Filenames file (.fof). When this command is used, SeqMan NGen still aligns each read from the file to the template, but uses the information contained within the specified file to determine the overall layout of reads.
Example:
loadLayout templateFile: “/Library/123_project/template.seq” layoutFile: “/Library/123_project/layoutfile.gff” | ||||||||||||||||||||||
|
layoutFile |
(required) Specifies the directory and file name of the layout file. Both .gff and .fof formats are accepted |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
templateFile |
(required) Specifies the directory and file name of the reference sequence file. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
loadRepeat
Loads a sequence file to be used to identify repeat sequences in the assembly. All sequences identified as repeats will be added to the assembly last, after all non-repeats have been assembled. See our website for a list of supported file types.
Example:
loadRepeat file: “/Library/repetitive_seqs/123_repeat.seq” | ||||||||||||||||||||||
|
file |
(required) Specifies the directory and file name of the repeat sequence file. A folder may also be specified, in which case all of the sequence files within that folder will be loaded and used as repetitive sequences. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
loadSeq
Loads a sequence file or files for assembly. See our website for a list of supported file types.
Example:
loadSeq file: “/Library/ABC_project/ABC_sequences.fas” | ||||||||||||||||||||||
|
blockContig |
Used in the reference-guided workflow. |
[text string] |
| ||||||||||||||||||
|
blockContigID |
Used in the reference-guided workflow. |
[number] |
| ||||||||||||||||||
|
blockName |
Used in the reference-guided workflow. |
[text string] |
| ||||||||||||||||||
|
blockPos |
Used in the reference-guided workflow. |
[number] |
| ||||||||||||||||||
|
DiscardLinkerless |
Specifies whether reads that do not have a linker sequence should be discarded from the assembly. |
[true|false] |
| ||||||||||||||||||
|
file |
(required) Specifies the directory and file name of the sequence file(s) to be loaded. A folder may also be specified, in which case all of the sequence files within that folder will be loaded. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
groupName |
Used to identify the multi-sample group name for a read file. |
[text string] |
| ||||||||||||||||||
|
isPair |
Specifies whether the query files contain paired end data. |
[true|false] |
| ||||||||||||||||||
|
linker |
The directory and file name of the .fas, fna, or .sff file containing the 454 linker sequences. If not specified, SeqMan NGen will use its default 454 linker sequence: GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
max |
The maximum distance for the paired end constraint. |
[number]
Default = 10000 |
| ||||||||||||||||||
|
maxSeqs |
Specifies the maximum number of reads to load from a file. |
[number] |
| ||||||||||||||||||
|
mergePairs |
Specifies whether the reads are paired end data that overlap and should therefore be merged. |
[true|false] |
| ||||||||||||||||||
|
min |
The minimum distance for the paired end constraint. |
[number]
Default = 0 |
| ||||||||||||||||||
|
minSeqLen |
Minimum length of a sequence required to include it in the assembly. |
[number] |
| ||||||||||||||||||
|
multiplex |
Specifies whether reads are from a multi-sample run. |
[true|false] |
| ||||||||||||||||||
|
seqTech |
Specifies the offset to be used when converting compressed quality scores into numerical values. These are the offsets used for the technology specified:
Note 1: For 454,quality scores for homopolymeric runs of ≥ 2 are oriented from 5' to 3' on the top strand.
Note 2: If possible, the data type of unknown data is determined automatically based on the first data file.
|
[IonTorrent|SOLiD|Illumina|454|normalScore|Other] |
| ||||||||||||||||||
|
templateFragment |
Used in reference-guided assemblies with gap closure. |
[number] |
| ||||||||||||||||||
LoadTemplate
Loads a sequence file to be used as a template for all other sequences to be assembled to. The template sequence will be displayed as a “reference” sequence in SeqMan Pro for SNP analysis. See our website for a list of supported file types.
Example:
loadTemplate file: “/Library/abc_Project/abc_template.seq” | ||||||||||||||||||||||
|
file |
(required) Specifies the directory and file name of the template sequence file to be loaded. A folder may also be specified, in which case all of the sequence files within that folder will be loaded and treated as template sequences. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
LoadVector
Loads a vector sequence file to be used for vector trimming. See our website for a list of supported file types.
Example:
loadVector file: “/Library/vectors/123_vector.seq” cloneSite:826 | ||||||||||||||||||||||
|
cloneSite |
This parameter specifies the position of the cloning site on the vector where insertion occurs. |
[number] |
| ||||||||||||||||||
|
file |
(required) Specifies the directory and file name of the vector sequence file to be used for vector trimming. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
openProject
Loads an existing assembly project into memory. | ||||||||||||||||||||||
|
file |
(required) Specifies the directory and file name of the project file to be loaded. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
setDefaultDirectory
(required) Defines the default directory for the project. When a default directory is specified, files located in that directory only need to be identified by their subfolder and/or file name in subsequent commands.
Examples for setDefaultDirectory:
setDefaultDirectory: “/Library/ABC_proj/”
Once you have set a default directory, you may use two periods before a file name to specify that the file you wish to use is located in the parent folder of the default directory you specified.
Example:
loadVector file: “../123Vector.fas”
This specifies that the vector file, 123Vector.fas, is located in the ABC Data folder, the parent folder of the default directory. | ||||||||||||||||||||||
|
directory |
(required) Specifies the default directory. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
defaultMacDirectory |
Specifies the default directory for Macintosh. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
defaultWinDirectory |
Specifies the default directory for Windows. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
Parameter Settings Commands | ||||||||||||||||||||||
setContaminantParam
Allows you to adjust the parameters used for scanning for contaminant sequences. In order to be applied, this command must appear in the script before the loadContaminant command, and the contamScan parameter for the assemble command must be set to ‘true.’
Example:
setContaminantParam MerLength:17 setContaminantParam MinMerMatch:12 | ||||||||||||||||||||||
|
MerLength |
The minimum length of a mer required to be considered an exact match when scanning for contaminants. |
[number from 5-50]
Default = 17 |
Advanced Trim/Scan Options: Mer length | ||||||||||||||||||
|
MinMerMatch |
The minimum number of matching mers required to mark the sequence as a contaminant. |
[number from 1-50]
Default = 12 |
Advanced Trim/Scan Options: Minimum matches | ||||||||||||||||||
setParam
Allows you to adjust the stringency of one or more of the assembling parameters for the project. SeqMan NGen will use the default values for any parameter that is not specified within the script.
Example:
setParam SNP: true setParam snp_minVariantDepthToScore: 2 setParam snp_minWeight: 5 setParam snp_combineSubs: true setParam snp_excludeBasesEdge: 0 setParam snp_maxRun: -1 setParam snp_maxStrandBias: -1 setParam snp_minHomopolDelDepth: 0 setParam snp_minHomopolDelFrac: 0 setParam snp_minHomopolInsDepth: 0 setParam snp_minHomopolInsFrac: 0 setParam snp_minSoftDepth: -1 setParam snp_minSoftPnotRefPct: -1 setParam snp_minSoftSnpPct: -1 setParam snp_minStrandCov: 0 setParam snp_runVar: false setParam snp_checkStrandedness: false setParam snp_minProbNonrefToCall: 0.1 setParam SNPmethod: diploid setParam snp_minPctToScore: 0.05
Example:
In the transcript annotation workflow, reads clustered with XNG are reassembled using SNG. In order to minimize mis-joins, the initial phase of the assembly is done at high stringency using the following parameters:
setParam merLength: 21 minMatchPercent: 97 useRepeatHandling: false minContigSeqs: 101
Two assembly passes are performed for each read cluster. During the first pass, contigs are assembled from the reads after which those with less than 101 reads are dis-assembled and added to the unassembled sequences pool for that cluster. During a second pass SNG attempts to merge the assembled contigs and add any of the unassembled sequence reads from the first pass. To facilitate merging, minMatchPercent is lowered to 85 for this pass.
setParam minMatchPercent: 85
| ||||||||||||||||||||||
|
AllowConstraintBased |
Specifies whether the assembler should use constraints during assembly. |
[true|false] |
| ||||||||||||||||||
|
AssembleBoneyard |
Specifies whether, after a templated assembly has been completed, the unassembled sequences remaining should be assembled into contigs. If the template has been split, SeqMan NGen will attempt to join the split contigs together in new arrangements. (Note: “Boneyard” is a term for sequences that were not assigned to any contig). |
[true|false] |
Assembly Options (De Novo, Special Reference-Guided): De novo assemble unassembled reads | ||||||||||||||||||
|
CoverageType |
Specifies the type of coverage to be used for repeat handling. ‘Genome’ uses the length of the genome being assembled to calculate the expected coverage. ‘Fixed’ uses a fixed value as the expected coverage. If you know the length of the genome/fragment being assembled, we recommend using ‘genome’ for this parameter and then specifying the length using the genomeLength parameter. If you do not know the genome/fragment length, use ‘fixed’ and provide the most accurate estimate of expected coverage for the FixedCoverage value. |
[genome|fixed] |
| ||||||||||||||||||
|
DefaultQuality |
The value used for the base quality of sequences without quality scores. |
[number from 5-100]
Default = 15 |
Advanced Assembly Options (De Novo): Default quality | ||||||||||||||||||
|
FixedCoverage |
The estimated depth of the sequencing, which can be used instead of the genome length for repeat handling. Use caution when estimating the value for fixedCoverage. If the value you use is significantly lower than the actual depth, the assembly may take a much longer time to complete and may have too many mers flagged as repeats. |
[number from 1-65535]
Default = 20 |
| ||||||||||||||||||
|
GapPenalty |
The penalty for opening or extending a gap during an alignment. This penalty is deducted from the pairwise score used to calculate match percentage. A high gap penalty suppresses gapping, while a low value promotes gapping. |
[number from 0-1000]
Default = 30 for most workflows, 50 for the transcript annotation workflow |
Advanced Assembly Options (De Novo): Gap penalty | ||||||||||||||||||
|
GenomeLength |
Specifies the length of the genome or fragment being assembled. This is used to calculate expected coverage in determining repeat handling. (Note: this parameter was called “setGenomeParam” prior to SeqMan NGen 2.0.) |
[number from 0-1015 ULL]
Default = 0 |
| ||||||||||||||||||
|
HaploidSNP |
Specifies whether to use the second most common base at a position when performing SNP passes. (See the snpPasses parameter). Using this parameter will increase the SNP percentage for SNPs occurring on one allele of a diploid genome in a templated assembly. When haploidSNP is set to ‘true,’ the lowCoverageThreshold parameter value should be greater than zero. |
[true|false] |
| ||||||||||||||||||
|
HaploidThreshold |
The minimum number of times that the second most common base must occur at a position in order for it to be used to find SNPs during haploid SNP passes. (See the haploidSNP parameter above). |
[number from 0-100]
Default = 0 |
| ||||||||||||||||||
|
LowCoverageThreshold |
The minimum coverage required in an assembly to be excluded from SNP passes. SeqMan NGen will include regions in an assembly that have coverage less than the value specified as well as regions with zero coverage when it performs SNP passes. (See the snpPasses parameter). |
[number from 0-10000]
Default = 0 |
Advanced Assembly Options (De Novo): SNP low cover cutoff | ||||||||||||||||||
|
MatchRepeatPercent |
The percent frequency a mer occurs compared to its expected frequency. Mers exceeding this value are flagged as repeated and not used as mer tags in determining overlaps. (Note: this parameter was called “maxCoverageRatio” prior to SeqMan NGen 2.0.) |
[number from 100-1000]
Default = 150 |
Advanced Assembly Options (De Novo): Match repeat percent | ||||||||||||||||||
|
MatchScore |
The score for a base match during an alignment. This score contributes to the pairwise score used to calculate match percentage. Increasing the matchScore value will allow for longer or more frequent gaps, thus forcing bases that match to be assembled together. |
[number from 1-1000]
Default = 10 |
Advanced Assembly Options (De Novo): Match score | ||||||||||||||||||
|
MatchSize |
The minimum number of matching consecutive bases required to determine the overlap of sequence reads. If an even number is entered, SeqMan NGen will automatically increase the value to the next odd number. (Note: this parameter was called setParamMerLength prior to SeqMan NGen 2.0.) |
[odd whole number]
Default = 21 |
Assembly Options (De Novo, Special Reference-Guided): Mer size | ||||||||||||||||||
|
MatchSpacing |
The length of the window of a sequence read where at least one mer tag will be chosen. (Note: this parameter was called “merTagWindow” prior to SeqMan NGen 2.0.) |
[number from 1- 1000000]
Default = 50 |
Advanced Assembly Options (De Novo): Match spacing | ||||||||||||||||||
|
MatchWindowLength |
The size of the window used to calculate the match percentage. |
[number from 10-1000]
Default = 50 |
Advanced Assembly Options (De Novo): Match window | ||||||||||||||||||
|
MaxAssemblyCoverage |
The maximum depth of coverage allowed in the templated assembly. SeqMan NGen will not exceed the coverage specified by this threshold. This parameter is only available for templated assemblies, and should be used with caution as it will limit the number of sequences included in the assembly. A value of 0 indicates unlimited coverage. |
[number from 0-65535]
Default = 0 |
Advanced Assembly Options (De Novo): Maximum coverage | ||||||||||||||||||
|
MaxContigs |
The maximum number of contigs to write to an .assembly project. This command is not generally needed due to SeqMan's capacity to handle a very large number of contigs. |
[number] |
| ||||||||||||||||||
|
MaxGap |
The maximum number of gaps allowed per 1000 bases in the alignment. |
[number from 0-1000]
Default = 6 |
Advanced Assembly Options (De Novo): Max gap | ||||||||||||||||||
|
MaxUsableCount |
Any mers occurring more frequently than FixedCoverage multiplied by MaxUsableCount are disregarded as mer tags from the assembly. |
[number from 1-65535]
Default = 25 |
Advanced Assembly Options (De Novo): Max usable | ||||||||||||||||||
|
MinContigSeqs |
The minimum number of sequences in a contig. After an assembly has been completed, any contigs without a template sequence will be disassembled if they contain fewer sequences than the number specified. The use of this parameter is recommended when performing de novo assemblies using data from Next Generation sequencing technologies, such as Illumina, as these types of assemblies can produce tens of thousands of very small contigs. |
[number from 0-10000]
Default = 0 |
| ||||||||||||||||||
|
Minimizer |
(Intended for internal use only). An experimental way of choosing mer tags that may save time and memory. The accuracy of this parameter has not been verified by DNASTAR. |
[number] |
| ||||||||||||||||||
|
MinMatchPercent |
The minimum percentage of matches in an overlap required to join two sequences in the same contig. (Note: this parameter was called “minMatchPercentage” prior to SeqMan NGen 2.0.) |
[number from 0-100]
Default = 93 |
Assembly Options (De Novo, Special Reference-Guided): Minimum match percentage | ||||||||||||||||||
|
MismatchPenalty |
The penalty for a base mismatch during an alignment. This penalty is deducted from the pairwise score used to calculate match percentage. |
[number from 0-1000]
Default = 20 |
Advanced Assembly Options (De Novo): Mismatch penalty | ||||||||||||||||||
|
SkipRealign |
This parameter only affects de novo assemblies, and specifies whether to skip the realignment step of the assembly. The realignment step will then analyze each sequence at the nucleotide level to determine the exact position of each sequence in the alignment. |
[true|false] |
Assembly Options (De Novo, Special Reference-Guided): Realign reads after assembly | ||||||||||||||||||
|
SNP |
Specifies whether a SNP detection pass of the gapped alignment is made during the assembly. |
[true|false] |
| ||||||||||||||||||
|
snp_checkStrandedness |
Specifies whether the strand that each read comes from is considered in the SNP calculation. This is ignored by the simple SNP calling method (used when genome ploidy is “Heterogeneous”). |
[true|false] |
| ||||||||||||||||||
|
snp_minPctToScore |
Specifies minimum percentage of reads in a column which must differ from the reference in order to score the column. For the simple SNP calling method (used when genome ploidy is “Heterogeneous”), this is the only criteria used to call a SNP. For the Bayesian SNP calling methods (used when genome ploidy is “Diploid” or “Haploid”), this is a filter applied before the other parameters. |
[number from 0-1]
Default = 0.05 |
| ||||||||||||||||||
|
snp_minProbNonrefToCall |
Specifies the minimum probability of a SNP column which is required to call a SNP, expressed as a number from 0 and 1. The probabilities of all genotypes other than Homozygous Reference are totaled and checked against this number. This is the final filter applied during the Bayesian SNP calling methods (used when genome ploidy is “Diploid” or “Haploid”). This is ignored by the simple SNP calling method (used when genome ploidy is “Heterogeneous”). |
[number from 0-1]
Default = 0.1, requiring a minimum 10% change |
| ||||||||||||||||||
|
snp_minVariantDepthToScore |
(required if “snp” is true) Specifies the minimum depth required for a specific base (or deletion) in a column before it is considered usable for SNP calling. This is the second filter applied during the Bayesian SNP calling methods (used when genome ploidy is “Diploid” or “Haploid”). This is ignored by the simple SNP calling method (used when genome ploidy is “Heterogeneous”). |
[number from 0-100]
Default = 2 |
| ||||||||||||||||||
|
snp_minWeight |
Called “Minimum base quality score” in the SeqMan NGen wizard, this parameter specifies the minimum quality score for a base to be considered in the SNP calculation. |
[number] |
| ||||||||||||||||||
|
SNPMatchPercentage |
The minimum match percentage required during passes to fill in SNP regions. See the snpPasses parameter. |
[number from 0-100]
Default = 90 |
Advanced Assembly Options (De Novo): SNP match percent | ||||||||||||||||||
|
snpMethod |
Specifies the SNP detection method to use. Simple produces a count of each type of base in the column and calculates the percent of non-reference bases. Haploid uses a Bayesian statistical model to calculate a probability score that the position contains a polymorphism and give a quality score for the base called at that position. Diploid uses a Bayesian statistical model to calculate a probability score that the position contains a polymorphism and give a quality score for the base(s) called at that position. Based on the scores, it also calls the genotype at each position. |
[simple|haploid|diploid|population] |
| ||||||||||||||||||
|
SNPPasses |
The number of times SeqMan NGen will cycle through a templated assembly, attempting to fill in regions with low coverage or no coverage due to SNPs. |
[number from 0-10]
Default = 2 |
Advanced Assembly Options (De Novo): SNP passes | ||||||||||||||||||
|
SplitFalseJoins |
Specifies whether the assembler should identify and splits false joins based on the set of false join parameters indicated. |
[true|false] |
| ||||||||||||||||||
|
SplitTemplateContigs |
Specifies whether, after a templated assembly has been completed, the template should be split into contigs at areas where there is zero coverage. Split contigs will be grouped into scaffolds with a defined position to allow for easy sorting when the project is viewed in SeqMan Pro. Annotations on the template sequence will also be split, and any /codon_start qualifiers will be adjusted to stay in frame. |
[true|false] |
| ||||||||||||||||||
|
TemplateDefaultQuality |
The value used for the base quality of template sequences without quality scores. |
[number from 5-50000]
Default = 500 |
Advanced Assembly Options (De Novo): Default template quality | ||||||||||||||||||
|
TrimToMer |
Specifies whether to trim the reads to the matching mer tags within the read. For each read, SeqMan NGen looks for mers that exist in the template (for templated assemblies) or in any other read in the assembly (for de novo assemblies). It then sets the trimming for the read to the start of the first mer found and the end of the last mer found. Trimming to mer may be useful when assembling data without accurate quality scores, data with very short linkers, or when assembling SOLiD data. |
[true|false] |
| ||||||||||||||||||
|
UseRepeatHandling |
Specifies whether to use the repeat probabilities to determine if a mer occurs too frequently to use. This parameter should only be used for de novo assemblies, unless the assembleBoneyard parameter is set to ‘true’ for the templated assembly. |
[true|false] |
Assembly Options (De Novo, Special Reference-Guided): Repeat handling | ||||||||||||||||||
setQualityParam
Allows you to adjust the parameters used for quality trimming. In order to be applied, the trimEnds parameter for the assemble command must be set to ‘true.’
Example:
setQualityParam winLength:30 setQualityParam minAveLowQaul:14 setQualityParam minAveHiQaul:18 setQualityParam minEndBaseQaul:15 setQualityParam endRegion:15 setQualityParam nTrimWinLength:50 setQualityParam maxN:2 setQualityParam maxNHiQual:1 | ||||||||||||||||||||||
|
EndRegion |
The number of bases at the end of a sequence considered to be the “end region” which is used by other quality parameters. |
[number from 1-100]
Default = 5 |
| ||||||||||||||||||
|
MaxN |
The maximum number of “N” bases permitted in the window used for N-based quality trimming. |
[number from 1-100]
Default = 2 |
| ||||||||||||||||||
|
MaxNHiQual |
The maximum number of “N” bases permitted in the window used for N-based quality trimming to meet the high-quality threshold. |
[number from 0-100]
Default = 1 |
| ||||||||||||||||||
|
MinAveHiQual |
The minimum averaged quality score of the evaluated window required to be considered high-quality. |
[number from 10-40]
Default = 22 |
| ||||||||||||||||||
|
MinAveLowQual |
The minimum averaged quality score of the evaluated window required to be considered low-quality. |
[number from 5-40]
Default = 20 |
Advanced Trim/Scan Options: Minimum quality | ||||||||||||||||||
|
MinEndBaseQual |
The minimum quality base score required in the specified end region. |
[number from 5-40]
Default = 15 |
| ||||||||||||||||||
|
NTrimWinLength |
The length of the window used for “N-based” quality trimming. N-based quality trimming trims bases that are called “N” and is used only when quality scores are not available. |
[number from 5-100]
Default = 7 |
| ||||||||||||||||||
|
WinLength |
The length of the window used for averaging quality scores. |
[number from 2-100]
Default = 5 |
Advanced Trim/Scan Options: Window | ||||||||||||||||||
setRepeatParam
Allows you to adjust the parameters used for scanning for repetitive sequences. In order to be applied, this command must appear in the script before the loadRepeat command, and the repeatScan parameter for the assemble command must be set to ‘true.’
Example:
setRepeatParam merLength:17 setRepeatParam minMerMatch:2 setRepeatParam maxMerGap:10 setRepeatParam minFlagLength:50 setRepeatParam alignCutoff:100 setRepeatParam minEndFlagLength:25 | ||||||||||||||||||||||
|
AlignCutoff |
The minimum acceptable alignment score. When the alignment score drops below the specified value, this indicates that the end of the alignment between the read and the repeat has been reached, and the alignment will stop. |
[number from 10-1000000]
Default = 100 |
| ||||||||||||||||||
|
MaxMerGap |
The maximum distance between two mers required to be considered a matching pair. |
[number from 0-50]
Default = 10 |
| ||||||||||||||||||
|
MerLength |
The minimum length of a mer required to be considered an exact match when scanning for repeats. |
[number from 5-50]
Default = 17 |
Advanced Trim/Scan Options: Mer length | ||||||||||||||||||
|
MinEndFlagLen |
The minimum length required for a mer to be flagged as a repeat if the segment is bound by the end of the read. |
[number from 5-1000000]
Default = 25 |
| ||||||||||||||||||
|
MinFlagLength |
The minimum length required for a mer to be flagged as a repeat. |
[number from 5-1000000]
Default = 50 |
Advanced Trim/Scan Options: Flag length | ||||||||||||||||||
|
MinMerMatch |
The minimum number of matching mers required to start an alignment. |
[number from 2-25]
Default = 2 |
Advanced Trim/Scan Options: Minimum matches | ||||||||||||||||||
setVectorParam
Allows you to adjust the parameters used for vector trimming. In order to be applied, this command must appear in the script before the loadVector or TrimVector command, and the vectScan parameter for the assemble command must be set to ‘true.’
Example:
setVectorParam merLength:9 setVectorParam minMerMatch:3 setVectorParam MerGap:5 setVectorParam minTrimLength:30 setVectorParam minEndTrimLength:5 setVectorParam alignCutoff:100 setVectorParam endRegion:15 setVectorParam endCutoff:25 setVectorParam endMerMatch:1 | ||||||||||||||||||||||
|
AlignCutoff |
The minimum acceptable alignment score. When the alignment score drops below the specified value, this indicates that the end of the alignment between the read and the vector has been reached, and the alignment will stop. |
[number from 10-1000000]
Default = 100 |
| ||||||||||||||||||
|
EndCutOff |
The distance to the endpoint where trimming will go all the way to the end of the sequence. |
[number from 0-1000000]
Default = 25 |
Advanced Trim/Scan Options: Trim to end | ||||||||||||||||||
|
EndMerMatch |
The minimum number of mer matches required to start an alignment in the specified end region. |
[number from 1-25]
Default = 1 |
| ||||||||||||||||||
|
EndRegion |
The number of bases at the end of a sequence where a lower stringency for matching and trimming is used. |
[number from 0-1000000]
Default = 15 |
| ||||||||||||||||||
|
MaxMerGap |
The maximum distance between two mers required to be considered a matching pair. |
[number from 0-50]
Default = 5 |
| ||||||||||||||||||
|
MergeTrimGap |
Maximum distance between two trim segments that will cause the segments to be merged. MergeTrimGap limits trimming to the ends of sequence reads, while EndCutOff doesn't. Controls how sensitive trimming should be in areas where some portions of the sequence match a vector and other portions don't. The higher the number the more likely the vector trimmer will find all the vector sequence in a region of poor quality. The smaller the number, the more confidence there is that the bases trimmed are actually vector and not a spurious match.
|
[number from 0-1000000]
Default = 7, which is suitable for trimming linkers from the ends of sequences. |
| ||||||||||||||||||
|
MerLength |
The minimum length of a mer required to be considered an exact match when searching for vector. |
[number from 5-25]
Default = 9 |
Advanced Trim/Scan Options: Mer length | ||||||||||||||||||
|
MinEndTrimLength |
The minimum length to be trimmed when a vector matches the end of a read. This parameter can be useful in preventing small spurious matches from being trimmed, which may be significant with short read technologies. |
[number from 5-1000000]
Default = 5 |
| ||||||||||||||||||
|
MinMerMatch |
The minimum number of matching mers required to start an alignment. |
[number from 1-25]
Default = 3 |
Advanced Trim/Scan Options: Minimum matches | ||||||||||||||||||
|
MinTrimLength |
The minimum length required for a mer to be considered as a match for vector trimming. |
[number from 5-1000000]
Default = 30 |
Advanced Trim/Scan Options: Trim length | ||||||||||||||||||
Preprocessing and Assembling Commands and Parameters | ||||||||||||||||||||||
assemble
(required) Reprocesses and assembles the sequences that have been loaded. Preprocessing may include quality trimming, and scanning for vector, repetitive, and contaminant sequences.
Example:
assemble trimEnds:false vectScan:false repeatScan:false contamScan:false doAssemble:true | ||||||||||||||||||||||
|
assembleBlocks |
Specifies whether the assembly is a reference guided assembly. |
[true|false] |
| ||||||||||||||||||
|
contamScan |
If true, sequences will be scanned for the specified contaminant sequences before assembling. Also see loadContaminant. |
[true|false] |
| ||||||||||||||||||
|
doAssemble |
If false, only the preprocessing will be done, and the sequences will not be assembled. |
[true|false] |
| ||||||||||||||||||
|
repeatScan |
If true, sequences will be scanned for the specified known repetitive sequences before assembling. Also see loadRepeat. |
[true|false] |
| ||||||||||||||||||
|
trimEnds |
If true, the sequences will be trimmed based on quality scores before assembling. |
[true|false] |
Read options: Quality end trim | ||||||||||||||||||
|
vectScan |
If true, the sequences will be scanned and trimmed for vector before assembling. Also see loadVector. |
[true|false] |
| ||||||||||||||||||
fixedTrim
Trims reads prior to assembly using fixed values. Based on the parameter settings for this command, SeqMan NGen will trim reads either by a specified number of bases from each end, or to a specified range.
Example:
fixedTrim end5:10 end3:20 trimRelative:true | ||||||||||||||||||||||
|
end3 |
If trimRelative (see below) is set to ‘true,’ then this value indicates the number of bases for SeqMan NGen to trim from the 3' end of each read. If trimRelative is set to ‘false,’ then this value indicates the specific 3' coordinate to which reads should be trimmed. |
[number from 0-1000000]
Default = 0 |
Advanced Trim/Scan Options: 3’ trim | ||||||||||||||||||
|
end5 |
If trimRelative (see below) is set to ‘true,’ then this value indicates the number of bases for SeqMan NGen to trim from the 5' end of each read. If trimRelative is set to ‘false,’ then this value indicates the specific 5' coordinate to which reads should be trimmed. |
[number from 0-1000000]
Default = 0 |
Advanced Trim/Scan Options: 5’ trim | ||||||||||||||||||
|
trimRelative |
Specifies whether the value for the end3 and end5 parameters should indicate the number of bases for SeqMan NGen to trim from the 3' or 5' end of each read. When ‘false,’ the value specified for the end3 or end5 parameter indicates the specific coordinate to which reads should be trimmed. |
[true|false] |
| ||||||||||||||||||
RealignContigs
Does another pass through a templated assembly once the initial assembly is complete, and realigns contigs as needed. (This step occurs automatically for de novo assemblies.) Using this command may improve the accuracy of the final assembly by correcting occasional misalignments that can occur in gapped regions, however note that this step may significantly increase the time to assemble. This command must appear in the script after the assemble command.
| ||||||||||||||||||||||
RemoveSmallContigs
This command disassembles any contigs without template sequences that have fewer than the specified number of sequences. | ||||||||||||||||||||||
|
minLength |
Specifies the minimum length of a contig to prevent it from being disassembled. |
[number]
Default = 0 |
Assembly Options (De Novo, Special Reference-Guided): Minimum length | ||||||||||||||||||
|
minSeqs |
(required) Specifies the minimum number of sequences necessary in a contig to prevent it from being disassembled. |
[number]
Default = 100 |
Assembly Options (De Novo, Special Reference-Guided): Minimum sequences | ||||||||||||||||||
setPairSpecifier
Defines the paired end pair specifier for the paired Sanger and Illumina sequences in the assembly. This command must appear in the script before the assemble command, but after sequences have been loaded (loadSeq). For more information on assembling 454 paired end data, see the load454PairedEnd command. Pair specifiers define the naming convention for sequence pairs, as well as requirements for a minimum and maximum distance between the opposite ends of the inserts. Expressions for forward and reverse naming conventions should be created using the paired end specification language. Forward and reverse sequences must have identical names except for the unique portion that determines the direction of the clone.
Example:
(defines 2 pair specifiers each with different size ranges)
setPairSpecifier pairs:{{forward:”(.*)(2kb)(.*)-FP.*$”reverse:”(.*)(2kb)(.*)-RP.*$” min: 1500 max: 2500} {forward:”(.*)(8kb)(.*)-FP.*$” reverse:”(.*)(8kb)(.*)-RP.*$” min: 7000 max: 9000}} | ||||||||||||||||||||||
|
pairs |
This parameter lists the paired end constraints, specified by the following four values. Each value should be separated by a space and the list of values enclosed in double brackets {}. An additional set of brackets is required around all of the paired end constraints, regardless of whether one or multiple pair constraints are specified. |
[forward|reverse|min|max] |
| ||||||||||||||||||
|
forward |
A naming pattern to match forward clones. |
[text string enclosed in quotes] |
| ||||||||||||||||||
|
max |
The maximum distance for the paired end sequences to be separated. |
[number] |
| ||||||||||||||||||
|
min |
The minimum distance for the paired end sequences to be separated. |
[number] |
| ||||||||||||||||||
|
reverse |
A naming pattern to match reverse clones. |
[text string enclosed in quotes] |
| ||||||||||||||||||
SplitLinkerReads
Splits specified reads based on their match to given linker sequences. Reads that align to the linker and include the linker site (as specified by the linkerSite parameter or by the cloneSite option in an .fof file) will be split into two reads. The two newly split reads will be designated by _A and _B appended to the name.
Example:
splitLinkerReads seqFile: “/Library/123_project/reads.fas” linkerFile: “/Library/123_project/linker.fas” linkerSite:30 | ||||||||||||||||||||||
|
linkerFile |
The directory and file name of the linker file. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
linkerSite |
The position indicating where reads should be split. |
[number] |
| ||||||||||||||||||
|
seqFile |
The directory and file name of the sequence reads. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
SplitTemplates
Splits template contigs into multiple contigs in areas where there is zero coverage. Split contigs will be grouped into scaffolds with a defined position to allow for easy sorting when the project is viewed in SeqMan Pro. Annotations on the template sequence will also be split, and any /codon_start qualifiers will be adjusted to stay in frame.
| ||||||||||||||||||||||
appendToAssembly
(This command is for the reference-guided workflow and is intended for internal use only). | ||||||||||||||||||||||
convertReads
Converts a sequence from one file format to another. This command is particularly useful for converting SOLiD .csfasta files into .fastq files that can be used by the XNG assembler. | ||||||||||||||||||||||
|
destination |
The location and filename for the output. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
file reads |
The input file containing the reads. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
format |
Specifies the format of the output file. If ‘genbank’ is entered, the output will be in .gbk format. If ‘fastq’ is entered, the output will be in .fastq format. |
[genbank|fastq] |
| ||||||||||||||||||
extendContigs
(Intended for internal use only). | ||||||||||||||||||||||
|
extendPasses |
|
[number] |
| ||||||||||||||||||
|
mergeContigsInScaffold |
|
[true|false] |
| ||||||||||||||||||
include
When building a script, this command can be used to call up additional lines of script previously stored in a text file. In this way, a group of commands can be shared between two or more scripts. | ||||||||||||||||||||||
|
file |
Specifies a directory and name for the file. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
MakeSeqNamesUnique
(Intended for internal use only). | ||||||||||||||||||||||
set
Used to set variables. See the example below and those under the runScript command.
Example:
set $snp:true set $snpMethod:”Diploid” | ||||||||||||||||||||||
setAssemblyReport
(Intended for internal use only). Used to designate a file for a tab delineated report, similar to a report that XNG generates. This is useful during development to test how code changes impact results. | ||||||||||||||||||||||
|
file name |
Specifies the folder and file name. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
SplitMIDSeqs
Used to split 454 MID reads into individual files with one file per MID tag. | ||||||||||||||||||||||
|
destination |
The location and filename for the output. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
file reads |
The input file containing the reads. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
SplitPairs
Used to split 454 or ion torrent mate pair files into forward and reverse (and singleton) files.
Example:
SplitPairs destination:”c:data\splitReads\” {
file:”C:data\reads\file1.fas” format: IonTorrent } | ||||||||||||||||||||||
|
destination |
The location and filename for the output. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
DiscardLinkerless |
Specifies that reads without a linker sequence should be discarded from the assembly. |
[true|false] |
| ||||||||||||||||||
|
file, reads |
The location and filename for the input. |
[directory/filename enclosed in quotes] |
| ||||||||||||||||||
|
seqTech |
Specifies the offset to be used when converting compressed quality scores into numerical values. These are the offsets used for the technology specified:
Note 1: For 454,quality scores for homopolymeric runs of ≥ 2 are oriented from 5' to 3' on the top strand.
Note 2: If possible, the data type of unknown data is determined automatically based on the first data file.
|
[IonTorrent|SOLiD|Illumina|454|normalScore|Other] |
Input Sequence Files: Read technology | ||||||||||||||||||
TrimVector
Used for fast trimming vector sequence. Each read file is processed and the trimmed file is saved to the destination folder. If the file with the same name exists, the number will appended to the file name. The file is saved in .fastq format, including trimming statistics.
setVectorParam EndCutOff: 130 MatchSize: 11 MinTrimLength: 15
TrimVector reads: { file: "C:\data\input.fastq" } LinkerFile: "c:\data\adapter.fas" destination: "c:\data\Out\" | ||||||||||||||||||||||
|
file, reads |
The location and filename for the input. |
[directory/filename] |
| ||||||||||||||||||
|
LinkerFile |
The location of file or folder with vector sequence. |
[directory/filename] |
| ||||||||||||||||||
|
destination |
The location of output folder. |
[directory] |
|