setParam - User Guide to SeqMan NGen

The setParam command allows you to adjust the stringency of one or more of the assembling parameters for the project. SeqMan NGen will use the default values for any parameter that is not specified within the script.

Parameter	Description	Allowed values
AllowConstraintBased	Specifies whether the assembler should use constraints during assembly.	[ true / false ]
AssembleBoneyard	Specifies whether, after a reference-guided assembly has been completed, the unassembled sequences remaining should be assembled into contigs. If the reference has been split, SeqMan NGen will attempt to join the split contigs together in new arrangements. (Note: “Boneyard” is a term for sequences that were not assigned to any contig). Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > De novo assemble unassembled reads.	[ true / false ]
CoverageType	Specifies the type of coverage to be used for repeat handling. ‘Genome’ uses the length of the genome being assembled to calculate the expected coverage. ‘Fixed’ uses a fixed value as the expected coverage. If you know the length of the genome/fragment being assembled, we recommend using ‘genome’ for this parameter and then specifying the length using the genomeLength parameter. If you do not know the genome/fragment length, use ‘fixed’ and provide the most accurate estimate of expected coverage for the FixedCoverage value.	[ genome / fixed ]
DefaultQuality	The value used for the base quality of sequences without quality scores. Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > Default quality.	[number from 5-100] Default = 15
FixedCoverage	The estimated depth of the sequencing, which can be used instead of the genome length for repeat handling. Use caution when estimating the value for fixedCoverage. If the value you use is significantly lower than the actual depth, the assembly may take a much longer time to complete and may have too many mers flagged as repeats.	[number from 1-65535] Default = 20
GapPenalty	The penalty for opening or extending a gap during an alignment. This penalty is deducted from the pairwise score used to calculate match percentage. A high gap penalty suppresses gapping, while a low value promotes gapping. Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > Gap penalty.	[number from 0-1000] Default = 30 for most workflows; 50 for the de novo transcriptome RNA-seq workflow.
GenomeLength	Specifies the length of the genome or fragment being assembled. This is used to calculate expected coverage in determining repeat handling. (Note: this parameter was called “setGenomeParam” prior to SeqMan NGen 2.0.).	[number from 0-1015 ULL] Default = 0
HaploidSNP	Specifies whether to use the second most common base at a position when performing SNP passes. (See the snpPasses parameter). Using this parameter will increase the SNP percentage for SNPs occurring on one allele of a diploid genome in a reference-guided assembly. When haploidSNP is set to ‘true,’ the lowCoverageThreshold parameter value should be greater than zero.	[ true / false ]
HaploidThreshold	The minimum number of times that the second most common base must occur at a position in order for it to be used to find SNPs during haploid SNP passes. (See the haploidSNP parameter above).	[number from 0-100] Default = 0
LowCoverageThreshold	The minimum coverage required in an assembly to be excluded from SNP passes. SeqMan NGen will include regions in an assembly that have coverage less than the value specified as well as regions with zero coverage when it performs SNP passes. (See the snpPasses parameter). Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > SNP low cover cutoff.	[number from 0-10000] Default = 0
MatchRepeatPercent	The percent frequency a mer occurs compared to its expected frequency. Mers exceeding this value are flagged as repeated and not used as mer tags in determining overlaps. (Note: this parameter was called “maxCoverageRatio” prior to SeqMan NGen 2.0.). Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > Match repeat percent.	[number from 100-1000] Default = 150
MatchScore	The score for a base match during an alignment. This score contributes to the pairwise score used to calculate match percentage. Increasing the matchScore value will allow for longer or more frequent gaps, thus forcing bases that match to be assembled together. Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > Match score.	[number from 1-1000] Default = 10
MatchSize	The minimum number of matching consecutive bases required to determine the overlap of sequence reads. If an even number is entered, SeqMan NGen will automatically increase the value to the next odd number. (Note: this parameter was called setParamMerLength prior to SeqMan NGen 2.0.). Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > Mer size.	[odd whole number] Default = 21
MatchSpacing	The length of the window of a sequence read where at least one mer tag will be chosen. (Note: this parameter was called “merTagWindow” prior to SeqMan NGen 2.0.). Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > Match spacing.	[number from 1- 1000000] Default = 50
MatchWindowLength	The size of the window used to calculate the match percentage. Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > Match window.	[number from 10-1000] Default = 50
MaxAssemblyCoverage	The maximum depth of coverage allowed in the reference-guided assembly. SeqMan NGen will not exceed the coverage specified by this threshold. This parameter is only available for reference-guided assemblies, and should be used with caution as it will limit the number of sequences included in the assembly. A value of 0 indicates unlimited coverage. Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > Maximum coverage.	[number from 0-65535] Default = 0
MaxContigs	The maximum number of contigs to write to an .assembly project. This command is not generally needed due to SeqMan’s capacity to handle a very large number of contigs.	[number]
MaxGap	The theoretical maximum length of a gap that could be inserted. In practice, the maximum gap size will usually be about half of this value. Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > Max gap.	[number from 0-99] Default = 6
MaxUsableCount	Any mers occurring more frequently than FixedCoverage multiplied by MaxUsableCount are disregarded as mer tags from the assembly. Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > Max usable.	[number from 1-65535] Default = 25
MinContigSeqs	The minimum number of sequences in a contig. After an assembly has been completed, any contigs without a reference sequence will be disassembled if they contain fewer sequences than the number specified. The use of this parameter is recommended when performing de novo assemblies using data from Next Generation sequencing technologies, such as Illumina, as these types of assemblies can produce tens of thousands of very small contigs.	[number from 0-10000] Default = 0
Minimizer	(Intended for internal use only). An experimental way of choosing mer tags that may save time and memory. The accuracy of this parameter has not been verified by DNASTAR.	[number]
MinMatchPercent	The minimum percentage of matches in an overlap required to join two sequences in the same contig. (Note: this parameter was called “minMatchPercentage” prior to SeqMan NGen 2.0.). Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > Minimum match percentage.	[number from 0-100] Default = *+93+
MismatchPenalty	The penalty for a base mismatch during an alignment. This penalty is deducted from the pairwise score used to calculate match percentage. Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > Mismatch penalty.	[number from 0-1000] Default = 20
SkipRealign	This parameter only affects de novo assemblies, and specifies whether to skip the realignment step of the assembly. The realignment step will then analyze each sequence at the nucleotide level to determine the exact position of each sequence in the alignment. Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > Realign reads after assembly.	[ true / false ]
SNP	Specifies whether a SNP detection pass of the gapped alignment is made during the assembly.	[ true / false ]
snp_checkStrandedness	Specifies whether the strand that each read comes from is considered in the SNP calculation. This is ignored by the simple SNP calling method (used when genome ploidy is “Heterogeneous”).	[ true / false ]
snp_minPctToScore	Specifies minimum percentage of reads in a column which must differ from the reference in order to score the column. For the simple SNP calling method (used when genome ploidy is “Heterogeneous”), this is the only criteria used to call a SNP. For the Bayesian SNP calling methods (used when genome ploidy is “Diploid” or “Haploid”), this is a filter applied before the other parameters.	[number from 0-1] Default = 0.05
snp_minProbNonrefToCall	Specifies the minimum probability of a SNP column which is required to call a SNP, expressed as a number from 0 and 1. The probabilities of all genotypes other than Homozygous Reference are totaled and checked against this number. This is the final filter applied during the Bayesian SNP calling methods (used when genome ploidy is “Diploid” or “Haploid”). This is ignored by the simple SNP calling method (used when genome ploidy is “Heterogeneous”).	[number from 0-1] Default = 0.1, requiring a minimum 10% change
snp_minVariantDepthToScore	(required if “snp” is true) Specifies the minimum depth required for a specific base (or deletion) in a column before it is considered usable for SNP calling. This is the second filter applied during the Bayesian SNP calling methods (used when genome ploidy is “Diploid” or “Haploid”). This is ignored by the simple SNP calling method (used when genome ploidy is “Heterogeneous”).	[number from 0-100] Default = 2
snp_minWeight	Called “Minimum base quality score” in the SeqMan NGen wizard, this parameter specifies the minimum quality score for a base to be considered in the SNP calculation.	[number]
SNPMatchPercentage	The minimum match percentage required during passes to fill in SNP regions. See the snpPasses parameter. Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > SNP match percent.	[number from 0-100] Default = 90
snpMethod	Specifies the SNP detection method to use. Simple produces a count of each type of base in the column and calculates the percent of non-reference bases. Haploid uses a Bayesian statistical model to calculate a probability score that the position contains a polymorphism and give a quality score for the base called at that position. Diploid uses a Bayesian statistical model to calculate a probability score that the position contains a polymorphism and give a quality score for the base(s) called at that position. Based on the scores, it also calls the genotype at each position.	[simple\|haploid\|diploid\|population]
SNPPasses	The number of times SeqMan NGen will cycle through a reference-guided assembly, attempting to fill in regions with low coverage or no coverage due to SNPs. Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > SNP passes.	[number from 0-10] Default = 2
SplitFalseJoins	Specifies whether the assembler should identify and splits false joins based on the set of false join parameters indicated.	[ true / false ]
SplitTemplateContigs	Specifies whether, after a reference-guided assembly has been completed, the template should be split into contigs at areas where there is zero coverage. Split contigs will be grouped into scaffolds with a defined position to allow for easy sorting when the project is viewed in SeqMan Pro. Annotations on the reference sequence will also be split, and any /codon_start qualifiers will be adjusted to stay in frame.	[ true / false ]
TemplateDefaultQuality	The value used for the base quality of template sequences without quality scores. Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > Default template quality.	[number from 5-50000] Default = 500
TrimToMer	Specifies whether to trim the reads to the matching mer tags within the read. For each read, SeqMan NGen looks for mers that exist in the template (for templated assemblies) or in any other read in the assembly (for de novo assemblies). It then sets the trimming for the read to the start of the first mer found and the end of the last mer found. Trimming to mer may be useful when assembling data without accurate quality scores, data with very short linkers, or when assembling SOLiD data.	[ true / false ]
UseRepeatHandling	Specifies whether to use the repeat probabilities to determine if a mer occurs too frequently to use. This parameter should only be used for de novo assemblies, unless the assembleBoneyard parameter is set to ‘true’ for the templated assembly. Wizard equivalent (de novo or special reference-guided workflows only): Assembly Options > Repeat handling.	[ true / false ]

Example A:

setParam SNP: true
setParam snp_minVariantDepthToScore: 2
setParam snp_minWeight: 5
setParam snp_combineSubs: true
setParam snp_excludeBasesEdge: 0
setParam snp_maxRun: -1
setParam snp_maxStrandBias: -1
setParam snp_minHomopolDelDepth: 0
setParam snp_minHomopolDelFrac: 0
setParam snp_minHomopolInsDepth: 0
setParam snp_minHomopolInsFrac: 0
setParam snp_minSoftDepth: -1
setParam snp_minSoftPnotRefPct: -1
setParam snp_minSoftSnpPct: -1
setParam snp_minStrandCov: 0
setParam snp_runVar: false
setParam snp_checkStrandedness: false
setParam snp_minProbNonrefToCall: 0.1
setParam SNPmethod: diploid
setParam snp_minPctToScore: 0.05

Example B:

In the de novo transcriptome RNA-seq workflow, reads clustered with XNG are reassembled using SNG. In order to minimize mis-joins, the initial phase of the assembly is done at high stringency using the following parameters:

setParam
merLength: 21
minMatchPercent: 97
useRepeatHandling: false
minContigSeqs: 101

Two assembly passes are performed for each read cluster. During the first pass, contigs are assembled from the reads after which those with less than 101 reads are dis-assembled and added to the unassembled sequences pool for that cluster. During a second pass SNG attempts to merge the assembled contigs and add any of the unassembled sequence reads from the first pass. To facilitate merging, minMatchPercent is lowered to 85 for this pass.

setParam
minMatchPercent: 85

setContaminantParam

setQualityParam

Need more help with this?
Contact DNASTAR