assembleTemplate is a required command, and Initiates the assembly of the loaded sequences using the specified template as a reference.

Example:

XNG script used in the “clustering” step of the de novo transcriptome RNA-seq workflow:

merSize: 25
minNewClusterSize: 5
minSingleMergeClusterSize: 7
minMultiMergeClusterSize: 7
minMultiMergeIgnoreFactor: (currently not used by default)
minClusterSizeToOutput: 100

Parameter Description Allowed values (defaults underlined)
alignmentCutoff Used in the “clustering” step of the de novo transcriptome RNA-seq workflow. [number]
Default = 200
assemble Specifies whether to use the part of the query that matches the contaminant sequence(s), the part that doesn’t match, or both. [ matchContam / noMatchContam / all ]
assemblyInfo Contains information about the assembly. [text string]
assemblyInfoAlt Contains pairs of keys and values which will be written to the -0.assemblyInfo file.
autoTrim Specifies whether mismatching ends of reads should automatically be trimmed. [ true / false]
autoTrim Specifies whether mismatching ends of reads should automatically be trimmed. [ true / false]
boneyardAssembly Specifies whether sequences not used in the original or incremental XNG assemblies should be added to the assembly project by the SNG assembler. This command pertains only to reference-guided assemblies with gap closure. By default, during this type of assembly, the XNG assembler first finds structural variations (SVs) then splits the contig after each SV. Elements of this process can be modified using this command. (Note: “Boneyard” is a term for sequences that were not assigned to any contig). [ true / false]
combineDuplicateSeqs Specifies whether the duplicate reads will be clustered. [ true / false]
contaminant Use of this parameter partitions the query data by running an additional mer-match (layout) against the specified contaminant sequence(s). A full assembly is then run using the part of the query that either matches or does not match the contaminant sequence(s). This parameter can be used for removing reads originating from an organism(s) that may have also been present in the query data set (e.g., reads from human DNA present in a metagenomic sample from the human gut).

file: [directory/filename enclosed in quotes] the file with contaminant sequences.
assembleContam: [matchContam/noMatchContam/all]
merLayoutMin: [number]
unassembled: [directory/filename enclosed in quotes] the file containing no contaminant reads.
[directory/filename enclosed in quotes]
dbSNPTable (Intended for internal use only). [directory/filename enclosed in quotes]
delayAlignInserts Use of this flag turns the delay reads that cause inserts on or off. ‘True’ means that gap causing reads will be delayed. Reads will be added such that reads causing the lowest number of inserts (length of inserts is not considered) will be added before those causing more inserts. [true / false]

Defaults: true for named read technologies; false for ‘Other’ read technologies
deleteIntermediates Specifies whether intermediate files are saved or deleted. These files can be large with large-scale projects. [true / false / none / all / notTemplateMer]
directoryMer Specifies the path and directory where both the template and query data mer files will be stored. Alternatively, separate directories for the template and query mer files can be specified using the parameters below. If no directory is specified, the mer file will be created in the directory containing the sequence data. [directory/filename enclosed in quotes]
directoryQueryMer (required) Specifies the path and directory where the query mer file will be stored. [directory/filename enclosed in quotes]
directoryTemplateMer (required) Specifies the path and directory where the template mer file will be stored. [directory/filename enclosed in quotes]
filterDeepLayout (optional) Specifies that XNG remove superfluous sequences in areas of deep coverage.

Wizard equivalent: Using ‘true’ is equivalent to selecting the Limit all deep coverage regions radio button from the Alignment tab. This tab is accessed from the Assembly Options screen by pressing the Advanced Options button.
[ true / false ]

Set to ‘false,’ by default, except for projects involving miRNA or microbial genomes, where it is set to ‘true.’
filterDeepLayoutOrganelle (optional) Specifies that XNG remove superfluous sequences in areas of deep coverage. Wizard equivalent: Using ‘true’ is equivalent to selecting Advanced Assembly Options > Alignment tab > Only limit deep coverage regions for Mitochondria and Chloroplasts radio button [ true / false ]

Set to ‘false,’ by default, except for projects involving a mitochondrial or chloroplast template (i.e., those with a short name of ‘MT’,‘M’, or ‘CHL’ or ‘chloro’), where it is set to ‘true.’
forceFullForwardAlign Start the alignment at the 5’ end of the sequence. [ true / false ]
forceMake Specifies whether new intermediate mer files will be created. A value of ‘false’ means that existing valid intermediate files will be used. [ true / false / query / hit / layout]
format Specifies the format of the alignment output file. If ‘none’ is entered, the assembly is run to include the alignment phase, but no alignment output is generated. This parameter can be used to remove reads from a contaminant source. [ BAM / SQD / NONE / NONE_align/Aux_align]
gap5Prime Put the gap on the 5’ side of the sequence. [ true / false ]
gapPenalty The penalty for opening or extending a gap during an alignment. This penalty is deducted from the pairwise score used to calculate match percentage. A high gap penalty suppresses gapping, while a low value promotes gapping. [number]

Default = 30 for most workflows, 50 for the de novo transcriptome RNA-seq workflow.
gapExtensionPenalty Used in the “clustering” step of the de novo transcriptome RNA-seq workflow. [number]

Default = 5
geneticCode This parameter specifies the genetic code to use with a reference sequence. [filepath/standard Lasergene genetic code file name]
hits (required) Specifies the path and name of the hit file. Incomplete paths will be appended to the default directory. [directory/filename enclosed in quotes]
increaseRunGapPen This parameter is a flag to increase the gap open penalty in HP runs. [ true / false ]
layout (required) Specifies the path and name of the layout file. Incomplete paths will be appended to the default directory. [directory/filename enclosed in quotes]
layoutAlign Specifies that a pairwise alignment should be performed at the payout phase in order to pick the best position for a given read. [ true / false ]
layoutMaxTemplateGap The maximal number of gaps introduced into the alignment used during layout. [number]
layoutRSRange The maximal Register Shift difference used while building the layout. [number]
layoutType Specifies how reads are to be laid out. [ unique / once / multiple / multipleAll ]
matchScore The score for a base match during an alignment. This score contributes to the pairwise score used to calculate match percentage. Increasing the matchScore value allows for longer or more frequent gaps, thus forcing bases that match to be assembled together. [number]

Default = 10
MaxGap The theoretical maximum length of a gap that could be inserted. In practice, the maximum gap size will usually be about half of this value. [number from 0-99]

Default = 6 for most workflows, 30 for the de novo transcriptome RNA-seq workflow
maxMergeSize When linking clusters into a scaffold, only link them together if the overall number of reads in the scaffold would not exceed this threshold. Used in the “clustering” step of the de novo transcriptome RNA-seq workflow.
maxNCnt (optional) This parameter removes sequential reads of the IUPAC ambiguity code ‘N’ that are greater than or equal to the number specified. Use of this parameter may help in assemblies whose reads contain large clusters of spurious N’s. [integer]
maxSecondaryTrimLength During alignment, a read can be trimmed from both ends. This parameter defines the longest allowable length for the smaller of the two trimmed ends. [number]
maxSeqs Specifies the maximum number of query sequences to add to an assembly. Use of this command can speed up assembly. [number]
merCntThresh Minimum number of mers needed in order to be recorded in the mer file. [number]
merLayoutMin Specifies the minimum length (in bases) of at least one stretch of matching mers used to identify matches between the reference and query data. The minimum value is equal to the mer. The maximum value is the read length, which would require the entire read be an exact match. For example, with a merSize of 19 and a merLayoutMin of 21, at least one stretch of three consecutive mers in a read would have to match for the read in order to be included in the layout. [number from 11-1000]

Default = 25
merMinimizer (Intended for internal use only) [number]
merSize, merLength or matchSize (required) Specifies the length (in bases) of mers used to identify matches between the reference and query data. [number]
merSkip (Intended for internal use only) Specifies the number of positions to ignore or “skip” when creating the template mer file. Normally, mers are only skipped in the query (see merSkipQuery, below). The first and last mer of every read are always included. Increasing the value reduces the size of the intermediate files as well as the overall assembly time. However, larger values can also reduce the number of reads included in the assembly, especially with short read data.

0 = do not skip
2 = skip every second base
3 = skip every third base
etc.
[number]

Default = 0
merSkipQuery Specifies the number of positions to ignore or “skip” when creating the query mer file. The first and last mer of every read are always included. Increasing the value reduces the size of the intermediate files as well as the overall assembly time. However, larger values can also reduce the number of reads included in the assembly, especially with short read data.

0 = do not skip
2 = skip every second base
3 = skip every third base
etc.
[number]

Default = 0
method Defines how to handle splits in the assembly:

* normal – normal assembly method
* splitOnly – only reads which have been split will be included in the assembly
* noSplit – no reads will be split
[normal/splitOnly/noSplit]
minAlignedLength Specifies the minimum number of bases that must align after trimming for a read to be included in the assembly. [number from 11-999]

Default = 25 for most workflows, 50 for the de novo transcriptome RNA-seq workflow.
minClusterSizeToOutput Threshold for the number of reads that a cluster must contain in order for the cluster to be passed along to SNG for assembly in the next step of the program. Used in the “clustering” step of the de novo transcriptome RNA-seq workflow.

Note that this command is present only for the clusterParam block of the rnaAssemble command.
[number]
minMatchPercent The minimum percentage of matches in an overlap required to join two sequences in the same contig. [number]

Default = 93 for most workflows, 60 for the de novo transcriptome RNA-seq workflow.
minMultiMergeClusterSize When two or more clusters overlap the same k-mer, the minimum number of reads (depth) required at that k-mer for a cluster to consider that cluster significant.

If three or more clusters exceed this threshold, the k-mer is considered “noisy” and a potential false join, and will not be merged. This is reported as a “multi-cluster link that was not merged”.

If two significant clusters overlap and have similar enough depth, the clusters are considered linked and are scaffolded together. Otherwise, if only one cluster is significant, all reads at that k-mer which have no assigned cluster are merged directly into it as described for the minSingleMergeClusterSize option. This parameter is used in the “clustering” step of the de novo transcriptome RNA-seq workflow.

Note that this command is present only for the clusterParam block of the rnaAssemble command.
[number]
minMultiMergeIgnoreFactor When two or more clusters overlap the same k-mer and may be linked, they must be within this ratio of one other. Used in the “clustering” step of the de novo transcriptome RNA-seq workflow.

Note that this command is present only for the clusterParam block of the rnaAssemble command.
[number]
minSeqsPerTemplate Minimum number of sequences sufficient to build the layout or alignment. [number]
minSingleMergeClusterSize The minimum number of reads (depth) matching an existing cluster at a single k-mer required to extend that cluster by immediately adding all new reads for that k-mer to the cluster. Used in the “clustering” step of the de novo transcriptome RNA-seq workflow.

Note that this command is present only for the clusterParam block of the rnaAssemble command.
[number]
minNewClusterSize Minimum number of matching reads at a single k-mer (i.e., “depth”) required to create a new cluster. Used in the “clustering” step of the de novo transcriptome RNA-seq workflow.

Note that this command is present only for the clusterParam block of the rnaAssemble command.
[number]
mismatchPenalty The penalty for a base mismatch during an alignment. This penalty is deducted from the pairwise score used to calculate match percentage. [number]

Default = 20
noSexChromosomes Disables special handling of sex chromosomes. [ true / false ]
noSVPairSort Specifies whether to turn off the calculation of pairs for structural variations. This may potentially reduce XNG assembly time. [ true / false ]
onePackage Specifies whether an assembly containing multiple reference sequences should be bundled into a single .assembly package. If ‘false’ is entered, one .assembly package is created per contig. [ true / false ]
openInSeqman (optional) Specifies whether the completed assembly should immediately be launched in SeqMan. [ true / false]
output (required) Specifies the path and directory of the output files. Incomplete paths are appended to the default directory. [directory/filename enclosed in quotes]
pairDist (Intended for internal use only) [true/ false ]
pickTemplate Defines the number of templates from which to choose, and finds the template that is the best match for the input sequence. [number]
placeHit (Intended for internal use only) [ true / false ]
probe (Intended for internal use only) [number]
query (required) Specifies the directory and file name(s) of the query data to be assembled. A folder with one or data files can also be used in place of individual file names.

Properties for query:

file: [directory/filename enclosed in quotes]
Specifies the directory and file/folder.

isPair: [true/false]
Specifies whether the query files contain paired end data.

minDist: [number]
(required if isPair is ‘true’) Specifies the minimum expected distance in bases between paired end reads. Default is 0.

maxDist: [number]
(required if isPair is ‘true’) Specifies the maximum expected distance in bases between paired end reads. Defaults are 750 for Illumina; 4500 for 454 and Sanger, 7500 for Other, and user-defined for Ion Torrent

seqTech: [unknown|IonTorrent||IlluminaLongReads|454|PacBio|normalScore|Other]
Specifies the offset to be used when converting compressed quality scores into numerical values. These are the offsets used for the technology specified:



Note 1: For 454,quality scores for homopolymeric runs of ≥ 2 are oriented from 5’ to 3’ on the top strand.

Note 2: If possible, the data type of unknown data is determined automatically based on the first data file.

pairTech : [unknown|LucigenRsaI|LucigenBfaI|Rsa1|Bfa1|Custom]

pairLinker: [string]

groupName: [string] The name of a group this file belongs to. Used for running multiple samples in one file.

sex: [unknown|female|male]

trim: [ true / false ] Specifies whether vector trimming needs to be applied to the reads.

sngTrim: contains parameters for fast vector trimming (See the SNG command trimVector )

scan: [ true / false ] Specifies whether reads needs to be scanned for contaminants

contaminantScan: Contains the assembleTemplate command with contaminant file used as a template and parameters: directoryTemplateMer, hits, layout, output, unassembled, results, format, mersize, ignorePolyMers and deleteIntermediates. The format parameter has valuenone_ALIGN.

Example:

query: {{file: “/data/home/proj/Illumina_s_5_1.txt”}
  {file: “/data/home/proj/Illumina_s_5_2.txt “}
isPair: true
minDist: 400
maxDist: 700
seqTech: Illumina}
[directory/filename enclosed in quotes]
recordSplitsOnly Functional only when used in the same program as splitTemplateContigs or recordStructVariations (both described below). Specifies whether or not to turn off contig splitting while still recording SVs for later inclusion in the Structural Variation Report. [ true / false ]
recordStructVariations Specifies under which circumstances structural variations (SVs) should be calculated and recorded.

0|false = Don’t calculate SVs
1|true = Calculate SVs at zero coverage
2 = Calculate SVs at insertions and deletions
3 = Calculate SVs at zero coverage and at insertions
[ integer between 0-3 / true / false ]

Default = 2
removeDuplicateSeqs Completely removes clonal reads after the alignment phase of assembly. Clonal reads, where the endpoints of both reads in a pair match those in another pair, are usually the result of PCR artifacts. If ‘true,’ the reads will not be scored, and will not be included in SNP calculations. Marking this parameter to ‘true’ may substantially increase the time needed for assembly. [ true / false ]
removeUniqueInserts Removes reads that cause an insert which no other read would create. This parameter is only enabled when delayAlignInserts (described under the assembleTemplate command) is true. [ true / false ]

Defaults: true for Illumina and Ion Torrent read technologies; false for all other types.
repeatPenaltyScale Indicates the quality penalty (using the Phred scale) to use for a read which places in two locations identically. Higher repeat counts are further penalized relative to this on a log2 scale such that repeats placing in four locations have a double penalty, in eight locations have a triple penalty, and so on. This penalty is applied to a ceiling of Phred score 30 if the other methods are disabled or have a higher score. [number]

Default = 8
repeatThreshMax Specifies the maximum number of occurrences of a mer in the reference sequence(s) for it to be considered repeated. Mers exceeding this number will not be used for identifying matches. [number from 1-10000]

Default = 100
repeatThreshMin Specifies the minimum number of occurrences of a mer in the reference sequence(s) for it to be considered repeated. Mers less than this number will not be used for identifying matches. [number]
reportFiles Defines the kind of report file to be generated.

perProject: [ true / false ] Generate a per project report.

perTemplate: [ true / false ] Generate a per template report.

removeInteral: [ true / false ] Remove intermediate reports.
repeatmermax Threshold number of occurrences in a data set for a mer to be considered “repeated.” Used in the “clustering” step of the de novo transcriptome RNA-seq workflow.
results Specifies the path and name of the result summary file. This file contains a compilation of assembly statistics and uses the extension fileSize.txt. Incomplete paths will be appended to the default directory. [directory/filename enclosed in quotes]
saveUnSplitAssembly Specifies whether XNG should save both the normal assembly output, [filename].assembly, and the unsplit intermediate assembly, [filename]-noSplit.assembly. The latter file contains SVs but no SNPs, and can be used to validate splits in the final assembly. [true / false ]
sex Specifies the sex of the subject, used for read placement and SNP calling. See How sex chromosomes are handled for details. [ male / female / unknown ]
showCDSVariant Specifies whether or not XNG should show all variants of a CDS feature contacted by a SNP. The version number for the CDS variant will then appear in brackets when viewed in the SNP report in SeqMan Pro. [ true / false ]
sngConvertOptions (Intended for internal use only) [text string]
snp Specifies whether or not a SNP detection pass of the gapped alignment should be made during the assembly. [ true / false ]
snp_checkStrandedness Specifies whether or not the strand that each read comes from is considered in the SNP calculation. This is ignored by the simple SNP calling method (used when genome ploidy is “Heterogeneous”). [ true / false ]
snp_combineSubs This parameter is used to coalesce adjacent substitutions. [ true / false ]
snp_excludeBases3p (internal use only) This parameter causes the specified number of bases from the 3’ end of each read to not be considered during variant calling. [integer]
snp_excludeBases5p (internal use only) This parameter causes the specified number of bases from the 5’ end of each read to not be considered during variant calling. [integer]
snp_excludeBasesEdge This parameter causes the specified number of bases from both the 5’ and 3’ ends of each read to not be considered during variant calling. [integer]

For the simple SNP calling method (used when genome ploidy is “Heterogeneous”), the default is 5. For the Bayesian SNP calling methods (used when genome ploidy is Diploid or Haploid), the default is 0.
snp_limitEndPos Specifies the 3’ most coordinate of the specified template from which to stop calculating SNPs. [number between 1 and the length of the template]
snp_limitStartPos Specifies the 5’ most coordinate of the specified template from which to begin calculating SNPs. A value between 1 and the length of the template must be entered. [number]

Default = 1
snp_limitTemplateID Specifies a single template ID for which to calculate SNPs. [number]

Default = 0
snp_logEndPos Specifies the 3’ most coordinate of the specified template from which to stop storing a detailed log of SNP information. A value between 1 and the length of the template must be entered. [number]

Default = 1
snp_logLevel Specifies the level of detailed logging to store in the “shared” project directory as “SNP.log.” Level 0 specifies that no log will be stored. Level 1 stores detailed info on the SNPs which were called, level 2 also logs columns where the preliminary filtered passed but the final filtering failed, and level 3 logs all columns. This is ignored by the simple SNP calling method (used when genome ploidy is “Heterogeneous”). [whole number from 0-3]

Default = 0
snp_logStartPos Specifies the 5’ most coordinate of the specified template from which to begin storing a detailed log of SNP information. A value between 1 and the length of the template must be entered. [number]

Default = 1
snp_logTemplateID Specifies a single template from which to store a detailed log of SNP information. [number]

Default = 0
snp_maxRun Specifies the maximum length of a homopolymeric run for an indel to be considered during variant calling. For example, a snp_maxRun of ‘5’ will allow a portion of sequence up to 5 bases in length to be called as a SNP. [integer]

Defaults are 3 for 454 and Ion Torrent read technologies; 5 for all others.
snp_maxStrandBias Strand Bias (SB) for a SNP is the bias for the SNP appearing on one strand versus the other. It is measured relative to the strand bias in the assembly at the location of the SNP. For example, in a column with 60 forward reads and 40 backward reads, 6 SNP bases on the forward strands, and 4 on the reverse strands would be unbiased. SB is given by the formula:

SB = |SNP% fSNP% r | / Total SNP%

…where SNP% f and SNP% r are the percentage of reads containing the variant on the forward (top) and reverse (bottom) strands, respectively; and SNP% is the total percentage of reads containing the variant. SB is calculated based on an “absolute value,” and will therefore be a positive number.

The effect of different SB thresholds is shown below:

-1 – A negative number cannot normally be generated by the equation above. However, you may use ‘-1’ in the script to turn off the snp_maxStrandBias parameter. In the wizard, SeqMan NGen indicates the parameter is turned off by making Maximum strand bias (see Variants tab) either blank or absent.

0 – Perfectly balanced (unbiased) strands. Reads with variants are present on both strands, and variants appear equally on both strands.

Between 0-1, not inclusive – As the number ‘1’ is approached, more variants are called with unbalanced variants containing reads at that position

1 – All variant-containing reads are on a single strand.

Note: In cases where all the reads covering a base are on one strand only, the SNP% of the other strand cannot be calculated (due to a “division by zero” error). These positions will not be removed by the snp_maxStrandBias filter. To remove these variants, instead set snp_minStrandCov to ≥ 1.

Example:

In a homozygous case (SNP% = 100) with a depth of 100, where 75 variant containing reads are on the top strand (75%) and 25 variant containing reads are on the bottom strand (25%), the strand bias would equal: (75 – 25)/100 = 0.5.
[integer]

Defaults for the Bayesian SNP calling methods (used when genome ploidy is “Diploid” or “Haploid”) are 0.8 for 454 and Ion Torrent read technologies; not shown (blank) for all others. Defaults for the simple SNP calling method (used when genome ploidy is “Heterogeneous”) are 0.25 for all read technologies.
snp_minHomopolDelDepth Specifies the minimum read depth required to call a deletion in a homopolymeric run. [integer]

Default = 0
snp_minHomopolDelFrac Specifies the minimum fraction of reads required to call a deletion in a homopolymeric run. [integer]

Default = 0
snp_minHomopolInsDepth Specifies the minimum read depth required to call an insertion in a homopolymeric run. [integer]

Default = 0
snp_minHomopolInsFrac Specifies the minimum fraction of reads required to call an insertion in a homopolymeric run. [integer]

Default = 0
snp_minPctToScore Specifies minimum percentage of reads in a column which must differ from the reference in order to score the column. For the simple SNP calling method (used when genome ploidy is “Heterogeneous”), this is the only criteria used to call a SNP. For the Bayesian SNP calling methods (used when genome ploidy is “Diploid” or “Haploid”), this is a filter applied before the other parameters. [number from 0-1]

Default = 0.05
snp_minProbNonrefToCall Specifies the minimum probability of a SNP column which is required to call a SNP, expressed as a number from 0 and 1. The probabilities of all genotypes other than Homozygous Reference are totaled and checked against this number. This is the final filter applied during the Bayesian SNP calling methods (used when genome ploidy is “Diploid” or “Haploid”) and is ignored by the simple SNP calling method (used when genome ploidy is “Heterogeneous”). [number from 0-1]

Default = 0.1, requiring a minimum 10% change.
snp_minStrandCov Specifies the minimum number of reads from each strand required to call a variant at a given position. [integer]

In the Bayesian SNP calling methods (used when genome ploidy is “Diploid” or “Haploid”), the default is 0. In the simple SNP calling method (used when genome ploidy is “Heterogeneous”), the default is 5.
snp_minVariantDepthToScore (required if “snp” is true) Specifies the minimum depth required for a specific base (or deletion) in a column before it is considered usable for SNP calling. This is the second filter applied during the Bayesian SNP calling methods (used when genome ploidy is “Diploid” or “Haploid”) and is ignored by the simple SNP calling method (used when genome ploidy is “Heterogeneous”). [number from 0-100]

Default = 2
snp_minWeight Called “Minimum base quality score” in the SeqMan NGen wizard, this parameter specifies the minimum quality score for a base to be considered in the SNP calculation. [number]

In the simple SNP calling method (used when genome ploidy is “Heterogeneous”), the default is 20. In the Bayesian SNP calling methods (used when genome ploidy is “Diploid” or “Haploid”), the default is 5.
snp_reportUserMissing Specifies what kind of positions to put in the missingUser file, including one or more of the following:

dbSNP = dbSNP Pos
user = in user VCF SNP file
zeroCoverage = include zero coverage regions
cosmic = in COSMIC database
allcaptured = include all positions in capture regions
captured = include only positions in capture regions

Example:

snp_reportUserMissing: [user allcaptured captured]
[kParamTypeStrFixedVocab]
snp_runVar Uses a Bayesian probabilistic model to exclude heterozygous insertions and deletions in homopolymeric runs. Intended for use with Ion Torrent data. [ true / false ]

Defaults: true for 454 and Ion Torrent read technologies; false for all others.
snp_showAllFeatures Specifies whether XNG should count SNPs multiple times if the SNP contacts different versions (variants) of a CDS feature. [ true / false ]
snp_writeExtended Specifies whether the additional values produced by the Haploid or Diploid SNP calculation methods are included in the SNP table.

Wizard equivalent: Advanced Options > Alignment tab > Trim to targeted regions
[ true / false ]
snpMethod Specifies the SNP detection method to use. Simple produces a count of each type of base in the column and calculates the percent of non-reference bases. Haploid uses a Bayesian statistical model to calculate a probability score that the position contains a polymorphism and give a quality score for the base called at that position. Diploid uses a Bayesian statistical model to calculate a probability score that the position contains a polymorphism and give a quality score for the base(s) called at that position. Based on the scores, it also calls the genotype at each position. [ simple / haploid / diploid ]
splitTemplateContigs Specifies under which circumstances contigs should be cut after a templated assembly. Any split contigs will be grouped into scaffolds with a defined position to allow for easy sorting when the project is viewed in SeqMan Pro. This command pertains only to reference-guided assemblies with gap closure. By default, during this type of assembly, the XNG assembler first finds structural variations (SVs) then splits the contig after each SV. Elements of this process can be modified using this command.

0|false = Don’t split
1|true = Split at locations with zero coverage
2 = Split at insertions and deletions
3 = Split at zero coverage and at insertions

[ integer between 0-3 / true / false ]

Default = 2
template (required) Specifies the directory and file name of the reference sequence file. A folder with one or more reference sequence files can also be used in place of individual file names. Each entry must also be enclosed by brackets. If more than template entry is used, the list must also be enclosed by an additional set of brackets.

Properties for template:

file: [directory/filename enclosed in quotes]

Specifies the directory and file/folder.

feature: [directory/filename enclosed in quotes] (optional) Specifies the directory and file name for annotated features when the reference sequence and feature annotations are in separate files.

transcriptKind: [both|identified|novel] if the .Transcriptome package is used as a template, defines which transcripts will be used as a template.

userSNP: [directory/filename enclosed in quotes]

exomeCapture: file: [directory/filename enclosed in quotes] The BED file name.

track: [string] the region of interest (Optional)

merMask: [ true / false ] Specifies if mers from outside of the capture region should be excluded from assembly.

Examples for template:

Sequence and annotation in one file:

AssembleTemplate

template: {{file: “/data/home/proj/MG1655.gbk”} {file: “/data/home/proj/W3110.gbk”}}

Sequence and annotation in separate files:

AssembleTemplate

template: {file: “/Library/ABC_proj/references/MG1655.fas” feature: “/Library/ABC_proj/references/MG1655.gff”}
[directory/filename enclosed in quotes]
templateHitCntThresh (Intended for internal use only) [number]
trimToTargetRegions Controls whether reads are trimmed, by default, to the boundaries of the targeted regions, as defined by the .bed or manifest file. The default of true indicates that the reads are trimmed to the stated boundaries. If conditions are not met, the SeqMan NGen wizard does not change this parameter to ‘false,’ but instead omits it from the script. The parameter status is only shown in the script for control workflows.

Wizard equivalent: Trim to targeted regions in the Alignment tab. This tab is accessed from the Assembly Options screen by pressing the Advanced Options button.
[ true / false ]
unassembled [directory/filename enclosed in quotes]
verify [ true / false ]

Need more help with this?
Contact DNASTAR

Thanks for your feedback.