Clicking the Advanced (Assembly) Options button from certain versions of the Assembly Options dialog launches a multi-tabbed Advanced Assembly Options dialog. This help topic describes options available in the Variant tab.

The Variant tab is used to view and edit options related to SNP calculation. The options chosen in this dialog affect the “hard” filtering of SNPs. Hard filtering is the automatic and permanent removal of SNPs that are not of interest to you. This is different from reversible “soft” filtering of SNPs, which is discussed under SNP filter stringency in the topic Assembly Options.

Default parameters in this tab are optimized for the sequencing technology and project type that you specified elsewhere in the wizard. Because of this, values seldom need to be changed.


Editable Variant Filters:

This section lets you specify the non-permanent “soft” filters for SNP data. SNPs that do not meet thresholds specified in this section are removed from certain displays (e.g., tables) but are still retained in the final project and may be displayed in downstream analysis, if desired.

  • Use the Filter stringency drop-down menu to specify low, medium, high or custom stringency. Choosing custom enables the next three options in the dialog, Otherwise, these options are disabled and instead populated with unchangeable default values based on your stringency selection.
  • Minimum variant percentage – The minimum percent of non-reference bases required to call a SNP. When it performs SNP passes, SeqMan NGen will include regions in an assembly that have coverage less than or equal to the specified value. The default value is 5. A non-zero value is recommended when using Ion Torrent data, or working with larger genomes or doing population studies. Very low values will lead to larger files, but do not necessarily result in better SNP calls.
  • P not ref - The minimum SNP quality score (Qcall) required to include a position as a putative SNP. For more information on the several ways to set P not Ref, see the topic Filter based on P not ref.
  • Depth – The minimum depth of coverage required to include a position as a putative SNP.


Fixed Variant Filters:

This section lets you specify permanent “hard” filters for SNP data. SNPs that do not meet thresholds specified in this section are permanently deleted without saving, and will not be displayed at any point downstream. Enter values for:

  • Minimum variant percentage – The minimum percent of non-reference bases required to call a SNP. When it performs SNP passes, SeqMan NGen will include regions in an assembly that have coverage less than or equal to the specified value. The default value is 5. A non-zero value is recommended when using Ion Torrent data, or working with larger genomes or doing population studies. Very low values will lead to larger files, but do not necessarily result in better SNP calls.
  • P not ref - The minimum SNP quality score (Qcall) required to include a position as a putative SNP. For more information on the several ways to set P not Ref, see Filter based on.
  • Minimum variant count – The minimum number of non-reference bases required to call a SNP. When it performs SNP passes, SeqMan NGen will include regions in an assembly that have coverage less than or equal to the specified value.
  • Minimum base quality score – The minimum quality score below which a base will not be considered.
  • Minimum strand coverage – The minimum number of reads from each strand required to call a variant at a given position.
  • Maximum strand bias – Strand Bias (SB) for a SNP is the bias for the SNP appearing on one strand versus the other. It is measured relative to the strand bias in the assembly at the location of the SNP. For example, in a column with 60 forward reads and 40 backward reads, 6 SNP bases on the forward strands, and 4 on the reverse strands would be unbiased.

SB is given by the formula: SB = |SNP% fSNP% r | / Total SNP%

…where SNP% f and SNP% r are the percentage of reads containing the variant on the forward (top) and reverse (bottom) strands, respectively; and SNP% is the total percentage of reads containing the variant. SB is calculated based on an “absolute value,” and will therefore be a positive number.

The following table describes different SB thresholds:

SB Threshold Description
0 Perfectly balanced (unbiased) strands. Reads with variants are present on both strands, and variants appear equally on both stands.
Between 0-1, not inclusive As the number ‘1’ is approached, more variants are called with unbalanced variants containing reads at that position.
1 All variant-containing reads are on a single strand.
  • Bases to mask at ends of reads – The specified number of bases from both the 5’ and 3’ ends of each read will be masked from the SNP caller and will not be considered during variant calling.
  • Bayesian-based removal of heterozygous indels – Check this box to turn on H-factor, a Bayesian-based model that excludes heterozygous calls. If you want to view the MID column in the ArrayStar SNP Report, you must check this box. By default, the box is unchecked.

****************************

When you have made the desired selections in this tab of the Assembly Options dialog, click another available tab to make changes there. If you don’t need to make further changes, click OK to close the Advanced Assembly Options dialog and return to the Assembly and Signal Processing or Assembly Options screen.

Need more help with this?
Contact DNASTAR

Thanks for your feedback.