The Variants Summary Report

To access the Variants Summary Report for a selected contig, use Variant > Variant Report, and then click on the Variants Summary tab. (In BAM-based projects, this is the only tab available.)

 

If no contigs have been selected, you must first select one (or more) from the Project Summary window before opening the report. To select all contigs, click on the “Unlocated Contigs” header in the Project Summary window. The variant report will then contain information on all variants in the project.

 

For information about items in the table header, see Working with Variant Reports.

 

 

Each row in the Variants Summary report summarizes information for all of the variant bases in an aligned column. If you are viewing a SeqMan NGen assembly, only variants meeting the SNP filter stringency (High, Medium or Low) that you specified in the SeqMan NGen wizard are displayed by default. Variants filtered out using the SNP filter stringency setting were not removed from the assembly, and can be made visible again, if desired, by changing SeqMan Pro’s Variant filtering parameters.

 

Different subsets of columns are available for SeqMan Pro (.sqd) assemblies, represented in the following table by “S,” and BAM-based assemblies, represented by “B.”

 

Note: The wording used for items in the columns changes dynamically based on the column width. Depending on the column widths you are using, you may see abbreviated versions of the wording discussed in the table below.

 

Column Name

Description

In .assembly projects (but not .sqd projects), variants in adjacent columns are coalesced into a single insertion or deletion if they are of the same type, and if at least 80% of the reads with the called variant in one column have a variant in the adjacent column. Each coalesced multiple-base insertion or deletion can be opened to reveal individual variants by clicking the corresponding triangle to the left of the SNP column. After clicking a triangle, information on each position of the insertion/deletion is displayed in a separate row.

 

SNP

(S, B)

A legend for this column appears in the header.

 

 

Click one or more times on a symbol in the column to change the evaluation for all variants in the aligned column.

Group

Specifies whether the variant was called from the NGS or the Sanger data. The NGS group is named using the Read technology selected in the SeqMan NGen wizard.

 

If you are investigating a putative variant that was identified in the initial run of the NGS data, you can find that position here and open it in the Alignment View to check whether there is a Sanger-called variant at the same position. See View Sanger Validation Results for more information.

MID

(B, MID-data assemblies only)

This column allows variants to be displayed separately for each MID sample. If the same variant (same base change, same position) occurs in more than one sample, there will be an entry for each sample. Similarly, if the same position is affected, but the base change is different, there will be separate entries and columns will correspond to that sample only, and not all the reads at that position.

Contig ID

(S, B)

The name of the contig in which the variant was found.

Contig Pos

(S, B)

The position of the variant in the contig, including gaps.

Ref Pos

(S, B)

The position of the variant in the reference sequence, excluding gaps. Coordinates matching entries in the VCF Variant table are shown. For deletions, Ref Pos is the genomic coordinate of the first deleted base. For insertions, Ref Pos is the genomic coordinate of the base preceding the insertion.

Type

(S, B)

Specifies the variation type as SNP, Del (deletion) or Ins (insertion). For .assembly files and certain SQD files (e.g., from de novo or special templated workflows), pink typeface may be used to indicate non-synonymous variants.

Ref Base

(S, B)

The reference sequence base in this position. For multi-base deletions, the reference sequence of the string is shown, beginning with the base at the Ref Pos coordinate. If there is no reference sequence present, the Ref Base column displays the most frequently occurring non-ambiguous base at this position. If no such base exists, the consensus base at this position is shown.

Called Base

(S, B)

The dominant variant in the aligned column. In the case of a heterozygote call, both bases at the position are shown, separated by a vertical bar. For multi-base insertions, the inserted string is shown. For multi-base deletions, the deleted bases are represented with dashes (-).

Genotype

(B, only if Diploid or Haploid SNP detection used)

When the “Diploid” SNP detection method is used in a SeqMan NGen assembly, there are four possibilities: 1) homozygous variant (both alleles have the same base and it is different from the reference), 2) reference (both alleles have the same base and it is the same as the reference), 3) heterozygous reference (two different alleles are called, one with the same base as the reference, the other with a variant base), and 4) heterozygous not reference (two different alleles, neither of which match the reference base). It is quite rare for the reference case to occur in the table. This only happens in cases where there is sufficient evidence of the possibility of a variant to pass the filtering threshold, but where the evidence is still quite weak. These cases are usually eliminated by even modest filtering. When the Haploid SNP detection method is used, only variant and reference are possible.

 

Note: In this column, if one or more of the adjacent variants is called as a heterozygote, the coalesced variant is also called a heterozygote. Therefore, for a coalesced variant to be called homozygous, all positions must be called homozygous.

Splice

(B)

Splice site variations are changes to the 5' ("donor") or 3' ("acceptor") consensus splice site sequences. The DNA sequence for the donor is 5'-AGGTRAGT-3' and for the acceptor is 5'-YYYYYYYYCAGGT-3'. Note that the AG dinucleotide on the 5' end of the donor and the GT dinucleotide on the 3' end of the acceptor are within the exon. Therefore, changes at these positions can also cause changes in the amino acid sequence of the resulting proteins. Changes in the intron portion of the splice site are marked as "Splice" in the column while those in the exon portion of the site are labeled "Splice in CDS." Only the position where the change occurs is considered, not the identity of the base.

Impact

(B)

The impact of the variant or indel on the genome, displayed as one of the following values:

 

Type

Description

Synonymous

No amino acid changes.

Non Synonymous

Amino acid substitution only.

Nonsense

Amino acid to translational stop.

Frameshift

An indel within a coding region and which is not a multiple of 3, thereby changing the reading frame.

No Start

A change that disrupts the start codon.

No Stop

A change that converts a stop codon to an amino acid, and thereby extends the reading frame.

Inframe Insertion

An insertion within a coding region whose length is divisible by 3. The type is followed by the word Conservative if the insertion occurs between two codons, and Disruptive if it occurs with a codon.

Inframe Deletion

A deletion within a coding region whose length is divisible by 3. The type is followed by the word Conservative if the insertion occurs between two codons, and Disruptive if it occurs within two codons.

 

If sorting by the Impact column, the column is ordered by severity. For example, a Frameshift is more severe than a Nonsense change.

Homopolymer

(B)

Indicates whether the variant occurs within a homopolymeric run, which is defined as two or more identical bases in a row. When using Pacific Biosciences (PacBio) or Ion Torrent data, SeqMan Pro may not list all homopolymeric indels.

 

Note: When possible, insertions or deletions are placed at the 5’ end (top strand) of the run during alignment.

SNP %

(S, B)

The percentage of the single most prevalent non-reference base in the aligned column. As a very general rule, significant variants tend to occur at 25% and higher.

P not ref

(B, only if Diploid or Haploid SNP detection used)

When the “Diploid” or “Haploid” SNP detection method was used in SeqMan NGen assembly, this column shows the probability that the called base at this position is not the reference base. For coalesced variants, this value is equal to the minimum value of all “child” values. The minimum allowed value is 30%.

Q call

(B, only if Diploid or Haploid SNP detection used)

When the “Diploid” or “Haploid” SNP detection method was used in SeqMan NGen assembly, this column shows the Phred-like quality score of the called genotype. It is a measure of the probability that the called genotype is correct.

Region Capture

(B)

Indicates whether the variant occurs within a region specified in the .BED or manifest file used. Values are Yes and No.

dbSNP ID

(S, B)

The dbSNP rs ID, if available, for positions with known variants. Double-clicking on the entry opens the corresponding page at dbSNP.

Cosmic

(B)

The Catalogue of Somatic Mutations in Cancer (COSMIC) ID for positions with known variants. Double-clicking on the entry opens the corresponding page at COSMIC. For human assemblies only.

GERP

(B, human genome assemblies only)

The Genomic Evolutionary Rate Profiling (GERP) score representing the calculated evolutionary constraint at that position. GERP data is automatically delivered when you download our human template package prior to performing a templated assembly in SeqMan NGen. To limit the size of the data file required, only positions with scores of 1.0 or greater are displayed.

 

GERP is a tool that provides a score for each position in the human genome that estimates whether that position is under purifying selection or not (Davydov et al. 2010). GERP uses alignments between the human genome and 33 other mammalian genomes to quantitate the position-specific constraint in terms of rejected substitutions, defined as the difference between the neutral rate of substitution and the observed rate, estimated by maximum likelihood. Substitutions in sites under selection are assumed to be more deleterious than those not under selection. Scores range from negative values to ~6. Positions with scores below or near zero are not under selection. Conversely, the more positive the score, the more constrained the position. GERP information can be useful in evaluating the impact of non-synonymous variants in coding regions and the impact of changes in or near promoter elements, among others.

User ID

(B)

Positions corresponding to a custom VCF Variant Table are labeled with the ID from that set.

Codon

(S; B only if Show Codon Bases & Distance to feature is checked)

When a translated feature is present on the reference sequence at the position of a variant, a codon change is displayed. The codon and amino acid translation is shown for the reference sequence and compared to the codon and amino acid translation for the selected variant. The position number of the amino acid change is also displayed. If more than one translated feature is present at the variant position, SeqMan Pro will use the first feature based on the current sorting in the Feature Table.

Coding Feature Distance

(S; B only if Show Codon Bases & Distance to feature is checked)

Shows whether variants are within or near a named feature, and the distance from that feature. For .assembly files and certain SQD files (e.g., from de novo or special templated workflows), the following color scheme may be used:

 

Color Scheme

Meaning

Gray + feature name + <within >

Variant is within the named feature.

Pink + arrow + feature name

Distance from the variant to the closest upstream coding feature.

Orange + arrow + feature name

Distance from the variant to the closest downstream coding feature.

 

Feature Type

(B)

For variants within a gene feature, the feature type is shown in the following order of precedence: CDS, mRNA, Gene. If Show Codon Bases Distance to feature is selected, this column also contains a feature designation if the variant is within 150 bases of the nearest exon. Therefore, it is possible for a variant that is in a gene to also be listed as a CDS, mRNA, etc.

 

When Show Codon Bases is checked, the Feature Type column will also show the distance to the nearby exon and an arrow indicating the direction of the feature.

 

Variant Location

Feature Type

Within a gene feature, but not included in an mRNA or CDS feature for that gene. (Variants within the intron portion of a splice site are indicated as CDS features.)

gene

Within an exon or splice site.

CDS

In the 5' and 3' untranslated portions of an mRNA.

mRNA

Feature Name

(S, B)

If a variant is located within an annotated feature in the reference sequence, the feature type and name are displayed.

 

A single nucleotide change may sometimes be reported as affecting multiple overlapping features. These can include different overlapping genes on the same or opposite strands, as well as alternatively spliced messages from the same gene. In this case, SeqMan Pro produces multiple VCF Variant table entries at the same position, one for each reported feature. In cases where a variant affects a gene with multiple alternatively spliced messages, the isoform with the longest open reading frame (ORF) is used for the DNA Change and Protein Change columns (see below). A bracketed number follows the Feature Name to indicate which isoform from the Feature Table was used (e.g., TP53 [2]). This is controlled through the [true/false] SeqMan NGen scripting commands ShowCDSVariant and SNP_showAllFeatures. For more information on editing scripts, please see the Scripting Manual topic in the SeqMan NGen online help.

Note: If a non-gene feature (“mRNA”, “CDS”, etc.) exists in the template file, but has no corresponding “gene” feature, SeqMan NGen adds the “gene” feature automatically during assembly. The locations of any automatically added “gene” annotations are indicated by asterisks (*) in this column.

DNA Change

(B, only if Show Codon Bases & Distance to feature is unchecked)

Change(s) in the DNA sequence affecting either CDS features or splice sites are indicated using the nomenclature established by the Human Genome Variation Society.

 

A “c.” prefix, followed by coordinates taken from the ORF, denotes a change in a CDS feature. For example:

 

      Substitutions. Example: c.76A>C denotes that at nucleotide 76 an A is changed to a C.

 

      Insertions within coding regions. Example: c.76_77insT denotes that a T is inserted between nucleotides 76 and 77.

 

      Deletions within coding regions. Example: c.76_78delACT denotes an ACT deletion from nucleotides 76 to 78.

 

A “g.” prefix followed by genomic coordinates denotes a change in the intronic region of a splice site.

 

Note: When a multibase variant affects both the intron and exon portions of a splice site, it is represented under two separate entries: one with g. coordinates and the other with c. coordinates.

 

Amino Acid Change

(B, only if Show Codon Bases & Distance to feature is unchecked)

The change(s) in the amino acid sequence, using the nomenclature established by the Human Genome Variation Society and The Sequence Ontology Project. This includes:

 

      Conservative in-frame insertions. Example: p.K2_M3insQSK denotes that the sequence GlnSerLys (QSK) was inserted between amino acids Lysine-2 (K) and Methionine-3 (M)

 

      Disruptive in-frame insertions. Example: p.C28delinsWV denotes a 3 bp insertion in the codon for Cysteine-28, generating codons for Tryptophan (W) and Valine (V).

 

      Conservative in-frame deletions. Example: p.(C28_M30del) a deletion of three amino acids, from Cysteine-28 to Methionine-30

 

      Disruptive in-frame deletions. Example: p.(C28_M30delinsL) denotes a 9 bp deletion including 2 bp from the codon for Cysteine-28 and 1 bp from the codon for Methionine-30 resulting in replacement of C28 to M30 with leucine (L).

Depth

(S, B)

The number of reads overlapping the aligned column. Since this calculation disregards bases below the quality threshold, the Alignment View may show a greater number of sequences than the Depth shown in the Variants Summary Report. The default quality threshold for assembly in SeqMan NGen is 5. The threshold can be changed either pre-assembly, in SeqMan NGen, or post-assembly, in SeqMan Pro.

Skew

When a maximum “strand bias” (also called “skew”) is set for a templated SeqMan NGen assembly (i.e., by setting the scripting parameter snp_maxStrandBias to ‘true’), SeqMan NGen calculates the strand bias for each variant. The results can be viewed in the Skew column of this table.

 

The strand bias for a variant is a bias in the variant appearing on one strand instead of the other. It is measured relative to the strand bias in the assembly at the location of the variant. For example, in a column with 60 forward reads and 40 backward reads, 6 variant bases on the forward strands and 4 on the reverse would be unbiased and have a skew of zero.

 

SeqMan NGen calculates strand bias using the following formula:

 

Strand bias =

 

… where:

 

      SNP%f= Strand-specific SNP percentage for the forward strand

 

      SNP%r= Strand-specific SNP percentage for the reverse strand

 

      SNP%= Overall SNP percentage

 

Below are interpretations of several Skew numbers:

 

      0 = unbiased

 

      2 = all SNPs on one strand, where strands are equally abundant

 

      > 2 = SNPs are present in large numbers on a strand that is, itself, rare.

 

The maximum theoretical strand bias is equal to the depth of coverage. In practice, however, numbers over four are seldom seen, as they require such low variant percentages that they are unlikely to be called as variants.

 

When a variant occurs only on strands in one direction, no strand bias can be calculated. In this case, the variant is not filtered. This situation can be prevented by setting a Minimum strand coverage value in SeqMan NGen prior to assembly.

A, C, G, T Cnt

(S and B if Show Counts as a percent is unchecked)

The number of bases of this type called in the aligned column. A dash (-) represents the reference base.

A, C, G, T %

(S and B if Show Counts as a percent is checked)

The percent of bases of this type called in the aligned column. A dash (-) represents the reference base.

Deletion

(S, B)

The number of deleted bases in the aligned column.

K, Y, S, B, W, R, D, M, H, V, N Cnt

(B, with Show Counts as a percent unchecked)

The number of bases of this type called in the aligned column. A dash (-) represents the reference base.

K, Y, S, B, W, R, D, M, H, V, N %

(B, with Show Counts as a percent checked)

The percent of bases of this type called in the aligned column. A dash (-) represents the reference base.

 

To display all variants and negate any previous filtering, click the Show All button.

 

To display only those variants of most interest to you, click the Filter button. See Filtering Variants in the Reports for detailed information.

 

To navigate between variants in the sequences, open the Alignment View (Contig > Alignment View) and arrange the windows so you can see the Alignment View and the Variant Report simultaneously. Select a row in the Variant Report, then click on the up or down arrow key on your keyboard to move to the previous or next variant in both the Variant Report and the Alignment View.

 

To locate the variant in the Alignment View, double click on a row in the Variant Report. The Alignment View will open at the location of the selected variant base.