In addition to variants and small insertions and deletions, genetic variation can also involve large scale rearrangements. These rearrangements may include large insertions and deletions, inversions, and translocations — collectively known as structural variations (SV’s). During a templated data assembly with DNASTAR’s SeqMan NGen, potential insertions and deletions are coded as such in the assembly output. When the assembly project is opened in SeqMan Ultra, the encoded SV information can be viewed in tabular format in the Contig Structural Variation report. For information on accessing or performing tasks in this view, see Contig Report View.
This tabular view contains the following columns:
|Contig ID||The name of the contig in which the SV was found.|
|Contig Pos||The position where the SV was detected in gapped coordinates. For insertions, a single coordinate is given. For deletions, the first and last coordinates of the deleted region are given.|
|Ref Pos||the position where the SV was detected in ungapped reference coordinates.|
|Length||Total length of a deletion in bases.|
|Type||The type of SV: Ins (insertion), Del (deletion), or Indel (substitution).|
|Split Read Count||The number of “split reads” defining the breakpoints of a deletion.|
|Library||A number corresponding to the library with information in that row. This column is present only for assemblies with multiple mate pair libraries of different insert sizes. General information on each library can be seen on a per contig basis using the Contig > Contig Info command.|
|Pair Dist||The median distance between mate pair reads spanning the SV. For deletions, this distance is approximately the average distance between all pairs in the assembly plus the size of the deleted segment. For insertions smaller than the insert size of the mate pair library, this distance is approximately the average distance between all pairs in the assembly minus the size of the inserted segment. For insertions larger than the insert size of the mate pair library, no distance is reported because no pairs will span the insertion. For assemblies with multiple mate pair libraries of different insert sizes, a separate row for each library with spanning pairs is shown for each SV.|
|Pair Count||The number of mate pairs spanning the SV. For assemblies with multiple mate pair libraries of different insert sizes, a separate row for each library with spanning pairs is shown for each SV.|
|Coverage|| The average depth of coverage across a putative deleted or indel region. The Coverage value is the mean depth of coverage over all columns, between the two edges of the deletion, in the original (unsplit) assembly. A "good" deletion will normally have much lower coverage than the flanking regions of the assembly. In cases where the deleted region is composed of non-repetitive sequence, the Coverage value will typically be zero or near zero. In cases where the deleted region is composed of repetitive sequence (e.g. an insertion sequence [IS] element), this value will typically be some fraction of the average coverage of the entire assembly. The exact value will depend on how many instances of the repeat are in the reference and the genome being sequenced. Note that clicking a row in the Contig Structural Variation report will take you to the Alignment View column just prior to the left edge of the deletion. However, Coverage does not measure the depth of coverage of that column, as the column is excluded from the range over which the mean is taken.
Q: In the SV Report for a gap-closure workflow, why does the Coverage column display low values (e.g., 0-7) for the majority of reference positions?
A: Coverage refers to the read depth in the original templated assembly and is calculated by summing the total number of aligned bases in that region and then dividing by the length of the region. Coverage is therefore expected to be low for deletions and indels. It is important to note that the Structural Variation Report is static and does not change after being created in SeqMan NGen.
Q: Why doesn’t the Alignment view for an SV with a Coverage of 4 have four sequences of coverage?
A: That’s because the deleted region is an insertion sequence (IS) repeat that is present elsewhere in the reference and in the data set. Since this is repeated sequence, SeqMan NGen’s assembler sees the match and places only a portion of the repeated reads in that spot. We can tell the region is deleted by the edges that form on both ends, the number of split reads defining the deletion endpoints, the relatively low depth in the deleted area and the lack of left and right joining pairs (pairs with one end outside the deletion and the other end in the putative deleted region). In cases where the deleted region is composed of unique sequence, the coverage will be zero or very near zero.
|Feature||The feature(s) affected by the SV. For deletions affecting multiple features, the first and last feature are displayed separated by an ellipsis (e.g. polB…ilvH). To see a complete list of affected genes for a given SV, select the SV and use the Features > Show Feature Table command. Note that the Contig Structural Variation report does not explicitly show features of type "DNA_SPLIT,” as this feature type merely denotes the presence a structural variation.|
Need more help with this?