In addition to SNPs and small insertions and deletions, genetic variation can also involve large scale rearrangements. These rearrangements may include large insertions and deletions, inversions, and translocations — collectively known as structural variations (SV’s). To view a tabular report with structural variation findings for an assembly, open the assembly in SeqMan Pro and select Contig > Structural Variation Report.
The rest of this topic describes how these structural variations were detected during assembly with SeqMan NGen.
During a templated data assembly, SeqMan NGen automatically detects insertions or deletions (indels) greater than 10 bp based on a combination of two data types:
Coverage – read depth can be suggestive of larger indels, including duplications or the collapse of a duplication.
Split reads – reads spanning a deletion in the new genome relative to the reference genome can be “split” into two segments based on matches to discontinuous regions on the reference. For example, the following split read alignment indicates there is a 35bp deletion in the new genome:
Split: AGGCTGACCTC GACTAGCA
The SeqMan NGen algorithm requires that four criteria be met for splitting a read:
- At least 20 bases on each “half” of the split must match the reference. This means that reads must be at least 40bp long, though in practice they should be > 60 bp.
- The first mer match must be within 10 bases of the start of the read, and the final mer match must be within 10 bases of the end of the read. This increases the likelihood that the entire read will align after splitting.
- The distance between the two closest mers on either side of the split must be within 20% of the total read length. For example, in a 100 base read where bases 5-30 make up the mer match on the 5’ “half” of the read, then the first mer match on the 3’ half must start between bases 31 and 50 (30+(100*0.2)=50) of the read. This relatively simple requirement allows for SNPs or sequencing errors near the actual split to be tolerated and resolved during alignment.
- The two “halves” must be aligned in the same orientation.
In practice, two copies of the read are given to the aligner: one seeded with the 5’ mer match and the other with the 3’ mer match. The aligner then extends the alignment on both sides of each copy, and then trims each copy to maximize the final alignment score. It is the final trimmed internal position for each copy that is reported in SeqMan Pro’s Structural Variation Report.
Need more help with this?