By default, SeqMan NGen uses a local match percentage which requires that the match percentage threshold be met in each overlapping window of 50 bases. The size of this window can be adjusted by specifying a different value for the match window parameter.
An example containing a repeated region follows.
A genome fragment has repeated regions labeled A and A’, and two unique regions labeled B and C.
When the fragment is sequenced, one of the sequences contains parts of regions A and B, and another contains parts of regions A’ and C:
In this example, a minimum match percentage of 80% is used. When the two sequences are aligned, the 400 bases in the overlapping A and A’ regions match 100%. The 200 bases in the overlapping B and C regions match 42%. Over the entire alignment, 484 out of 600 bases match, yielding a global match percentage of 81%.
However, SeqMan NGen checks the match percentage for every alignment of 50 bases. The alignment below shows the last 36 overlapping bases of A and A’ and the first 18 overlapping bases of B and C. Each mismatch in the overlap is marked by an X below the alignment. In the first 50 bases shown, there are 41 matches, and the match percentage is 82%. This is above the threshold of 80%, so the match percentage of the next 50 bases is checked and is also found to be 82%.
Each fifty bases are checked along the overlap as long as the match percentage is at or above the threshold. In this case, the alignment fails once it gets far enough into the overlap of the unique regions, B and C, that the match percentage drops to 78%. The sequences will not be assembled together into a contig, which is correct for this data set.
Need more help with this?