The SeqMan NGen layout algorithm relies on unique subsequences of bases, or mers, which occur in overlapping regions of fragment reads. Mers that are common to two or more fragment reads are aligned to determine the overall layout of reads. Overlapping reads have many mers in common, but only a few mers per overlapping region are needed to identify the overlap. These mers are called mer tags. The use of mers to tag fragments and identify overlaps is illustrated in the following figure:
As shown in the above figure, a 54bp original DNA sequence is covered by five overlapping fragment reads. The 6-mer tags for each fragment read are underlined. Matching mer tags are aligned to determine the layout of the reads.
The power of using mer tags relies on the ability of SeqMan NGen to choose mers that are most likely to occur only once in the original DNA sequence. It is important to avoid choosing mers that occur in repeated regions since the result may be fragment reads that are incorrectly aligned together.
Three parameters are involved in choosing mer tags: Match Size, Repeat Handling, and Match Spacing. All of these parameters can be adjusted in the Advanced Assembly Options dialog.
The Match Size and Repeat Handling parameters help to choose tags that are most likely to be unique in the original DNA sequence. Match Size sets the length of the mers. The longer the mer, the higher the probability that it is unique. Repeat Handling parameters help to identify which mers are not likely to be unique. If a mer occurs more often than expected in the dataset, the mer may be part of a repeated region.
Match Spacing specifies the preferred distance between mer tags. The smaller the Match Spacing parameter value, the more memory and more time the assembly will take. If a fragment read is shorter than the Match Spacing value, multiple mer tags are still chosen for the read.
Need more help with this?