End-Trimming Based on Averaged Quality Scores

SeqMan Pro uses averaged quality scores (Q/n) to identify regions of poor quality data at the end of sequences. Averaged quality scores are calculated as the average of the quality scores, Q, over a window of 21 bases. The average score is assigned to the base in the center of the window. Averaging the scores smooths out the quality scores and quantifies the general quality of data in a region. To perform quality end-trimming, a threshold is set and the longest sequence of bases with all Q/n meeting the threshold is identified. Below-threshold ends to either side of the high-quality region of the sequence are trimmed off before assembly.

 

In the example below, the quality scores, Q, and averaged quality scores, Q/n, are graphed for the 5’ end of a 794 base pair sequence. A dashed horizontal line marks the quality end-trimming threshold. The average scores, Q/n, are compared to the threshold and the first 14 bases are trimmed from the 5’ end of this sequence.

 

 

image8

 

 

Poor quality data on the ends of sequences often contain miscalled bases that produce mismatches in alignments with other sequences. If the number of mismatches is high enough that SeqMan Pro’s Minimum Match Percentage threshold is not met, sequences will not be assembled in the same contig. Trimming the poor quality data from the ends of sequences allows better and more complete assembly.

 

The image below compares Q to Q/n scores for the 5’ end of a sequence.

 

image7