Controlling Ambiguous Calls

Note: This topic is not applicable to BAM-based projects.

 

To force SeqMan Pro to call one of the four bases (A,G,C,T) rather than ambiguities (R,Y,W,S, etc.) for sequence trace data, you need to decrease the Evidence Percentage value in the Consensus Calling parameters.

 

Evidence Percentage is a variable in SeqMan Pro’s Consensus Calling parameters. This section explains how trace quality data are evaluated and how modifying this parameter affects consensus calling.

 

Evidence Percentage controls the stringency used by the Trace Quality Evaluation system to make unambiguous calls in the consensus sequence. Fluorescence trace data are imperfect in that the “primary peak” for a particular base in the trace data typically coincides with much smaller “secondary peaks” that represent one or more other bases. However, secondary peaks may occasionally reflect a substantial fraction of the fluorescence signal in a particular position due either to the presence of heterozygosity at that position or to an experimental artifact. When a sequence read originates from a heterozygote, the heterozygous position should in theory display two coincident peaks of the same amplitude—though this ideal is rarely observed in practice.

 

The Evidence Percentage value can be adjusted so that you can weigh the risk of miscalling a base against the risk of failing to detect a questionable consensus call or genuinely heterozygous position. You control this balance by adjusting the Evidence Percentage value.

 

Setting Evidence Percentage to a high value imposes a high stringency for unambiguously calling a base in the consensus—ambiguous calls (represented by IUB codes) are more likely to be made for the consensus when the Evidence Percentage is increased. If you expect to detect heterozygotes, you may want to increase the Evidence Percentage above the 50% default, but you should be aware that setting it too high might result in spurious heterozygous/ambiguous calls in the consensus.

 

Reducing the Evidence Percentage value reduces the stringency for calling bases unambiguously, so ambiguous consensus calls are less likely to be made. However, reducing the percentage too far increases the risk of making an unambiguous call in the consensus when the evidence for that call is equivocal, or when there is evidence that the position is heterozygous.

 

The Evidence Percentage is related to the trace evidence scores that SeqMan Pro computes internally for each position in the consensus. For example, suppose SeqMan Pro assigns scores for A, C, G, T and gap in one particular position of 30, 48, 200, 0 and 0 respectively. SeqMan Pro then determines how to call the consensus sequence as follows:

 

Each score is expressed as a percentage of the highest score. For the example above this yields 15% for A, 24% for C, 100% for G, 0% for T and 0% for a gap. G has the 100% score and the others compete with G for inclusion in the call with 15, 24, 0, and 0% of the maximum evidence percentage.

 

The “competition-free percentages” are then calculated by subtracting the percentage values for the competing bases from 100. In the example, this yields competition-free values of 85% for A, 76% for C, 100% for T and 100% for gap. Another way of thinking of this would be that there is 100% confidence that the consensus should be G rather than T or gap, but only 85% confidence it should be G rather than A, and 76% confidence it should be G rather than C.

 

The competition-free percentages are compared with the Evidence Percentage value chosen, and the base(s) with competition-free percentages below the Evidence Percentage threshold are used to make the consensus call.

 

If, in the example, Evidence Percentage were set to 75%, then the competition-free percentages for A, C, T and gap would all lie above the threshold, and only G would lie below. In this case G would be called unambiguously in the consensus.

 

If instead, Evidence Percentage were set to 80%, then the competition-free percentage for C (76%) would fall below this threshold and both G and C would be included in the consensus, resulting in an ambiguous or heterozygous call of S.