Note: This topic is not applicable to BAM-based projects.
The Conflict Split parameters allow you to specify your own criteria for what may constitute a region of sequence differences at which you may want to split contigs.
Access these parameters by selecting Project > Parameters and choosing Conflict Split from the list on the left.
• Min Coverage - the minimum number of reads that must cover the region before it is considered a candidate for splitting. For example, if this parameter was set to 4, and a column contained only 3 sequences, SeqMan Pro would ignore the column. Only columns with at least 4 sequences would be examined.
• Min Inconsistent - the minimum number of reads that must differ in sequence among the reads covering the column with conflicts.
• Min Percent Inconsistent - the minimum percentage of reads that must differ in sequence among the reads in the column with conflicts.
Note: The Min Inconsistent and Min Percent Inconsistent parameters are used in conjunction to determine if a candidate for splitting exists. SeqMan Pro requires data to pass both thresholds before declaring a candidate. (See text below for additional information about these parameters.)
The number of inconsistent bases in a column is the number of times its second most common base appears. In the single column of bases “AAGGA,” A is the most commonly occurring base and G is the second most commonly occurring base. The number of inconsistent bases in the column is 2.
A threshold for the number of inconsistent bases in a column can be computed either as an absolute number or as a fraction of the total number of bases in the column. In the example above, the absolute number was 2 and the fraction was 40%. In low coverage areas, it makes sense to use the absolute number. However, in higher coverage areas, it makes more sense to use the percentage as a guide. If the number of inconsistent bases is 3 with coverage of 6, there may be a compelling case for splitting the contig. If, however, the number of inconsistent is 3 and the coverage is 30, the case is much less compelling. Given this, the threshold for identifying a candidate split is the maximum of Min Inconsistent and Min Percent Inconsistent multiplied by the column coverage.
As an example, imagine you set Min Inconsistent to 2 and Min Percent Inconsistent to 25%. The following table lists thresholds for coverages from 4 to 20. For a given coverage, a candidate split will be suggested if the number of inconsistent bases is at least as much as the threshold.
Coverage |
Threshold |
4-9 |
2 |
10-13 |
3 |
14-17 |
4 |
18-20 |
5 |