Conflict Split Parameters

Note: This topic is not applicable to BAM-based projects.

 

The Conflict Split parameters allow you to specify your own criteria for what may constitute a region of sequence differences at which you may want to split contigs.

 

Access these parameters by selecting Project > Parameters and choosing Conflict Split from the list on the left.

 

 

      Min Coverage - the minimum number of reads that must cover the region before it is considered a candidate for splitting. For example, if this parameter was set to 4, and a column contained only 3 sequences, SeqMan Pro would ignore the column. Only columns with at least 4 sequences would be examined.

 

      Min Inconsistent - the minimum number of reads that must differ in sequence among the reads covering the column with conflicts.

 

      Min Percent Inconsistent - the minimum percentage of reads that must differ in sequence among the reads in the column with conflicts.

 

Note: The Min Inconsistent and Min Percent Inconsistent parameters are used in conjunction to determine if a candidate for splitting exists. SeqMan Pro requires data to pass both thresholds before declaring a candidate. (See text below for additional information about these parameters.)

 

The number of inconsistent bases in a column is the number of times its second most common base appears. In the single column of bases “AAGGA,” A is the most commonly occurring base and G is the second most commonly occurring base. The number of inconsistent bases in the column is 2.

 

A threshold for the number of inconsistent bases in a column can be computed either as an absolute number or as a fraction of the total number of bases in the column. In the example above, the absolute number was 2 and the fraction was 40%. In low coverage areas, it makes sense to use the absolute number. However, in higher coverage areas, it makes more sense to use the percentage as a guide. If the number of inconsistent bases is 3 with coverage of 6, there may be a compelling case for splitting the contig. If, however, the number of inconsistent is 3 and the coverage is 30, the case is much less compelling. Given this, the threshold for identifying a candidate split is the maximum of Min Inconsistent and Min Percent Inconsistent multiplied by the column coverage.

 

As an example, imagine you set Min Inconsistent to 2 and Min Percent Inconsistent to 25%. The following table lists thresholds for coverages from 4 to 20. For a given coverage, a candidate split will be suggested if the number of inconsistent bases is at least as much as the threshold.

 

Coverage

Threshold

4-9

2

10-13

3

14-17

4

18-20

5