Coding Prediction – Local Compositional Complexity - User Guide to GeneQuest

This method, located in the More Methods section, identifies regions rich in information. These regions often correlate with coding regions and regulatory elements. Results can be presented as line graphs and/or region plots. Information content is based upon the Shannon information theory formula.

H = -Σ p(i) log p(i)

Where:
i = {A G T C}
p(i) = N_i/L
N_i = count of base i in the context length and L = context length.

The method provides a quick way to evaluate the information content of a region of DNA. While it is not reliable for distinguishing coding from non-coding sequence, it can distinguish random or repetitive DNA from biologically interesting DNA.

To change method parameters:

Double click on the method name in the Method Curtain; or select the method display and then Analysis > Method Parameters.

Specify Context. This field determines the value of “L,” the number of bases examined at a single time.

Decide whether or not to select Smoothing Window. This feature averages the context peaks over the specified range.

Specify Threshold, the level at which a line plot is called informative or not. In most cases, the default values are appropriate.

You may wish to raise or lower Context or Smoothing Window depending upon the length of your sequence and the preponderance of repetitive sequence within the region. Low values may distinguish repetitive from non-repetitive DNA more accurately, but at the cost of increased noise.

To generate a summary table of codon usage for any selected segment of the sequence, which may also be useful in identifying coding regions, choose Analysis > Codon Usage.

Coding Prediction – Borodovsky

Coding Prediction – Starts Stops ORFs

Need more help with this?
Contact DNASTAR