When performing multiple calculations, the steps must be set up in a particular way. For example, if the input file for one step is the output file from an earlier step, you may obtain unexpected results. Specifically, the final output may have additional sequences and/or features.
Here is an example of a workflow that can lead to such unexpected results:
- Step 1: Extract features from TEST.GBK and save them as EXTRACTED.GBK.
- Step 2: Translate EXTRACTED.GBK and save the results as TRANSLATED.GBK.
- Result: The file TRANSLATED.GBK may contain unexpected data (e.g., an extra translated feature).
This situation arises when there are overlapping CDS features. In such cases, a piece of one CDS will end up being annotated in the interval it shares with the second CDS. Then, when the EXTRACTED.GBK file is translated, that particular sequence will result in two protein sequences: the desired full length CDS, and the fragment of the overlapping CDS. Note that this is not an issue when the intermediate file is in FASTA format since there is no carryover of information about the overlapping CDS.
Need more help with this?