This example can be adapted to extract relevant annotated features (e.g., specific feature types or all features with specified annotations) for uses such as building BLAST databases or consensus matrices or performing alignments. Script A generates an overlapping CDS (specifically, the yeaC fragment). Using Scripts B or C solves this problem. Scripts B and C produce identical results, but only B also outputs the nucleotide sequence file.

Goal To extract a set of annotated CDS features from a genome as protein sequences
Script A m54sCDS.gbk=extract(m54s.gbk, 'CDS')

Output A LOCUS U00096:yeaA 414 bp DNA 13-JAN-2012

FEATURES Location/Qualifiers

Source 1..414

Source complement (1..414)


/note=”***Needs review***Cut segment head by 1860039 and tail by 2778768 units.”

/organism=“Escherichia coli”

CDS 411..414



/note=”***Needs review***Cut segment tail by 314 units.”

CDS 1..414




1 atggctaata aaccttcggc agaagaactg aaaaaaaatt tgtccgagat gcagttttac
61 gtgacgcaga atcatgggac agaaccgcca tttacgggtc gtttactgca taacaagcgt
121 gacggcgtat atcactgttt gatctgcgat gccccgctgt ttcattccca aaccaagtat
181 gattccggct gtggctggcc cagtttctac gaaccggtaa gtgaagaatc cattcgttat
241 atcaaagact tgtcacatgg aatgcagcgc atagaaattc gttgcggtaa ctgtgatgcc…
Script B m54sCDS.fas=extract(m54s.gbk, 'CDS')m54s_proteins2.fas=translate("m54sCDS.fas", '/transl_table=11')
Script C m54s_proteins3.fas=translate("m54s.gbk")

Need more help with this?

Thanks for your feedback.