The following scripts can be used to convert genome coordinates between assemblies or to migrate annotations between different versions of a genome while preserving annotations.

In this example, researchers identified seven mutations in E. coli strains that did not appear in the reference sequence. They wanted to determine the impact of those changes on the features annotated in the genome. Therefore, they needed to migrate annotations between different versions of a genome while simultaneously converting genome coordinates between assemblies.

Goal 1 To convert the reference sequence into the mutant sequences with full annotation. The following changes are required:

  • 1 bp SNP “A” to “G”

  • insert “G” (duplicate preceding “G”)

  • 1 bp SNP “C” to “T”
Script 1 Mutant_A.gbk=Reference_Seq.gbk(1,547693)+"G"+

Reference_Seq.gbk(547695,547832)+Reference_Seq.gbk

(547832,3957956)+"T"+Reference_seq.gbk(3957958,rend)
Goal 2 To convert the annotated reference sequence to the mutant-B sequence, making the following changes:

  • < IS1 < insertion + 8 bp target duplication

  • 1 bp SNP “A” to “G”

  • insert “G” (duplicate preceding “G”)

  • > IS5 > insertion + 4 bp target duplication

  • insert “CC”

  • 1 bp deletion/frameshift (-G)

  • 1 bp SNP “C” to “T”
Script 2 Mutant_B.gbk=Reference_seq.gbk(1,257907)+complement(IS1.fas)
+Reference_seq.gbk(257900,547693)+"G"+Reference_seq.gbk(547695,547832)
+Reference_seq.gbk(547832,1298721)+IS5.fas+Reference_seq.gbk(1298718,2171386)
+"CC"+Reference_seq.gbk(2171387,3558477)+Reference_seq.gbk(3558479,3957956)
+"T"+Reference_seq.gbk(3957958,rend)

Need more help with this?
Contact DNASTAR

Thanks for your feedback.