The following tutorial shows how to do long read assembly in SeqMan NGen, and contains optional steps for assembly validation in QUAST (Quality Assessment Tool for Genome Assemblies). The sequence data consists of an Oxford Nanopore Technologies (ONT) MAP006-1 .fastq file from E. coli strain MG1655.
This tutorial includes an step in which you will add a reference sequence in order to assist SeqMan NGen in creating a scaffold. Reference-guided scaffolding is optional, but is commonly called for when: 1) the assembly was broken up into multiple contigs, 2) the genome involves multiple chromosomes (e.g. yeast) and/or plasmids or 3) both of the above. Since long reads contain no pair information, reference-guided scaffolding is a way to provisionally order and orient the contigs. For an organism like the yeast S. cerevisiae that has sixteen chromosomes and two plasmids, mitochondrion and two micron, the scaffolding automatically associates each contig with its canonical chromosome rather than have eighteen or more contigs in random order to sort through. It’s important to note that, since structural variation is common between species and strains, the ordering and orientations are provisional. As always, the closer the reference sequence is to the sequence strain, the better. More distantly related sequences will likely yield inconsistent/inaccurate scaffolding.
Need more help with this?