DNASTAR Lasergene lets you set up a de novo assembly with ease. One of the outputs is an editable .sqd file that can be opened and edited in SeqMan Ultra. In SeqMan Ultra, you can evaluate the assembled contigs, edit them, organize them into scaffolds, and close any gaps in those scaffolds.
This tutorial uses a de novo whole genome assembly that was assembled from two MiSeq 2×300 paired-end read files from E.coli K12 MG1655. SeqMan NGen was used to assemble 2.5M reads from this data set, producing an assembly with a large contigN50 of 203Kbases in 51 contigs. Due to the size of the reads (28 GB combined), this tutorial will begin at the stage of downstream analysis in SeqMan Ultra.
- Download T3_Whole_Genome_DeNovo.zip (332 MB) and extract it to any convenient location (i.e., your desktop). The data set consists of a single SeqMan Ultra project named E.coli K12 MG1655 MiSeq de novo.sqd.
- Launch SeqMan Ultra and use File > Open to open E.coli K12 MG1655 MiSeq de novo.sqd. Note that the Project Overview shows that the Contig N50 is 204.0 kb, which is quite large. Also observe that the assembly resulted in 51 contigs.
- The header of this section contains a drop-down menu. Use the menu to select Project Details. This section has additional statistics followed by the complete script that was used by SeqMan NGen to assembly the reads. In the upper section of the report, observe that there are an average of 42798 sequences per contig and that average depth of coverage is 135.
In the next few steps, you will check for potential assembly errors and then correct them. Assume that you have already performed these steps for Contigs 1-16 and have found no errors.
- In the Explorer panel on the right, select Contig 17 and press the Show strategy view tool () on the right.
- To limit the display to show only arrows for inconsistent pairs, click the Style tab on the right. In the Strategy area, use the Show menu to select Inconsistent. The orange arrows represent pairs in the middle of contigs that are not matched up with their mate; the numbers associated with these pairs indicate the contig where their “mate” resides.
- Use the horizontal zoom slider and scroll until you are centered on the area around 13,000. Observe the large number of orange (inconsistent) arrows in this section, as well as orange coloration in the Pair Consistency graph in the header. This represents a possible mis-assembled contig. Note that Contigs 5, 19 and 22 are referenced by both left- and right-facing arrows, indicating the region is likely a repeat. This is a good candidate for a contig split.
- Open the Alignment view for this contig by right-clicking in any white portion and choose Show in Alignment view. Use the horizontal zoom slider and the horizontal scroll bar to navigate to the same area visible in the Strategy view. Insert the cursor in the low pair-consistency region; the thinnest area of green in the Coverage histogram.
- Choose Contig > Split at Insertion to split the contig into two contigs.
- Return to the Explorer panel by clicking the Explorer tab on the right.
- Select the uppermost row, All Contigs then right-click and choose Order Contigs into Scaffolds. When prompted, confirm that you wish to perform scaffolding. SeqMan Ultra uses the pair information at the ends of the contigs to order them into scaffolds.
Fourteen scaffolds are created, numbered from 100-230. The reason that microbial genomes do not assemble into a single contig (when using short reads) is a high prevalence of repetitive elements, like transposons that occur throughout the genome.
- Now that the contigs have been ordered into scaffolds, you can now merge adjacent overlapping contigs. To do this, select Scaffold 100 and choose Contig > Align Contigs End to End. Keep the default settings and press OK. Repeat this procedure for all fourteen scaffolds. On occasion, you may get a message saying that the alignment didn’t work. In this case, just move on to the next scaffold.
- Use Ctrl/Cmd+click to select cach contig that is in a scaffold. Then drag and drop the entire selection anywhere above Scaffold 100. Finally, select all of the now-empty scaffolds and press the Delete key on your keyboard to remove them.
- To see the effects of these edits and realignments, use Project > Project Overview.
Note that the Contig N50 has increased from 204 kb to 290 kb, and that the number of contigs has decreased from 51 to 36.
In a real-world situation, you could continue merging contigs and closing gaps via one or more of the following:
- Continue to make contig edits, create scaffolds and merge contigs to close additional gaps, as above.
- Perform a BLAST search on contig ends to determine genome coordinates, then manually create new scaffolds and attempt additional end-to-end alignments.
- Resolve remaining gaps by adding new data (e.g., Sanger reads) and using the gap closure workflow.
This marks the end of this tutorial.
Need more help with this?