A “scaffold” is a collection of contigs that are related based on their known spatial relationships rather than their sequence similarity. Now that you have completed Part B, step 3, this section will show how to use SeqMan Pro to automatically organize likely adjacent contigs into scaffolds using their paired end sequence information.
- Choose File > Save As and save a copy of the assembly as De novo assembly-scaffolded.sqd.
- Choose Edit > Select All to select all 100+ contigs in the Project Summary window.
- To create scaffolds, use Project > Order Contigs in Scaffold. When prompted, click the Order button.
When ordering is complete, the Report opens automatically and lists any contigs that were automatically reverse complemented prior to being added to a scaffold.
- In the Project Summary window, observe that SeqMan Pro created new scaffolds (numbered in increments of 10) for each set of contigs for which paired end data implies physical linkage. Contigs for which there was no evidence of physical linkage were placed in the “Unlocated Contigs” area.
- Scroll past the unordered contigs and select Scaffold 130, which contains four contigs. (These may be listed in a different order from that shown in the image.)
- Choose Contig > Scaffold Strategy View.
- To see the location of each contig more clearly, click and hold down the mouse button on the Show All Reads tool ( ), and then select the Customized display tool ().
In the ensuing dialog, first select Custom to clear all the selections, then close the dialog. The layout of the contigs is now shown by separate black lines.
Next, we will investigate the pair support for Scaffold 130.
- Click the Customized display tool ( ). This time, choose Consistent & Grouped in the left column. Two boxes in the right column will be checked automatically. Close the dialog.
- Under “Contig 11” on the left of the Strategy View, select the first constituent sequence (…695/2).
- Click the Zoom In tool () until the arrows appear similar to those in the image below.
Scroll down and note that there are actually two highlighted arrows: one for Contig 11 (sequence …695/2) and one for the adjacent contig 90 (sequence …695/1).
The arrows’ dark blue color signifies paired reads located in different contigs whose assembly locations are consistent. The fact that the paired reads are located in two different contigs provides evidence that these contigs are adjacent.
- At approximately position 400 on the ruler, the Coverage Threshold display shows an area of single-stranded coverage, denoted by a thin red line.
Note that the light blue arrows in this area all show the number ‘79.’ These arrows represent paired reads in different contigs (in the same or different scaffolds) whose assembly locations could be consistent with Pair Specifier parameters if the contigs were rescaffolded or reordered. The number appearing next to the pale blue arrows indicates the number of the contig containing the other member of the pair.
Further exploration with these data could include:
- Using the light blue arrows as a guide to reorder or rescaffold contigs in the assembly.
- Manually verifying the relative positions of unlocated contigs and scaffolds and joining them together using Contig > Force Join Contigs. If necessary, contigs can be complemented first using Contig > Complement Contig.
Need more help with this?