Specifying paired-end reads:

If you will be using paired-end reads for the assembly, check the Paired-end data box in the Input Sequence Files and Define Experiments or Individual Replicates screen. Ideally, this should be done after specifying a read technology, but before adding reads. The box is checked, by default, if you choose Illumina as the read technology; otherwise, it is unchecked. If you are doing a whole genome reference-guided workflow with gap closure, you must check this box and upload paired data.

To add files or folders of files:

  • Local assemblies – Add files using the Add button or add folders of files using the Add Folder button. In the file explorer, navigate to and select the desired file(s)/folder(s), and then click Open.
  • Cloud assemblies – Add files or folders using the Add button. This takes you to the Cloud Data Drive. Navigate to the desired file(s) or folder and then click the green check mark ().

To remove files:

If you would like to remove a file from the list, select it and click Remove.

To specify the insert size and other pair information:

If you check the Paired end data box and add sequences, the Set Pair Information dialog will pop up automatically.

Different options are available depending on the read technology you selected in the Input Sequence Files and Define Experiments or Individual Replicates screen. For example, the following version of the dialog appears if you are following special reference-guided or de novo (except RNA-Seq) workflows and specify paired Sanger reads. A different version would appear for other workflows or data types.

  • Insert size – Enter the anticipated distance between paired end reads across the library. SeqMan NGen will use this value to automatically calculate the minimum and maximum insert distances. This box may originally be blank or may contain a changeable default value, depending on the read technology you chose. The default value is 3000 bp for Sanger data.
  • Discard reads without linkers – (read technology = Ion Torrent only) When you input an Insert size and leave the box checked, clicking OK will launch the Pair Technology Input pop-up dialog.

Choose between Standard Linker and Custom Linker. If you choose the latter, you must paste or type the junction linker in the box provided. Click OK to return to the Input Sequence Files and Define Experiments or Individual Replicates screen.

  • Name pattern – (read technology = Sanger only) In order for NGen to identify Sanger pairs using a sequence naming convention, the convention must systematically distinguish between different pair reads while specifying which pair reads are associated. Forward and reverse sequences must have identical names except for the unique portion that determines the direction of the clone.

If applicable, select one of the following predefined file naming patterns from the Name Pattern drop-down menu:

    • sample_f.abi < > sample_r.abi

    • sample100.f_abc.abi < > sample100.r_abc.abi

    • sample_n100.abi <: > sample_f100.abi

    • SAMPL0D1234.abi <: > SAMPL0E1234.abi

If none of the predefined patterns matches your file naming convention, you may select Custom Pair Specifier from the dropdown list, and then manually enter the appropriate expressions for Forward and Reverse naming conventions.

Once you are finished, click OK to return to the Input Sequence Files dialog. The insert size you specified now appears in the “Insert Size (bp)” column.

Considerations related to the Set Pair Information dialog:

  • During assembly, SeqMan NGen lists this value as a range. If, for example, you enter an Insert size of 300, the Assembly Log will list the value as “0 to 450.” This convention does not impact assembly results.
  • For short inserts containing fewer than 1000 bases, SeqMan NGen sets the minimum size to 0 to catch smaller outliers, which tend to be common. For larger inserts, it sets the minimum to half of the Insert size, with the exception of Illumina data, which is set to 0. Long insert Illumina reads have a minimum of 0 because only half the reads consist of long inserts. The other half consist of short inserts (~300 bp), with the short inserts pointing towards one other, and the long inserts pointing away. The 0 value is used by SeqMan NGen’s small genome assembler as a flag to account for the undetermined insert size.

