Adding and Removing Sequence Read Files

IMPORTANT: If you are following the Sanger Validation workflow, you will see a slightly different version of this dialog, with separate areas for adding Sanger and non-Sanger data. Refer to the help topic Sanger Validation Workflow before adding your files.

 

Specifying paired-end reads:

 

If you will be using paired-end reads for the assembly, check the Paired-end data box in the Input Sequence Files and Define Experiments or Individual Replicates dialog. Ideally, this should be done after specifying a read technology, but before adding reads. The box is checked, by default, if you choose Illumina as the read technology; otherwise, it is unchecked. If you are doing a reference-guided assembly with gap closure, you must check this box and upload paired data.

 

Note 2: If you check the Paired-end data box before you specify a read technology, a popup dialog will prompt you to choose the desired read technology from a list.

 

 

To add files or folders of files:

 

      Local assemblies – Add files using the Add button or add folders of files using the Add Folder button. In the file explorer, navigate to and select the desired file(s)/folder(s), and then click Open.

 

      Cloud assemblies – Add files or folders using the Add button. This takes you to the Cloud Data Drive. Navigate to the desired file(s) or folder and then click the green check mark ().

 

 

To remove files:

 

If you would like to remove a file from the list, select it and click Remove.

 

 

Specifying the insert size and other pair information:

 

If you check the Paired end data box and add sequences, the Set Insert Size for Pairs dialog will pop up automatically.

 

Different options are available depending on the read technology you selected in the Input Sequence Files and Define Experiments or Individual Replicates screen. For example, the following version of the pop-up appears if you are following special reference-guided or de novo (except RNA-Seq) workflows and specify paired Sanger reads. A different version would appear for other workflows or data types.

 

 

      Insert size – Enter the anticipated distance between paired end reads across the library. SeqMan NGen will use this value to automatically calculate the minimum and maximum insert distances. This box may originally be blank or may contain a changeable default value, depending on the read technology you chose. The default value is 3000 bp for Sanger data.

 

Note: During assembly, SeqMan NGen lists this value as a range. If, for example, you enter an Insert size of 300, the Assembly Log will list the value as “0 to 450.” This convention does not impact assembly results.

 

      Discard reads without linkers – (read technology = Ion Torrent only) When you input an Insert size and leave the box checked, clicking OK will launch the following pop-up dialog.

 

 

Choose between Standard Linker and Custom Linker. If you choose the latter, you must paste or type the junction linker in the box provided. Click OK to return to the Input Sequence Files and Define Experiments or Individual Replicates dialog.

 

      Name pattern – (read technology = Sanger only) In order for NGen to identify Sanger pairs using a sequence naming convention, the convention must systematically distinguish between different pair reads while specifying which pair reads are associated. Forward and reverse sequences must have identical names except for the unique portion that determines the direction of the clone.

 

If applicable, select one of the following predefined file naming patterns from the Name Pattern dropdown list:

 

sample_f.abi < > sample_r.abi

 

sample100.f_abc.abi < > sample100.r_abc.abi

 

sample_n100.abi <: > sample_f100.abi

 

SAMPL0D1234.abi <: > SAMPL0E1234.abi

 

If none of the predefined patterns matches your file naming convention, you may select Custom Pair Specifier from the dropdown list, and then manually enter the appropriate expressions for Forward and Reverse naming conventions.

 

Note: Naming conventions should use a subset of regular expressions which utilize elements of the Grep language. For more information, see Example Regular Expressions.

 

Once you are finished, click OK to return to the Input Sequence Files dialog. The insert size you specified now appears in the “Insert Size (bp)” column.

 

Considerations related to the Set Pair Information dialog:

 

      During assembly, SeqMan NGen lists this value as a range. If, for example, you enter an Insert size of 300, the Assembly Log will list the value as “0 to 450.” This convention does not impact assembly results.

 

      For short inserts containing fewer than 1000 bases, SeqMan NGen sets the minimum size to 0 to catch smaller outliers, which tend to be common. For larger inserts, it sets the minimum to half of the Insert size, with the exception of Illumina data, which is set to 0. Long insert Illumina reads have a minimum of 0 because only half the reads consist of long inserts. The other half consist of short inserts (~300 bp), with the short inserts pointing towards one other, and the long inserts pointing away. The 0 value is used by SeqMan NGen’s small genome assembler as a flag to account for the undetermined insert size.

 

      If you specified Ion Torrent read technology in the Input Sequence Files and Define Experiments or Individual Replicates dialog and enter a value of 0-799 in the Set Pair Information dialog, SeqMan NGen assumes the library is paired end (small insert). For values ≥ 800, the library is assumed to be mate pair (long insert).