Paired end reads are typically in two files, or a small number of files if they are from multiple runs or lanes. These pairs are specified by a naming convention used in the .fasta file comment line.

For de novo assemblies with paired end reads, SeqMan NGen automatically adds the following information to the script:

setPairSpecifier pairs:
  { {
    forward: “(.*)/1”
    reverse: “(.*)/2”
    min: 0
    max: 750
    key: Illumina
  } }

If reads do not match one of the pair specifiers, or if the forward and reverse specifiers are represented by empty strings (““), the assembler will attempt to match using the whole name of the sequence. If exactly two reads have the same name, they will be considered a match.

For reference-guided assemblies, SeqMan NGen adds the following information:

  {
    is Pair: true
    file: “****”
    SeqTech: “Illumina”
    minDist: 0
    maxDist: 750
  }


For reference-guided assemblies with paired-end reads, SeqMan NGen recognizes the pairs by their file names. The following examples demonstrate some of the filename formats that SeqMan NGen supports for reference-guided pairs. Large-bold text in the examples is used to highlight the region of each filename that specifies the forward and reverse reads:

“R_2011_11_21_11_06_08_user_C29-100_PE_DH10B_11_Auto_C29-100_PE_DH10B_11_4120_reverse_pe2.fastq”,
“R_2011_11_21_11_06_08_user_C29-100_PE_DH10B_11_Auto_C29-100_PE_DH10B_11_4120_forward_pe1.fastq”,

“Strain1234_L7_*R1*_ATCACG_Index1.fastq”,
“Strain1234_L7_*R2*_ATCACG_Index1.fastq”,

“K12-1-B_TGACCA_L006_R1.fastq”,
“K12-1-B_TGACCA_L006_R2.fastq”,

“GBBC920_GGCTAC_L008_R1.filt.50bp.fastq”,
“GBBC920_GGCTAC_L008_R2.filt.50bp.fastq”

“tiny*_1*.txt”,
“tiny*_2*.txt”,

“tiny*_1*_sequence.txt”,
“tiny*_2*_sequence.txt”,

tiny1._qseq”,
tiny2._qseq”,

“s_1*_1*_sequence.txt”
“s_1*_2*_sequence.txt”

“C29-129_forward_pe1.fastq”
“C29-129_forward_pe2.fastq”


The Grep used to match the pairFileNames is shown below:

“(?‘name’.*?)_R1_(?‘ext’.*)\\.fastq”,
“(?‘name’.*?)_R2_(?‘ext’.*)\\.fastq”,

“(?‘name’.*?)_R1\\.(?‘ext’.*)\\.fastq”,
“(?‘name’.*?)_R2\\.(?‘ext’.*)\\.fastq”,

“(?‘name’.*?)_forward_pe1(?‘ext_p’\\.fastq)”,
“(?‘name’.*?)_reverse_pe2(?‘ext_p’\\.fastq)”,

“(?‘name’.*?)_{0,1}1\\.fastq”,
“(?‘name’.*?)_{0,1}2\\.fastq”,

“(?‘name’.*?)1\\.fastq”,
“(?‘name’.*?)2\\.fastq”,

“(?‘name’.*?)1_sequence\\.txt”,
“(?‘name’.*?)2_sequence\\.txt”,

“(?‘name’.*?)1\\.txt”,
“(?‘name’.*?)2\\.txt”,

“(?‘name’.*?)1\\._qseq”,
“(?‘name’.*?)2\\._qseq”,

“(?‘name’.*?)1\\.fq”,
“(?‘name’.*?)2\\.fq”,

The following script command can be used to add support for a new filename format. The command must be executed before assembly. The pattern will be used for all subsequent assembleTemplate commands for that run of the reference-guided assembler.

pairFilePattern forward: “(?‘name’.*?)_R1_(?‘ext’.*)\.fastq” reverse: “(?‘name’.*?)_R2_(?‘ext’.*)\.fastq”

Need more help with this?
Contact DNASTAR

Thanks for your feedback.