Paired End Specification Language

In order to enable SeqMan to display paired end sequence relationships, you must specify a sequence naming convention that systematically distinguishes between different reads while also specifying which reads are pairs.

 

If none of the predefined pair patterns found in the Pair Specifier parameters window match your sequence naming convention, you may create your own expression using a subset of regular expressions which utilizes elements of the Grep language.

 

Examples of expressions you may find useful for paired end naming specifications follow. Using these examples, you should be able to create expressions that are valid for your own particular projects. In essence, parts of sequence names that are the same in a pair of reads are specified inside parentheses, and parts of the names that distinguish members of the same pair are placed outside the parentheses.

 

Please note this is not a complete list of regular expressions, and the definitions of the terms used are limited to their application to SeqMan paired end naming specifications.

 

Example expressions and their meanings:

 

d

Literally the letter d

\d

Any digit (0-9)

\d*

Zero or more digits

\d+

One or more digits

(\d+)

A phrase comprising one or more digits--same as “\d+” but causes SeqMan to match the names from the string inside the phrase when other characters in the name may not match.

\.

Literally the period symbol (.)

.

Any character

.+

One or more of any characters

.*

Zero or more of any characters

a|b

a OR b

ab[i1]

abi or ab1

abi$

Ends with abi

[\.\d]

A period OR a digit

[abc]

a OR b OR c

[abc]+

One or more characters from the set a, b, c

.*f

Any number of any characters followed by the letter “f”

(.*)f

A phrase comprising any number of any characters, followed by the letter “f”--same as “*.*f”, but causes SeqMan to match the phrase in parentheses without matching the “f” in a read name

(\D+)r(\d+)

One or more non-digit characters followed by “r” followed by one or more digits.

(\d{2,4})f(\.abi)

Two, three or four digits followed by “f” followed by “*.abi”