Choosing a pairwise alignment method - User Guide to MegAlign Pro

MegAlign Pro has four pairwise alignment algorithms.

Align to Chromosome: DNASTAR is used to perform individual pairwise alignments of 1 to 25 short sequences (typically cDNA sequences) to a chromosome-length reference sequence. This method was released with Lasergene 17.5 in July 2023, and is adapted from a proprietary algorithm used in SeqMan NGen. The new MegAlign Pro algorithm can align sequences in which one sequence is thousands of times longer than the other. The short sequences must consist of extracted CDS features. You can create this type of sequence by using the Extract Features as Sequences template in SeqNinja, an application that is included with each Lasergene package. To perform a pairwise alignment of this type, see Pairwise alignment of a short sequence to a chromosome.

Local, Global and Semi-Global are used to align sequences that are similar in length (i.e., within one or two orders of magnitude). To learn how to perform this more general type of pairwise alignment, see Pairwise alignment for sequences with similar lengths.

These three algorithms work in a similar fashion, although they can often produce very different results. All use a method called dynamic programming to find the best scoring alignment between two sequences. Alignment scores are computed by adding up per-base match scores and subtracting a penalty for opening a gap (of any length) and another for the number of positions that have gaps. The match scores are based on a scoring matrix such as NUC42 or BLOSUM62. It’s always a good idea to explore the effects of various settings of these three parameters to see if you can get a more desirable outcome. Depending on your two sequences, the three methods can potentially yield widely different results, so it’s important to understand how they differ.

Local Pairwise Alignment – This alignment is a modernized variant of the one described by Smith-Waterman (1981), is designed specifically to find the highest scoring aligned segments of two sequences, even if the full extent of the two is not included in the final alignment. (Note: in MegAlign Pro, the “Show Context” check-box in the Style Panel lets you display any unaligned parts of the sequences flanking the aligned segments). Local alignment reports the highest scoring contigous segment of alignment between two sequences, even if the full extent of one or both of the sequences is not included in the final alignment. Local alignments are ideal for finding a short sequence within a larger sequence. Flanking segments of sequences that are not within the aligned segment can be visualized in MegAlign Pro by checking the Show context box in the Pairwise Alignment section of the of Style panel.

Global Pairwise Alignment – The alternative to locally aligning is to align globally. To do this MegAlign Pro uses two variants of the Needleman and Wunsch (1970) algorithm. Global aligners don’t try to find the best scoring segment, but instead require that the full extent of both sequences be included in their results. There is no requirement or guarantee that the best scoring pair of aligned segments from a local alignment will be aligned in a global alignment. Global alignment includes the full length of both sequences, even if this requires padding one or more of the sequence ends with gaps. MegAlign Pro considers overhangs and underhangs created this way as unaligned context. One example of a situation where a global alignment is preferred over a local alignment is when there are multiple, but disjoint, segments of aligned sequence. Examples when a global alignment is a good choice: 1) aligning a CDS or mRNA sequence to a gene that contains introns; 2) aligning two sequences that differ because of the presence of large insertions, such as might be caused by transposable elements. In both cases, a local alignment is less likely to reflect the full alignment, especially if the lengths of the unalignable inclusions are long relative to the gap extension penalty.

Semi-Global Pairwise Alignment – A relatively new approach that is particularly suitable when the two sequences differ greatly in length. When that happens, the longer sequence will have overhangs on either end of the alignment. Since overhangs are represented with gaps, a global aligner will attempt to increase the match score and minimize accumulated gap penalties by aligning parts of the shorter sequence to overhanging sequence region(s). This effect can produce a number of unrealistic, usually small aligned segments spaced by gaps near the ends of the alignment. Semi-global alignment is designed to address this problem by not penalizing gaps in overhangs (aka “end gaps”). Semi-global alignment is similar to global alignment, except the gaps placed at the ends of sequences are not penalized. A semi-global alignment might be more useful than global alignment in situations where long leading/trailing gaps might be suppressed in favor of a result that contains segments of aligned sequences punctuated by gaps.

The differences between these three pairwise approaches really can make an impact in the resulting alignment, but the choice of which to use really depends on your task. When two sequences are nearly identical (check by performing a multiple alignment and consulting the Distance view), all pairwise methods should work equally well. For basic cases, such as aligning two genes or proteins, Local alignment is a good starting point, but when things get more complicated, Global or Semi-Global may be the best choice.

Pairwise alignment of a short sequence to a chromosome

Try It! – Follow a multiple alignment with Global pairwise alignments

Need more help with this?
Contact DNASTAR