When you initiate a pairwise alignment, you are prompted to select one of three alignment algorithms: Local, Global and Semi-Global. These algorithms are quite similar, although they can often produce very different results. All use a method called dynamic programming to find the best scoring alignment between two sequences. Alignment scores are computed by adding up per-base match scores and subtracting a penalty for opening a gap (of any length) and another for the number of positions that have gaps. The match scores are based on a scoring matrix such as NUC42 or BLOSUM62. It’s always a good idea to explore the effects of various settings of these three parameters to see if you can get a more desirable outcome.
Depending on your two sequences, the three methods can potentially yield widely different results, so it’s important to understand how they differ.
- – This alignment is a modernized variant of the one described by Smith-Waterman (1981), is designed specifically to find the highest scoring aligned segments of two sequences, even if the full extent of the two is not included in the final alignment. (Note: in MegAlign Pro, the “Show Context” check-box in the Style Panel lets you display any unaligned parts of the sequences flanking the aligned segments). Local alignment reports the highest scoring contigous segment of alignment between two sequences, even if the full extent of one or both of the sequences is not included in the final alignment. Local alignments are ideal for finding a short sequence within a larger sequence. Flanking segments of sequences that are not within the aligned segment can be visualized in MegAlign Pro by checking the Show context box in the Pairwise Alignment section of the of Style panel.
- – The alternative to locally aligning is to align globally. To do this MegAlign Pro uses two variants of the Needleman and Wunsch (1970) algorithm. Global aligners don’t try to find the best scoring segment, but instead require that the full extent of both sequences be included in their results. There is no requirement or guarantee that the best scoring pair of aligned segments from a local alignment will be aligned in a global alignment. Global alignment includes the full length of both sequences, even if this requires padding one or more of the sequence ends with gaps. MegAlign Pro considers overhangs and underhangs created this way as unaligned context. One example of a situation where a global alignment is preferred over a local alignment is when there are multiple, but disjoint, segments of aligned sequence. Examples when a global alignment is a good choice: 1) aligning a CDS or mRNA sequence to a gene that contains introns; 2) aligning two sequences that differ because of the presence of large insertions, such as might be caused by transposable elements. In both cases, a local alignment is less likely to reflect the full alignment, especially if the lengths of the unalignable inclusions are long relative to the gap extension penalty.
- – A relatively new approach that is particularly suitable when the two sequences differ greatly in length. When that happens, the longer sequence will have overhangs on either end of the alignment. Since overhangs are represented with gaps, a global aligner will attempt to increase the match score and minimize accumulated gap penalties by aligning parts of the shorter sequence to overhanging sequence region(s). This effect can produce a number of unrealistic, usually small aligned segments spaced by gaps near the ends of the alignment. Semi-global alignment is designed to address this problem by not penalizing gaps in overhangs (aka “end gaps”). Semi-global alignment is similar to global alignment, except the gaps placed at the ends of sequences are not penalized. A semi-global alignment might be more useful than global alignment in situations where long leading/trailing gaps might be suppressed in favor of a result that contains segments of aligned sequences punctuated by gaps.
The differences between these three pairwise approaches really can make an impact in the resulting alignment, but the choice of which to use really depends on your task. When two sequences are nearly identical (check by performing a multiple alignment and consulting the Distance view), all pairwise methods should work equally well. For basic cases, such as aligning two genes or proteins, Local alignment is a good starting point, but when things get more complicated, Global or Semi-Global may be the way to go. See this page from Trinity College-Dublin’s Department of Genetics for a good synopsis of Needleman-Wunsch vs. Smith-Waterman algorithms.
Need more help with this?