A substitution matrix describes the rate at which a nucleotide or amino acid changes to another nucleotide or amino acid over time. When performing a pairwise alignment, you can specify the desired substitution matrix in the (Pairwise) Alignment Options dialog.
|NUC44|| DNASTAR’s modified version of NCBI’s NUC.4.4 algorithm, the modification being that U is treated as a synonym of T. In NUC44, exact matches, and T:U matches score as 5, and mismatches between unambiguous bases [G A T C U] score as -4. Matches between bases and ambiguous symbols [S W R Y K M B V H D N] have intermediate scores. A base versus a 2-way ambiguous category [R Y W S K M] to which it belongs scores as +1, and a mismatch to a 2-way group to which it doesn’t belong scores as -4.
Example: C is in [S R M] but not in [W Y K] . The 3-way groupings are [B V H D] where C is in all but D (which means not C). Therefore, C vs [B V H] scores as -1 while C vs [D] scores as -4.
|BLOSUM||(Henikoff & Henikoff, 1992). These matrices are ideal for carrying out similarity searches.||Available matrices range from 30-100, and are provided in increments of 5 and 62. Choose larger numbers for less divergent sequences.|
|GONNET||Derived from PAM matrices (Dayhoff et al., 1978) but more sensitive, and based on a much larger data set.||(Unchangeable default of 250)|
|IDENTITY||Scores two identical amino acids as 1, and anything else as -10,000.||N/A|
|MATCH||Scores two identical amino acids as 1, and anything else as -1.||N/A|
|PAM||(Dayhoff et al., 1978). Widely used since the late 1970s.||Available matrices range from 10-500, and are provided in increments of 10. Choose larger numbers for more divergent sequences.|
|VTML||Derived from PAM matrices (Dayhoff et al., 1978) by Müller T et al. (2002), .||Available matrices range from 10-500, and are provided in increments of 10.|
- BLOSUM, PAM, GONNET, IDENTITY, and MATCH are part of NCBI’s BLAST distribution. For more information, see NCBI’s matrix page.
- The PAM, GONNET and VTML numbers are based on the presumed millions of years of divergence.
- In BLOSUM, the matrix number is proportional to the presumed degree of similarity. Therefore, BLOSUM100 would be the preferred matrix for near-identical sequences.
- VTLM and GONNET are considered to be updated versions of PAM250.
- In BLOSUM, PAM, and GONNET, match/mismatch scores vary with the series number. Also exact matches vary with the particular amino acid. For example, BLOSUM30 scores W:W as 20 and S:S as 4. BLOSUM100 scores these as 17 and 9, respectively.
Need more help with this?