A substitution matrix describes the rate at which a nucleotide or amino acid changes to another nucleotide or amino acid over time. When performing a pairwise alignment, you can specify the desired substitution matrix in the (Pairwise) Alignment Options dialog.

Available matrices for nucleotide sequences:

Matrix Description
NUC44 DNASTAR’s modified version of NCBI’s NUC.4.4 algorithm, the modification being that U is treated as a synonym of T. In NUC44, exact matches, and T:U matches score as 5, and mismatches between unambiguous bases [G A T C U] score as -4. Matches between bases and ambiguous symbols [S W R Y K M B V H D N] have intermediate scores. A base versus a 2-way ambiguous category [R Y W S K M] to which it belongs scores as +1, and a mismatch to a 2-way group to which it doesn’t belong scores as -4.

Example: C is in [S R M] but not in [W Y K] . The 3-way groupings are [B V H D] where C is in all but D (which means not C). Therefore, C vs [B V H] scores as -1 while C vs [D] scores as -4.


Available matrices for protein sequences:

Matrix Description Secondary option
BLOSUM (Henikoff & Henikoff, 1992). These matrices are ideal for carrying out similarity searches. Available matrices range from 30-100, and are provided in increments of 5 and 62. Choose larger numbers for less divergent sequences.
GONNET Derived from PAM matrices (Dayhoff et al., 1978) but more sensitive, and based on a much larger data set. (Unchangeable default of 250)
IDENTITY Scores two identical amino acids as 1, and anything else as -10,000. N/A
MATCH Scores two identical amino acids as 1, and anything else as -1. N/A
PAM (Dayhoff et al., 1978). Widely used since the late 1970s. Available matrices range from 10-500, and are provided in increments of 10. Choose larger numbers for more divergent sequences.
VTML Derived from PAM matrices (Dayhoff et al., 1978) by Müller T et al. (2002), . Available matrices range from 10-500, and are provided in increments of 10.
  • BLOSUM, PAM, GONNET, IDENTITY, and MATCH are part of NCBI’s BLAST distribution. For more information, see NCBI’s matrix page.
  • The PAM, GONNET and VTML numbers are based on the presumed millions of years of divergence.
  • In BLOSUM, the matrix number is proportional to the presumed degree of similarity. Therefore, BLOSUM100 would be the preferred matrix for near-identical sequences.
  • VTLM and GONNET are considered to be updated versions of PAM250.
  • In BLOSUM, PAM, and GONNET, match/mismatch scores vary with the series number. Also exact matches vary with the particular amino acid. For example, BLOSUM30 scores W:W as 20 and S:S as 4. BLOSUM100 scores these as 17 and 9, respectively.

Need more help with this?
Contact DNASTAR

Thanks for your feedback.