BLAST Parameter


A key element in evaluating the quality of a pairwise sequence alignment is the "substitution matrix", which assigns a score for aligning any possible pair of residues. In general, different substitution matrices are tailored to detecting similarities among sequences that are diverged by differing degrees. A single matrix may nevertheless be reasonably efficient over a relatively broad range of evolutionary change. Experimentation has shown that the BLOSUM-62 matrix is among the best for detecting most weak protein similarities. For particularly long and weak alignments, the BLOSUM-45 matrix may prove superior. A detailed statistical theory for gapped alignments has not been developed, and the best gap costs to use with a given substitution matrix are determined empirically. Short alignments need to be relatively strong (i.e. have a higher percentage of matching residues) to rise above background noise. Such short but strong alignments are more easily detected using a matrix with a higher "relative entropy" than that of BLOSUM-62. In particular, short query sequences can only produce short alignments, and therefore database searches with short queries should use an appropriately tailored matrix. The BLOSUM series does not include any matrices with relative entropies suitable for the shortest queries, so the older PAM matrices may be used instead. No alternate scoring matrices are available for BLASTN; specifying the MATRIX directive in BLASTN requests returns an error response. For proteins, a provisional table of recommended substitution matrices and gap costs for various query lengths is:

Query lengthSubstitution matrixGap costs
< 35PAM-309, 1
35-50PAM-7010, 1
50-85BLOSUM-8010, 1
> 85BLOSUM-6211, 1