A collection of sequence alignments and profiles representing protein domains conserved in molecular evolution. It also includes alignments of the domains to known 3-dimensional protein structures in the MMDB database.

A gene homology tool that compares nucleotide sequences between pairs of organisms in order to identify putative orthologs. Curated orthologs are incorporated from a variety of sources via the Gene database.

A collection of related protein sequences (clusters), consisting of Reference Sequence proteins encoded by complete prokaryotic and organelle plasmids and genomes. The database provides easy access to annotation information, publications, domains, structures, external links, and analysis tools.


BLAST executables for local use are provided for Solaris, LINUX, Windows, and MacOSX systems. See the README file in the ftp directory for more information. Pre-formatted databases for BLAST nucleotide, protein, and translated searches also are available for downloading under the db subdirectory.

FTP: HomoloGene

This site contains data for each build of HomoloGene, beginning with build 35. Complete data for each build are provided in XML, and a data summary is provided in tab-delimited text format.

FTP: Protein Clusters

This site contains data from the Protein Clusters database arranged by release date. See the README files for more information.


A link option on protein records that displays the results of a pre-computed BLAST search of that protein against all other protein sequences at NCBI.

Finds regions of local similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as to help identify members of gene families.


COBALT is a protein multiple sequence alignment tool that finds a collection of pairwise constraints derived from conserved domain database, protein motif database, and sequence similarity, using RPS-BLAST, BLASTP, and PHI-BLAST.

Concise Microbial Protein BLAST

A specialized BLAST service in which the queried database consists of all proteins from complete microbial (prokaryotic) genomes. NCBI has precalculated clusters of similar proteins at the genus-level and one representative is chosen from each cluster in order to reduce the dataset, thereby reducing search time and providing a broader taxonomic view.

Identifies the conserved domains present in a protein sequence. CD-Search uses RPS-BLAST (Reverse Position-Specific BLAST) to compare a query sequence against position-specific score matrices that have been prepared from conserved domain alignments present in the Conserved Domain Database (CDD).

Genome BLAST

This tool compares nucleotide or protein sequences to genomic sequence databases and calculates the statistical significance of matches using the Basic Local Alignment Search Tool (BLAST) algorithm.

Genome ProtMap maps each protein from a COG, or in the case of viruses a VOG, back to its genome, and displays all the genomic segments coding for members of this particular group of related proteins. The view can be shifted to focus on an adjacent COG/VOG, and clusters can be searched by name, protein gi, or gene locus tag.


A tool for comparing genomes on the basis of the protein sequences they encode. To use TaxPlot, one selects a reference genome and two species for comparison. Pre-computed BLAST results are then used to plot a point for each predicted protein in the reference genome, based on the best alignment with proteins in each of the two genomes being compared.