What is SEQATOMs?

Sometimes, not all regions of a protein structures have known coordinates. Thus, the protein structure files (PDB) may not always contain information on the position of all atoms in space. This can occur due to technical problems, but also because a certain region does not have a fixed three-dimensional structure. These regions are called "disordered regions".

SEQATOMs was constructed to visualize all "missing" protein regions in PDB. In order to visualize these regions in their sequence context, we constructed a BLAST interface that is able to show the lower-case masked regions in the output. Since BLAST converts all letters to upper-case and does not search case-sensitively, we parse the BLAST results and replace hits in the output with the corresponding subsequence from the original FASTA files.

We provide (lower-case) masked versions of PDB, the PDB-derived CATH database and, for completeness, the DisProt database as well as the PDB SEQRES database. In case of DisProt, we masked regions that are annotated as disorded in the DisProt database. We would like to remind the user that regions missing in the three-dimensional structure are not necessarily disordered. Therefore, we refer to these regions as "missing".

Please cite: Brandt, B.W., Heringa, J. and Leunissen, J.A.M. (2008). SEQATOMS: a web tool for identifying missing regions in PDB in sequence context. Nucleic Acids Reseach 36:W255-W259.

Access to the information

SEQATOMs has been primarily set-up as a case-sensitive BLAST service. The NCBI BLAST output is parsed and a NCBI-like HTML output is produced. All missing regions in the hit sequences are printed in lower-case. In addition, the missing regions are shown in an alignment graphic. Thus, from this graphic an overlap in missing regions between sequences is easily seen.

Keyword search

On the Search page, you can search on keywords and identifiers in the available databases. Please note, you can only search for keywords in the FASTA headers (descriptions). PDB SEQATOMs is the main file and is annotated most extensively. See example entries under Constuction.


For users especially interested in intrinsic protein disorder, we have included DisProt, a curated database of protein disorder is available to BLAST and search against. For extensive access to this data visit the DisProt home page. On the Disorder page, we provide links to several disorder predictors.


For more details on automated access to SEQATOMS, have a look at the information on Services. A Perl script and URL-API examples are available there.

Construction of SEQATOMs

SEQATOMs was constructed by masking the "missing" residues in lower-case letters. When the entries are retrieved via links on this site, the missing regions are coloured as illustrated below. When you mouse-over the regions, the coordinates appear. Missing residues in the different databases are found as described below.


PDB files in mmCIF format contain an alignment of residues in the sequence (or SEQRES) and in the coordinate section (ATOM). These alignments were extracted and the missing residues (indicated by "?" in mmCIF) were changed to lower-case letters. The sequence description contains the entry name and PDB title. This databases was made non-redundant (case-sensitively).

>pdbsa|1JYF_A TRANSCRIPTION Structure Of The Dimeric Lac Repressor With An 11-Residue C-Terminal Deletion.


The CATH sequences in CATH COMBS and ATOM were aligned. Missing residues were changed to lower-case. We added the CATH classification data (Class; Architecture; Topology; Homolology) to the sequence description. This databases was made non-redundant (case-sensitively).
Note: the words "class", "arch", "topol" and "homol" are not indexed for searching (these occur in all entries).

>cath|1gmeA00<soh>cath|1gmeC00 Class: Mainly Beta; Arch: Sandwich; Topol: Immunoglobulin-like; Homol: Immunoglobulin-like


For completeness, DisProt, a curated database of protein disorder, is provided. The disordered regions were changed to lower-case letters. We added the protein name, synomyms and organism name to the FASTA headers.

>DisProt|DP00001|sp|Q9HFQ6 60S acidic ribosomal protein P1-B; Candida albicans #1-108


This is the original PDB SEQRES sequence file, which is used in sequence similarity searching generally (e.g. BLAST at NCBI). Here, it is included for completeness. This databases was made non-redundant.

>pdb|102l_A mol:protein length:165  T4 LYSOZYME
