SEQATOMsHome | BLAST | Search | Disorder | Services | E-mail |
What is SEQATOMs? |
Sometimes, not all regions of a protein structures have known coordinates.
Thus, the protein structure files (PDB) may not always contain information on the position of all atoms in space.
This can occur due to technical problems, but also because
a certain region does not have a fixed three-dimensional structure. These regions are called "disordered regions".
Please cite: Brandt, B.W., Heringa, J. and Leunissen, J.A.M. (2008). SEQATOMS: a web tool for identifying missing regions in PDB in sequence context. Nucleic Acids Reseach 36:W255-W259. |
Access to the information |
You have several choices to access the information in SEQATOMs: BLASTSEQATOMs has been primarily set-up as a case-sensitive BLAST service. The NCBI BLAST output is parsed and a NCBI-like HTML output is produced. All missing regions in the hit sequences are printed in lower-case. In addition, the missing regions are shown in an alignment graphic. Thus, from this graphic an overlap in missing regions between sequences is easily seen. Keyword searchOn the Search page, you can search on keywords and identifiers in the available databases. Please note, you can only search for keywords in the FASTA headers (descriptions). PDB SEQATOMs is the main file and is annotated most extensively. See example entries under Constuction. DisorderFor users especially interested in intrinsic protein disorder, we have included DisProt, a curated database of protein disorder is available to BLAST and search against. For extensive access to this data visit the DisProt home page. On the Disorder page, we provide links to several disorder predictors. ServicesFor more details on automated access to SEQATOMS, have a look at the information on Services. A Perl script and URL-API examples are available there. |
Construction of SEQATOMs |
SEQATOMs was constructed by masking the "missing" residues in lower-case letters. When the entries are retrieved via links on this site, the missing regions are coloured as illustrated below. When you mouse-over the regions, the coordinates appear. Missing residues in the different databases are found as described below. PDB SEQATOMs
PDB files in mmCIF format contain an alignment of residues
in the sequence (or SEQRES) and in the coordinate section (ATOM).
These alignments were extracted and the missing residues
(indicated by "?" in mmCIF) were changed to lower-case letters.
The sequence description contains the entry name and PDB title.
This databases was made non-redundant (case-sensitively). >pdbsa|1JYF_A TRANSCRIPTION Structure Of The Dimeric Lac Repressor With An 11-Residue C-Terminal Deletion. mkpvtlydvaeyagvsyqtvsrvvnqashvsaktrekveaamaelnyipnrvaqqlagkq sLLIGVATSSLALHAPSQIVAAIKSRADQLGASVVVSMVERSGVEACKTAVHNLLAQRVS GLIINYPLDDQDAIAVEAACTNVPALFLDVSDQTPINSIIFSHEDGTRLGVEHLVALGHQ QIALLAGPLSSVSARLRLAGWHKYLTRNQIQPIAEREGDWSAMSGFQQTMQMLNEGIVPT AMLVANDQMALGAMRAITESGLRVGADISVVGYDDTEDSSCYIPPLTTIKQDFRLLGQTS VDRLLQLSQGQAVKGNQLLPVSLVKRKTTLAPNtqtaspraladslmql CATH
The CATH sequences in CATH COMBS and ATOM were aligned.
Missing residues were changed to lower-case.
We added the CATH classification data (Class; Architecture; Topology; Homolology) to the sequence description.
This databases was made non-redundant (case-sensitively).
>cath|1gmeA00<soh>cath|1gmeC00 Class: Mainly Beta; Arch: Sandwich; Topol: Immunoglobulin-like; Homol: Immunoglobulin-like
mSIVRRSNVFDPFADLWADPFDTFRSIVPAISGGGSETAAFANARMDWKETPEAHVFKAD
LPGVKKEEVKVEVEDGNVLVVSGERTKEKEDKNDKWHRVERSSGKFVRRFRLLEDAKVEE
VKAGLENGVLTVTVPKAEVKKPEVKAIQISG
DisProt
For completeness, DisProt,
a curated database of protein disorder, is provided.
The disordered regions were changed to lower-case letters.
We added the protein name, synomyms and organism name to the FASTA headers.
>DisProt|DP00001|sp|Q9HFQ6 60S acidic ribosomal protein P1-B; Candida albicans #1-108
msteasvsyaaliladaeqeitsekllaitkaaganvdqvwadvfakavegknlkellfs
faaaapasgaaagsasgaaaggeaaaeeaaeeeaaeesdddmgfglfd
PDB SEQRES
This is the original PDB SEQRES sequence file,
which is used in sequence similarity searching generally (e.g. BLAST at NCBI).
Here, it is included for completeness. This databases was made non-redundant. >pdb|102l_A mol:protein length:165 T4 LYSOZYME MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAAKSELDKAIGRNTNGVIT KDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLR MLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL |
Version 1.0 SEQATOMs; Last Modified 25 Oct, 2020 by BB |