Information

Why does sequence of amino acids presented on the Catalytic Site Atlas from a given protein differs from the sequence on the RSCB Protein Data Bank

Why does sequence of amino acids presented on the Catalytic Site Atlas from a given protein differs from the sequence on the RSCB Protein Data Bank


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I wanted to compare the amino acid sequence of enzymes for this project I'm working on and need to compare them at their catalytic site. For that, I went to the Catalitic Site Atlas to get the information on the catalytic site, but since they don't offer an easy way for me to download the structure data programmatically I downloaded it from the RSCB PDB by downloading the fasta sequence. When checking for the catalytic sites it wasn't matching what the CSA was telling me and that's when I realized that they are different files. Take for example the 3nos, the CSA presents the following sequence:

MGNLKS…

While the PDB presents the following sequence:

PKFPRV…

Why aren't they the same sequence if it's the same protein?

Sorry if it's a noob question, I'm not a biologist, just a computer scientist who happens to like bioinformatics.

Important info:

The CSA data comes from here while the PDB data comes from here


Crystallography results (pdb files) almost always contain a truncated sequence.

Both ends of a protein are often flexible (even in a crystal) and don't result in enough data for a good fit. The corresponding residues are removed from the model and the sequence, and you're left with only the residues that show a defined electron density.


One sequence is partly contained in the other (highlighted).

So the CSA sequence is (FASTA format, truncated):

>sp|P29474|NOS3_HUMAN Nitric oxide synthase, endothelial OS=Homo sapiens GN=NOS3 PE=1 SV=3
MGNLKSVAQEPGPPCGLGLGLGLGLCGKQGPATPAPEPSRAPASLLPPAPEHSPPSSPLT QPPEGPKFPRVKNWEVGSITYDTLSAQAQQDGPCTPRRCLGSLVFPRKLQGRPSPGPPAP EQLLSQARDFINQYYSSIKRSGSQAHEQRLQEVEAEVAATGTYQLRESELVFGAKQAWRN

taken from http://www.uniprot.org/uniprot/P29474 for convenience.

While the PDB one is :

>3NOS:A|PDBID|CHAIN|SEQUENCE PKFPRVKNWEVGSITYDTLSAQAQQDGPCTPRRCLGSLVFPRKLQGRPSPGPPAPEQLLSQARDFINQYYSSIKRSGSQA HEQRLQEVEAEVAATGTYQLRESELVFGAKQAWRNAPRCVGRIQWGKLQVFDARDCRSAQEMFTYICNHIKYATNRGNLR SAITVFPQRCPGRGDFRIWNSQLVRYAGYRQQDGSVRGDPANVEITELCIQHGWTPGNGRFDVLPLLLQAPDEPPELFLL…

The Uniprot entry mentions 3 different isoforms due to alternative splicing, so perhaps that is what is going on here. Here is the output from a sequence alignment (using https://www.ebi.ac.uk/Tools/psa/emboss_matcher/):

#======================================= # # Aligned_sequences: 2 # 1: NOS3_HUMAN # 2: SEQUENCE # Matrix: EBLOSUM62 # Gap_penalty: 14 # Extend_penalty: 4 # # Length: 240 # Identity: 240/240 (100.0%) # Similarity: 240/240 (100.0%) # Gaps: 0/240 ( 0.0%) # Score: 1294 # # #======================================= NOS3_HUMAN 66 PKFPRVKNWEVGSITYDTLSAQAQQDGPCTPRRCLGSLVFPRKLQGRPSP 115 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 1 PKFPRVKNWEVGSITYDTLSAQAQQDGPCTPRRCLGSLVFPRKLQGRPSP 50 NOS3_HUMAN 116 GPPAPEQLLSQARDFINQYYSSIKRSGSQAHEQRLQEVEAEVAATGTYQL 165 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 51 GPPAPEQLLSQARDFINQYYSSIKRSGSQAHEQRLQEVEAEVAATGTYQL 100 NOS3_HUMAN 166 RESELVFGAKQAWRNAPRCVGRIQWGKLQVFDARDCRSAQEMFTYICNHI 215 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 101 RESELVFGAKQAWRNAPRCVGRIQWGKLQVFDARDCRSAQEMFTYICNHI 150 NOS3_HUMAN 216 KYATNRGNLRSAITVFPQRCPGRGDFRIWNSQLVRYAGYRQQDGSVRGDP 265 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 151 KYATNRGNLRSAITVFPQRCPGRGDFRIWNSQLVRYAGYRQQDGSVRGDP 200 NOS3_HUMAN 266 ANVEITELCIQHGWTPGNGRFDVLPLLLQAPDEPPELFLL 305 |||||||||||||||||||||||||||||||||||||||| SEQUENCE 201 ANVEITELCIQHGWTPGNGRFDVLPLLLQAPDEPPELFLL 240

This answer is correct, I just wanted to add that the correct sequence numbering is preserved in the PDB file in the DBREF record (which you can see by opening the PDB in a text editor):

DBREF 3NOS A 66 492 UNP P29474 NOS3_HUMAN 66 492

In plain English, the sequence presented in this file (3NOSchainA) corresponds to residues66-492of the associated UniProt (UNP) entry (accession:P29474).


Watch the video: How to download protein structure tutorial. Bioinformatics. RCSB PDB (September 2022).


Comments:

  1. Kirwyn

    Surely. I join all of the above.

  2. Al-Fadee

    I can't take part in the discussion right now - I'm very busy. But soon I will definitely write what I think.

  3. Yoshura

    I think this - the wrong way.



Write a message