This project contains the application of Integration Score (IS) that allows users to narrow down potential candidates of functional residues. Based on the hypothesis that functional residues are conserved and have more co-evolutionary coupled partners than non-functional residues, we developed IS by combining sequence conservation and co-evolutionary information. The initial application for the code provided was to identify functional residues in Major Facilitator Superfamily (MFS) transporters, LacY, GlpT, and EmrD which have known 3D-structures and characterized some of the functional residues enough to validate the performance of our method. Using our method, we could successfully find that the conserved cores of evolutionary coupled residues are responsible for specific substrate recognition and translocation of MFS transporters. Here, we provide a downloadable source code for the wide applications of this method to find functionally important residues in other classes of proteins.
Source codes (Python) for LINUX and Windows is available for download here: [IS.tar.gz]
Source codes include programs for calculating IS (IS.py) and for installing other required programs (IS_install.py).
To calculate IS for each residue, you should follow these guidelines.
1. First, install the tools for calculating co-evolution score1, 2 (McBasc) and sequence conservation score3 (rate4site) by typing:
By using "IS_install.py", you can install McBasc and rate4site easily. After installation, you can see the output file "config" and output folder, such as "covariance" and "rate4site".
2. Second, calculate the IS of each residue by typing:
python IS.py [argument1] [argument 2] [argument 3]
The argument1 is the name of your own MSA file and argument2 is protein query sequence name in MSA file. The argument3 is the name of output file.
For example, if name of MSA file is Test.aln, query sequence name is Test_1, and desired output name is Test.out, you type:
python IS.py Test.aln "Test_1" Test.out
That is, each homologue sequence should be one line. The first word of the line is the name of the sequence. The second word of the line is the amino acid sequences. '-' indicates gap.
RES (tab) POS (tab) IS (tab) CS (tab) CS_P (tab) CN (tab) CN_P
RES: the amino acid in the query sequence in one letter code.
POS: the residue number in the query sequence.
IS: integration score of given residue in query sequence.
CS: sequence conservation score in given residue of query sequence.
CS_P: percentile rank of sequence conservation score of given residue in query sequence.
CN: co-evolutionary coupling number in given residue of query sequence
CN_P: percentile rank of co-evolutionary coupling number of given residue in query sequence.