Integration of evolutionary features to identify the functionally important residues in Major Facilitator Superfamily (MFS) transporters.

Released on March, 2009


This project contains the application of Integration Score (IS) that allows users to narrow down potential candidates of functional residues. Based on the hypothesis that functional residues are conserved and have more co-evolutionary coupled partners than non-functional residues, we developed IS by combining sequence conservation and co-evolutionary information. The initial application for the code provided was to identify functional residues in Major Facilitator Superfamily (MFS) transporters, LacY, GlpT, and EmrD which have known 3D-structures and characterized some of the functional residues enough to validate the performance of our method. Using our method, we could successfully find that the conserved cores of evolutionary coupled residues are responsible for specific substrate recognition and translocation of MFS transporters. Here, we provide a downloadable source code for the wide applications of this method to find functionally important residues in other classes of proteins.


Source codes (Python) for LINUX and Windows is available for download here: [IS.tar.gz]
Source codes include programs for calculating IS ( and for installing other required programs (

Running the program

To run this program, Python (higher than 2.4) and Java application (Java runtime environment 1.5 or higher) should be installed.

To calculate IS for each residue, you should follow these guidelines.
1. First, install the tools for calculating co-evolution score1, 2 (McBasc) and sequence conservation score3 (rate4site) by typing:


By using "", you can install McBasc and rate4site easily. After installation, you can see the output file "config" and output folder, such as "covariance" and "rate4site".

2. Second, calculate the IS of each residue by typing:

python [argument1] [argument 2] [argument 3]

The argument1 is the name of your own MSA file and argument2 is protein query sequence name in MSA file. The argument3 is the name of output file.
For example, if name of MSA file is Test.aln, query sequence name is Test_1, and desired output name is Test.out, you type:

python Test.aln "Test_1" Test.out

Input file



That is, each homologue sequence should be one line. The first word of the line is the name of the sequence. The second word of the line is the amino acid sequences. '-' indicates gap.

Output file

The result is in the format given by:

RES (tab) POS (tab) IS (tab) CS (tab) CS_P (tab) CN (tab) CN_P

RES: the amino acid in the query sequence in one letter code.
POS: the residue number in the query sequence.
IS: integration score of given residue in query sequence.
CS: sequence conservation score in given residue of query sequence.
CS_P: percentile rank of sequence conservation score of given residue in query sequence.
CN: co-evolutionary coupling number in given residue of query sequence
CN_P: percentile rank of co-evolutionary coupling number of given residue in query sequence.


1. Fodor A. and Aldrich R., Influence of Conservation on Calculations of Amino Acid Covariance in Multiple Sequence Alignments, Proteins: Structure, Function and Genetics, Proteins. 2004 Aug 1;56(2):211-21.

2. Dekker J., Fodor A., Aldrich R. and Yellen G. A perturbation-based method for calculating explicit likelihood of evolutionary covariance in multiple sequence alignments. 2004 Jul 10;20(10):1565-72.

3. Mayrose, I., Graur, D., Ben-Tal, N., and Pupko, T. 2004. Comparison of site-specific rate-inference methods: Bayesian methods are superior. Mol Biol Evol 21: 1781-1791.