Edoardo Sarti (INRIA)
Title: Understanding the function of paralogous protein sequences
One of the main ways organisms evolve new functional proteins is via duplication events in their genome. When two copies of the same gene are present, either the organism benefits of a larger concentration of the expressed protein, or the sequence of one of the two copies will accumulate mutations and diverge in evolution, often developing new functions. Annotating the function of such paralogous sequences has always been very challenging both in small-scale, expert-guided assays and in large-scale bioinformatics studies, where paralogs are the most important source of functional annotation errors. ProfileView is a recent computational method designed to functionally classify sets of protein sequences. It constructs a library of probabilistic models accurately representing the functional variability of protein families, and extracts biologically interpretable information from the classification process. We have used it in order to classify the paralogs of the 11 proteins participating in the Calvin-Benson cycle (CBC), and obtained fully consistent results on 8 of them, and partially consistent results on other 2. The knowledge about paralog function annotation in the CBC is being now employed for matching same-function paralog sequences for producing joint MSAs for protein-protein interaction studies.