Treffer: Supervised learning-aided optimization of expert-driven functional protein sequence annotation
InterPro and SwissProt data retrieval and encoding EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
Calculations and programming EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Computer science; theoretical automation; systems
Generalities in biological sciences
Weitere Informationen
The aim of this work is to use a supervised learning approach to identify sets of motif-based sequence characteristics, combinations of which can give the most accurate annotation of new proteins. We assess several of Inter-Pro Consortium member databases for their informativeness for the annotation of full-length protein sequences. Thus, our study addresses the problem of integrating biological information from various resources. Decision-rule algorithms are used to cross-map different biological classification systems in order to optimise the process of functional annotation of protein sequences. Various features (e.g., keywords, GO terms, structural complex names) may be assigned to a sequence via its characteristics (e.g., motifs built by various protein sequence analysis methods) with the developed approach. We chose SwissProt keywords as the set of features on which to perform our analysis. From the presented results one can quickly obtain the best combinations of methods appropriate for the description of a given class of proteins.