Automated enzyme classification by formal concept analysis

François Coste, Gaëlle Garet, Agnès Groisillier, Jacques Nicolas, Thierry Tonon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Enzymes are macro-molecules (linear sequences of linked molecules) with a catalytic activity that make them essential for any biochemical reaction. High throughput genomic techniques give access to the sequence of new enzymes found in living organisms. Guessing the enzyme's functional activity from its sequence is a crucial task that can be approached by comparing the new sequences with those of already known enzymes labeled by a family class. This task is difficult because the activity is based on a combination of small sequence patterns and sequences greatly evolved over time. This paper presents a classifier based on the identification of common subsequence blocks between known and new enzymes and the search of formal concepts built on the cross product of blocks and sequences for each class. Since new enzyme families may emerge, it is important to propose a first classification of enzymes that cannot be assigned to a known family. FCA offers a nice framework to set the task as an optimization problem on the set of concepts. The classifier has been tested with success on a particular set of enzymes present in a large variety of species, the haloacid dehalogenase superfamily.

Original languageEnglish
Title of host publicationFormal Concept Analysis - 12th International Conference, ICFCA 2014, Proceedings
PublisherSpringer
Pages235-250
Number of pages16
ISBN (Print)9783319072470
DOIs
Publication statusPublished - 1 Jan 2014
Event12th International Conference on Formal Concept Analysis, ICFCA 2014 - Cluj-Napoca, Romania
Duration: 10 Jun 201413 Jun 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8478 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference12th International Conference on Formal Concept Analysis, ICFCA 2014
Country/TerritoryRomania
CityCluj-Napoca
Period10/06/1413/06/14

Keywords

  • bioinformatics
  • FCA application
  • protein classification

Cite this