A Database and Evaluation for Classification of RNA Molecules Using Graph Methods

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

In this paper, we introduce a new graph dataset based on
the representation of RNA. The RNA dataset includes 3178 RNA chains
which are labelled in 8 classes according to their reported biological functions.
The goal of this database is to provide a platform for investigating
the classication of RNA using graph-based methods. The molecules are
represented by graphs representing the sequence and base-pairs of the
RNA, with a number of labelling schemes using base labels and local
shape. We report the results of a number of state-of-the-art graph based
methods on this dataset as a baseline comparison and investigate how
these methods can be used to categorise RNA molecules on their type and
functions. The methods applied are Weisfeiler Lehman and optimal assignment
kernels, shortest paths kernel and the all paths and cycle methods.
We also compare to the standard Needleman-Wunsch algorithm used
in bioinformatics for DNA and RNA comparison, and demonstrate the
superiority of graph kernels even on a string representation. The highest
classication rate is obtained by the WL-OA algorithm using base labels
and base-pair connections.
Original languageEnglish
Title of host publicationGraph-Based Representations in Pattern Recognition - 12th IAPR-TC-15 International Workshop, GbRPR 2019, Tours, France, June 19-21, 2019, Proceedings
PublisherSpringer
Pages78-87
Number of pages10
Publication statusPublished - 13 Jun 2019

Cite this