IIT Home Page CNR Home Page

An Efficient Combinatorial Approach for Solving the DNA Motif Finding Problem

The detection of an over-represented sub-sequence in a set of (carefully chosen) DNA sequences is often the main clue leading to the investigation of a possible functional role for such a subsequence. Over-represented substrings (with possibly local mutations) in a biological string are termed motifs. A typical functional unit that can be modeled by a motif is a Transcription Factor Binding Site (TFBS), a portion of the DNA sequence apt to the binding of a protein that participates in complex transcriptomic biochemical reactions.
In the literature it has been proposed a simplified combinatorial problem called the planted (l-d)-motif problem (known also as the (l-d) Challenge Problem) that captures the essential combinatorial nature of the motif finding problem. In this paper we propose a novel graph-based algorithm for solving a refinement of the (l-d) Challenge Problem. Experimental results show that instances of the (l-d) Challenge Problem considered difficult for competing state of the art methods in literature can be solved efficiently in our framework.

International Conference on Intelligent Systems Design and Applications (ISDA), Pisa, Italy, 2009

IIT authors:

Type: Article in proceedings of international peer-reviewed conference
Field of reference: Information Technology and Communication Systems

Activity: Biologia computazionale