IIT Home Page CNR Home Page

TRStalker: an Efficient Heuristic for Finding Fuzzy Tandem Repeats

Motivation: Genomes in  higher eucaryotic organisms contain a substantial amount of (almost) repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via  phenomena such as replication slippage and  are characterized by close spatial contiguity. They play an important role in several molecular regulatory mechanisms, and also in several diseases (e.g. in the group of trinucleotide repeat disorders). While for tandem repeats with a low or medium level of divergence the current methods are rather effective, the problem of detecting TRs with higher divergence (fuzzy TRs) is still open. The detection of fuzzy TRs is propaedeutic to enriching our view of their role in regulatory mechanisms and diseases.
Fuzzy TRs are also important as tools to shed light  on the evolutionary history  of the genome, where  higher divergence correlates with more remote duplication events.

We have developed an algorithm (christened TRStalker) with the aim of detecting efficiently Tandem Repeats that are hard to detect because of their inherent fuzziness, due to high levels of base substitutions, insertions and deletions.
To attain this goal we developed heuristics to solve a Steiner version of the problem for which the fuzziness is measured with respect to a motif string not necessarily present in the input string. This problem akin to the "generalized median string" that is know to be an NP-hard problem. Experiments with both synthetic and biological sequences demonstrate that our method performs better than current state of the  art for fuzzy TRs and that the fuzzy TRs of the type we detect are indeed present in important biological sequences.

Availability: TRStalker will be integrated in the Web-based Tandem Repeats Discovery Service (TReaDS) at bioalgo.iit.cnr.it.

Bioinformatics, 2010

IIT authors:

Type: Article in ISI Journal
Field of reference: Computer Science & Engineering
Da pagina i358 a pagina i366

Activity: Biologia computazionale