IIT Home Page CNR Home Page

Theoretical and Practical Analyses in Metagenomic Sequence Classification

Metagenomics is the study of genomic sequences in a heterogeneous microbial sample taken, e.g. from the soil, water, and human microbiome. One of the primary objectives of metagenomic studies is to assign a taxonomic identity to each read sequenced from a sample and then to estimate the abundance of the known clades. With ever-increasing metagenomic datasets obtained from high-throughput sequencing technologies readily available nowadays, several fast and accurate methods have been developed that can work with reasonable computing requirements. Here we provide an overview of the state-of-the-art methods for the classification of metagenomic sequences, especially highlighting theoretical factors that seem to correlate well with practical factors, and could therefore be useful in the choice or development of a new method in experimental contexts. In particular, we emphasize that the information derived from the known genomes and eventually used in the learning and classification processes may create several experimental issues —mostly based on the amount of information used in the processes and its uniqueness, significance, and redundancy,— and some of these issues are intrinsic both in current alignment-based approaches and in compositional ones. This entails the need to develop efficient alignment-free methods that overcome such problems by combining the learning and classification processes in a single framework.

30th International Conference on Database and Expert Systems Applications (DEXA 2019), International Workshop on Biological Knowledge Discovery from Big Data (BIOKDD), Linz, Austria, 2019

Autori esterni: Hend Amraoui (Department of Information Engineering, University of Pisa, Pisa, Italy; University of Tunis El Manar, Tunis, Tunisia; Laboratory of Technologies of Information and Communication and Electrical Engineering (LaTICE), National Higher School of Engineers of Tunis (ENSIT), University of Tunis, Tunis, Tunisia), Mourad Elloumi (Laboratory of Technologies of Information and Communication and Electrical Engineering (LaTICE), National Higher School of Engineers of Tunis (ENSIT), University of Tunis, Tunis, Tunisia), Francesco Marcelloni (Department of Information Engineering, University of Pisa, Pisa, Italy), Faouzi Mhamdi (Laboratory of Technologies of Information and Communication and Electrical Engineering (LaTICE), National Higher School of Engineers of Tunis (ENSIT), University of Tunis, Tunis, Tunisia)
Autori IIT:

Davide Verzotto

Foto di Davide Verzotto

Tipo: Contributo in atti di convegno
Area di disciplina: Computer Science & Engineering

File: amraoui_dexa_biokdd_2019_488022_1_5.pdf

Attività: Biologia computazionale