Marco Pellegrini applies computer techniques in the field of biology, to study how patients’ genes are made and how they work
Marco Pellegrini, Research Director at the Algorithms and Computational Mathematics research unit of the Institute of Informatics and Telematics (IIT-CNR), studied electronic and computer engineering. In his career, he had always been involved in computer science in the classic sense of the term.
“I was studying algorithms and complexity. In the early 2000s, I was doing information retrieval on web pages and videos,” he recalls. Simplifying as much as possible, those who work in information retrieval deal with the management of information (for example, textual information) in order to help the user to find the information they is looking for.
“At a certain point, however, I realized that those same techniques could also be applied in other areas”, says Pellegrini. In particular, on the occasion of a conference in Brussels, Pellegrini realized that some problems of bioinformatics were very similar to those of the web. “In every cell of our body there are about 25,000 genes expressed, an avalanche of data. Some of these genes have characteristics in common. For example, they contribute to the same function”. Computer scientists speak of clustering to describe the grouping of data that have some similarity. And that is exactly what biologists usually do when, in order to study how genes work, they put together those that do the same thing.
In the field of bioinformatics, Pellegrini collaborates with the CNR Institute of Clinical Physiology. “We start from real patient data to improve therapies to fight prostate cancer”.
The research focuses on a drug widely used in this type of cancer, which however has a limit. After some time it ceases to work. Doctors speak of “resistance” to the drug to indicate this reduction in the effectiveness of a treatment. “The question we are trying to answer is: can you understand in advance if you are about to develop resistance, in order to suspend taking the drug in time?”, explains Pellegrini.
We are in the field of transcriptomics, the discipline that studies the behavior of genes. The experiments involved taking biological material (vials of blood), which is then inserted into machines that perform sequencing. In practice, the machine returns a series of tables describing how the genes are working. By analyzing this data, computer scientists are able to understand which genes are involved in resistance, distinguishing them from the others. “We make a prediction, then it’s up to the biologists to validate the results through clinical studies. But very often we get it right”, he smiles.
To reach concrete results, one must arm oneself with patience, because validation is a long process, made up of continuous exchanges between researchers. This is without losing sight of the awareness that, against certain diseases, every result is essential, even the one that responds to a case record with very low percentages.
Then there is the theme of error. In the analysis techniques developed by the research group, statistics play a leading role. “When we limited ourselves to studying the web, the measurement errors were practically negligible. If you find the word ball in the text, it is very likely that you are talking about football there”, explains Pellegrini. “In the human body, however, things are different. There is a lot of noise that cannot be ignored”.
What is the gene like?
Under the research lens of Pellegrini and colleagues there is not only the functioning of genes. “In transcriptomics, one can see the functioning of genes, like observing an engine. But the dynamics are linked to genomics, which instead describes how genes are made”.
In practice, we take a step back, trying to understand if the gene is there or not, or if there are variations that do not enable it to function properly. For example, in the case of tumors, important variations indicate that that certain genetic function has been altered. “And so that’s where the drug will have to go and act,” explains Pellegrini.
This fascinating challenge is in some ways more difficult. There are 3 billion bases in genomics, compared to 25,000 genes in transcriptomics. “There is a data management problem. In the midst of so much information, finding what I’m looking for is like looking for a needle in a haystack”, he explains.
Many previous and ongoing projects have made it possible to understand more about the mechanisms of very different diseases, such as ALS or Parkinson’s disease.