Búsqueda | Portal Regional de la BVS

1.

Label-free cell classification in holographic flow cytometry through an unbiased learning strategy.

Ciaparrone, Gioele; Pirone, Daniele; Fiore, Pierpaolo; Xin, Lu; Xiao, Wen; Li, Xiaoping; Bardozzo, Francesco; Bianco, Vittorio; Miccio, Lisa; Pan, Feng; Memmolo, Pasquale; Tagliaferri, Roberto; Ferraro, Pietro.

Lab Chip ; 24(4): 924-932, 2024 02 13.

Artículo en Inglés | MEDLINE | ID: mdl-38264771

RESUMEN

Nowadays, label-free imaging flow cytometry at the single-cell level is considered the stepforward lab-on-a-chip technology to address challenges in clinical diagnostics, biology, life sciences and healthcare. In this framework, digital holography in microscopy promises to be a powerful imaging modality thanks to its multi-refocusing and label-free quantitative phase imaging capabilities, along with the encoding of the highest information content within the imaged samples. Moreover, the recent achievements of new data analysis tools for cell classification based on deep/machine learning, combined with holographic imaging, are urging these systems toward the effective implementation of point of care devices. However, the generalization capabilities of learning-based models may be limited from biases caused by data obtained from other holographic imaging settings and/or different processing approaches. In this paper, we propose a combination of a Mask R-CNN to detect the cells, a convolutional auto-encoder, used to the image feature extraction and operating on unlabelled data, thus overcoming the bias due to data coming from different experimental settings, and a feedforward neural network for single cell classification, that operates on the above extracted features. We demonstrate the proposed approach in the challenging classification task related to the identification of drug-resistant endometrial cancer cells.

Asunto(s)

Algoritmos , Holografía , Citometría de Flujo , Procesamiento de Imagen Asistido por Computador/métodos , Microscopía , Holografía/métodos

2.

An automated pipeline integrating AlphaFold 2 and MODELLER for protein structure prediction.

Gil Zuluaga, Fabio Hernan; D'Arminio, Nancy; Bardozzo, Francesco; Tagliaferri, Roberto; Marabotti, Anna.

Comput Struct Biotechnol J ; 21: 5620-5629, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-38047234

RESUMEN

The ability to predict a protein's three-dimensional conformation represents a crucial starting point for investigating evolutionary connections with other members of the corresponding protein family, examining interactions with other proteins, and potentially utilizing this knowledge for the purpose of rational drug design. In this work, we evaluated the feasibility of improving AlphaFold2's three-dimensional protein predictions by developing a novel pipeline (AlphaMod) that incorporates AlphaFold2 with MODELLER, a template-based modeling program. Additionally, our tool can drive a comprehensive quality assessment of the tertiary protein structure by incorporating and comparing a set of different quality assessment tools. The outcomes of selected tools are combined into a composite score (BORDASCORE) that exhibits a meaningful correlation with GDT_TS and facilitates the selection of optimal models in the absence of a reference structure. To validate AlphaMod's results, we conducted evaluations using two distinct datasets summing up to 72 targets, previously used to independently assess AlphaFold2's performance. The generated models underwent evaluation through two methods: i) averaging the GDT_TS scores across all produced structures for a single target sequence, and ii) a pairwise comparison of the best structures generated by AlphaFold2 and AlphaMod. The latter, within the unsupervised setups, shows a rising accuracy of approximately 34% over AlphaFold2. While, when considering the supervised setup, AlphaMod surpasses AlphaFold2 in 18% of the instances. Finally, there is an 11% correspondence in outcomes between the diverse methodologies. Consequently, AlphaMod's best-predicted tertiary structures in several cases exhibited a significant improvement in the accuracy of the predictions with respect to the best models obtained by AlphaFold2. This pipeline paves the way for the integration of additional data and AI-based algorithms to further improve the reliability of the predictions.

3.

Machine Learning as a Support for the Diagnosis of Type 2 Diabetes.

Agliata, Antonio; Giordano, Deborah; Bardozzo, Francesco; Bottiglieri, Salvatore; Facchiano, Angelo; Tagliaferri, Roberto.

Int J Mol Sci ; 24(7)2023 Apr 05.

Artículo en Inglés | MEDLINE | ID: mdl-37047748

RESUMEN

Diabetes is a chronic, metabolic disease characterized by high blood sugar levels. Among the main types of diabetes, type 2 is the most common. Early diagnosis and treatment can prevent or delay the onset of complications. Previous studies examined the application of machine learning techniques for prediction of the pathology, and here an artificial neural network shows very promising results as a possible valuable aid in the management and prevention of diabetes. Additionally, its superior ability for long-term predictions makes it an ideal choice for this field of study. We utilized machine learning methods to uncover previously undiscovered associations between an individual's health status and the development of type 2 diabetes, with the goal of accurately predicting its onset or determining the individual's risk level. Our study employed a binary classifier, trained on scratch, to identify potential nonlinear relationships between the onset of type 2 diabetes and a set of parameters obtained from patient measurements. Three datasets were utilized, i.e., the National Center for Health Statistics' (NHANES) biennial survey, MIMIC-III and MIMIC-IV. These datasets were then combined to create a single dataset with the same number of individuals with and without type 2 diabetes. Since the dataset was balanced, the primary evaluation metric for the model was accuracy. The outcomes of this study were encouraging, with the model achieving accuracy levels of up to 86% and a ROC AUC value of 0.934. Further investigation is needed to improve the reliability of the model by considering multiple measurements from the same patient over time.

Asunto(s)

Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/diagnóstico , Encuestas Nutricionales , Reproducibilidad de los Resultados , Aprendizaje Automático , Redes Neurales de la Computación

4.

Investigating the Effects of Amino Acid Variations in Human Menin.

Biancaniello, Carmen; D'Argenio, Antonia; Giordano, Deborah; Dotolo, Serena; Scafuri, Bernardina; Marabotti, Anna; d'Acierno, Antonio; Tagliaferri, Roberto; Facchiano, Angelo.

Molecules ; 27(5)2022 Mar 07.

Artículo en Inglés | MEDLINE | ID: mdl-35268848

RESUMEN

Human menin is a nuclear protein that participates in many cellular processes, as transcriptional regulation, DNA damage repair, cell signaling, cell division, proliferation, and migration, by interacting with many other proteins. Mutations of the gene encoding menin cause multiple endocrine neoplasia type 1 (MEN1), a rare autosomal dominant disorder associated with tumors of the endocrine glands. In order to characterize the structural and functional effects at protein level of the hundreds of missense variations, we investigated by computational methods the wild-type menin and more than 200 variants, predicting the amino acid variations that change secondary structure, solvent accessibility, salt-bridge and H-bond interactions, protein thermostability, and altering the capability to bind known protein interactors. The structural analyses are freely accessible online by means of a web interface that integrates also a 3D visualization of the structure of the wild-type and variant proteins. The results of the study offer insight into the effects of the amino acid variations in view of a more complete understanding of their pathological role.

Asunto(s)

Aminoácidos

5.

StaSiS-Net: A stacked and siamese disparity estimation network for depth reconstruction in modern 3D laparoscopy.

Bardozzo, Francesco; Collins, Toby; Forgione, Antonello; Hostettler, Alexandre; Tagliaferri, Roberto.

Med Image Anal ; 77: 102380, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35139482

RESUMEN

Developing accurate and real-time algorithms for a non-invasive three-dimensional representation and reconstruction of internal patient structures is one of the main research fields in computer-assisted surgery and endoscopy. Mono and stereo endoscopic images of soft tissues are converted into a three-dimensional representation by the estimation of depth maps. However, automatic, detailed, accurate and robust depth map estimation is a challenging problem that, in the stereo setting, is strictly dependent on a robust estimate of the disparity map. Many traditional algorithms are often inefficient or not accurate. In this work, novel self-supervised stacked and Siamese encoder/decoder neural networks are proposed to compute accurate disparity maps for 3D laparoscopy depth estimation. These networks run in real-time on standard GPU-equipped desktop computers and the outputs may be used for depth map estimation using the a known camera calibration. We compare performance on three different public datasets and on a new challenging simulated dataset and our solutions outperform state-of-the-art mono and stereo depth estimation methods. Extensive robustness and sensitivity analyses on more than 30000 frames has been performed. This work leads to important improvements in mono and stereo real-time depth map estimation of soft tissues and organs with a very low average mean absolute disparity reconstruction error with respect to ground truth.

Asunto(s)

Laparoscopía , Cirugía Asistida por Computador , Algoritmos , Humanos , Imagenología Tridimensional/métodos , Redes Neurales de la Computación , Cirugía Asistida por Computador/métodos

6.

Blind microscopy image denoising with a deep residual and multiscale encoder/decoder network.

Gil Zuluaga, Fabio Hernan; Bardozzo, Francesco; Rios Patino, Jorge Ivan; Tagliaferri, Roberto.

Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 3483-3486, 2021 11.

Artículo en Inglés | MEDLINE | ID: mdl-34891990

RESUMEN

In computer-aided diagnosis (CAD) focused on microscopy, denoising improves the quality of image analysis. In general, the accuracy of this process may depend both on the experience of the microscopist and on the equipment sensitivity and specificity. A medical image could be corrupted by several perturbations during image acquisition. Nowadays, CAD deep learning applications pre-process images with image denoising models to reinforce learning and prediction. In this work, an innovative and lightweight deep multiscale convolutional encoder-decoder neural network is proposed. Specifically, the encoder uses deterministic mapping to map features into a hidden representation. Then, the latent representation is rebuilt to generate the reconstructed denoised image. Residual learning strategies are used to improve and accelerate the training process using skip connections in bridging across convolutional and deconvolutional layers. The proposed model reaches on average 38.38 of PSNR and 0.98 of SSIM on a test set of 57458 images overcoming state-of-the-art models in the same application domain.Clinical relevance - Encoder-decoder based denoiser enables industry experts to provide more accurate and reliable medical interpretation and diagnosis in a variety of fields, from microscopy to surgery, with the benefit of real-time processing.

Asunto(s)

Microscopía , Redes Neurales de la Computación , Diagnóstico por Computador , Procesamiento de Imagen Asistido por Computador , Sensibilidad y Especificidad

7.

A multiple network-based bioinformatics pipeline for the study of molecular mechanisms in oncological diseases for personalized medicine.

Dotolo, Serena; Marabotti, Anna; Rachiglio, Anna Maria; Esposito Abate, Riziero; Benedetto, Marco; Ciardiello, Fortunato; De Luca, Antonella; Normanno, Nicola; Facchiano, Angelo; Tagliaferri, Roberto.

Brief Bioinform ; 22(6)2021 11 05.

Artículo en Inglés | MEDLINE | ID: mdl-34050359

RESUMEN

MOTIVATION: Assessment of genetic mutations is an essential element in the modern era of personalized cancer treatment. Our strategy is focused on 'multiple network analysis' in which we try to improve cancer diagnostics by using biological networks. Genetic alterations in some important hubs or in driver genes such as BRAF and TP53 play a critical role in regulating many important molecular processes. Most of the studies are focused on the analysis of the effects of single mutations, while tumors often carry mutations of multiple driver genes. The aim of this work is to define an innovative bioinformatics pipeline focused on the design and analysis of networks (such as biomedical and molecular networks), in order to: (1) improve the disease diagnosis; (2) identify the patients that could better respond to a given drug treatment; and (3) predict what are the primary and secondary effects of gene mutations involved in human diseases. RESULTS: By using our pipeline based on a multiple network approach, it has been possible to demonstrate and validate what are the joint effects and changes of the molecular profile that occur in patients with metastatic colorectal carcinoma (mCRC) carrying mutations in multiple genes. In this way, we can identify the most suitable drugs for the therapy for the individual patient. This information is useful to improve precision medicine in cancer patients. As an application of our pipeline, the clinically significant case studies of a cohort of mCRC patients with the BRAF V600E-TP53 I195N missense combined mutation were considered. AVAILABILITY: The procedures used in this paper are part of the Cytoscape Core, available at (www.cytoscape.org). Data used here on mCRC patients have been published in [55]. SUPPLEMENTARY INFORMATION: A supplementary file containing a more detailed discussion of this case study and other cases is available at the journal site as Supplementary Data.

Asunto(s)

Biomarcadores de Tumor , Biología Computacional/métodos , Susceptibilidad a Enfermedades , Neoplasias/etiología , Medicina de Precisión/métodos , Redes Reguladoras de Genes , Humanos , Redes y Vías Metabólicas , Neoplasias/metabolismo , Mapas de Interacción de Proteínas , Transducción de Señal

8.

A review on drug repurposing applicable to COVID-19.

Dotolo, Serena; Marabotti, Anna; Facchiano, Angelo; Tagliaferri, Roberto.

Brief Bioinform ; 22(2): 726-741, 2021 03 22.

Artículo en Inglés | MEDLINE | ID: mdl-33147623

RESUMEN

Drug repurposing involves the identification of new applications for existing drugs at a lower cost and in a shorter time. There are different computational drug-repurposing strategies and some of these approaches have been applied to the coronavirus disease 2019 (COVID-19) pandemic. Computational drug-repositioning approaches applied to COVID-19 can be broadly categorized into (i) network-based models, (ii) structure-based approaches and (iii) artificial intelligence (AI) approaches. Network-based approaches are divided into two categories: network-based clustering approaches and network-based propagation approaches. Both of them allowed to annotate some important patterns, to identify proteins that are functionally associated with COVID-19 and to discover novel drug-disease or drug-target relationships useful for new therapies. Structure-based approaches allowed to identify small chemical compounds able to bind macromolecular targets to evaluate how a chemical compound can interact with the biological counterpart, trying to find new applications for existing drugs. AI-based networks appear, at the moment, less relevant since they need more data for their application.

Asunto(s)

Antivirales/uso terapéutico , Tratamiento Farmacológico de COVID-19 , Reposicionamiento de Medicamentos , SARS-CoV-2/aislamiento & purificación , COVID-19/virología , Humanos , Simulación del Acoplamiento Molecular

9.

Signal metrics analysis of oscillatory patterns in bacterial multi-omic networks.

Bardozzo, Francesco; Lió, Pietro; Tagliaferri, Roberto.

Bioinformatics ; 37(10): 1411-1419, 2021 06 16.

Artículo en Inglés | MEDLINE | ID: mdl-33185666

RESUMEN

MOTIVATION: One of the branches of Systems Biology is focused on a deep understanding of underlying regulatory networks through the analysis of the biomolecules oscillations and their interplay. Synthetic Biology exploits gene or/and protein regulatory networks towards the design of oscillatory networks for producing useful compounds. Therefore, at different levels of application and for different purposes, the study of biomolecular oscillations can lead to different clues about the mechanisms underlying living cells. It is known that network-level interactions involve more than one type of biomolecule as well as biological processes operating at multiple omic levels. Combining network/pathway-level information with genetic information it is possible to describe well-understood or unknown bacterial mechanisms and organism-specific dynamics. RESULTS: Following the methodologies used in signal processing and communication engineering, a methodology is introduced to identify and quantify the extent of multi-omic oscillations. These are due to the process of multi-omic integration and depend on the gene positions on the chromosome. Ad hoc signal metrics are designed to allow further biotechnological explanations and provide important clues about the oscillatory nature of the pathways and their regulatory circuits. Our algorithms designed for the analysis of multi-omic signals are tested and validated on 11 different bacteria for thousands of multi-omic signals perturbed at the network level by different experimental conditions. Information on the order of genes, codon usage, gene expression and protein molecular weight is integrated at three different functional levels. Oscillations show interesting evidence that network-level multi-omic signals present a synchronized response to perturbations and evolutionary relations along taxa. AVAILABILITY AND IMPLEMENTATION: The algorithms, the code (in language R), the tool, the pipeline and the whole dataset of multi-omic signal metrics are available at: https://github.com/lodeguns/Multi-omicSignals. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Benchmarking , Bacterias/genética , Redes Reguladoras de Genes , Biología de Sistemas

10.

Strong-Weak Pruning for Brain Network Identification in Connectome-Wide Neuroimaging: Application to Amyotrophic Lateral Sclerosis Disease Stage Characterization.

Serra, Angela; Galdi, Paola; Pesce, Emanuele; Fratello, Michele; Trojsi, Francesca; Tedeschi, Gioacchino; Tagliaferri, Roberto; Esposito, Fabrizio.

Int J Neural Syst ; 29(7): 1950007, 2019 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-30929575

RESUMEN

Magnetic resonance imaging allows acquiring functional and structural connectivity data from which high-density whole-brain networks can be derived to carry out connectome-wide analyses in normal and clinical populations. Graph theory has been widely applied to investigate the modular structure of brain connections by using centrality measures to identify the "hub" of human connectomes, and community detection methods to delineate subnetworks associated with diverse cognitive and sensorimotor functions. These analyses typically rely on a preprocessing step (pruning) to reduce computational complexity and remove the weakest edges that are most likely affected by experimental noise. However, weak links may contain relevant information about brain connectivity, therefore, the identification of the optimal trade-off between retained and discarded edges is a subject of active research. We introduce a pruning algorithm to identify edges that carry the highest information content. The algorithm selects both strong edges (i.e. edges belonging to shortest paths) and weak edges that are topologically relevant in weakly connected subnetworks. The newly developed "strong-weak" pruning (SWP) algorithm was validated on simulated networks that mimic the structure of human brain networks. It was then applied for the analysis of a real dataset of subjects affected by amyotrophic lateral sclerosis (ALS), both at the early (ALS2) and late (ALS3) stage of the disease, and of healthy control subjects. SWP preprocessing allowed identifying statistically significant differences in the path length of networks between patients and healthy subjects. ALS patients showed a decrease of connectivity between frontal cortex to temporal cortex and parietal cortex and between temporal and occipital cortex. Moreover, degree of centrality measures revealed significantly different hub and centrality scores between patient subgroups. These findings suggest a widespread alteration of network topology in ALS associated with disease progression.

Asunto(s)

Esclerosis Amiotrófica Lateral/diagnóstico por imagen , Encéfalo/diagnóstico por imagen , Conectoma/métodos , Imagen por Resonancia Magnética/métodos , Red Nerviosa/diagnóstico por imagen , Plasticidad Neuronal/fisiología , Esclerosis Amiotrófica Lateral/fisiopatología , Encéfalo/fisiopatología , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Red Nerviosa/fisiopatología , Neuroimagen/métodos

11.

INSIdE NANO: a systems biology framework to contextualize the mechanism-of-action of engineered nanomaterials.

Serra, Angela; Letunic, Ivica; Fortino, Vittorio; Handy, Richard D; Fadeel, Bengt; Tagliaferri, Roberto; Greco, Dario.

Sci Rep ; 9(1): 179, 2019 01 17.

Artículo en Inglés | MEDLINE | ID: mdl-30655578

RESUMEN

Engineered nanomaterials (ENMs) are widely present in our daily lives. Despite the efforts to characterize their mechanism of action in multiple species, their possible implications in human pathologies are still not fully understood. Here we performed an integrated analysis of the effects of ENMs on human health by contextualizing their transcriptional mechanism-of-action with respect to drugs, chemicals and diseases. We built a network of interactions of over 3,000 biological entities and developed a novel computational tool, INSIdE NANO, to infer new knowledge about ENM behavior. We highlight striking association of metal and metal-oxide nanoparticles and major neurodegenerative disorders. Our novel strategy opens possibilities to achieve fast and accurate read-across evaluation of ENMs and other chemicals based on their biosignatures.

12.

Stochastic Rank Aggregation for the Identification of Functional Neuromarkers.

Galdi, Paola; Fratello, Michele; Trojsi, Francesca; Russo, Antonio; Tedeschi, Gioacchino; Tagliaferri, Roberto; Esposito, Fabrizio.

Neuroinformatics ; 17(4): 479-496, 2019 10.

Artículo en Inglés | MEDLINE | ID: mdl-30604083

RESUMEN

The main challenge in analysing functional magnetic resonance imaging (fMRI) data from extended samples of subject (N > 100) is to extract as much relevant information as possible from big amounts of noisy data. When studying neurodegenerative diseases with resting-state fMRI, one of the objectives is to determine regions with abnormal background activity with respect to a healthy brain and this is often attained with comparative statistical models applied to single voxels or brain parcels within one or several functional networks. In this work, we propose a novel approach based on clustering and stochastic rank aggregation to identify parcels that exhibit a coherent behaviour in groups of subjects affected by the same disorder and apply it to default-mode network independent component maps from resting-state fMRI data sets. Brain voxels are partitioned into parcels through k-means clustering, then solutions are enhanced by means of consensus techniques. For each subject, clusters are ranked according to their median value and a stochastic rank aggregation method, TopKLists, is applied to combine the individual rankings within each class of subjects. For comparison, the same approach was tested on an anatomical parcellation. We found parcels for which the rankings were different among control subjects and subjects affected by Parkinson's disease and amyotrophic lateral sclerosis and found evidence in literature for the relevance of top ranked regions in default-mode brain activity. The proposed framework represents a valid method for the identification of functional neuromarkers from resting-state fMRI data, and it might therefore constitute a step forward in the development of fully automated data-driven techniques to support early diagnoses of neurodegenerative diseases.

Asunto(s)

Encéfalo/diagnóstico por imagen , Imagen por Resonancia Magnética/métodos , Enfermedades Neurodegenerativas/diagnóstico por imagen , Adulto , Anciano , Anciano de 80 o más Años , Mapeo Encefálico/métodos , Análisis por Conglomerados , Estudios de Cohortes , Femenino , Humanos , Imagen por Resonancia Magnética/estadística & datos numéricos , Masculino , Persona de Mediana Edad , Descanso , Procesos Estocásticos

13.

A study on multi-omic oscillations in Escherichia coli metabolic networks.

Bardozzo, Francesco; Lió, Pietro; Tagliaferri, Roberto.

BMC Bioinformatics ; 19(Suppl 7): 194, 2018 07 09.

Artículo en Inglés | MEDLINE | ID: mdl-30066640

RESUMEN

BACKGROUND: Two important challenges in the analysis of molecular biology information are data (multi-omic information) integration and the detection of patterns across large scale molecular networks and sequences. They are are actually coupled beause the integration of omic information may provide better means to detect multi-omic patterns that could reveal multi-scale or emerging properties at the phenotype levels. RESULTS: Here we address the problem of integrating various types of molecular information (a large collection of gene expression and sequence data, codon usage and protein abundances) to analyse the E.coli metabolic response to treatments at the whole network level. Our algorithm, MORA (Multi-omic relations adjacency) is able to detect patterns which may represent metabolic network motifs at pathway and supra pathway levels which could hint at some functional role. We provide a description and insights on the algorithm by testing it on a large database of responses to antibiotics. Along with the algorithm MORA, a novel model for the analysis of oscillating multi-omics has been proposed. Interestingly, the resulting analysis suggests that some motifs reveal recurring oscillating or position variation patterns on multi-omics metabolic networks. Our framework, implemented in R, provides effective and friendly means to design intervention scenarios on real data. By analysing how multi-omics data build up multi-scale phenotypes, the software allows to compare and test metabolic models, design new pathways or redesign existing metabolic pathways and validate in silico metabolic models using nearby species. CONCLUSIONS: The integration of multi-omic data reveals that E.coli multi-omic metabolic networks contain position dependent and recurring patterns which could provide clues of long range correlations in the bacterial genome.

Asunto(s)

Escherichia coli/metabolismo , Redes y Vías Metabólicas , Metabolómica/métodos , Algoritmos , Escherichia coli/genética , Genoma Bacteriano , Operón/genética , Fenotipo , Programas Informáticos

14.

Robust clustering of noisy high-dimensional gene expression data for patients subtyping.

Coretto, Pietro; Serra, Angela; Tagliaferri, Roberto.

Bioinformatics ; 34(23): 4064-4072, 2018 12 01.

Artículo en Inglés | MEDLINE | ID: mdl-29939219

RESUMEN

Motivation: One of the most important research areas in personalized medicine is the discovery of disease sub-types with relevance in clinical applications. This is usually accomplished by exploring gene expression data with unsupervised clustering methodologies. Then, with the advent of multiple omics technologies, data integration methodologies have been further developed to obtain better performances in patient separability. However, these methods do not guarantee the survival separability of the patients in different clusters. Results: We propose a new methodology that first computes a robust and sparse correlation matrix of the genes, then decomposes it and projects the patient data onto the first m spectral components of the correlation matrix. After that, a robust and adaptive to noise clustering algorithm is applied. The clustering is set up to optimize the separation between survival curves estimated cluster-wise. The method is able to identify clusters that have different omics signatures and also statistically significant differences in survival time. The proposed methodology is tested on five cancer datasets downloaded from The Cancer Genome Atlas repository. The proposed method is compared with the Similarity Network Fusion (SNF) approach, and model based clustering based on Student's t-distribution (TMIX). Our method obtains a better performance in terms of survival separability, even if it uses a single gene expression view compared to the multi-view approach of the SNF method. Finally, a pathway based analysis is accomplished to highlight the biological processes that differentiate the obtained patient groups. Availability and implementation: Our R source code is available online at https://github.com/angy89/RobustClusteringPatientSubtyping. Supplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Análisis por Conglomerados , Perfilación de la Expresión Génica , Programas Informáticos , Biología Computacional , Humanos , Neoplasias/genética , Medicina de Precisión

15.

Robust and sparse correlation matrix estimation for the analysis of high-dimensional genomics data.

Serra, Angela; Coretto, Pietro; Fratello, Michele; Tagliaferri, Roberto; Stegle, Oliver.

Bioinformatics ; 34(4): 625-634, 2018 02 15.

Artículo en Inglés | MEDLINE | ID: mdl-29040390

RESUMEN

Motivation: Microarray technology can be used to study the expression of thousands of genes across a number of different experimental conditions, usually hundreds. The underlying principle is that genes sharing similar expression patterns, across different samples, can be part of the same co-expression system, or they may share the same biological functions. Groups of genes are usually identified based on cluster analysis. Clustering methods rely on the similarity matrix between genes. A common choice to measure similarity is to compute the sample correlation matrix. Dimensionality reduction is another popular data analysis task which is also based on covariance/correlation matrix estimates. Unfortunately, covariance/correlation matrix estimation suffers from the intrinsic noise present in high-dimensional data. Sources of noise are: sampling variations, presents of outlying sample units, and the fact that in most cases the number of units is much larger than the number of genes. Results: In this paper, we propose a robust correlation matrix estimator that is regularized based on adaptive thresholding. The resulting method jointly tames the effects of the high-dimensionality, and data contamination. Computations are easy to implement and do not require hand tunings. Both simulated and real data are analyzed. A Monte Carlo experiment shows that the proposed method is capable of remarkable performances. Our correlation metric is more robust to outliers compared with the existing alternatives in two gene expression datasets. It is also shown how the regularization allows to automatically detect and filter spurious correlations. The same regularization is also extended to other less robust correlation measures. Finally, we apply the ARACNE algorithm on the SyNTreN gene expression data. Sensitivity and specificity of the reconstructed network is compared with the gold standard. We show that ARACNE performs better when it takes the proposed correlation matrix estimator as input. Availability and implementation: The R software is available at https://github.com/angy89/RobustSparseCorrelation. Contact: aserra@unisa.it or robtag@unisa.it. Supplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Análisis por Conglomerados , Perfilación de la Expresión Génica/métodos , Programas Informáticos , Algoritmos , Humanos , Neoplasias/genética , Sensibilidad y Especificidad , Análisis de Secuencia de ARN/métodos

16.

E2FM: an encrypted and compressed full-text index for collections of genomic sequences.

Montecuollo, Ferdinando; Schmid, Giovannni; Tagliaferri, Roberto.

Bioinformatics ; 33(18): 2808-2817, 2017 Sep 15.

Artículo en Inglés | MEDLINE | ID: mdl-28498928

RESUMEN

MOTIVATION: Next Generation Sequencing (NGS) platforms and, more generally, high-throughput technologies are giving rise to an exponential growth in the size of nucleotide sequence databases. Moreover, many emerging applications of nucleotide datasets-as those related to personalized medicine-require the compliance with regulations about the storage and processing of sensitive data. RESULTS: We have designed and carefully engineered E 2 FM -index, a new full-text index in minute space which was optimized for compressing and encrypting nucleotide sequence collections in FASTA format and for performing fast pattern-search queries. E 2 FM -index allows to build self-indexes which occupy till to 1/20 of the storage required by the input FASTA file, thus permitting to save about 95% of storage when indexing collections of highly similar sequences; moreover, it can exactly search the built indexes for patterns in times ranging from few milliseconds to a few hundreds milliseconds, depending on pattern length. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/montecuollo/E2FM . CONTACT: ferdinando.montecuollo@unicampania.it. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Bases de Datos de Ácidos Nucleicos , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Algoritmos , Cromosomas Humanos Par 11 , Genómica/métodos , Humanos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN/métodos

17.

Multi-View Ensemble Classification of Brain Connectivity Images for Neurodegeneration Type Discrimination.

Fratello, Michele; Caiazzo, Giuseppina; Trojsi, Francesca; Russo, Antonio; Tedeschi, Gioacchino; Tagliaferri, Roberto; Esposito, Fabrizio.

Neuroinformatics ; 15(2): 199-213, 2017 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-28210983

RESUMEN

Brain connectivity analyses using voxels as features are not robust enough for single-patient classification because of the inter-subject anatomical and functional variability. To construct more robust features, voxels can be aggregated into clusters that are maximally coherent across subjects. Moreover, combining multi-modal neuroimaging and multi-view data integration techniques allows generating multiple independent connectivity features for the same patient. Structural and functional connectivity features were extracted from multi-modal MRI images with a clustering technique, and used for the multi-view classification of different phenotypes of neurodegeneration by an ensemble learning method (random forest). Two different multi-view models (intermediate and late data integration) were trained on, and tested for the classification of, individual whole-brain default-mode network (DMN) and fractional anisotropy (FA) maps, from 41 amyotrophic lateral sclerosis (ALS) patients, 37 Parkinson's disease (PD) patients and 43 healthy control (HC) subjects. Both multi-view data models exhibited ensemble classification accuracies significantly above chance. In ALS patients, multi-view models exhibited the best performances (intermediate: 82.9%, late: 80.5% correct classification) and were more discriminative than each single-view model. In PD patients and controls, multi-view models' performances were lower (PD: 59.5%, 62.2%; HC: 56.8%, 59.1%) but higher than at least one single-view model. Training the models only on patients, produced more than 85% patients correctly discriminated as ALS or PD type and maximal performances for multi-view models. These results highlight the potentials of mining complementary information from the integration of multiple data views in the classification of connectivity patterns from multi-modal brain images in the study of neurodegenerative diseases.

Asunto(s)

Mapeo Encefálico , Encéfalo/diagnóstico por imagen , Imagen por Resonancia Magnética , Modelos Neurológicos , Vías Nerviosas/diagnóstico por imagen , Enfermedades Neurodegenerativas/patología , Adulto , Anciano , Anciano de 80 o más Años , Anisotropía , Árboles de Decisión , Femenino , Humanos , Procesamiento de Imagen Asistido por Computador , Masculino , Persona de Mediana Edad , Vías Nerviosas/fisiología , Enfermedades Neurodegenerativas/clasificación

18.

CONDOP: an R package for CONdition-Dependent Operon Predictions.

Fortino, Vittorio; Tagliaferri, Roberto; Greco, Dario.

Bioinformatics ; 32(20): 3199-3200, 2016 10 15.

Artículo en Inglés | MEDLINE | ID: mdl-27296981

RESUMEN

The use of high-throughput RNA sequencing to predict dynamic operon structures in prokaryotic genomes has recently gained popularity in bioinformatics. We provide the R implementation of a novel method that uses transcriptomic features extracted from RNA-seq transcriptome profiles to develop ensemble classifiers for condition-dependent operon predictions. The CONDOP package provides a deeper insight into RNA-seq data analysis and allows scientists to highlight the operon organization in the context of transcriptional regulation with a few lines of code. AVAILABILITY AND IMPLEMENTATION: CONDOP is implemented in R and is freely available at CRAN. CONTACT: vittorio.fortino@helsinki.fiSupplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Operón , Análisis de Secuencia de ARN , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento , ARN

19.

MVDA: a multi-view genomic data integration methodology.

Serra, Angela; Fratello, Michele; Fortino, Vittorio; Raiconi, Giancarlo; Tagliaferri, Roberto; Greco, Dario.

BMC Bioinformatics ; 16: 261, 2015 Aug 19.

Artículo en Inglés | MEDLINE | ID: mdl-26283178

RESUMEN

BACKGROUND: Multiple high-throughput molecular profiling by omics technologies can be collected for the same individuals. Combining these data, rather than exploiting them separately, can significantly increase the power of clinically relevant patients subclassifications. RESULTS: We propose a multi-view approach in which the information from different data layers (views) is integrated at the levels of the results of each single view clustering iterations. It works by factorizing the membership matrices in a late integration manner. We evaluated the effectiveness and the performance of our method on six multi-view cancer datasets. In all the cases, we found patient sub-classes with statistical significance, identifying novel sub-groups previously not emphasized in literature. Our method performed better as compared to other multi-view clustering algorithms and, unlike other existing methods, it is able to quantify the contribution of single views on the final results. CONCLUSION: Our observations suggest that integration of prior information with genomic features in the subtyping analysis is an effective strategy in identifying disease subgroups. The methodology is implemented in R and the source code is available online at http://neuronelab.unisa.it/a-multi-view-genomic-data-integration-methodology/ .

Asunto(s)

Algoritmos , Genómica/métodos , Análisis por Conglomerados , MicroARNs/genética , MicroARNs/metabolismo , Análisis de Secuencia de ARN

20.

A multi-view genomic data simulator.

Fratello, Michele; Serra, Angela; Fortino, Vittorio; Raiconi, Giancarlo; Tagliaferri, Roberto; Greco, Dario.

BMC Bioinformatics ; 16: 151, 2015 May 12.

Artículo en Inglés | MEDLINE | ID: mdl-25962835

RESUMEN

BACKGROUND: OMICs technologies allow to assay the state of a large number of different features (e.g., mRNA expression, miRNA expression, copy number variation, DNA methylation, etc.) from the same samples. The objective of these experiments is usually to find a reduced set of significant features, which can be used to differentiate the conditions assayed. In terms of development of novel feature selection computational methods, this task is challenging for the lack of fully annotated biological datasets to be used for benchmarking. A possible way to tackle this problem is generating appropriate synthetic datasets, whose composition and behaviour are fully controlled and known a priori. RESULTS: Here we propose a novel method centred on the generation of networks of interactions among different biological molecules, especially involved in regulating gene expression. Synthetic datasets are obtained from ordinary differential equations based models with known parameters. Our results show that the generated datasets are well mimicking the behaviour of real data, for popular data analysis methods are able to selectively identify existing interactions. CONCLUSIONS: The proposed method can be used in conjunction to real biological datasets in the assessment of data mining techniques. The main strength of this method consists in the full control on the simulated data while retaining coherence with the real biological processes. The R package MVBioDataSim is freely available to the scientific community at http://neuronelab.unisa.it/?p=1722.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Simulación por Computador , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Genómica/métodos , Variaciones en el Número de Copia de ADN , Metilación de ADN , Conjuntos de Datos como Asunto , Regulación de la Expresión Génica , Humanos , MicroARNs/genética

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA