Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
J Pers Med ; 13(1)2023 Jan 06.
Artículo en Inglés | MEDLINE | ID: mdl-36675783

RESUMEN

A personalized approach is strongly advocated for treatment selection in Multiple Sclerosis patients due to the high number of available drugs. Machine learning methods proved to be valuable tools in the context of precision medicine. In the present work, we applied machine learning methods to identify a combined clinical and genetic signature of response to fingolimod that could support the prediction of drug response. Two cohorts of fingolimod-treated patients from Italy and France were enrolled and divided into training, validation, and test set. Random forest training and robust feature selection were performed in the first two sets respectively, and the independent test set was used to evaluate model performance. A genetic-only model and a combined clinical-genetic model were obtained. Overall, 381 patients were classified according to the NEDA-3 criterion at 2 years; we identified a genetic model, including 123 SNPs, that was able to predict fingolimod response with an AUROC= 0.65 in the independent test set. When combining clinical data, the model accuracy increased to an AUROC= 0.71. Integrating clinical and genetic data by means of machine learning methods can help in the prediction of response to fingolimod, even though further studies are required to definitely extend this approach to clinical applications.

2.
Comput Struct Biotechnol J ; 20: 864-873, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35222845

RESUMEN

Prostate cancer (PC) is one of the major male cancers. Differential diagnosis of PC is indispensable for the individual therapy, i.e., Gleason score (GS) that describes the grade of cancer can be used to choose the appropriate therapy. However, the current techniques for PC diagnosis and prognosis are not always effective. To identify potential markers that could be used for differential diagnosis of PC, we analyzed miRNA-mRNA interactions and we build specific networks for PC onset and progression. Key differentially expressed miRNAs for each GS were selected by calculating three parameters of network topology measures: the number of their single regulated mRNAs (NSR), the number of target genes (NTG) and NSR/NTG. miRNAs that obtained a high statistically significant value of these three parameters were chosen as potential biomarkers for computational validation and pathway analysis. 20 miRNAs were identified as key candidates for PC. 8 out of 20 miRNAs (miR-25-3p, miR-93-3p, miR-122-5p, miR-183-5p, miR-615-3p, miR-7-5p, miR-375, and miR-92a-3p) were differentially expressed in all GS and proposed as biomarkers for PC onset. In addition, "Extracellular-receptor interaction", "Focal adhesion", and "microRNAs in cancer" were significantly enriched by the differentially expressed target genes of the identified miRNAs. miR-10a-5p was found to be differentially expressed in GS 6, 7, and 8 in PC samples. 3 miRNAs were identified as PC GS-specific differentially expressed miRNAs: miR-155-5p was identified in PC samples with GS 6, and miR-142-3p and miR-296-3p in PC samples with GS 9. The efficacy of 20 miRNAs as potential biomarkers was revealed with a Random Forest classification using an independent dataset. The results demonstrated our 20 miRNAs achieved a better performance (AUC: 0.73) than miRNAs selected with Boruta algorithm (AUC: 0.55), a method for the automated feature extraction. Studying miRNA-mRNA associations, key miRNAs were identified with a computational approach for PC onset and progression. Further experimental validations are needed for future translational development.

3.
Bioinformatics ; 37(23): 4526-4533, 2021 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-34240108

RESUMEN

MOTIVATION: Automated protein function prediction is a complex multi-class, multi-label, structured classification problem in which protein functions are organized in a controlled vocabulary, according to the Gene Ontology (GO). 'Hierarchy-unaware' classifiers, also known as 'flat' methods, predict GO terms without exploiting the inherent structure of the ontology, potentially violating the True-Path-Rule (TPR) that governs the GO, while 'hierarchy-aware' approaches, even if they obey the TPR, do not always show clear improvements with respect to flat methods, or do not scale well when applied to the full GO. RESULTS: To overcome these limitations, we propose Hierarchical Ensemble Methods for Directed Acyclic Graphs (HEMDAG), a family of highly modular hierarchical ensembles of classifiers, able to build upon any flat method and to provide 'TPR-safe' predictions, by leveraging a combination of isotonic regression and TPR learning strategies. Extensive experiments on synthetic and real data across several organisms firstly show that HEMDAG can be used as a general tool to improve the predictions of flat classifiers, and secondly that HEMDAG is competitive versus state-of-the-art hierarchy-aware learning methods proposed in the last CAFA international challenges. AVAILABILITY AND IMPLEMENTATION: Fully tested R code freely available at https://anaconda.org/bioconda/r-hemdag. Tutorial and documentation at https://hemdag.readthedocs.io. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Biología Computacional , Ontología de Genes , Biología Computacional/métodos , Proteínas/metabolismo
4.
Medicina (Kaunas) ; 57(3)2021 Mar 12.
Artículo en Inglés | MEDLINE | ID: mdl-33809336

RESUMEN

Background and Objectives: Breast cancer is a heterogeneous disease categorized into four subtypes. Previous studies have shown that copy number alterations of several genes are implicated with the development and progression of many cancers. This study evaluates the effects of DNA copy number alterations on gene expression levels in different breast cancer subtypes. Materials and Methods: We performed a computational analysis integrating copy number alterations and gene expression profiles in 1024 breast cancer samples grouped into four molecular subtypes: luminal A, luminal B, HER2, and basal. Results: Our analyses identified several genes correlated in all subtypes such as KIAA1967 and MCPH1. In addition, several subtype-specific genes that showed a significant correlation between copy number and gene expression profiles were detected: SMARCB1, AZIN1, MTDH in luminal A, PPP2R5E, APEX1, GCN5 in luminal B, TNFAIP1, PCYT2, DIABLO in HER2, and FAM175B, SENP5, SCAF1 in basal subtype. Conclusions: This study showed that computational analyses integrating copy number and gene expression can contribute to unveil the molecular mechanisms of cancer and identify new subtype-specific biomarkers.


Asunto(s)
Neoplasias de la Mama , Variaciones en el Número de Copia de ADN , Biomarcadores de Tumor/genética , Neoplasias de la Mama/genética , Variaciones en el Número de Copia de ADN/genética , Regulación Neoplásica de la Expresión Génica , Humanos , Proteínas de la Membrana , Proteína Fosfatasa 2 , Proteínas de Unión al ARN , Transcriptoma/genética
5.
PLoS One ; 15(12): e0244241, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33351828

RESUMEN

The visual exploration and analysis of biomolecular networks is of paramount importance for identifying hidden and complex interaction patterns among proteins. Although many tools have been proposed for this task, they are mainly focused on the query and visualization of a single protein with its neighborhood. The global exploration of the entire network and the interpretation of its underlying structure still remains difficult, mainly due to the excessively large size of the biomolecular networks. In this paper we propose a novel multi-resolution representation and exploration approach that exploits hierarchical community detection algorithms for the identification of communities occurring in biomolecular networks. The proposed graphical rendering combines two types of nodes (protein and communities) and three types of edges (protein-protein, community-community, protein-community), and displays communities at different resolutions, allowing the user to interactively zoom in and out from different levels of the hierarchy. Links among communities are shown in terms of relationships and functional correlations among the biomolecules they contain. This form of navigation can be also combined by the user with a vertex centric visualization for identifying the communities holding a target biomolecule. Since communities gather limited-size groups of correlated proteins, the visualization and exploration of complex and large networks becomes feasible on off-the-shelf computer machines. The proposed graphical exploration strategies have been implemented and integrated in UNIPred-Web, a web application that we recently introduced for combining the UNIPred algorithm, able to address both integration and protein function prediction in an imbalance-aware fashion, with an easy to use vertex-centric exploration of the integrated network. The tool has been deeply amended from different standpoints, including the prediction core algorithm. Several tests on networks of different size and connectivity have been conducted to show off the vast potential of our methodology; moreover, enrichment analyses have been performed to assess the biological meaningfulness of detected communities. Finally, a CoV-human network has been embedded in the system, and a corresponding case study presented, including the visualization and the prediction of human host proteins that potentially interact with SARS-CoV2 proteins.


Asunto(s)
COVID-19/genética , Internet , Redes y Vías Metabólicas/genética , SARS-CoV-2/genética , Algoritmos , COVID-19/metabolismo , COVID-19/virología , Humanos , Proteínas/genética , Proteínas/metabolismo , SARS-CoV-2/metabolismo , SARS-CoV-2/patogenicidad
6.
Gigascience ; 9(5)2020 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-32444882

RESUMEN

BACKGROUND: Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: thus, the prediction of deleterious variants is a challenging, highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data. RESULTS: To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and substantially speed up the computation. The synergy between Bayesian optimization techniques and the parallel nature of parSMURF enables efficient and user-friendly automatic tuning of the hyper-parameters of the algorithm, and allows specific learning problems in genomic medicine to be easily fit. Moreover, by using MPI parallel and machine learning ensemble techniques, parSMURF can manage big data by partitioning them across the nodes of a high-performance computing cluster. Results with synthetic data and with single-nucleotide variants associated with Mendelian diseases and with genome-wide association study hits in the non-coding regions of the human genome, involhing millions of examples, show that parSMURF achieves state-of-the-art results and an 80-fold speed-up with respect to the sequential version. CONCLUSIONS: parSMURF is a parallel machine learning tool that can be trained to learn different genomic problems, and its multiple levels of parallelization and high scalability allow us to efficiently fit problems characterized by big and imbalanced genomic data. The C++ OpenMP multi-core version tailored to a single workstation and the C++ MPI/OpenMP hybrid multi-core and multi-node parSMURF version tailored to a High Performance Computing cluster are both available at https://github.com/AnacletoLAB/parSMURF.


Asunto(s)
Biología Computacional/métodos , Predisposición Genética a la Enfermedad , Variación Genética , Estudio de Asociación del Genoma Completo/métodos , Programas Informáticos , Algoritmos , Bases de Datos Genéticas , Genómica/métodos , Humanos , Aprendizaje Automático , Reproducibilidad de los Resultados
7.
Sci Rep ; 10(1): 3612, 2020 02 27.
Artículo en Inglés | MEDLINE | ID: mdl-32107391

RESUMEN

Methods for phenotype and outcome prediction are largely based on inductive supervised models that use selected biomarkers to make predictions, without explicitly considering the functional relationships between individuals. We introduce a novel network-based approach named Patient-Net (P-Net) in which biomolecular profiles of patients are modeled in a graph-structured space that represents gene expression relationships between patients. Then a kernel-based semi-supervised transductive algorithm is applied to the graph to explore the overall topology of the graph and to predict the phenotype/clinical outcome of patients. Experimental tests involving several publicly available datasets of patients afflicted with pancreatic, breast, colon and colorectal cancer show that our proposed method is competitive with state-of-the-art supervised and semi-supervised predictive systems. Importantly, P-Net also provides interpretable models that can be easily visualized to gain clues about the relationships between patients, and to formulate hypotheses about their stratification.


Asunto(s)
Neoplasias de la Mama/diagnóstico , Neoplasias Colorrectales/diagnóstico , Redes Reguladoras de Genes , Redes Neurales de la Computación , Neoplasias Pancreáticas/diagnóstico , Algoritmos , Inteligencia Artificial , Neoplasias de la Mama/epidemiología , Neoplasias Colorrectales/epidemiología , Biología Computacional/métodos , Conjuntos de Datos como Asunto , Femenino , Humanos , Individualidad , Masculino , Neoplasias Pancreáticas/epidemiología , Fenotipo , Pronóstico , Transcriptoma , Resultado del Tratamiento
8.
IEEE Access ; 8: 196299-196325, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-34812365

RESUMEN

Between January and October of 2020, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus has infected more than 34 million persons in a worldwide pandemic leading to over one million deaths worldwide (data from the Johns Hopkins University). Since the virus begun to spread, emergency departments were busy with COVID-19 patients for whom a quick decision regarding in- or outpatient care was required. The virus can cause characteristic abnormalities in chest radiographs (CXR), but, due to the low sensitivity of CXR, additional variables and criteria are needed to accurately predict risk. Here, we describe a computerized system primarily aimed at extracting the most relevant radiological, clinical, and laboratory variables for improving patient risk prediction, and secondarily at presenting an explainable machine learning system, which may provide simple decision criteria to be used by clinicians as a support for assessing patient risk. To achieve robust and reliable variable selection, Boruta and Random Forest (RF) are combined in a 10-fold cross-validation scheme to produce a variable importance estimate not biased by the presence of surrogates. The most important variables are then selected to train a RF classifier, whose rules may be extracted, simplified, and pruned to finally build an associative tree, particularly appealing for its simplicity. Results show that the radiological score automatically computed through a neural network is highly correlated with the score computed by radiologists, and that laboratory variables, together with the number of comorbidities, aid risk prediction. The prediction performance of our approach was compared to that that of generalized linear models and shown to be effective and robust. The proposed machine learning-based computational system can be easily deployed and used in emergency departments for rapid and accurate risk prediction in COVID-19 patients.

9.
BMC Bioinformatics ; 20(1): 422, 2019 Aug 14.
Artículo en Inglés | MEDLINE | ID: mdl-31412768

RESUMEN

BACKGROUND: One of the main issues in the automated protein function prediction (AFP) problem is the integration of multiple networked data sources. The UNIPred algorithm was thereby proposed to efficiently integrate -in a function-specific fashion- the protein networks by taking into account the imbalance that characterizes protein annotations, and to subsequently predict novel hypotheses about unannotated proteins. UNIPred is publicly available as R code, which might result of limited usage for non-expert users. Moreover, its application requires efforts in the acquisition and preparation of the networks to be integrated. Finally, the UNIPred source code does not handle the visualization of the resulting consensus network, whereas suitable views of the network topology are necessary to explore and interpret existing protein relationships. RESULTS: We address the aforementioned issues by proposing UNIPred-Web, a user-friendly Web tool for the application of the UNIPred algorithm to a variety of biomolecular networks, already supplied by the system, and for the visualization and exploration of protein networks. We support different organisms and different types of networks -e.g., co-expression, shared domains and physical interaction networks. Users are supported in the different phases of the process, ranging from the selection of the networks and the protein function to be predicted, to the navigation of the integrated network. The system also supports the upload of user-defined protein networks. The vertex-centric and the highly interactive approach of UNIPred-Web allow a narrow exploration of specific proteins, and an interactive analysis of large sub-networks with only a few mouse clicks. CONCLUSIONS: UNIPred-Web offers a practical and intuitive (visual) guidance to biologists interested in gaining insights into protein biomolecular functions. UNIPred-Web provides facilities for the integration of networks, and supplies a framework for the imbalance-aware protein network integration of nine organisms, the prediction of thousands of GO protein functions, and a easy-to-use graphical interface for the visual analysis, navigation and interpretation of the integrated networks and of the functional predictions.


Asunto(s)
Biología Computacional/métodos , Internet , Mapas de Interacción de Proteínas , Proteínas/metabolismo , Programas Informáticos , Algoritmos , Interfaz Usuario-Computador
10.
IEEE/ACM Trans Comput Biol Bioinform ; 16(5): 1550-1560, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-28328509

RESUMEN

Automated protein function prediction is a challenging problem with distinctive features, such as the hierarchical organization of protein functions and the scarcity of annotated proteins for most biological functions. We propose a multitask learning algorithm addressing both issues. Unlike standard multitask algorithms, which use task (protein functions) similarity information as a bias to speed up learning, we show that dissimilarity information enforces separation of rare class labels from frequent class labels, and for this reason is better suited for solving unbalanced protein function prediction problems. We support our claim by showing that a multitask extension of the label propagation algorithm empirically works best when the task relatedness information is represented using a dissimilarity matrix as opposed to a similarity matrix. Moreover, the experimental comparison carried out on three model organism shows that our method has a more stable performance in both "protein-centric" and "function-centric" evaluation settings.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Automático , Modelos Estadísticos , Proteínas/clasificación , Proteínas/fisiología , Algoritmos , Proteínas de Drosophila/clasificación , Proteínas de Drosophila/fisiología , Proteínas de Escherichia coli/clasificación , Proteínas de Escherichia coli/fisiología , Ontología de Genes , Humanos
11.
BMC Bioinformatics ; 19(1): 530, 2018 12 17.
Artículo en Inglés | MEDLINE | ID: mdl-30558532

RESUMEN

After publication of the original article [1], it was noticed that the dagger symbol indicating equal contribution wasn't added next to the names of all authors.

12.
BMC Bioinformatics ; 19(Suppl 14): 417, 2018 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-30453879

RESUMEN

BACKGROUND: Supervised machine learning methods when applied to the problem of automated protein-function prediction (AFP) require the availability of both positive examples (i.e., proteins which are known to possess a given protein function) and negative examples (corresponding to proteins not associated with that function). Unfortunately, publicly available proteome and genome data sources such as the Gene Ontology rarely store the functions not possessed by a protein. Thus the negative selection, consisting in identifying informative negative examples, is currently a central and challenging problem in AFP. Several heuristics have been proposed through the years to solve this problem; nevertheless, despite their effectiveness, to the best of our knowledge no previous existing work studied which protein features are more relevant to this task, that is, which protein features help more in discriminating reliable and unreliable negatives. RESULTS: The present work analyses the impact of several features on the selection of negative proteins for the Gene Ontology (GO) terms. The analysis is network-based: it exploits the fact that proteins can be naturally structured in a network, considering the pairwise relationships coming from several sources of data, such as protein-protein and genetic interactions. Overall, the proposed protein features, including local and global graph centrality measures and protein multifunctionality, can be term-aware (i.e., depending on the GO term) and term-unaware (i.e., invariant across the GO terms). We validated the informativeness of each feature utilizing a temporal holdout in three different experiments on yeast, mouse and human proteomes: (i) feature selection to detect which protein features are more helpful for the negative selection; (ii) protein function prediction to verify whether the features considered are also useful to predict GO terms; (iii) negative selection by applying two different negative selection algorithms on proteins represented through the proposed features. CONCLUSIONS: Term-aware features (with some exceptions) resulted more informative for problem (i), together with node betweenness, which is the most relevant among term-unaware features. The node positive neighborhood instead is the most predictive feature for the AFP problem, while experiment (iii) showed that the proposed features allow negative selection algorithms to select effectively negative instances in the temporal holdout setting, with better results when nonlinear combinations of features are also exploited.


Asunto(s)
Proteínas/química , Algoritmos , Animales , Ontología de Genes , Redes Reguladoras de Genes , Humanos , Ratones , Proteoma/metabolismo , Saccharomyces cerevisiae/metabolismo
13.
BMC Bioinformatics ; 19(Suppl 10): 357, 2018 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-30367588

RESUMEN

BACKGROUND: In the clinical practice, the objective quantification of histological results is essential not only to define objective and well-established protocols for diagnosis, treatment, and assessment, but also to ameliorate disease comprehension. SOFTWARE: The software MIAQuant_Learn presented in this work segments, quantifies and analyzes markers in histochemical and immunohistochemical images obtained by different biological procedures and imaging tools. MIAQuant_Learn employs supervised learning techniques to customize the marker segmentation process with respect to any marker color appearance. Our software expresses the location of the segmented markers with respect to regions of interest by mean-distance histograms, which are numerically compared by measuring their intersection. When contiguous tissue sections stained by different markers are available, MIAQuant_Learn aligns them and overlaps the segmented markers in a unique image enabling a visual comparative analysis of the spatial distribution of each marker (markers' relative location). Additionally, it computes novel measures of markers' co-existence in tissue volumes depending on their density. CONCLUSIONS: Applications of MIAQuant_Learn in clinical research studies have proven its effectiveness as a fast and efficient tool for the automatic extraction, quantification and analysis of histological sections. It is robust with respect to several deficits caused by image acquisition systems and produces objective and reproducible results. Thanks to its flexibility, MIAQuant_Learn represents an important tool to be exploited in basic research where needs are constantly changing.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Coloración y Etiquetado , Biomarcadores de Tumor/metabolismo , Árboles de Decisión , Humanos , Inmunohistoquímica , Programas Informáticos , Máquina de Vectores de Soporte
14.
BMC Bioinformatics ; 19(Suppl 10): 353, 2018 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-30367594

RESUMEN

BACKGROUND: Several problems in network biology and medicine can be cast into a framework where entities are represented through partially labeled networks, and the aim is inferring the labels (usually binary) of the unlabeled part. Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated protein function prediction (AFP) for most Gene Ontology terms only few proteins are annotated, or in the disease-gene prioritization problem only few genes are actually known to be involved in the etiology of a given disease. Imbalance-aware approaches to accurately predict node labels in biological networks are thereby required. Furthermore, such methods must be scalable, since input data can be large-sized as, for instance, in the context of multi-species protein networks. RESULTS: We propose a novel semi-supervised parallel enhancement of COSNET, an imbalance-aware algorithm build on Hopfield neural model recently suggested to solve the AFP problem. By adopting an efficient representation of the graph and assuming a sparse network topology, we empirically show that it can be efficiently applied to networks with millions of nodes. The key strategy to speed up the computations is to partition nodes into independent sets so as to process each set in parallel by exploiting the power of GPU accelerators. This parallel technique ensures the convergence to asymptotically stable attractors, while preserving the asynchronous dynamics of the original model. Detailed experiments on real data and artificial big instances of the problem highlight scalability and efficiency of the proposed method. CONCLUSIONS: By parallelizing COSNET we achieved on average a speed-up of 180x in solving the AFP problem in the S. cerevisiae, Mus musculus and Homo sapiens organisms, while lowering memory requirements. In addition, to show the potential applicability of the method to huge biomolecular networks, we predicted node labels in artificially generated sparse networks involving hundreds of thousands to millions of nodes.


Asunto(s)
Algoritmos , Gráficos por Computador , Redes Reguladoras de Genes , Animales , Ontología de Genes , Humanos , Ratones , Mapas de Interacción de Proteínas/genética , Proteínas/genética , Saccharomyces cerevisiae/genética , Factores de Tiempo
15.
Artif Intell Med ; 82: 34-46, 2017 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-28882544

RESUMEN

OBJECTIVE: Finding the human genes co-causing complex diseases, also known as "disease-genes", is one of the emerging and challenging tasks in biomedicine. This process, termed gene prioritization (GP), is characterized by a scarcity of known disease-genes for most diseases, and by a vast amount of heterogeneous data, usually encoded into networks describing different types of functional relationships between genes. In addition, different diseases may share common profiles (e.g. genetic or therapeutic profiles), and exploiting disease commonalities may significantly enhance the performance of GP methods. This work aims to provide a systematic comparison of several disease similarity measures, and to embed disease similarities and heterogeneous data into a flexible framework for gene prioritization which specifically handles the lack of known disease-genes. METHODS: We present a novel network-based method, Gene2DisCo, based on generalized linear models (GLMs) to effectively prioritize genes by exploiting data regarding disease-genes, gene interaction networks and disease similarities. The scarcity of disease-genes is addressed by applying an efficient negative selection procedure, together with imbalance-aware GLMs. Gene2DisCo is a flexible framework, in the sense it is not dependent upon specific types of data, and/or upon specific disease ontologies. RESULTS: On a benchmark dataset composed of nine human networks and 708 medical subject headings (MeSH) diseases, Gene2DisCo largely outperformed the best benchmark algorithm, kernelized score functions, in terms of both area under the ROC curve (0.94 against 0.86) and precision at given recall levels (for recall levels from 0.1 to 1 with steps 0.1). Furthermore, we enriched and extended the benchmark data to the whole human genome and provided the top-ranked unannotated candidate genes even for MeSH disease terms without known annotations.


Asunto(s)
Minería de Datos/métodos , Redes Reguladoras de Genes , Marcadores Genéticos , Predisposición Genética a la Enfermedad/clasificación , Aprendizaje Automático , Modelos Genéticos , Algoritmos , Área Bajo la Curva , Bases de Datos Genéticas , Humanos , Modelos Lineales , Curva ROC , Toxicogenética/clasificación , Toxicogenética/métodos
16.
Bioinformatics ; 32(18): 2872-4, 2016 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-27256314

RESUMEN

UNLABELLED: RANKS is a flexible software package that can be easily applied to any bioinformatics task formalizable as ranking of nodes with respect to a property given as a label, such as automated protein function prediction, gene disease prioritization and drug repositioning. To this end RANKS provides an efficient and easy-to-use implementation of kernelized score functions, a semi-supervised algorithmic scheme embedding both local and global learning strategies for the analysis of biomolecular networks. To facilitate comparative assessment, baseline network-based methods, e.g. label propagation and random walk algorithms, have also been implemented. AVAILABILITY AND IMPLEMENTATION: The package is available from CRAN: https://cran.r-project.org/ The package is written in R, except for the most computationally intensive functionalities which are implemented in C. CONTACT: valentini@di.unimi.it SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Reposicionamiento de Medicamentos , Programas Informáticos , Algoritmos , Biología Computacional/métodos , Bases de Datos Factuales , Genómica , Humanos , Proteínas , Biología de Sistemas
17.
J Comput Biol ; 22(12): 1057-74, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26402488

RESUMEN

The proper integration of multiple sources of data and the unbalance between annotated and unannotated proteins represent two of the main issues of the automated function prediction (AFP) problem. Most of supervised and semisupervised learning algorithms for AFP proposed in literature do not jointly consider these items, with a negative impact on both sensitivity and precision performances, due to the unbalance between annotated and unannotated proteins that characterize the majority of functional classes and to the specific and complementary information content embedded in each available source of data. We propose UNIPred (unbalance-aware network integration and prediction of protein functions), an algorithm that properly combines different biomolecular networks and predicts protein functions using parametric semisupervised neural models. The algorithm explicitly takes into account the unbalance between unannotated and annotated proteins both to construct the integrated network and to predict protein annotations for each functional class. Full-genome and ontology-wide experiments with three eukaryotic model organisms show that the proposed method compares favorably with state-of-the-art learning algorithms for AFP.


Asunto(s)
Proteoma/metabolismo , Proteómica/métodos , Programas Informáticos , Proteoma/química
18.
Bioinformatics ; 30(15): 2235-6, 2014 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-24659104

RESUMEN

SUMMARY: We present GOssTo, the Gene Ontology semantic similarity Tool, a user-friendly software system for calculating semantic similarities between gene products according to the Gene Ontology. GOssTo is bundled with six semantic similarity measures, including both term- and graph-based measures, and has extension capabilities to allow the user to add new similarities. Importantly, for any measure, GOssTo can also calculate the Random Walk Contribution that has been shown to greatly improve the accuracy of similarity measures. GOssTo is very fast, easy to use, and it allows the calculation of similarities on a genomic scale in a few minutes on a regular desktop machine. CONTACT: alberto@cs.rhul.ac.uk AVAILABILITY: GOssTo is available both as a stand-alone application running on GNU/Linux, Windows and MacOS from www.paccanarolab.org/gossto and as a web application from www.paccanarolab.org/gosstoweb. The stand-alone application features a simple and concise command line interface for easy integration into high-throughput data processing pipelines.


Asunto(s)
Minería de Datos/métodos , Ontología de Genes , Internet , Semántica , Programas Informáticos , Proteínas/genética , Vocabulario Controlado
19.
Neural Netw ; 43: 84-98, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23500503

RESUMEN

Given a weighted graph and a partial node labeling, the graph classification problem consists in predicting the labels of all the nodes. In several application domains, from gene to social network analysis, the labeling is unbalanced: for instance positive labels may be much less than negatives. In this paper we present COSNet (COst Sensitive neural Network), a neural algorithm for predicting node labels in graphs with unbalanced labels. COSNet is based on a 2-parameter family of Hopfield networks, and consists of two main steps: (1) the network parameters are learned through a cost-sensitive optimization procedure; (2) a suitable Hopfield network restricted to the unlabeled nodes is considered and simulated. The reached equilibrium point induces the classification of the unlabeled nodes. The restriction of the dynamics leads to a significant reduction in time complexity and allows the algorithm to nicely scale with large networks. An experimental analysis on real-world unbalanced data, in the context of the genome-wide prediction of gene functions, shows the effectiveness of the proposed approach.


Asunto(s)
Algoritmos , Aprendizaje/fisiología , Redes Neurales de la Computación , Inteligencia Artificial , Estadística como Asunto
20.
Artículo en Inglés | MEDLINE | ID: mdl-21778526

RESUMEN

Gene selection methods aim at determining biologically relevant subsets of genes in DNA microarray experiments. However, their assessment and validation represent a major difficulty since the subset of biologically relevant genes is usually unknown. To solve this problem a novel procedure for generating biologically plausible synthetic gene expression data is proposed. It is based on a proper mathematical model representing gene expression signatures and expression profiles through Boolean threshold functions. The results show that the proposed procedure can be successfully adopted to analyze the quality of statistical and machine learning-based gene selection algorithms.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Perfilación de la Expresión Génica/normas , Modelos Genéticos , Simulación por Computador , Bases de Datos Factuales , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...