Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
PLoS One ; 19(7): e0305654, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39024199

RESUMEN

Even though deep learning shows impressive results in several applications, its use on problems with High Dimensions and Low Sample Size, such as diagnosing rare diseases, leads to overfitting. One solution often proposed is feature selection. In deep learning, along with feature selection, network sparsification is also used to improve the results when dealing with high dimensions low sample size data. However, most of the time, they are tackled as separate problems. This paper proposes a new approach that integrates feature selection, based on sparsification, into the training process of a deep neural network. This approach uses a constrained biobjective gradient descent method. It provides a set of Pareto optimal neural networks that make a trade-off between network sparsity and model accuracy. Results on both artificial and real datasets show that using a constrained biobjective gradient descent increases the network sparsity without degrading the classification performances. With the proposed approach, on an artificial dataset, the feature selection score reached 0.97 with a sparsity score of 0.92 with an accuracy of 0.9. For the same accuracy, none of the other methods reached a feature score above 0.20 and sparsity score of 0.35. Finally, statistical tests validate the results obtained on all datasets.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Tamaño de la Muestra , Humanos , Algoritmos
2.
Bioinformatics ; 39(39 Suppl 1): i94-i102, 2023 06 30.
Artículo en Inglés | MEDLINE | ID: mdl-37387182

RESUMEN

MOTIVATION: The increasing availability of high-throughput omics data allows for considering a new medicine centered on individual patients. Precision medicine relies on exploiting these high-throughput data with machine-learning models, especially the ones based on deep-learning approaches, to improve diagnosis. Due to the high-dimensional small-sample nature of omics data, current deep-learning models end up with many parameters and have to be fitted with a limited training set. Furthermore, interactions between molecular entities inside an omics profile are not patient specific but are the same for all patients. RESULTS: In this article, we propose AttOmics, a new deep-learning architecture based on the self-attention mechanism. First, we decompose each omics profile into a set of groups, where each group contains related features. Then, by applying the self-attention mechanism to the set of groups, we can capture the different interactions specific to a patient. The results of different experiments carried out in this article show that our model can accurately predict the phenotype of a patient with fewer parameters than deep neural networks. Visualizing the attention maps can provide new insights into the essential groups for a particular phenotype. AVAILABILITY AND IMPLEMENTATION: The code and data are available at https://forge.ibisc.univ-evry.fr/abeaude/AttOmics. TCGA data can be downloaded from the Genomic Data Commons Data Portal.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Fenotipo , Medicina de Precisión
3.
PLoS One ; 18(5): e0286137, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37228138

RESUMEN

In the sea of data generated daily, unlabeled samples greatly outnumber labeled ones. This is due to the fact that, in many application areas, labels are scarce or hard to obtain. In addition, unlabeled samples might belong to new classes that are not available in the label set associated with data. In this context, we propose A3SOM, an abstained explainable semi-supervised neural network that associates a self-organizing map to dense layers in order to classify samples. Abstained classification enables the detection of new classes and class overlaps. The use of a self-organizing map in A3SOM allows integrated visualization and makes the model explainable. Along with describing our approach, this paper shows that the method is competitive with other classifiers and demonstrates the benefits of including abstention rules. A use case is presented on breast cancer subtype classification and discovery to show the relevance of our method in real-world medical problems.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Aprendizaje Automático Supervisado
4.
Nucleic Acids Res ; 51(W1): W281-W288, 2023 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-37158254

RESUMEN

Recent advances have shown that some biologically active non-coding RNAs (ncRNAs) are actually translated into polypeptides that have a physiological function as well. This paradigm shift requires adapted computational methods to predict this new class of 'bifunctional RNAs'. Previously, we developed IRSOM, an open-source algorithm to classify non-coding and coding RNAs. Here, we use the binary statistical model of IRSOM as a ternary classifier, called IRSOM2, to identify bifunctional RNAs as a rejection of the two other classes. We present its easy-to-use web interface, which allows users to perform predictions on large datasets of RNA sequences in a short time, to re-train the model with their own data, and to visualize and analyze the classification results thanks to the implementation of self-organizing maps (SOM). We also propose a new benchmark of experimentally validated RNAs that play both protein-coding and non-coding roles, in different organisms. Thus, IRSOM2 showed promising performance in detecting these bifunctional transcripts among ncRNAs of different types, such as circRNAs and lncRNAs (in particular those of shorter lengths). The web server is freely available on the EvryRNA platform: https://evryrna.ibisc.univ-evry.fr.


Asunto(s)
Algoritmos , Biología Computacional , Simulación por Computador , ARN , ARN Largo no Codificante/química , Análisis de Secuencia de ARN/métodos , Biología Computacional/instrumentación , Biología Computacional/métodos , ARN/química , ARN/clasificación , Internet
5.
BMC Bioinformatics ; 23(1): 262, 2022 Jul 03.
Artículo en Inglés | MEDLINE | ID: mdl-35786378

RESUMEN

BACKGROUND: Machine learning is now a standard tool for cancer prediction based on gene expression data. However, deep learning is still new for this task, and there is no clear consensus about its performance and utility. Few experimental works have evaluated deep neural networks and compared them with state-of-the-art machine learning. Moreover, their conclusions are not consistent. RESULTS: We extensively evaluate the deep learning approach on 22 cancer prediction tasks based on gene expression data. We measure the impact of the main hyper-parameters and compare the performances of neural networks with the state-of-the-art. We also investigate the effectiveness of several transfer learning schemes in different experimental setups. CONCLUSION: Based on our experimentations, we provide several recommendations to optimize the construction and training of a neural network model. We show that neural networks outperform the state-of-the-art methods only for very large training set size. For a small training set, we show that transfer learning is possible and may strongly improve the model performance in some cases.


Asunto(s)
Aprendizaje Profundo , Neoplasias , Expresión Génica , Humanos , Aprendizaje Automático , Neoplasias/genética , Redes Neurales de la Computación
6.
Bioinformatics ; 38(9): 2504-2511, 2022 04 28.
Artículo en Inglés | MEDLINE | ID: mdl-35266505

RESUMEN

MOTIVATION: Medical care is becoming more and more specific to patients' needs due to the increased availability of omics data. The application to these data of sophisticated machine learning models, in particular deep learning (DL), can improve the field of precision medicine. However, their use in clinics is limited as their predictions are not accompanied by an explanation. The production of accurate and intelligible predictions can benefit from the inclusion of domain knowledge. Therefore, knowledge-based DL models appear to be a promising solution. RESULTS: In this article, we propose GraphGONet, where the Gene Ontology is encapsulated in the hidden layers of a new self-explaining neural network. Each neuron in the layers represents a biological concept, combining the gene expression profile of a patient and the information from its neighboring neurons. The experiments described in the article confirm that our model not only performs as accurately as the state-of-the-art (non-explainable ones) but also automatically produces stable and intelligible explanations composed of the biological concepts with the highest contribution. This feature allows experts to use our tool in a medical setting. AVAILABILITY AND IMPLEMENTATION: GraphGONet is freely available at https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet.git. The microarray dataset is accessible from the ArrayExpress database under the identifier E-MTAB-3732. The TCGA datasets can be downloaded from the Genomic Data Commons (GDC) data portal. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Ontología de Genes , Fenotipo , Expresión Génica
7.
BMC Bioinformatics ; 22(Suppl 10): 455, 2021 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-34551707

RESUMEN

BACKGROUND: With the rapid advancement of genomic sequencing techniques, massive production of gene expression data is becoming possible, which prompts the development of precision medicine. Deep learning is a promising approach for phenotype prediction (clinical diagnosis, prognosis, and drug response) based on gene expression profile. Existing deep learning models are usually considered as black-boxes that provide accurate predictions but are not interpretable. However, accuracy and interpretation are both essential for precision medicine. In addition, most models do not integrate the knowledge of the domain. Hence, making deep learning models interpretable for medical applications using prior biological knowledge is the main focus of this paper. RESULTS: In this paper, we propose a new self-explainable deep learning model, called Deep GONet, integrating the Gene Ontology into the hierarchical architecture of the neural network. This model is based on a fully-connected architecture constrained by the Gene Ontology annotations, such that each neuron represents a biological function. The experiments on cancer diagnosis datasets demonstrate that Deep GONet is both easily interpretable and highly performant to discriminate cancer and non-cancer samples. CONCLUSIONS: Our model provides an explanation to its predictions by identifying the most important neurons and associating them with biological functions, making the model understandable for biologists and physicians.


Asunto(s)
Neoplasias , Redes Neurales de la Computación , Expresión Génica , Ontología de Genes , Humanos , Fenotipo
8.
BMC Bioinformatics ; 21(1): 501, 2020 Nov 04.
Artículo en Inglés | MEDLINE | ID: mdl-33148191

RESUMEN

BACKGROUND: The use of predictive gene signatures to assist clinical decision is becoming more and more important. Deep learning has a huge potential in the prediction of phenotype from gene expression profiles. However, neural networks are viewed as black boxes, where accurate predictions are provided without any explanation. The requirements for these models to become interpretable are increasing, especially in the medical field. RESULTS: We focus on explaining the predictions of a deep neural network model built from gene expression data. The most important neurons and genes influencing the predictions are identified and linked to biological knowledge. Our experiments on cancer prediction show that: (1) deep learning approach outperforms classical machine learning methods on large training sets; (2) our approach produces interpretations more coherent with biology than the state-of-the-art based approaches; (3) we can provide a comprehensive explanation of the predictions for biologists and physicians. CONCLUSION: We propose an original approach for biological interpretation of deep learning models for phenotype prediction from gene expression data. Since the model can find relationships between the phenotype and gene expression, we may assume that there is a link between the identified genes and the phenotype. The interpretation can, therefore, lead to new biological hypotheses to be investigated by biologists.


Asunto(s)
Regulación Neoplásica de la Expresión Génica , Neoplasias/patología , Redes Neurales de la Computación , Bases de Datos Genéticas , Humanos , Neoplasias/metabolismo
9.
Bioinformatics ; 34(17): i620-i628, 2018 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-30423081

RESUMEN

Motivation: Non-coding RNAs (ncRNAs) play important roles in many biological processes and are involved in many diseases. Their identification is an important task, and many tools exist in the literature for this purpose. However, almost all of them are focused on the discrimination of coding and ncRNAs without giving more biological insight. In this paper, we propose a new reliable method called IRSOM, based on a supervised Self-Organizing Map (SOM) with a rejection option, that overcomes these limitations. The rejection option in IRSOM improves the accuracy of the method and also allows identifing the ambiguous transcripts. Furthermore, with the visualization of the SOM, we analyze the rejected predictions and highlight the ambiguity of the transcripts. Results: IRSOM was tested on datasets of several species from different reigns, and shown better results compared to state-of-art. The accuracy of IRSOM is always greater than 0.95 for all the species with an average specificity of 0.98 and an average sensitivity of 0.99. Besides, IRSOM is fast (it takes around 254 s to analyze a dataset of 147 000 transcripts) and is able to handle very large datasets. Availability and implementation: IRSOM is implemented in Python and C++. It is available on our software platform EvryRNA (http://EvryRNA.ibisc.univ-evry.fr).


Asunto(s)
Algoritmos , ARN no Traducido/genética , Programas Informáticos
10.
PLoS One ; 12(6): e0179787, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28622364

RESUMEN

Many computational tools have been proposed during the two last decades for predicting piRNAs, which are molecules with important role in post-transcriptional gene regulation. However, these tools are mostly based on only one feature that is generally related to the sequence. Discoveries in the domain of piRNAs are still in their beginning stages, and recent publications have shown many new properties. Here, we propose an integrative approach for piRNA prediction in which several types of genomic and epigenomic properties that can be used to characterize these molecules are examined. We reviewed and extracted a large number of piRNA features from the literature that have been observed experimentally in several species. These features are represented by different kernels, in a Multiple Kernel Learning based approach, implemented within an object-oriented framework. The obtained tool, called IpiRId, shows prediction results that attain more than 90% of accuracy on different tested species (human, mouse and fly), outperforming all existing tools. Besides, our method makes it possible to study the validity of each given feature in a given species. Finally, the developed tool is modular and easily extensible, and can be adapted for predicting other types of ncRNAs. The IpiRId software and the user-friendly web-based server of our tool are now freely available to academic users at: https://evryrna.ibisc.univ-evry.fr/evryrna/.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Epigenómica , ARN Interferente Pequeño/genética , Análisis de Secuencia de ARN/métodos , Animales , Ratones
11.
RNA ; 21(5): 775-85, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25795417

RESUMEN

Identification of microRNAs (miRNAs) is an important step toward understanding post-transcriptional gene regulation and miRNA-related pathology. Difficulties in identifying miRNAs through experimental techniques combined with the huge amount of data from new sequencing technologies have made in silico discrimination of bona fide miRNA precursors from non-miRNA hairpin-like structures an important topic in bioinformatics. Among various techniques developed for this classification problem, machine learning approaches have proved to be the most promising. However these approaches require the use of training data, which is problematic due to an imbalance in the number of miRNAs (positive data) and non-miRNAs (negative data), which leads to a degradation of their performance. In order to address this issue, we present an ensemble method that uses a boosting technique with support vector machine components to deal with imbalanced training data. Classification is performed following a feature selection on 187 novel and existing features. The algorithm, miRBoost, performed better in comparison with state-of-the-art methods on imbalanced human and cross-species data. It also showed the highest ability among the tested methods for discovering novel miRNA precursors. In addition, miRBoost was over 1400 times faster than the second most accurate tool tested and was significantly faster than most of the other tools. miRBoost thus provides a good compromise between prediction efficiency and execution time, making it highly suitable for use in genome-wide miRNA precursor prediction. The software miRBoost is available on our web server http://EvryRNA.ibisc.univ-evry.fr.


Asunto(s)
Biología Computacional/métodos , MicroARNs/clasificación , Precursores del ARN/clasificación , Programas Informáticos , Máquina de Vectores de Soporte , Animales , Bases de Datos Genéticas , Humanos , Almacenamiento y Recuperación de la Información/métodos , MicroARNs/genética , Precursores del ARN/genética , Sensibilidad y Especificidad , Alineación de Secuencia/métodos
12.
Bioinformatics ; 31(5): 728-35, 2015 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-25355790

RESUMEN

MOTIVATION: Identifying the set of genes differentially expressed along time is an important task in two-sample time course experiments. Furthermore, estimating at which time periods the differential expression is present can provide additional insight into temporal gene functions. The current differential detection methods are designed to detect difference along observation time intervals or on single measurement points, warranting dense measurements along time to characterize the full temporal differential expression patterns. RESULTS: We propose a novel Bayesian likelihood ratio test to estimate the differential expression time periods. Applying the ratio test to systems of genes provides the temporal response timings and durations of gene expression to a biological condition. We introduce a novel non-stationary Gaussian process as the underlying expression model, with major improvements on model fitness on perturbation and stress experiments. The method is robust to uneven or sparse measurements along time. We assess the performance of the method on realistically simulated dataset and compare against state-of-the-art methods. We additionally apply the method to the analysis of primary human endothelial cells under an ionizing radiation stress to study the transcriptional perturbations over 283 measured genes in an attempt to better understand the role of endothelium in both normal and cancer tissues during radiotherapy. As a result, using the cascade of differential expression periods, domain literature and gene enrichment analysis, we gain insights into the dynamic response of endothelial cells to irradiation. AVAILABILITY AND IMPLEMENTATION: R package 'nsgp' is available at www.ibisc.fr/en/logiciels_arobas.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica , Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Radioterapia , Teorema de Bayes , Células Cultivadas , Relación Dosis-Respuesta en la Radiación , Células Endoteliales de la Vena Umbilical Humana/metabolismo , Células Endoteliales de la Vena Umbilical Humana/efectos de la radiación , Humanos , Neoplasias/radioterapia , Distribución Normal , Factores de Tiempo
13.
Bioinformatics ; 30(17): i364-70, 2014 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-25161221

RESUMEN

MOTIVATION: Piwi-interacting RNA (piRNA) is the most recently discovered and the least investigated class of Argonaute/Piwi protein-interacting small non-coding RNAs. The piRNAs are mostly known to be involved in protecting the genome from invasive transposable elements. But recent discoveries suggest their involvement in the pathophysiology of diseases, such as cancer. Their identification is therefore an important task, and computational methods are needed. However, the lack of conserved piRNA sequences and structural elements makes this identification challenging and difficult. RESULTS: In the present study, we propose a new modular and extensible machine learning method based on multiple kernels and a support vector machine (SVM) classifier for piRNA identification. Very few piRNA features are known to date. The use of a multiple kernels approach allows editing, adding or removing piRNA features that can be heterogeneous in a modular manner according to their relevance in a given species. Our algorithm is based on a combination of the previously identified features [sequence features (k-mer motifs and a uridine at the first position) and piRNAs cluster feature] and a new telomere/centromere vicinity feature. These features are heterogeneous, and the kernels allow to unify their representation. The proposed algorithm, named piRPred, gives promising results on Drosophila and Human data and outscores previously published piRNA identification algorithms. AVAILABILITY AND IMPLEMENTATION: piRPred is freely available to non-commercial users on our Web server EvryRNA http://EvryRNA.ibisc.univ-evry.fr.


Asunto(s)
Algoritmos , Inteligencia Artificial , ARN Interferente Pequeño/química , Análisis de Secuencia de ARN/métodos , Máquina de Vectores de Soporte , Animales , Drosophila/genética , Humanos , Programas Informáticos
14.
Artículo en Inglés | MEDLINE | ID: mdl-18003550

RESUMEN

We present progress on a comprehensive, modular, interactive modeling environment centered on overall regulation of blood pressure and body fluid homeostasis. We call the project SAPHIR, for "a Systems Approach for PHysiological Integration of Renal, cardiac, and respiratory functions". The project uses state-of-the-art multi-scale simulation methods. The basic core model will give succinct input-output (reduced-dimension) descriptions of all relevant organ systems and regulatory processes, and it will be modular, multi-resolution, and extensible, in the sense that detailed submodules of any process(es) can be "plugged-in" to the basic model in order to explore, eg. system-level implications of local perturbations. The goal is to keep the basic core model compact enough to insure fast execution time (in view of eventual use in the clinic) and yet to allow elaborate detailed modules of target tissues or organs in order to focus on the problem area while maintaining the system-level regulatory compensations.


Asunto(s)
Presión Sanguínea/fisiología , Líquidos Corporales/fisiología , Modelos Biológicos , Animales , Fenómenos Fisiológicos Cardiovasculares , Homeostasis , Humanos , Riñón/fisiología , Fenómenos Fisiológicos Respiratorios
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA