Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 14 de 14
Filtrer
Plus de filtres










Base de données
Gamme d'année
1.
Sci Rep ; 14(1): 18105, 2024 08 05.
Article de Anglais | MEDLINE | ID: mdl-39103384

RÉSUMÉ

In complex systems, it's crucial to uncover latent mechanisms and their context-dependent relationships. This is especially true in medical research, where identifying unknown cancer mechanisms and their impact on phenomena like drug resistance is vital. Directly observing these mechanisms is challenging due to measurement complexities, leading to an approach that infers latent mechanisms from observed variable distributions. Despite machine learning advancements enabling sophisticated generative models, their black-box nature complicates the interpretation of complex latent mechanisms. A promising method for understanding these mechanisms involves estimating latent factors through linear projection, though there's no assurance that inferences made under specific conditions will remain valid across contexts. We propose a novel solution, suggesting data, even from systems appearing complex, can often be explained by sparse dependencies among a few common latent factors, regardless of the situation. This simplification allows for modeling that yields significant insights across diverse fields. We demonstrate this with datasets from finance, where we capture societal trends from stock price movements, and medicine, where we uncover new insights into cancer drug resistance through gene expression analysis.


Sujet(s)
Tumeurs , Humains , Tumeurs/génétique , Tumeurs/métabolisme , Apprentissage machine , Résistance aux médicaments antinéoplasiques
2.
Sci Rep ; 12(1): 8206, 2022 05 17.
Article de Anglais | MEDLINE | ID: mdl-35581358

RÉSUMÉ

Predicting the chemical properties of compounds is crucial in discovering novel materials and drugs with specific desired characteristics. Recent significant advances in machine learning technologies have enabled automatic predictive modeling from past experimental data reported in the literature. However, these datasets are often biased because of various reasons, such as experimental plans and publication decisions, and the prediction models trained using such biased datasets often suffer from over-fitting to the biased distributions and perform poorly on subsequent uses. Hence, this study focused on mitigating bias in the experimental datasets. We adopted two techniques from causal inference combined with graph neural networks that can represent molecular structures. The experimental results in four possible bias scenarios indicated that the inverse propensity scoring-based method and the counter-factual regression-based method made solid improvements.


Sujet(s)
Apprentissage machine , , Biais (épidémiologie) , Causalité
3.
Sci Rep ; 11(1): 23648, 2021 12 08.
Article de Anglais | MEDLINE | ID: mdl-34880365

RÉSUMÉ

Recently, research has been conducted to automatically control anesthesia using machine learning, with the aim of alleviating the shortage of anesthesiologists. In this study, we address the problem of predicting decisions made by anesthesiologists during surgery using machine learning; specifically, we formulate a decision making problem by increasing the flow rate at each time point in the continuous administration of analgesic remifentanil as a supervised binary classification problem. The experiments were conducted to evaluate the prediction performance using six machine learning models: logistic regression, support vector machine, random forest, LightGBM, artificial neural network, and long short-term memory (LSTM), using 210 case data collected during actual surgeries. The results demonstrated that when predicting the future increase in flow rate of remifentanil after 1 min, the model using LSTM was able to predict with scores of 0.659 for sensitivity, 0.732 for specificity, and 0.753 for ROC-AUC; this demonstrates the potential to predict the decisions made by anesthesiologists using machine learning. Furthermore, we examined the importance and contribution of the features of each model using Shapley additive explanations-a method for interpreting predictions made by machine learning models. The trends indicated by the results were partially consistent with known clinical findings.


Sujet(s)
Anesthésiques/administration et posologie , Apprentissage machine , Anesthésiologistes/psychologie , Prise de décision , Humains
4.
BMC Bioinformatics ; 21(Suppl 3): 94, 2020 Apr 23.
Article de Anglais | MEDLINE | ID: mdl-32321421

RÉSUMÉ

BACKGROUND: Predicting of chemical compounds is one of the fundamental tasks in bioinformatics and chemoinformatics, because it contributes to various applications in metabolic engineering and drug discovery. The recent rapid growth of the amount of available data has enabled applications of computational approaches such as statistical modeling and machine learning method. Both a set of chemical interactions and chemical compound structures are represented as graphs, and various graph-based approaches including graph convolutional neural networks have been successfully applied to chemical network prediction. However, there was no efficient method that can consider the two different types of graphs in an end-to-end manner. RESULTS: We give a new formulation of the chemical network prediction problem as a link prediction problem in a graph of graphs (GoG) which can represent the hierarchical structure consisting of compound graphs and an inter-compound graph. We propose a new graph convolutional neural network architecture called dual graph convolutional network that learns compound representations from both the compound graphs and the inter-compound network in an end-to-end manner. CONCLUSIONS: Experiments using four chemical networks with different sparsity levels and degree distributions shows that our dual graph convolution approach achieves high prediction performance in relatively dense networks, while the performance becomes inferior on extremely-sparse networks.


Sujet(s)
Biologie informatique/méthodes , Infographie , Modèles chimiques , , Découverte de médicament
5.
Genes Genet Syst ; 95(1): 43-50, 2020 Apr 22.
Article de Anglais | MEDLINE | ID: mdl-32213716

RÉSUMÉ

Recently, the prospect of applying machine learning tools for automating the process of annotation analysis of large-scale sequences from next-generation sequencers has raised the interest of researchers. However, finding research collaborators with knowledge of machine learning techniques is difficult for many experimental life scientists. One solution to this problem is to utilise the power of crowdsourcing. In this report, we describe how we investigated the potential of crowdsourced modelling for a life science task by conducting a machine learning competition, the DNA Data Bank of Japan (DDBJ) Data Analysis Challenge. In the challenge, participants predicted chromatin feature annotations from DNA sequences with competing models. The challenge engaged 38 participants, with a cumulative total of 360 model submissions. The performance of the top model resulted in an area under the curve (AUC) score of 0.95. Over the course of the competition, the overall performance of the submitted models improved by an AUC score of 0.30 from the first submitted model. Furthermore, the 1st- and 2nd-ranking models utilised external data such as genomic location and gene annotation information with specific domain knowledge. The effect of incorporating this domain knowledge led to improvements of approximately 5%-9%, as measured by the AUC scores. This report suggests that machine learning competitions will lead to the development of highly accurate machine learning models for use by experimental scientists unfamiliar with the complexities of data science.


Sujet(s)
Arabidopsis/génétique , Chromatine/génétique , Bases de données d'acides nucléiques , Génome végétal/génétique , Apprentissage machine , Biologie informatique , Externalisation ouverte , Analyse de données , Séquençage nucléotidique à haut débit , Japon , Annotation de séquence moléculaire
6.
J Mol Graph Model ; 80: 217-223, 2018 03.
Article de Anglais | MEDLINE | ID: mdl-29414041

RÉSUMÉ

Synthetic accessibility evaluation is a process to assess the ease of synthesis of compounds. A rapid method for the assessment of synthetic accessibility for a vast number of chemical compounds is expected to bring about a breakthrough in the drug discovery. Although several computational methods have been proposed, the compound evaluation has still been processed by medicinal chemists; however, the low throughput of the human evaluation due to the lack of chemists is a critical issue for handling a large number of compounds. We propose the use of crowdsourcing for addressing this problem, and we conducted experiments to investigate the feasibility of incorporating semi-experts and a statistical aggregation method into the synthetic accessibility evaluation. Our experimental results show that we can obtain accurate synthetic accessibility scores through the statistical aggregation of judgments from semi-experts.


Sujet(s)
Conception de médicament , Modèles chimiques , Algorithmes , Humains
7.
Sci Rep ; 5: 8953, 2015 May 20.
Article de Anglais | MEDLINE | ID: mdl-25989741

RÉSUMÉ

Well-trained clinicians may be able to provide diagnosis and prognosis from very short biomarker series using information and experience gained from previous patients. Although mathematical methods can potentially help clinicians to predict the progression of diseases, there is no method so far that estimates the patient state from very short time-series of a biomarker for making diagnosis and/or prognosis by employing the information of previous patients. Here, we propose a mathematical framework for integrating other patients' datasets to infer and predict the state of the disease in the current patient based on their short history. We extend a machine-learning framework of "prediction with expert advice" to deal with unstable dynamics. We construct this mathematical framework by combining expert advice with a mathematical model of prostate cancer. Our model predicted well the individual biomarker series of patients with prostate cancer that are used as clinical samples.


Sujet(s)
Algorithmes , Marqueurs biologiques , Évolution de la maladie , Modèles théoriques , Humains
8.
J Med Internet Res ; 17(1): e2, 2015 Jan 28.
Article de Anglais | MEDLINE | ID: mdl-25630348

RÉSUMÉ

BACKGROUND: The prevalence of non-communicable diseases is increasing throughout the world, including developing countries. OBJECTIVE: The intent was to conduct a study of a preventive medical service in a developing country, combining eHealth checkups and teleconsultation as well as assess stratification rules and the short-term effects of intervention. METHODS: We developed an eHealth system that comprises a set of sensor devices in an attaché case, a data transmission system linked to a mobile network, and a data management application. We provided eHealth checkups for the populations of five villages and the employees of five factories/offices in Bangladesh. Individual health condition was automatically categorized into four grades based on international diagnostic standards: green (healthy), yellow (caution), orange (affected), and red (emergent). We provided teleconsultation for orange- and red-grade subjects and we provided teleprescription for these subjects as required. RESULTS: The first checkup was provided to 16,741 subjects. After one year, 2361 subjects participated in the second checkup and the systolic blood pressure of these subjects was significantly decreased from an average of 121 mmHg to an average of 116 mmHg (P<.001). Based on these results, we propose a cost-effective method using a machine learning technique (random forest method) using the medical interview, subject profiles, and checkup results as predictor to avoid costly measurements of blood sugar, to ensure sustainability of the program in developing countries. CONCLUSIONS: The results of this study demonstrate the benefits of an eHealth checkup and teleconsultation program as an effective health care system in developing countries.


Sujet(s)
Maladie chronique/prévention et contrôle , Pays en voie de développement , Médecine préventive/méthodes , Consultation à distance , Adolescent , Adulte , Sujet âgé , Sujet âgé de 80 ans ou plus , Enfant , Prestations des soins de santé , Prescription électronique , Femelle , Humains , Mâle , Adulte d'âge moyen , Consultation à distance/instrumentation , Facteurs de risque , Télémédecine , Jeune adulte
9.
J Mol Graph Model ; 29(3): 492-7, 2010 Nov.
Article de Anglais | MEDLINE | ID: mdl-20965757

RÉSUMÉ

Accurate prediction of protein-ligand binding affinities for lead optimization in drug discovery remains an important and challenging problem on scoring functions for docking simulation. In this paper, we propose a data-driven approach that integrates multiple scoring functions to predict protein-ligand binding affinity directly. We then propose a new method called multiple instance regression based scoring (MIRS) that incorporates unbound ligand conformations using multiple scoring functions. We evaluated the predictive performance of MIRS using 100 protein-ligand complexes and their binding affinities. The experimental results showed that MIRS outperformed the 11 conventional scoring functions including LigScore, PLP, AutoDock, G-Score, D-Score, LUDI, F-Score, ChemScore, X-Score, PMF, and DrugScore. In addition, we confirmed that MIRS performed well on binding pose prediction. Our results reveal that it is indispensable to incorporate unbound ligand conformations in both binding affinity prediction and binding pose prediction. The proposed method will accelerate efficient lead optimization on structure-based drug design and provide a new direction to designing of new scoring score functions.


Sujet(s)
Simulation numérique , Ligands , Liaison aux protéines , Biologie informatique/méthodes , Découverte de médicament , Modèles moléculaires , Conformation moléculaire , Structure moléculaire , Analyse de régression , Thermodynamique
10.
BMC Bioinformatics ; 11: 350, 2010 Jun 28.
Article de Anglais | MEDLINE | ID: mdl-20584269

RÉSUMÉ

BACKGROUND: High-throughput methods for detecting protein-protein interactions enable us to obtain large interaction networks, and also allow us to computationally identify the associations of proteins as protein complexes. Although there are methods to extract protein complexes as sets of proteins from interaction networks, the extracted complexes may include false positives because they do not account for the structural limitations of the proteins and thus do not check that the proteins in the extracted complex can simultaneously bind to each other. In addition, there have been few searches for deeper insights into the protein complexes, such as of the topology of the protein-protein interactions or into the domain-domain interactions that mediate the protein interactions. RESULTS: Here, we introduce a combinatorial approach for prediction of protein complexes focusing not only on determining member proteins in complexes but also on the DDI/PPI organization of the complexes. Our method analyzes complex candidates predicted by the existing methods. It searches for optimal combinations of domain-domain interactions in the candidates based on an assumption that the proteins in a candidate can form a true protein complex if each of the domains is used by a single protein interaction. This optimization problem was mathematically formulated and solved using binary integer linear programming. By using publicly available sets of yeast protein-protein interactions and domain-domain interactions, we succeeded in extracting protein complex candidates with an accuracy that is twice the average accuracy of the existing methods, MCL, MCODE, or clustering coefficient. Although the configuring parameters for each algorithm resulted in slightly improved precisions, our method always showed better precision for most values of the parameters. CONCLUSIONS: Our combinatorial approach can provide better accuracy for prediction of protein complexes and also enables to identify both direct PPIs and DDIs that mediate them in complexes.


Sujet(s)
Algorithmes , Complexes multiprotéiques/composition chimique , Motifs et domaines d'intéraction protéique , Cartographie d'interactions entre protéines/méthodes , Protéines/composition chimique , Protéines/métabolisme , Analyse de regroupements , Programmation linéaire , Techniques de double hybride
11.
BMC Bioinformatics ; 11 Suppl 1: S31, 2010 Jan 18.
Article de Anglais | MEDLINE | ID: mdl-20122204

RÉSUMÉ

BACKGROUND: Understanding of secondary metabolic pathway in plant is essential for finding druggable candidate enzymes. However, there are many enzymes whose functions are not yet discovered in organism-specific metabolic pathways. Towards identifying the functions of those enzymes, assignment of EC numbers to the enzymatic reactions they catalyze plays a key role, since EC numbers represent the categorization of enzymes on one hand, and the categorization of enzymatic reactions on the other hand. RESULTS: We propose reaction graph kernels for automatically assigning EC numbers to unknown enzymatic reactions in a metabolic network. Reaction graph kernels compute similarity between two chemical reactions considering the similarity of chemical compounds in reaction and their relationships. In computational experiments based on the KEGG/REACTION database, our method successfully predicted the first three digits of the EC number with 83% accuracy. We also exhaustively predicted missing EC numbers in plant's secondary metabolism pathway. The prediction results of reaction graph kernels on 36 unknown enzymatic reactions are compared with an expert's knowledge. Using the same data for evaluation, we compared our method with E-zyme, and showed its ability to assign more number of accurate EC numbers. CONCLUSION: Reaction graph kernels are a new metric for comparing enzymatic reactions.


Sujet(s)
Biologie informatique/méthodes , Enzymes/métabolisme , Plantes/métabolisme , Bases de données factuelles
12.
Bioinformatics ; 25(22): 2962-8, 2009 Nov 15.
Article de Anglais | MEDLINE | ID: mdl-19689962

RÉSUMÉ

MOTIVATION: The existing supervised methods for biological network inference work on each of the networks individually based only on intra-species information such as gene expression data. We believe that it will be more effective to use genomic data and cross-species evolutionary information from different species simultaneously, rather than to use the genomic data alone. RESULTS: We created a new semi-supervised learning method called Link Propagation for inferring biological networks of multiple species based on genome-wide data and evolutionary information. The new method was applied to simultaneous reconstruction of three metabolic networks of Caenorhabditis elegans, Helicobacter pylori and Saccharomyces cerevisiae, based on gene expression similarities and amino acid sequence similarities. The experimental results proved that the new simultaneous network inference method consistently improves the predictive performance over the individual network inferences, and it also outperforms in accuracy and speed other established methods such as the pairwise support vector machine. AVAILABILITY: The software and data are available at http://cbio.ensmp.fr/~yyamanishi/LinkPropagation/.


Sujet(s)
Évolution biologique , Biologie informatique/méthodes , Réseaux de régulation génique/génétique , Génome , Voies et réseaux métaboliques/génétique , Animaux , Caenorhabditis elegans/génétique , Helicobacter pylori/génétique , Saccharomyces cerevisiae/génétique
13.
Genome Inform ; 17(2): 25-34, 2006.
Article de Anglais | MEDLINE | ID: mdl-17503376

RÉSUMÉ

We propose a novel general-purpose tree kernel and apply it to glycan structure analysis. Our kernel measures the similarity between two labeled trees by counting the number of common q-length substrings (tree q-grams) embedded in the trees for all possible lengths q. We apply our tree kernel using a support vector machine (SVM) to classification and specific feature extraction from glycan structure data. Our results show that our kernel outperforms the layered trimer kernel of Hizukuri et al. which is well tailored to glycan data while we do not adjust our kernel to glycan-specific properties. In addition, we extract specific features from various types of glycan data using our trained SVM. The results show that our kernel is more flexible and capable of finding a wider variety of substructures from glycan data.


Sujet(s)
Polyosides/analyse , Analyse de séquence de protéine/méthodes , Algorithmes , Motifs d'acides aminés , Intelligence artificielle , Marqueurs biologiques , Séquence glucidique , Bases de données de protéines , Oses/composition chimique , Polyosides/composition chimique , Polyosides/classification
14.
Bioinformatics ; 20(1): 29-39, 2004 Jan 01.
Article de Anglais | MEDLINE | ID: mdl-14693805

RÉSUMÉ

MOTIVATION: Clustering sequences of a full-length cDNA library into alternative splice form candidates is a very important problem. RESULTS: We developed a new efficient algorithm to cluster sequences of a full-length cDNA library into alternative splice form candidates. Current clustering algorithms for cDNAs tend to produce too many clusters containing incorrect splice form candidates. Our algorithm is based on a spliced sequence alignment algorithm that considers splice sites. The spliced sequence alignment algorithm is a variant of an ordinary dynamic programming algorithm, which requires O(nm) time for checking a pair of sequences where n and m are the lengths of the two sequences. Since the time bound is too large to perform all-pair comparison for a large set of sequences, we developed new techniques to reduce the computation time without affecting the accuracy of the output clusters. Our algorithm was applied to 21 076 mouse cDNA sequences of the FANTOM 1.10 database to examine its performance and accuracy. In these experiments, we achieved about 2-12-fold speedup against a method using only a traditional hash-based technique. Moreover, without using any information of the mouse genome sequence data or any gene data in public databases, we succeeded in listing 87-89% of all the clusters that biologists have annotated manually. AVAILABILITY: We provide a web service for cDNA clustering located at https://access.obigrid.org/ibm/cluspa/, for which registration for the OBIGrid (http://www.obigrid.org) is required.


Sujet(s)
Algorithmes , Analyse de regroupements , ADN complémentaire/classification , ADN complémentaire/génétique , ADN recombiné/génétique , Analyse de profil d'expression de gènes/méthodes , Alignement de séquences/méthodes , Analyse de séquence d'ADN/méthodes , Animaux , Séquence nucléotidique , Bases de données d'acides nucléiques , Banque de gènes , Génome , Souris , Données de séquences moléculaires , Reconnaissance automatique des formes , Contrôle de qualité , Reproductibilité des résultats , Sensibilité et spécificité , Similitude de séquences d'acides nucléiques
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE