Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Sci Rep ; 12(1): 8206, 2022 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-35581358

RESUMO

Predicting the chemical properties of compounds is crucial in discovering novel materials and drugs with specific desired characteristics. Recent significant advances in machine learning technologies have enabled automatic predictive modeling from past experimental data reported in the literature. However, these datasets are often biased because of various reasons, such as experimental plans and publication decisions, and the prediction models trained using such biased datasets often suffer from over-fitting to the biased distributions and perform poorly on subsequent uses. Hence, this study focused on mitigating bias in the experimental datasets. We adopted two techniques from causal inference combined with graph neural networks that can represent molecular structures. The experimental results in four possible bias scenarios indicated that the inverse propensity scoring-based method and the counter-factual regression-based method made solid improvements.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Viés , Causalidade
2.
Sci Rep ; 11(1): 23648, 2021 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-34880365

RESUMO

Recently, research has been conducted to automatically control anesthesia using machine learning, with the aim of alleviating the shortage of anesthesiologists. In this study, we address the problem of predicting decisions made by anesthesiologists during surgery using machine learning; specifically, we formulate a decision making problem by increasing the flow rate at each time point in the continuous administration of analgesic remifentanil as a supervised binary classification problem. The experiments were conducted to evaluate the prediction performance using six machine learning models: logistic regression, support vector machine, random forest, LightGBM, artificial neural network, and long short-term memory (LSTM), using 210 case data collected during actual surgeries. The results demonstrated that when predicting the future increase in flow rate of remifentanil after 1 min, the model using LSTM was able to predict with scores of 0.659 for sensitivity, 0.732 for specificity, and 0.753 for ROC-AUC; this demonstrates the potential to predict the decisions made by anesthesiologists using machine learning. Furthermore, we examined the importance and contribution of the features of each model using Shapley additive explanations-a method for interpreting predictions made by machine learning models. The trends indicated by the results were partially consistent with known clinical findings.


Assuntos
Anestésicos/administração & dosagem , Aprendizado de Máquina , Anestesiologistas/psicologia , Tomada de Decisões , Humanos
3.
BMC Bioinformatics ; 21(Suppl 3): 94, 2020 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-32321421

RESUMO

BACKGROUND: Predicting of chemical compounds is one of the fundamental tasks in bioinformatics and chemoinformatics, because it contributes to various applications in metabolic engineering and drug discovery. The recent rapid growth of the amount of available data has enabled applications of computational approaches such as statistical modeling and machine learning method. Both a set of chemical interactions and chemical compound structures are represented as graphs, and various graph-based approaches including graph convolutional neural networks have been successfully applied to chemical network prediction. However, there was no efficient method that can consider the two different types of graphs in an end-to-end manner. RESULTS: We give a new formulation of the chemical network prediction problem as a link prediction problem in a graph of graphs (GoG) which can represent the hierarchical structure consisting of compound graphs and an inter-compound graph. We propose a new graph convolutional neural network architecture called dual graph convolutional network that learns compound representations from both the compound graphs and the inter-compound network in an end-to-end manner. CONCLUSIONS: Experiments using four chemical networks with different sparsity levels and degree distributions shows that our dual graph convolution approach achieves high prediction performance in relatively dense networks, while the performance becomes inferior on extremely-sparse networks.


Assuntos
Biologia Computacional/métodos , Gráficos por Computador , Modelos Químicos , Redes Neurais de Computação , Descoberta de Drogas
4.
Genes Genet Syst ; 95(1): 43-50, 2020 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-32213716

RESUMO

Recently, the prospect of applying machine learning tools for automating the process of annotation analysis of large-scale sequences from next-generation sequencers has raised the interest of researchers. However, finding research collaborators with knowledge of machine learning techniques is difficult for many experimental life scientists. One solution to this problem is to utilise the power of crowdsourcing. In this report, we describe how we investigated the potential of crowdsourced modelling for a life science task by conducting a machine learning competition, the DNA Data Bank of Japan (DDBJ) Data Analysis Challenge. In the challenge, participants predicted chromatin feature annotations from DNA sequences with competing models. The challenge engaged 38 participants, with a cumulative total of 360 model submissions. The performance of the top model resulted in an area under the curve (AUC) score of 0.95. Over the course of the competition, the overall performance of the submitted models improved by an AUC score of 0.30 from the first submitted model. Furthermore, the 1st- and 2nd-ranking models utilised external data such as genomic location and gene annotation information with specific domain knowledge. The effect of incorporating this domain knowledge led to improvements of approximately 5%-9%, as measured by the AUC scores. This report suggests that machine learning competitions will lead to the development of highly accurate machine learning models for use by experimental scientists unfamiliar with the complexities of data science.


Assuntos
Arabidopsis/genética , Cromatina/genética , Bases de Dados de Ácidos Nucleicos , Genoma de Planta/genética , Aprendizado de Máquina , Biologia Computacional , Crowdsourcing , Análise de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Japão , Anotação de Sequência Molecular
5.
J Mol Graph Model ; 80: 217-223, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29414041

RESUMO

Synthetic accessibility evaluation is a process to assess the ease of synthesis of compounds. A rapid method for the assessment of synthetic accessibility for a vast number of chemical compounds is expected to bring about a breakthrough in the drug discovery. Although several computational methods have been proposed, the compound evaluation has still been processed by medicinal chemists; however, the low throughput of the human evaluation due to the lack of chemists is a critical issue for handling a large number of compounds. We propose the use of crowdsourcing for addressing this problem, and we conducted experiments to investigate the feasibility of incorporating semi-experts and a statistical aggregation method into the synthetic accessibility evaluation. Our experimental results show that we can obtain accurate synthetic accessibility scores through the statistical aggregation of judgments from semi-experts.


Assuntos
Desenho de Fármacos , Modelos Químicos , Algoritmos , Humanos
6.
Sci Rep ; 5: 8953, 2015 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-25989741

RESUMO

Well-trained clinicians may be able to provide diagnosis and prognosis from very short biomarker series using information and experience gained from previous patients. Although mathematical methods can potentially help clinicians to predict the progression of diseases, there is no method so far that estimates the patient state from very short time-series of a biomarker for making diagnosis and/or prognosis by employing the information of previous patients. Here, we propose a mathematical framework for integrating other patients' datasets to infer and predict the state of the disease in the current patient based on their short history. We extend a machine-learning framework of "prediction with expert advice" to deal with unstable dynamics. We construct this mathematical framework by combining expert advice with a mathematical model of prostate cancer. Our model predicted well the individual biomarker series of patients with prostate cancer that are used as clinical samples.


Assuntos
Algoritmos , Biomarcadores , Progressão da Doença , Modelos Teóricos , Humanos
7.
J Med Internet Res ; 17(1): e2, 2015 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-25630348

RESUMO

BACKGROUND: The prevalence of non-communicable diseases is increasing throughout the world, including developing countries. OBJECTIVE: The intent was to conduct a study of a preventive medical service in a developing country, combining eHealth checkups and teleconsultation as well as assess stratification rules and the short-term effects of intervention. METHODS: We developed an eHealth system that comprises a set of sensor devices in an attaché case, a data transmission system linked to a mobile network, and a data management application. We provided eHealth checkups for the populations of five villages and the employees of five factories/offices in Bangladesh. Individual health condition was automatically categorized into four grades based on international diagnostic standards: green (healthy), yellow (caution), orange (affected), and red (emergent). We provided teleconsultation for orange- and red-grade subjects and we provided teleprescription for these subjects as required. RESULTS: The first checkup was provided to 16,741 subjects. After one year, 2361 subjects participated in the second checkup and the systolic blood pressure of these subjects was significantly decreased from an average of 121 mmHg to an average of 116 mmHg (P<.001). Based on these results, we propose a cost-effective method using a machine learning technique (random forest method) using the medical interview, subject profiles, and checkup results as predictor to avoid costly measurements of blood sugar, to ensure sustainability of the program in developing countries. CONCLUSIONS: The results of this study demonstrate the benefits of an eHealth checkup and teleconsultation program as an effective health care system in developing countries.


Assuntos
Doença Crônica/prevenção & controle , Países em Desenvolvimento , Medicina Preventiva/métodos , Consulta Remota , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Criança , Atenção à Saúde , Prescrição Eletrônica , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Consulta Remota/instrumentação , Fatores de Risco , Telemedicina , Adulto Jovem
8.
J Mol Graph Model ; 29(3): 492-7, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20965757

RESUMO

Accurate prediction of protein-ligand binding affinities for lead optimization in drug discovery remains an important and challenging problem on scoring functions for docking simulation. In this paper, we propose a data-driven approach that integrates multiple scoring functions to predict protein-ligand binding affinity directly. We then propose a new method called multiple instance regression based scoring (MIRS) that incorporates unbound ligand conformations using multiple scoring functions. We evaluated the predictive performance of MIRS using 100 protein-ligand complexes and their binding affinities. The experimental results showed that MIRS outperformed the 11 conventional scoring functions including LigScore, PLP, AutoDock, G-Score, D-Score, LUDI, F-Score, ChemScore, X-Score, PMF, and DrugScore. In addition, we confirmed that MIRS performed well on binding pose prediction. Our results reveal that it is indispensable to incorporate unbound ligand conformations in both binding affinity prediction and binding pose prediction. The proposed method will accelerate efficient lead optimization on structure-based drug design and provide a new direction to designing of new scoring score functions.


Assuntos
Simulação por Computador , Ligantes , Ligação Proteica , Biologia Computacional/métodos , Descoberta de Drogas , Modelos Moleculares , Conformação Molecular , Estrutura Molecular , Análise de Regressão , Termodinâmica
9.
BMC Bioinformatics ; 11: 350, 2010 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-20584269

RESUMO

BACKGROUND: High-throughput methods for detecting protein-protein interactions enable us to obtain large interaction networks, and also allow us to computationally identify the associations of proteins as protein complexes. Although there are methods to extract protein complexes as sets of proteins from interaction networks, the extracted complexes may include false positives because they do not account for the structural limitations of the proteins and thus do not check that the proteins in the extracted complex can simultaneously bind to each other. In addition, there have been few searches for deeper insights into the protein complexes, such as of the topology of the protein-protein interactions or into the domain-domain interactions that mediate the protein interactions. RESULTS: Here, we introduce a combinatorial approach for prediction of protein complexes focusing not only on determining member proteins in complexes but also on the DDI/PPI organization of the complexes. Our method analyzes complex candidates predicted by the existing methods. It searches for optimal combinations of domain-domain interactions in the candidates based on an assumption that the proteins in a candidate can form a true protein complex if each of the domains is used by a single protein interaction. This optimization problem was mathematically formulated and solved using binary integer linear programming. By using publicly available sets of yeast protein-protein interactions and domain-domain interactions, we succeeded in extracting protein complex candidates with an accuracy that is twice the average accuracy of the existing methods, MCL, MCODE, or clustering coefficient. Although the configuring parameters for each algorithm resulted in slightly improved precisions, our method always showed better precision for most values of the parameters. CONCLUSIONS: Our combinatorial approach can provide better accuracy for prediction of protein complexes and also enables to identify both direct PPIs and DDIs that mediate them in complexes.


Assuntos
Algoritmos , Complexos Multiproteicos/química , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Proteínas/metabolismo , Análise por Conglomerados , Programação Linear , Técnicas do Sistema de Duplo-Híbrido
10.
BMC Bioinformatics ; 11 Suppl 1: S31, 2010 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-20122204

RESUMO

BACKGROUND: Understanding of secondary metabolic pathway in plant is essential for finding druggable candidate enzymes. However, there are many enzymes whose functions are not yet discovered in organism-specific metabolic pathways. Towards identifying the functions of those enzymes, assignment of EC numbers to the enzymatic reactions they catalyze plays a key role, since EC numbers represent the categorization of enzymes on one hand, and the categorization of enzymatic reactions on the other hand. RESULTS: We propose reaction graph kernels for automatically assigning EC numbers to unknown enzymatic reactions in a metabolic network. Reaction graph kernels compute similarity between two chemical reactions considering the similarity of chemical compounds in reaction and their relationships. In computational experiments based on the KEGG/REACTION database, our method successfully predicted the first three digits of the EC number with 83% accuracy. We also exhaustively predicted missing EC numbers in plant's secondary metabolism pathway. The prediction results of reaction graph kernels on 36 unknown enzymatic reactions are compared with an expert's knowledge. Using the same data for evaluation, we compared our method with E-zyme, and showed its ability to assign more number of accurate EC numbers. CONCLUSION: Reaction graph kernels are a new metric for comparing enzymatic reactions.


Assuntos
Biologia Computacional/métodos , Enzimas/metabolismo , Plantas/metabolismo , Bases de Dados Factuais
11.
Bioinformatics ; 25(22): 2962-8, 2009 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-19689962

RESUMO

MOTIVATION: The existing supervised methods for biological network inference work on each of the networks individually based only on intra-species information such as gene expression data. We believe that it will be more effective to use genomic data and cross-species evolutionary information from different species simultaneously, rather than to use the genomic data alone. RESULTS: We created a new semi-supervised learning method called Link Propagation for inferring biological networks of multiple species based on genome-wide data and evolutionary information. The new method was applied to simultaneous reconstruction of three metabolic networks of Caenorhabditis elegans, Helicobacter pylori and Saccharomyces cerevisiae, based on gene expression similarities and amino acid sequence similarities. The experimental results proved that the new simultaneous network inference method consistently improves the predictive performance over the individual network inferences, and it also outperforms in accuracy and speed other established methods such as the pairwise support vector machine. AVAILABILITY: The software and data are available at http://cbio.ensmp.fr/~yyamanishi/LinkPropagation/.


Assuntos
Evolução Biológica , Biologia Computacional/métodos , Redes Reguladoras de Genes/genética , Genoma , Redes e Vias Metabólicas/genética , Animais , Caenorhabditis elegans/genética , Helicobacter pylori/genética , Saccharomyces cerevisiae/genética
12.
Genome Inform ; 17(2): 25-34, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17503376

RESUMO

We propose a novel general-purpose tree kernel and apply it to glycan structure analysis. Our kernel measures the similarity between two labeled trees by counting the number of common q-length substrings (tree q-grams) embedded in the trees for all possible lengths q. We apply our tree kernel using a support vector machine (SVM) to classification and specific feature extraction from glycan structure data. Our results show that our kernel outperforms the layered trimer kernel of Hizukuri et al. which is well tailored to glycan data while we do not adjust our kernel to glycan-specific properties. In addition, we extract specific features from various types of glycan data using our trained SVM. The results show that our kernel is more flexible and capable of finding a wider variety of substructures from glycan data.


Assuntos
Polissacarídeos/análise , Análise de Sequência de Proteína/métodos , Algoritmos , Motivos de Aminoácidos , Inteligência Artificial , Biomarcadores , Sequência de Carboidratos , Bases de Dados de Proteínas , Monossacarídeos/química , Polissacarídeos/química , Polissacarídeos/classificação
13.
Bioinformatics ; 20(1): 29-39, 2004 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-14693805

RESUMO

MOTIVATION: Clustering sequences of a full-length cDNA library into alternative splice form candidates is a very important problem. RESULTS: We developed a new efficient algorithm to cluster sequences of a full-length cDNA library into alternative splice form candidates. Current clustering algorithms for cDNAs tend to produce too many clusters containing incorrect splice form candidates. Our algorithm is based on a spliced sequence alignment algorithm that considers splice sites. The spliced sequence alignment algorithm is a variant of an ordinary dynamic programming algorithm, which requires O(nm) time for checking a pair of sequences where n and m are the lengths of the two sequences. Since the time bound is too large to perform all-pair comparison for a large set of sequences, we developed new techniques to reduce the computation time without affecting the accuracy of the output clusters. Our algorithm was applied to 21 076 mouse cDNA sequences of the FANTOM 1.10 database to examine its performance and accuracy. In these experiments, we achieved about 2-12-fold speedup against a method using only a traditional hash-based technique. Moreover, without using any information of the mouse genome sequence data or any gene data in public databases, we succeeded in listing 87-89% of all the clusters that biologists have annotated manually. AVAILABILITY: We provide a web service for cDNA clustering located at https://access.obigrid.org/ibm/cluspa/, for which registration for the OBIGrid (http://www.obigrid.org) is required.


Assuntos
Algoritmos , Análise por Conglomerados , DNA Complementar/classificação , DNA Complementar/genética , DNA Recombinante/genética , Perfilação da Expressão Gênica/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Animais , Sequência de Bases , Bases de Dados de Ácidos Nucleicos , Biblioteca Gênica , Genoma , Camundongos , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão , Controle de Qualidade , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Homologia de Sequência do Ácido Nucleico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA