Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 3.489
Filtrar
1.
Accid Anal Prev ; 134: 105251, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31402051

RESUMO

Powered two-wheelers (PTWs) are growing globally each year as they are considered an attractive alternative to cars (flexible, small, affordable, fast and easy to park), especially on congested traffic situations. However, PTWs represent an important challenge for road safety. In fact, in 2016, Spain ranked fifth in terms of PTW fatalities among EU 28. For this reason, this paper aims to investigate which are the patterns among crash characteristics contributing to PTW crashes in Spain. Data from 78,611 crashes involving PTWs occurred in Spain in the period 2011-2013 were analyzed. The analysis was performed by using classification trees and rules discovery which are suitable models aimed at extracting knowledge and identifying valid and understandable patterns from large amounts of data previously unknown and indistinguishable. The response variables assessed in this study were severity and crash type. As a result, several combinations of road, environmental and drivers' characteristics associated with severity and typology of PTW crashes in Spain were identified. Based on the analysis results, several countermeasures to solve or mitigate the safety issues identified in the study were proposed. From the methodological point of view, study results show that both the classification trees and the a priori algorithm were effective in providing non-trivial and unsuspected relations in the data. Classification trees structure allowed a simpler understanding of the phenomenon under study while association discovery provided new information which was previously hidden in the data. Given that the results of the two different techniques were never contradictory, we recommend using classification trees and association discovery as complementary approaches since their combination is effective in exploring data providing meaningful insights about PTW crash characteristics and their interdependencies.


Assuntos
Acidentes de Trânsito/mortalidade , Mineração de Dados/métodos , Motocicletas/estatística & dados numéricos , Algoritmos , Humanos , Curva ROC , Espanha/epidemiologia
2.
Medicine (Baltimore) ; 98(52): e18493, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31876736

RESUMO

Bronchopulmonary dysplasia (BPD) is a common disease of premature infants with very low birth weight. The mechanism is inconclusive. The aim of this study is to systematically explore BPD-related genes and characterize their functions.Natural language processing analysis was used to identify BPD-related genes. Gene data were extracted from PubMed database. Gene ontology, pathway, and network analysis were carried out, and the result was integrated with corresponding database.In this study, 216 genes were identified as BPD-related genes with P < .05, and 30 pathways were identified as significant. A network of BPD-related genes was also constructed with 17 hub genes identified. In particular, phosphatidyl inositol-3-enzyme-serine/threonine kinase signaling pathway involved the largest number of genes. Insulin was found to be a promising candidate gene related with BPD, suggesting that it may serve as an effective therapeutic target.Our data may help to better understand the molecular mechanisms underlying BPD. However, the mechanisms of BPD are elusive, and further studies are needed.


Assuntos
Displasia Broncopulmonar/genética , Mineração de Dados , Algoritmos , Displasia Broncopulmonar/etiologia , Displasia Broncopulmonar/metabolismo , Biologia Computacional/métodos , Mineração de Dados/métodos , Ontologia Genética , Genes/genética , Genes/fisiologia , Predisposição Genética para Doença/genética , Humanos , Recém-Nascido , Redes e Vias Metabólicas/genética , Processamento de Linguagem Natural , Transdução de Sinais/genética
3.
BMC Bioinformatics ; 20(1): 534, 2019 Oct 29.
Artigo em Inglês | MEDLINE | ID: mdl-31664891

RESUMO

BACKGROUND: Biomedical literature concerns a wide range of concepts, requiring controlled vocabularies to maintain a consistent terminology across different research groups. However, as new concepts are introduced, biomedical literature is prone to ambiguity, specifically in fields that are advancing more rapidly, for example, drug design and development. Entity linking is a text mining task that aims at linking entities mentioned in the literature to concepts in a knowledge base. For example, entity linking can help finding all documents that mention the same concept and improve relation extraction methods. Existing approaches focus on the local similarity of each entity and the global coherence of all entities in a document, but do not take into account the semantics of the domain. RESULTS: We propose a method, PPR-SSM, to link entities found in documents to concepts from domain-specific ontologies. Our method is based on Personalized PageRank (PPR), using the relations of the ontology to generate a graph of candidate concepts for the mentioned entities. We demonstrate how the knowledge encoded in a domain-specific ontology can be used to calculate the coherence of a set of candidate concepts, improving the accuracy of entity linking. Furthermore, we explore weighting the edges between candidate concepts using semantic similarity measures (SSM). We show how PPR-SSM can be used to effectively link named entities to biomedical ontologies, namely chemical compounds, phenotypes, and gene-product localization and processes. CONCLUSIONS: We demonstrated that PPR-SSM outperforms state-of-the-art entity linking methods in four distinct gold standards, by taking advantage of the semantic information contained in ontologies. Moreover, PPR-SSM is a graph-based method that does not require training data. Our method improved the entity linking accuracy of chemical compounds by 0.1385 when compared to a method that does not use SSMs.


Assuntos
Semântica , Ontologias Biológicas , Mineração de Dados/métodos , Bases de Dados Factuais , Humanos , Bases de Conhecimento , Vocabulário Controlado
4.
Medicine (Baltimore) ; 98(42): e17504, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31626105

RESUMO

Mental disorders are important diseases with a high prevalence rate in the general population. Common mental disorders are complex diseases with high heritability, and their pathogenesis is the result of interactions between genetic and environmental factors. However, the relationship between mental disorders and genes is complex and difficult to evaluate. Additionally, some mental disorders involve numerous genes, and a single gene can also be associated with different types of mental disorders.This study used text mining (including word frequency analysis, cluster analysis, and association analysis) of the PubMed database to identify genes related to mental disorders.Word frequency analysis revealed 52 high-frequency genes important in studies of mental disorders. Cluster analysis showed that 5-HTT, SLC6A4, and MAOA are common genetic factors in most mental disorders; the intra-group genes in each cluster were highly correlated. Some mental disorders may have common genetic factors; for example, there may be common genetic factors between 'Affective Disorders' and 'Schizophrenia.' Association analysis revealed 35 frequent itemsets and 25 association rules, indicating close associations among genes. The results of association rules showed that CCK, MAOA, and 5-HTT are the most closely related.We used text mining technology to analyze genes related to mental disorders to further summarize and clarify the relationships between mental disorders and genes as well as identify potential relationships, providing a foundation for future experiments. The results of the associative analysis also provide a reference for multi-gene studies of mental disorders.


Assuntos
Mineração de Dados/métodos , Transtornos Mentais/genética , Análise por Conglomerados , Bases de Dados Factuais , Predisposição Genética para Doença/genética , Humanos , Monoaminoxidase/análise , PubMed , Proteínas da Membrana Plasmática de Transporte de Serotonina/análise
5.
BMC Bioinformatics ; 20(1): 459, 2019 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-31492112

RESUMO

BACKGROUND: Automatic extraction of biomedical events from literature is an important task in the understanding biological systems, allowing for faster update of the latest discoveries automatically. Detecting trigger words which indicate events is a critical step in the process of event extraction, because following steps depend on the recognized triggers. The task in this study is to identify event triggers from the literature across multiple levels of biological organization. In order to achieve high performances, the machine learning based approaches, such as neural networks, must be trained on a dataset with plentiful annotations. However, annotations might be difficult to obtain on the multiple levels, and annotated resources have so far mainly focused on the relations and processes at the molecular level. In this work, we aim to apply transfer learning for multiple-level trigger recognition, in which a source dataset with sufficient annotations on the molecular level is utilized to improve performance on a target domain with insufficient annotations and more trigger types. RESULTS: We propose a generalized cross-domain neural network transfer learning architecture and approach, which can share as much knowledge as possible between the source and target domains, especially when their label sets overlap. In the experiments, MLEE corpus is used to train and test the proposed model to recognize the multiple-level triggers as a target dataset. Two different corpora having the varying degrees of overlapping labels with MLEE from the BioNLP'09 and BioNLP'11 Shared Tasks are used as source datasets, respectively. Regardless of the degree of overlap, our proposed approach achieves recognition improvement. Moreover, its performance exceeds previously reported results of other leading systems on the same MLEE corpus. CONCLUSIONS: The proposed transfer learning method can further improve the performance compared with the traditional method, when the labels of the source and target datasets overlap. The most essential reason is that our approach has changed the way parameters are shared. The vertical sharing replaces the horizontal sharing, which brings more sharable parameters. Hence, these more shared parameters between networks improve the performance and generalization of the model on the target domain effectively.


Assuntos
Pesquisa Biomédica , Mineração de Dados/métodos
6.
Neural Netw ; 119: 85-92, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31401529

RESUMO

Robust Principal Component Analysis (RPCA) is a powerful tool in machine learning and data mining problems. However, in many real-world applications, RPCA is unable to well encode the intrinsic geometric structure of data, thereby failing to obtain the lowest rank representation from the corrupted data. To cope with this problem, most existing methods impose the smooth manifold, which is artificially constructed by the original data. This reduces the flexibility of algorithms. Moreover, the graph, which is artificially constructed by the corrupted data, is inexact and does not characterize the true intrinsic structure of real data. To tackle this problem, we propose an adaptive RPCA (ARPCA) to recover the clean data from the high-dimensional corrupted data. Our proposed model is advantageous due to: (1) The graph is adaptively constructed upon the clean data such that the system is more flexible. (2) Our model simultaneously learns both clean data and similarity matrix that determines the construction of graph. (3) The clean data has the lowest-rank structure that enforces to correct the corruptions. Extensive experiments on several datasets illustrate the effectiveness of our model for clustering and low-rank recovery tasks.


Assuntos
Mineração de Dados/métodos , Aprendizado de Máquina , Análise de Componente Principal/métodos , Algoritmos , Análise por Conglomerados
7.
PLoS Comput Biol ; 15(8): e1007239, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31437145

RESUMO

Tailored therapy aims to cure cancer patients effectively and safely, based on the complex interactions between patients' genomic features, disease pathology and drug metabolism. Thus, the continual increase in scientific literature drives the need for efficient methods of data mining to improve the extraction of useful information from texts based on patients' genomic features. An important application of text mining to tailored therapy in cancer encompasses the use of mutations and cancer fusion genes as moieties that change patients' cellular networks to develop cancer, and also affect drug metabolism. Fusion proteins, which are derived from the slippage of two parental genes, are produced in cancer by chromosomal aberrations and trans-splicing. Given that the two parental proteins for predicted fusion proteins are known, we used our previously developed method for identifying chimeric protein-protein interactions (ChiPPIs) associated with the fusion proteins. Here, we present a validation approach that receives fusion proteins of interest, predicts their cellular network alterations by ChiPPI and validates them by our new method, ProtFus, using an online literature search. This process resulted in a set of 358 fusion proteins and their corresponding protein interactions, as a training set for a Naïve Bayes classifier, to identify predicted fusion proteins that have reliable evidence in the literature and that were confirmed experimentally. Next, for a test group of 1817 fusion proteins, we were able to identify from the literature 2908 PPIs in total, across 18 cancer types. The described method, ProtFus, can be used for screening the literature to identify unique cases of fusion proteins and their PPIs, as means of studying alterations of protein networks in cancers. Availability: http://protfus.md.biu.ac.il/.


Assuntos
Mineração de Dados/métodos , Proteínas de Fusão Oncogênica/genética , Mapeamento de Interação de Proteínas/métodos , Algoritmos , Teorema de Bayes , Big Data , Biologia Computacional , Mineração de Dados/estatística & dados numéricos , Bases de Dados Genéticas , Humanos , Mutação , Neoplasias/genética , Neoplasias/terapia , Proteínas de Fusão Oncogênica/química , Proteínas de Fusão Oncogênica/metabolismo , Medicina de Precisão , Mapeamento de Interação de Proteínas/estatística & dados numéricos , Mapas de Interação de Proteínas
8.
Nucleic Acids Res ; 47(18): e110, 2019 10 10.
Artigo em Inglês | MEDLINE | ID: mdl-31400112

RESUMO

Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.


Assuntos
Vias Biossintéticas/genética , Biologia Computacional/métodos , Mineração de Dados/métodos , Família Multigênica/genética , Aprendizado Profundo , Genoma , Genoma Bacteriano/genética
9.
BMC Bioinformatics ; 20(1): 427, 2019 Aug 16.
Artigo em Inglês | MEDLINE | ID: mdl-31419937

RESUMO

BACKGROUND: Biomedical named entity recognition (BioNER) is a fundamental and essential task for biomedical literature mining, which affects the performance of downstream tasks. Most BioNER models rely on domain-specific features or hand-crafted rules, but extracting features from massive data requires much time and human efforts. To solve this, neural network models are used to automatically learn features. Recently, multi-task learning has been applied successfully to neural network models of biomedical literature mining. For BioNER models, using multi-task learning makes use of features from multiple datasets and improves the performance of models. RESULTS: In experiments, we compared our proposed model with other multi-task models and found our model outperformed the others on datasets of gene, protein, disease categories. We also tested the performance of different dataset pairs to find out the best partners of datasets. Besides, we explored and analyzed the influence of different entity types by using sub-datasets. When dataset size was reduced, our model still produced positive results. CONCLUSION: We propose a novel multi-task model for BioNER with the cross-sharing structure to improve the performance of multi-task models. The cross-sharing structure in our model makes use of features from both datasets in the training procedure. Detailed analysis about best partners of datasets and influence between entity categories can provide guidance of choosing proper dataset pairs for multi-task training. Our implementation is available at https://github.com/JogleLew/bioner-cross-sharing .


Assuntos
Pesquisa Biomédica , Mineração de Dados/métodos , Disseminação de Informação , Algoritmos , Bases de Dados como Assunto , Humanos , Aprendizado de Máquina
10.
Int J Med Inform ; 130: 103947, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31450080

RESUMO

BACKGROUND: In recent years, the development and application of emerging information technologies, such as artificial intelligence, cloud computing, Internet of Things, and wearable devices, has expanded the content of electronic health (e-health). Electronic health has become a research focus, but few studies have explored its knowledge structure from a global perspective. METHODS: To detect the evolution track, knowledge base and research hotspots of e-health, we conducted a series of bibliometric analyses on the retrieved 3,085 papers from the Web of Science core database in 1992-2017. We used several bibliometric tools, such as HistCite, CiteSpace, NetDraw, and NEViewer, to describe the evolution process, time-and-space knowledge map, and hotspots in e-health. RESULTS: The research results are as follows. (a) The number of publications has been obviously increasing after 2005 and according to the trend line it is expected to continue increase exponentially in the future. (b) Countries/regions conducting e-health research have close cooperative relationship, among which European countries have the closest cooperation. (c) Electronic health records, mobile health and health information technology are research hotspots in electronic health. Moreover, scholars also pay attention to the hot issues such as privacy, security, and quality improvement. CONCLUSIONS: Electronic health is a large and growing field with quite a number of research articles in medical journals. This study provides a comprehensive knowledge structure of electronic health for scholars in the healthcare informatics field, which can help them quickly grasp research hotspots and choose future research projects.


Assuntos
Bibliometria , Pesquisa Biomédica/tendências , Mineração de Dados/métodos , Bases de Dados Factuais , Registros Eletrônicos de Saúde/estatística & dados numéricos , Informática Médica/tendências , Telemedicina/tendências , Inteligência Artificial , Computação em Nuvem , Europa (Continente) , Humanos
11.
Artigo em Inglês | MEDLINE | ID: mdl-31390774

RESUMO

Driven by the pull of gravity, mass-wasting comprises all of the sedimentary processes related to remobilization of sediments deposited on slopes, including creep, sliding, slumping, flow, and fall. It is vital to conduct mass-wasting susceptibility mapping, with the aim of providing decision makers with management advice. The current study presents two individual data mining methods-the frequency ratio (FR) and information value model (IVM) methods-to map mass-wasting susceptibility in four catchments in Miyun County, Beijing, China. To achieve this goal, nine influence factors and a mass-wasting inventory map were used and produced, respectively. In this study, 71 mass-wasting locations were investigated in the field. Of these hazard locations, 70% of them were randomly selected to build the model, and the remaining 30% of the hazard locations were used for validation. Finally, a receiver operating characteristic (ROC) curve was used to assess the mass-wasting susceptibility maps produced by the above-mentioned models. Results show that the FR had a higher concordance and spatial differentiation, with respective values of 0.902 (area under the success rate) and 0.883 (area under the prediction rate), while the IVM had lower values of 0.865 (area under the success rate) and 0.855 (area under the prediction rate). Both proposed methodologies are useful for general planning and evaluation purposes, and they are shown to be reasonable models. Slopes of 6-21° were the most common thresholds that controlled occurrence of mass-wasting. Farmland terraces were mainly composed of gravel, mud, and clay, which are more prone to mass-wasting. Mass-wasting susceptibility mapping is feasible and potentially highly valuable. It could provide useful information in support of environmental health policies.


Assuntos
Mineração de Dados/métodos , Mapeamento Geográfico , Sedimentos Geológicos , Análise Espacial , Pequim , China , Curva ROC
12.
BMC Res Notes ; 12(1): 436, 2019 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-31324263

RESUMO

BACKGROUND: Cloud computing is a unique paradigm that is aggregating resources available from cloud service providers for use by customers on demand and pay per use basis. There is a Cloud federation that integrates the four primary Cloud models and the Cloud aggregator that integrates multiple computing services. A systematic mapping study provides an overview of work done in a particular field of interest and identifies gaps for further research. OBJECTIVES: The objective of this paper was to conduct a study of deployment and designs models for Cloud using a systematic mapping process. The methodology involves examining core aspect of the field of study using the research, contribution and topic facets. RESULTS: The results obtained indicated that there were more publications on solution proposals, which constituted 41.98% of papers relating to design and deployment models on the Cloud. Out of this, 5.34% was on security, 1.5% on privacy, 6.11% on configuration, 7.63% on implementation, 11.45% on service deployment, and 9.92% of the solution proposal was on design. The results obtained will be useful for further studies by the academia and industry in this broad topic that was examined.


Assuntos
Computação em Nuvem/estatística & dados numéricos , Biologia Computacional/estatística & dados numéricos , Mineração de Dados/estatística & dados numéricos , Gestão da Informação/estatística & dados numéricos , Biologia Computacional/métodos , Segurança Computacional , Mineração de Dados/métodos , Humanos , Gestão da Informação/métodos , Privacidade , Projetos de Pesquisa/normas , Projetos de Pesquisa/estatística & dados numéricos
13.
Molecules ; 24(15)2019 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-31349632

RESUMO

The complexity of herbal matrix necessitates the development of powerful analytical strategies to enable comprehensive multicomponent characterization. In this work, targeting the multicomponents from Panax japonicus C.A. Meyer, both data dependent acquisition (DDA) and data-independent high-definition MSE (HDMSE) in the negative electrospray ionization mode were used to extend the coverage of untargeted metabolites characterization by ultra-high-performance liquid chromatography (UHPLC) coupled to a VionTM IM-QTOF (ion-mobility/quadrupole time-of-flight) high-resolution mass spectrometer. Efficient chromatographic separation was achieved by using a BEH Shield RP18 column. Optimized mass-dependent ramp collision energy of DDA enabled more balanced MS/MS fragmentation for mono- to penta-glycosidic ginsenosides. An in-house ginsenoside database containing 504 known ginsenosides and 60 reference compounds was established and incorporated into UNIFITM, by which efficient and automated peak annotation was accomplished. By streamlined data processing workflows, we could identify or tentatively characterize 178 saponins from P. japonicus, of which 75 may have not been isolated from the Panax genus. Amongst them, 168 ginsenosides were characterized based on the DDA data, while 10 ones were newly identified from the HDMSE data, which indicated their complementary role. Conclusively, the in-depth deconvolution and characterization of multicomponents from P. japonicus were achieved, and the approaches we developed can be an example for comprehensive chemical basis elucidation of traditional Chinese medicine (TCM).


Assuntos
Cromatografia Líquida de Alta Pressão , Mineração de Dados , Metaboloma , Metabolômica , Panax/química , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Cromatografia Líquida de Alta Pressão/métodos , Mineração de Dados/métodos , Metabolômica/métodos , Estrutura Molecular , Panax/metabolismo , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Fluxo de Trabalho
15.
Value Health ; 22(7): 808-815, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31277828

RESUMO

BACKGROUND: Machine learning is increasingly used to predict healthcare outcomes, including cost, utilization, and quality. OBJECTIVE: We provide a high-level overview of machine learning for healthcare outcomes researchers and decision makers. METHODS: We introduce key concepts for understanding the application of machine learning methods to healthcare outcomes research. We first describe current standards to rigorously learn an estimator, which is an algorithm developed through machine learning to predict a particular outcome. We include steps for data preparation, estimator family selection, parameter learning, regularization, and evaluation. We then compare 3 of the most common machine learning methods: (1) decision tree methods that can be useful for identifying how different subpopulations experience different risks for an outcome; (2) deep learning methods that can identify complex nonlinear patterns or interactions between variables predictive of an outcome; and (3) ensemble methods that can improve predictive performance by combining multiple machine learning methods. RESULTS: We demonstrate the application of common machine methods to a simulated insurance claims dataset. We specifically include statistical code in R and Python for the development and evaluation of estimators for predicting which patients are at heightened risk for hospitalization from ambulatory care-sensitive conditions. CONCLUSIONS: Outcomes researchers should be aware of key standards for rigorously evaluating an estimator developed through machine learning approaches. Although multiple methods use machine learning concepts, different approaches are best suited for different research problems.


Assuntos
Mineração de Dados/métodos , Pesquisa sobre Serviços de Saúde/métodos , Aprendizado de Máquina , Demandas Administrativas em Assistência à Saúde , Tomada de Decisão Clínica , Análise Custo-Benefício , Custos de Cuidados de Saúde , Humanos , Modelos Econômicos , Modelos Estatísticos , Indicadores de Qualidade em Assistência à Saúde
16.
Methods Mol Biol ; 1910: 605-634, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31278679

RESUMO

Metagenomics, also known as environmental genomics, is the study of the genomic content of a sample of organisms (microbes) obtained from a common habitat. Metagenomics and other "omics" disciplines have captured the attention of researchers for several decades. The effect of microbes in our body is a relevant concern for health studies. There are plenty of studies using metagenomics which examine microorganisms that inhabit niches in the human body, sometimes causing disease, and are often correlated with multiple treatment conditions. No matter from which environment it comes, the analyses are often aimed at determining either the presence or absence of specific species of interest in a given metagenome or comparing the biological diversity and the functional activity of a wider range of microorganisms within their communities. The importance increases for comparison within different environments such as multiple patients with different conditions, multiple drugs, and multiple time points of same treatment or same patient. Thus, no matter how many hypotheses we have, we need a good understanding of genomics, bioinformatics, and statistics to work together to analyze and interpret these datasets in a meaningful way. This chapter provides an overview of different data analyses and statistical approaches (with example scenarios) to analyze metagenomics samples from different medical projects or clinical trials.


Assuntos
Biologia Computacional , Mineração de Dados , Metagenoma , Metagenômica , Algoritmos , Biodiversidade , Biologia Computacional/métodos , Interpretação Estatística de Dados , Mineração de Dados/métodos , Evolução Molecular , Ácidos Graxos Ômega-3/farmacologia , Microbioma Gastrointestinal/efeitos dos fármacos , Humanos , Metagenômica/métodos , Microbiota , Anotação de Sequência Molecular , Placa Aterosclerótica/etiologia , Fluxo de Trabalho
17.
Gene ; 712: 143961, 2019 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-31279709

RESUMO

Since international federation of gynecology and obstetrics (FIGO) staging is mainly based on clinical assessment, an integrated approach for mining RNA based biomarkers for understanding the molecular deregulation of signaling pathways and RNAs in cervical cancer was proposed in this study. Publicly available data were mined for identifying significant RNAs after patient staging. Significant miRNA families were identified from mRNA-miRNA and lncRNA-miRNA interaction network analyses followed by stage specific mRNA-miRNA-lncRNA association network generation. Integrated bioinformatic analyses of selected mRNAs and lncRNAs were performed. Results suggest that HBA1, HBA2, HBB, SLC2A1, CXCL10 (stage I), PKIA (stage III) and S100A7 (stage IV) were important. miRNA family enrichment of interacting miRNA partners of selected RNAs indicated the enrichment of let-7 family. Assembly of collagen fibrils and other multimeric structures_Homosapiens_R-HSA-2022090 in pathway analysis and progesterone_CTD_00006624 in DSigDB analysis were the most significant and SLC2A1, hsa-miR-188-3p, hsa-miR-378a-3p and hsa-miR-150-5p were selected as survival markers.


Assuntos
Biomarcadores Tumorais/metabolismo , Biologia Computacional/métodos , Mineração de Dados/métodos , RNA Neoplásico/metabolismo , Neoplasias do Colo do Útero/genética , Neoplasias do Colo do Útero/metabolismo , Colágeno/química , Metilação de DNA , Progressão da Doença , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Humanos , MicroRNAs/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Papillomaviridae/metabolismo , Infecções por Papillomavirus/complicações , Neoplasias do Colo do Útero/virologia
18.
Nature ; 571(7763): 95-98, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31270483

RESUMO

The overwhelming majority of scientific knowledge is published as text, which is difficult to analyse by either traditional statistical analysis or modern machine learning methods. By contrast, the main source of machine-interpretable data for the materials research community has come from structured property databases1,2, which encompass only a small fraction of the knowledge present in the research literature. Beyond property values, publications contain valuable knowledge regarding the connections and relationships between data items as interpreted by the authors. To improve the identification and use of this knowledge, several studies have focused on the retrieval of information from scientific literature using supervised natural language processing3-10, which requires large hand-labelled datasets for training. Here we show that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings11-13 (vector representations of words) without human labelling or supervision. Without any explicit insertion of chemical knowledge, these embeddings capture complex materials science concepts such as the underlying structure of the periodic table and structure-property relationships in materials. Furthermore, we demonstrate that an unsupervised method can recommend materials for functional applications several years before their discovery. This suggests that latent knowledge regarding future discoveries is to a large extent embedded in past publications. Our findings highlight the possibility of extracting knowledge and relationships from the massive body of scientific literature in a collective manner, and point towards a generalized approach to the mining of scientific literature.


Assuntos
Mineração de Dados/métodos , Conhecimento , Ciência dos Materiais , Processamento de Linguagem Natural , Relatório de Pesquisa , Pesquisa , Terminologia como Assunto , Aprendizado de Máquina não Supervisionado , Condutividade Elétrica , Eletrodos , Ferro , Lítio , Magnetismo , Reprodutibilidade dos Testes , Semântica , Temperatura Ambiente
19.
Nucleic Acids Res ; 47(14): 7247-7261, 2019 08 22.
Artigo em Inglês | MEDLINE | ID: mdl-31265077

RESUMO

Scaffold/matrix attachment regions (S/MARs) are DNA elements that serve to compartmentalize the chromatin into structural and functional domains. These elements are involved in control of gene expression which governs the phenotype and also plays role in disease biology. Therefore, genome-wide understanding of these elements holds great therapeutic promise. Several attempts have been made toward identification of S/MARs in genomes of various organisms including human. However, a comprehensive genome-wide map of human S/MARs is yet not available. Toward this objective, ChIP-Seq data of 14 S/MAR binding proteins were analyzed and the binding site coordinates of these proteins were used to prepare a non-redundant S/MAR dataset of human genome. Along with co-ordinate (location) details of S/MARs, the dataset also revealed details of S/MAR features, namely, length, inter-SMAR length (the chromatin loop size), nucleotide repeats, motif abundance, chromosomal distribution and genomic context. S/MARs identified in present study and their subsequent analysis also suggests that these elements act as hotspots for integration of retroviruses. Therefore, these data will help toward better understanding of genome functioning and designing effective anti-viral therapeutics. In order to facilitate user friendly browsing and retrieval of the data obtained in present study, a web interface, MARome (http://bioinfo.net.in/MARome), has been developed.


Assuntos
Cromatina/genética , DNA/genética , Genoma Humano/genética , Proteínas de Ligação à Região de Interação com a Matriz/genética , Regiões de Interação com a Matriz/genética , Sítios de Ligação/genética , Cromatina/metabolismo , Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , DNA/metabolismo , Mineração de Dados/métodos , Genômica/métodos , Humanos , Internet , Proteínas de Ligação à Região de Interação com a Matriz/metabolismo , Ligação Proteica , Reprodutibilidade dos Testes
20.
Nucleic Acids Res ; 47(13): e76, 2019 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-31329928

RESUMO

Existing large gene expression data repositories hold enormous potential to elucidate disease mechanisms, characterize changes in cellular pathways, and to stratify patients based on molecular profiles. To achieve this goal, integrative resources and tools are needed that allow comparison of results across datasets and data types. We propose an intuitive approach for data-driven stratifications of molecular profiles and benchmark our methodology using the dimensionality reduction algorithm t-distributed stochastic neighbor embedding (t-SNE) with multi-study and multi-platform data on hematological malignancies. Our approach enables assessing the contribution of biological versus technical variation to sample clustering, direct incorporation of additional datasets to the same low dimensional representation, comparison of molecular disease subtypes identified from separate t-SNE representations, and characterization of the obtained clusters based on pathway databases and additional data. In this manner, we performed an integrative analysis across multi-omics acute myeloid leukemia studies. Our approach indicated new molecular subtypes with differential survival and drug responsiveness among samples lacking fusion genes, including a novel myelodysplastic syndrome-like cluster and a cluster characterized with CEBPA mutations and differential activity of the S-adenosylmethionine-dependent DNA methylation pathway. In summary, integration across multiple studies can help to identify novel molecular disease subtypes and generate insight into disease biology.


Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Mineração de Dados/métodos , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica/métodos , Regulação Leucêmica da Expressão Gênica , Leucemia Mieloide Aguda/genética , Fenótipo , Algoritmos , Bases de Dados Genéticas , Genes Neoplásicos , Humanos , Leucemia Mieloide Aguda/classificação , Mutação , Tamanho da Amostra
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA