Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 21(1): 217, 2020 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-32460703

RESUMO

BACKGROUND: Enzymatic and chemical reactions are key for understanding biological processes in cells. Curated databases of chemical reactions exist but these databases struggle to keep up with the exponential growth of the biomedical literature. Conventional text mining pipelines provide tools to automatically extract entities and relationships from the scientific literature, and partially replace expert curation, but such machine learning frameworks often require a large amount of labeled training data and thus lack scalability for both larger document corpora and new relationship types. RESULTS: We developed an application of Snorkel, a weakly supervised learning framework, for extracting chemical reaction relationships from biomedical literature abstracts. For this work, we defined a chemical reaction relationship as the transformation of chemical A to chemical B. We built and evaluated our system on small annotated sets of chemical reaction relationships from two corpora: curated bacteria-related abstracts from the MetaCyc database (MetaCyc_Corpus) and a more general set of abstracts annotated with MeSH (Medical Subject Headings) term Bacteria (Bacteria_Corpus; a superset of MetaCyc_Corpus). For the MetaCyc_Corpus, we obtained 84% precision and 41% recall (55% F1 score). Extending to the more general Bacteria_Corpus decreased precision to 62% with only a four-point drop in recall to 37% (46% F1 score). Overall, the Bacteria_Corpus contained two orders of magnitude more candidate chemical reaction relationships (nine million candidates vs 68,0000 candidates) and had a larger class imbalance (2.5% positives vs 5% positives) as compared to the MetaCyc_Corpus. In total, we extracted 6871 chemical reaction relationships from nine million candidates in the Bacteria_Corpus. CONCLUSIONS: With this work, we built a database of chemical reaction relationships from almost 900,000 scientific abstracts without a large training set of labeled annotations. Further, we showed the generalizability of our initial application built on MetaCyc documents enriched with chemical reactions to a general set of articles related to bacteria.


Assuntos
Mineração de Dados/métodos , Bactérias/metabolismo , Fenômenos Bioquímicos , Bases de Dados Factuais , Humanos , Aprendizado de Máquina , Publicações , Software
2.
Artigo em Inglês | MEDLINE | ID: mdl-31777414

RESUMO

Labeling training data is one of the most costly bottlenecks in developing machine learning-based applications. We present a first-of-its-kind study showing how existing knowledge resources from across an organization can be used as weak supervision in order to bring development time and cost down by an order of magnitude, and introduce Snorkel DryBell, a new weak supervision management system for this setting. Snorkel DryBell builds on the Snorkel framework, extending it in three critical aspects: flexible, template-based ingestion of diverse organizational knowledge, cross-feature production serving, and scalable, sampling-free execution. On three classification tasks at Google, we find that Snorkel DryBell creates classifiers of comparable quality to ones trained with tens of thousands of hand-labeled examples, converts non-servable organizational resources to servable models for an average 52% performance improvement, and executes over millions of data points in tens of minutes.

3.
Nat Commun ; 9(1): 5153, 2018 12 04.
Artigo em Inglês | MEDLINE | ID: mdl-30514844

RESUMO

Regeneration of complex multi-tissue structures, such as limbs, requires the coordinated effort of multiple cell types. In axolotl limb regeneration, the wound epidermis and blastema have been extensively studied via histology, grafting, and bulk-tissue RNA-sequencing. However, defining the contributions of these tissues is hindered due to limited information regarding the molecular identity of the cell types in regenerating limbs. Here we report unbiased single-cell RNA-sequencing on over 25,000 cells from axolotl limbs and identify a plethora of cellular diversity within epidermal, mesenchymal, and hematopoietic lineages in homeostatic and regenerating limbs. We identify regeneration-induced genes, develop putative trajectories for blastema cell differentiation, and propose the molecular identity of fibroblast-like blastema progenitor cells. This work will enable application of molecular techniques to assess the contribution of these populations to limb regeneration. Overall, these data allow for establishment of a putative framework for adult axolotl limb regeneration.


Assuntos
Extremidades/fisiologia , Regulação da Expressão Gênica no Desenvolvimento/fisiologia , Regeneração , Transcriptoma , Ambystoma mexicanum/genética , Ambystoma mexicanum/fisiologia , Experimentação Animal , Animais , Diferenciação Celular , Linhagem da Célula , Células Epidérmicas , Epiderme/patologia , Epiderme/fisiologia , Extremidades/embriologia , Extremidades/patologia , Fibroblastos/citologia , Fibroblastos/fisiologia , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento/genética , Sistema Imunitário/fisiologia , Hibridização In Situ , Macrófagos , Células-Tronco Mesenquimais , Células Mieloides/fisiologia , Regeneração Nervosa/fisiologia , Neurônios/fisiologia , Regeneração/genética , Análise de Sequência de RNA , Células-Tronco/citologia , Células-Tronco/fisiologia
4.
Nat Neurosci ; 21(7): 1017, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29752482

RESUMO

In the version of this article initially published, the x-axis labels in Fig. 3c read Vglut, Gad1/2, Aldh1l1 and Pecam1; they should have read Vglut+, Gad1/2+, Aldh1l1+ and Pecam1+. In Fig. 4, the range values were missing from the color scales; they are, from left to right, 4-15, 0-15, 4-15 and 0-15 in Fig. 4a and 4-15, 4-15 and 4-8 in Fig. 4h. In the third paragraph of the main text, the phrase reading "Previous approaches have analyzed a limited number of inhibitory cell types, thus masking the full diversity of excitatory populations" should have read "Previous approaches have analyzed a limited number of inhibitory cell types and masked the full diversity of excitatory populations." In the second paragraph of Results section "Diversity of experience-regulated ERGs," the phrase reading "thus suggesting considerable divergence within the gene expression program responding to early stimuli" should have read "thus suggesting considerable divergence within the early stimulus-responsive gene expression program." In the fourth paragraph of Results section "Excitatory neuronal LRGs," the sentence reading "The anatomical organization of these cell types into sublayers, coupled with divergent transcriptional responses to a sensory stimulus, suggested previously unappreciated functional subdivisions located within the laminae of the mouse visual cortex and resembling the cytoarchitecture in higher mammals" should have read "The anatomical organization of these cell types into sublayers, coupled with divergent transcriptional responses to a sensory stimulus, suggests previously unappreciated functional subdivisions located within the laminae of the mouse visual cortex, resembling the cytoarchitecture in higher mammals." In the last sentence of the Results, "sensory-responsive genes" should have read "sensory-stimulus-responsive genes." The errors have been corrected in the HTML and PDF versions of the article.

5.
Artigo em Inglês | MEDLINE | ID: mdl-30931438

RESUMO

Many real-world machine learning problems are challenging to tackle for two reasons: (i) they involve multiple sub-tasks at different levels of granularity; and (ii) they require large volumes of labeled training data. We propose Snorkel MeTaL, an end-to-end system for multi-task learning that leverages weak supervision provided at multiple levels of granularity by domain expert users. In MeTaL, a user specifies a problem consisting of multiple, hierarchically-related sub-tasks-for example, classifying a document at multiple levels of granularity-and then provides labeling functions for each sub-task as weak supervision. MeTaL learns a re-weighted model of these labeling functions, and uses the combined signal to train a hierarchical multi-task network which is automatically compiled from the structure of the sub-tasks. Using MeTaL on a radiology report triage task and a fine-grained news classification task, we achieve average gains of 11.2 accuracy points over a baseline supervised approach and 9.5 accuracy points over the predictions of the user-provided labeling functions.

6.
Nat Neurosci ; 21(1): 120-129, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-29230054

RESUMO

Activity-dependent transcriptional responses shape cortical function. However, a comprehensive understanding of the diversity of these responses across the full range of cortical cell types, and how these changes contribute to neuronal plasticity and disease, is lacking. To investigate the breadth of transcriptional changes that occur across cell types in the mouse visual cortex after exposure to light, we applied high-throughput single-cell RNA sequencing. We identified significant and divergent transcriptional responses to stimulation in each of the 30 cell types characterized, thus revealing 611 stimulus-responsive genes. Excitatory pyramidal neurons exhibited inter- and intralaminar heterogeneity in the induction of stimulus-responsive genes. Non-neuronal cells showed clear transcriptional responses that may regulate experience-dependent changes in neurovascular coupling and myelination. Together, these results reveal the dynamic landscape of the stimulus-dependent transcriptional changes occurring across cell types in the visual cortex; these changes are probably critical for cortical function and may be sites of deregulation in developmental brain disorders.


Assuntos
Neuroglia/fisiologia , Neurônios/fisiologia , Transcrição Gênica/fisiologia , Transcriptoma/fisiologia , Córtex Visual/citologia , Animais , Fatores de Transcrição Hélice-Alça-Hélice Básicos/metabolismo , Regulação da Expressão Gênica/fisiologia , Ontologia Genética , Luz , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Proteínas do Tecido Nervoso/genética , Proteínas do Tecido Nervoso/metabolismo , Inibição Neural/fisiologia , Neurônios/citologia , Acoplamento Neurovascular/fisiologia , Estimulação Luminosa , Proteínas Proto-Oncogênicas c-fos/metabolismo , Transdução de Sinais/fisiologia , Análise de Célula Única/métodos , Estatísticas não Paramétricas , Vias Visuais
7.
SIGMOD Rec ; 45(1): 60-67, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-28344371

RESUMO

The dark data extraction or knowledge base construction (KBC) problem is to populate a SQL database with information from unstructured data sources including emails, webpages, and pdf reports. KBC is a long-standing problem in industry and research that encompasses problems of data extraction, cleaning, and integration. We describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems. The key idea in DeepDive is that statistical inference and machine learning are key tools to attack classical data problems in extraction, cleaning, and integration in a unified and more effective manner. DeepDive programs are declarative in that one cannot write probabilistic inference algorithms; instead, one interacts by defining features or rules about the domain. A key reason for this design choice is to enable domain experts to build their own KBC systems. We present the applications, abstractions, and techniques of DeepDive employed to accelerate construction of KBC systems.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA