Pesquisa | Biblioteca Virtual em Saúde

1.

Mobile circular DNAs regulating memory and communication in CNS neurons.

Smalheiser, Neil R.

Front Mol Neurosci ; 16: 1304667, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38125007

RESUMO

Stimuli that stimulate neurons elicit transcription of immediate-early genes, a process which requires local sites of chromosomal DNA to form double-strand breaks (DSBs) generated by topoisomerase IIb within a few minutes, followed by repair within a few hours. Wakefulness, exploring a novel environment, and contextual fear conditioning also elicit turn-on of synaptic genes requiring DSBs and repair. It has been reported (in non-neuronal cells) that extrachromosomal circular DNA can form at DSBs as the sites are repaired. I propose that activated neurons may generate extrachromosomal circular DNAs during repair at DSB sites, thus creating long-lasting "markers" of that activity pattern which contain sequences from their sites of origin and which regulate long-term gene expression. Although the population of extrachromosomal DNAs is diverse and overall associated with pathology, a subclass of small circular DNAs ("microDNAs," â¼100-400 bases long), largely derives from unique genomic sequences and has attractive features to act as stable, mobile circular DNAs to regulate gene expression in a sequence-specific manner. Circular DNAs can be templates for the transcription of RNAs, particularly small inhibitory siRNAs, circular RNAs and other non-coding RNAs that interact with microRNAs. These may regulate translation and transcription of other genes involved in synaptic plasticity, learning and memory. Another possible fate for mobile DNAs is to be inserted stably into chromosomes after new DSB sites are generated in response to subsequent activation events. Thus, the insertions of mobile DNAs into activity-induced genes may tend to inactivate them and aid in homeostatic regulation to avoid over-excitation, as well as providing a "counter" for a neuron's activation history. Moreover, activated neurons release secretory exosomes that can be transferred to recipient cells to regulate their gene expression. Mobile DNAs may be packaged into exosomes, released in an activity-dependent manner, and transferred to recipient cells, where they may be templates for regulatory RNAs and possibly incorporated into chromosomes. Finally, aging and neurodegenerative diseases (including Alzheimer's disease) are also associated with an increase in DSBs in neurons. It will become important in the future to assess how pathology-associated DSBs may relate to activity-induced mobile DNAs, and whether the latter may potentially contribute to pathogenesis.

2.

Editorial: Emerging areas in literature-based discovery.

Sebastian, Yakub; Smalheiser, Neil R.

Front Res Metr Anal ; 8: 1122547, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36741345

3.

Testing a filtering strategy for systematic reviews: evaluating work savings and recall.

Proescholdt, Randi; Hsiao, Tzu-Kun; Schneider, Jodi; Cohen, Aaron M; McDonagh, Marian S; Smalheiser, Neil R.

AMIA Jt Summits Transl Sci Proc ; 2022: 406-413, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35854734

RESUMO

Systematic reviews are extremely time-consuming. The goal of this work is to assess work savings and recall for a publication type filtering strategy that uses the output of two machine learning models, Multi-Tagger and web RCT Tagger, applied retrospectively to 10 systematic reviews on drug effectiveness. Our filtering strategy resulted in mean work savings of 33.6% and recall of 98.3%. Of 363 articles finally included in any of the systematic reviews, 7 were filtered out by our strategy, but 1 "error" was actually an article using a publication type that the SR team had not pre-specified as relevant for inclusion. Our analysis suggests that automated publication type filtering can potentially provide substantial work savings with minimal loss of included articles. Publication type filtering should be personalized for each systematic review and might be combined with other filtering or ranking methods to provide additional work savings for manual triage.

4.

Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews.

Schneider, Jodi; Hoang, Linh; Kansara, Yogeshwar; Cohen, Aaron M; Smalheiser, Neil R.

JAMIA Open ; 5(1): ooac015, 2022 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-35571360

RESUMO

Objectives: To produce a systematic review (SR), reviewers typically screen thousands of titles and abstracts of articles manually to find a small number which are read in full text to find relevant articles included in the final SR. Here, we evaluate a proposed automated probabilistic publication type screening strategy applied to the randomized controlled trial (RCT) articles (i.e., those which present clinical outcome results of RCT studies) included in a corpus of previously published Cochrane reviews. Materials and Methods: We selected a random subset of 558 published Cochrane reviews that specified RCT study only inclusion criteria, containing 7113 included articles which could be matched to PubMed identifiers. These were processed by our automated RCT Tagger tool to estimate the probability that each article reports clinical outcomes of a RCT. Results: Removing articles with low predictive scores P < 0.01 eliminated 288 included articles, of which only 22 were actually typical RCT articles, and only 18 were actually typical RCT articles that MEDLINE indexed as such. Based on our sample set, this screening strategy led to fewer than 0.05 relevant RCT articles being missed on average per Cochrane SR. Discussion: This scenario, based on real SRs, demonstrates that automated tagging can identify RCT articles accurately while maintaining very high recall. However, we also found that even SRs whose inclusion criteria are restricted to RCT studies include not only clinical outcome articles per se, but a variety of ancillary article types as well. Conclusions: This encourages further studies learning how best to incorporate automated tagging of additional publication types into SR triage workflows.

5.

The Citation Cloud of a biomedical article: a free, public, web-based tool enabling citation analysis.

Smalheiser, Neil R; Schneider, Jodi; Torvik, Vetle I; Fragnito, Dean P; Tirk, Eric E.

J Med Libr Assoc ; 110(1): 103-108, 2022 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-35210969

RESUMO

BACKGROUND: An article's citations are useful for finding related articles that may not be readily found by keyword searches or textual similarity. Citation analysis is also important for analyzing scientific innovation and the structure of the biomedical literature. We wanted to facilitate citation analysis for the broad community by providing a user-friendly interface for accessing and analyzing citation data for biomedical articles. CASE PRESENTATION: We seeded the Citation Cloud dataset with over 465 million open access citations culled from six different sources: PubMed Central, Microsoft Academic Graph, ArnetMiner, Semantic Scholar, Open Citations, and the NIH iCite dataset. We implemented a free, public extension to PubMed that allows any user to visualize and analyze the entire citation cloud around any paper of interest A: the set of articles cited by A, those which cite A, those which are co-cited with A, and those which are bibliographically coupled to A. CONCLUSIONS: Citation Cloud greatly enables the study of citations by the scientific community, including relatively advanced analyses (co-citations and bibliographic coupling) that cannot be undertaken using other available tools. The tool can be accessed by running any PubMed query on the Anne O'Tate value-added search interface and clicking on the Citations button next to any retrieved article.

Assuntos

Bibliometria , Publicações , Internet , PubMed

6.

A web-based tool for automatically linking clinical trials to their publications.

Smalheiser, Neil R; Holt, Arthur W.

J Am Med Inform Assoc ; 29(5): 822-830, 2022 04 13.

Artigo em Inglês | MEDLINE | ID: mdl-35020887

RESUMO

OBJECTIVE: Evidence synthesis teams, physicians, policy makers, and patients and their families all have an interest in following the outcomes of clinical trials and would benefit from being able to evaluate both the results posted in trial registries and in the publications that arise from them. Manual searching for publications arising from a given trial is a laborious and uncertain process. We sought to create a statistical model to automatically identify PubMed articles likely to report clinical outcome results from each registered trial in ClinicalTrials.gov. MATERIALS AND METHODS: A machine learning-based model was trained on pairs (publications known to be linked to specific registered trials). Multiple features were constructed based on the degree of matching between the PubMed article metadata and specific fields of the trial registry, as well as matching with the set of publications already known to be linked to that trial. RESULTS: Evaluation of the model using known linked articles as gold standard showed that they tend to be top ranked (median best rank = 1.0), and 91% of them are ranked in the top 10. DISCUSSION: Based on this model, we have created a free, public web-based tool that, given any registered trial in ClinicalTrials.gov, presents a ranked list of the PubMed articles in order of estimated probability that they report clinical outcome data from that trial. The tool should greatly facilitate studies of trial outcome results and their relation to the original trial designs.

Assuntos

Aprendizado de Máquina , Relatório de Pesquisa , Ensaios Clínicos como Assunto , Humanos , Internet , PubMed , Sistema de Registros

7.

Editorial: Coronavirus Research Landscape: Resources, Utilities, and Analytic Studies.

Chen, Chaomei; Chavalarias, David; Smalheiser, Neil R; Wolfram, Dietmar.

Front Res Metr Anal ; 6: 712672, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34250436

8.

Anne O'Tate: Value-added PubMed search engine for analysis and text mining.

Smalheiser, Neil R; Fragnito, Dean P; Tirk, Eric E.

PLoS One ; 16(3): e0248335, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33684153

RESUMO

Over a decade ago, we introduced Anne O'Tate, a free, public web-based tool http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/AnneOTate.cgi to support user-driven summarization, drill-down and mining of search results from PubMed, the leading search engine for biomedical literature. A set of hotlinked buttons allows the user to sort and rank retrieved articles according to important words in titles and abstracts; topics; author names; affiliations; journal names; publication year; and clustered by topic. Any result can be further mined by choosing any other button, and small search results can be expanded to include related articles. It has been deployed continuously, serving a wide range of biomedical users and needs, and over time has also served as a platform to support the creation of new tools that address additional needs. Here we describe the current, greatly expanded implementation of Anne O'Tate, which has added additional buttons to provide new functionalities: We now allow users to sort and rank search results by important phrases contained in titles and abstracts; the number of authors listed on the article; and pairs of topics that co-occur significantly more than chance. We also display articles according to NLM-indexed publication types, as well as according to 50 different publication types and study designs as predicted by a novel machine learning-based model. Furthermore, users can import search results into two new tools: e) Mine the Gap!, which identifies pairs of topics that are under-represented within set of the search results, and f) Citation Cloud, which for any given article, allows users to visualize the set of articles that cite it; that are cited by it; that are co-cited with it; and that are bibliographically coupled to it. We invite the scientific community to explore how Anne O'Tate can assist in analyzing biomedical literature, in a variety of use cases.

Assuntos

Indexação e Redação de Resumos , Mineração de Dados/tendências , PubMed/tendências , Ferramenta de Busca , Humanos , Software

9.

Effect size, sample size and power of forced swim test assays in mice: Guidelines for investigators to optimize reproducibility.

Smalheiser, Neil R; Graetz, Elena E; Yu, Zhou; Wang, Jing.

PLoS One ; 16(2): e0243668, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33626103

RESUMO

A recent flood of publications has documented serious problems in scientific reproducibility, power, and reporting of biomedical articles, yet scientists persist in their usual practices. Why? We examined a popular and important preclinical assay, the Forced Swim Test (FST) in mice used to test putative antidepressants. Whether the mice were assayed in a naïve state vs. in a model of depression or stress, and whether the mice were given test agents vs. known antidepressants regarded as positive controls, the mean effect sizes seen in the experiments were indeed extremely large (1.5-2.5 in Cohen's d units); most of the experiments utilized 7-10 animals per group which did have adequate power to reliably detect effects of this magnitude. We propose that this may at least partially explain why investigators using the FST do not perceive intuitively that their experimental designs fall short-even though proper prospective design would require ~21-26 animals per group to detect, at a minimum, large effects (0.8 in Cohen's d units) when the true effect of a test agent is unknown. Our data provide explicit parameters and guidance for investigators seeking to carry out prospective power estimation for the FST. More generally, altering the real-life behavior of scientists in planning their experiments may require developing educational tools that allow them to actively visualize the inter-relationships among effect size, sample size, statistical power, and replicability in a direct and intuitive manner.

Assuntos

Antidepressivos/farmacologia , Avaliação Pré-Clínica de Medicamentos/métodos , Teste de Esforço/métodos , Camundongos , Condicionamento Físico Animal/métodos , Projetos de Pesquisa , Animais , Depressão/tratamento farmacológico , Feminino , Masculino , Camundongos/fisiologia , Reprodutibilidade dos Testes , Tamanho da Amostra , Natação

10.

New improved Aggregator: predicting which clinical trial articles derive from the same registered clinical trial.

Smalheiser, Neil R; Holt, Arthur W.

JAMIA Open ; 3(3): 338-341, 2020 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-33215068

RESUMO

OBJECTIVES: To identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence. MATERIALS AND METHODS: We updated our previous model by creating larger, more recent, and more diverse positive and negative training sets consisting of article pairs that were (or not) linked to the same ClinicalTrials.gov trial registry number. Features were extracted from PubMed metadata; pairwise similarity scores were modeled using logistic regression and used to form clusters of articles that are likely to arise from the same registered clinical trial. RESULTS: Articles from the same trial were identified with high accuracy (F1 = 0.859), nominally better than the previous model (F1 = 0.843). Predicted clusters showed a low error rate of splitting of 8-11% (ie, when 2 articles belonged to the same trial but were assigned to different clusters). Performance was similar whether only randomized controlled trial articles or a more diverse set of clinical trial articles were processed. DISCUSSION: Metadata are surprisingly accurate in predicting when 2 articles derive from the same underlying clinical trial. CONCLUSION: We have continued confidence in the Aggregator tool which can be accessed publicly at http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi.

11.

Identifying main finding sentences in clinical case reports.

Luo, Mengqi; Cohen, Aaron M; Addepalli, Sidharth; Smalheiser, Neil R.

Database (Oxford) ; 20202020 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-32525207

RESUMO

Clinical case reports are the 'eyewitness reports' of medicine and provide a valuable, unique, albeit noisy and underutilized type of evidence. Generally, a case report has a single main finding that represents the reason for writing up the report in the first place. However, no one has previously created an automatic way of identifying main finding sentences in case reports. We previously created a manual corpus of main finding sentences extracted from the abstracts and full text of clinical case reports. Here, we have utilized the corpus to create a machine learning-based model that automatically predicts which sentence(s) from abstracts state the main finding. The model has been evaluated on a separate manual corpus of clinical case reports and found to have good performance. This is a step toward setting up a retrieval system in which, given one case report, one can find other case reports that report the same or very similar main findings. The code and necessary files to run the main finding model can be downloaded from https://github.com/qi29/main_ finding_recognition, released under the Apache License, Version 2.0.

Assuntos

Mineração de Dados/métodos , Aprendizado de Máquina , Prontuários Médicos/classificação , Humanos , Processamento de Linguagem Natural , Software

12.

A Neglected Link Between the Psychoactive Effects of Dietary Ingredients and Consciousness-Altering Drugs.

Smalheiser, Neil R.

Front Psychiatry ; 10: 591, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31474892

13.

Ketamine: A Neglected Therapy for Alzheimer Disease.

Smalheiser, Neil R.

Front Aging Neurosci ; 11: 186, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31396078

14.

Mining Clinical Case Reports to Identify New Lines of Investigation in Alzheimer's Disease: The Curious Case of DNase I.

Smalheiser, Neil R.

J Alzheimers Dis Rep ; 3(1): 71-76, 2019 Mar 22.

Artigo em Inglês | MEDLINE | ID: mdl-31025031

RESUMO

Mining the case report literature identified an intriguing, yet neglected finding: Deoxyribonuclease I (DNase I) as a possible treatment for Alzheimer's disease. This finding is speculative, both because it is based on one patient, and because the underlying mechanism(s) of action remain obscure. However, further literature review revealed that there are several plausible mechanisms by which DNase I might affect the course of Alzheimer's disease. Given that DNase I is an FDA-approved drug, with extensive studies in both animals and man in the context of other diseases, I suggest that investigation of DNAse I in Alzheimer's disease is worthwhile.

15.

Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings.

Smalheiser, Neil R; Cohen, Aaron M; Bonifield, Gary.

J Biomed Inform ; 90: 103096, 2019 02.

Artigo em Inglês | MEDLINE | ID: mdl-30654030

RESUMO

Neural embeddings are a popular set of methods for representing words, phrases or text as a low dimensional vector (typically 50-500 dimensions). However, it is difficult to interpret these dimensions in a meaningful manner, and creating neural embeddings requires extensive training and tuning of multiple parameters and hyperparameters. We present here a simple unsupervised method for representing words, phrases or text as a low dimensional vector, in which the meaning and relative importance of dimensions is transparent to inspection. We have created a near-comprehensive vector representation of words, and selected bigrams, trigrams and abbreviations, using the set of titles and abstracts in PubMed as a corpus. This vector is used to create several novel implicit word-word and text-text similarity metrics. The implicit word-word similarity metrics correlate well with human judgement of word pair similarity and relatedness, and outperform or equal all other reported methods on a variety of biomedical benchmarks, including several implementations of neural embeddings trained on PubMed corpora. Our implicit word-word metrics capture different aspects of word-word relatedness than word2vec-based metrics and are only partially correlated (rhoâ¯=â¯0.5-0.8 depending on task and corpus). The vector representations of words, bigrams, trigrams, abbreviations, and PubMed titleâ¯+â¯abstracts are all publicly available from http://arrowsmith.psych.uic.edu/arrowsmith_uic/word_similarity_metrics.html for release under CC-BY-NC license. Several public web query interfaces are also available at the same site, including one which allows the user to specify a given word and view its most closely related terms according to direct co-occurrence as well as different implicit similarity metrics.

Assuntos

Mineração de Dados , PubMed , Semântica

16.

A manual corpus of annotated main findings of clinical case reports.

Smalheiser, Neil R; Luo, Mengqi; Addepalli, Sidharth; Cui, Xiaokai.

Database (Oxford) ; 20192019 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-30657910

RESUMO

Clinical case reports are the `eyewitness reports' of medicine and provide a valuable, unique, albeit noisy and underutilized type of evidence. Generally a case report has a single main finding that represents the reason for writing up the report in the first place. In the present study, we present the results of manual annotation carried out by two individuals on 500 randomly sampled case reports. This corpus contains main finding sentences extracted from title, abstract and full-text of the same article that can be regarded as semantically related and are often paraphrases. The final reconciled corpus of 416 articles comprises an open resource for further study. This is the first step in establishing text mining models and tools that can identify main finding sentences in an automated fashion, and in measuring quantitatively how similar any two main findings are. We envision that case reports in PubMed may be automatically indexed by main finding, so that users can carry out information queries for specific main findings (rather than general topics)-and given one case report, a user can retrieve those having the most similar main findings. The metric of main finding similarity may also potentially be relevant to the modeling of paraphrasing, summarization and entailment within the biomedical literature.

Assuntos

Curadoria de Dados/métodos , Mineração de Dados/métodos , Prontuários Médicos , PubMed , Bases de Dados Factuais , Humanos , Semântica , Terminologia como Assunto

17.

A probabilistic automated tagger to identify human-related publications.

Cohen, Aaron M; Dunivin, Zackary O; Smalheiser, Neil R.

Database (Oxford) ; 2018: 1-8, 2018 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-30184195

RESUMO

The Medical Subject Heading 'Humans' is manually curated and indicates human-related studies within MEDLINE. However, newly published MEDLINE articles may take months to be indexed and non-MEDLINE articles lack consistent, transparent indexing of this feature. Therefore, for up to date and broad literature searches, there is a need for an independent automated system to identify whether a given publication is human-related, particularly when they lack Medical Subject Headings. One million MEDLINE records published in 1987-2014 were randomly selected. Text-based features from the title, abstract, author name and journal fields were extracted. A linear support vector machine was trained to estimate the probability that a given article should be indexed as Humans and was evaluated on records from 2015 to 2016. Overall accuracy was high: area under the receiver operating curve = 0.976, F1 = 95% relative to MeSH indexing. Manual review of cases of extreme disagreement with MEDLINE showed 73.5% agreement with the automated prediction. We have tagged all articles indexed in PubMed with predictive scores and have made the information publicly available at http://arrowsmith.psych.uic.edu/evidence_based_medicine/index.html. We have also made available a web-based interface to allow users to obtain predictive scores for non-MEDLINE articles. This will assist in the triage of clinical evidence for writing systematic reviews.

Assuntos

Automação , Probabilidade , Publicações , Calibragem , Bases de Dados como Assunto , Humanos , Reprodutibilidade dos Testes

18.

Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database.

Smalheiser, Neil R; Cohen, Aaron M.

Data Inf Manag ; 2(1): 27-36, 2018 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-30766970

RESUMO

Many investigators have carried out text mining of the biomedical literature for a variety of purposes, ranging from the assignment of indexing terms to the disambiguation of author names. A common approach is to define positive and negative training examples, extract features from article metadata, and employ machine learning algorithms. At present, each research group tackles each problem from scratch, and in isolation of other projects, which causes redundancy and great waste of effort. Here, we propose and describe the design of a generic platform for biomedical text mining, which can serve as a shared resource for machine learning projects, and can serve as a public repository for their outputs. We will initially focus on a specific goal, namely, classifying articles according to Publication Type, and emphasize how feature sets can be made more powerful and robust through the use of multiple, heterogeneous similarity measures as input to machine learning models. We then discuss how the generic platform can be extended to include a wide variety of other machine learning based goals and projects, and can be used as a public platform for disseminating the results of NLP tools to end-users as well.

19.

Gaps within the Biomedical Literature: Initial Characterization and Assessment of Strategies for Discovery.

Peng, Yufang; Bonifield, Gary; Smalheiser, Neil R.

Front Res Metr Anal ; 22017 May.

Artigo em Inglês | MEDLINE | ID: mdl-29271976

RESUMO

Within well-established fields of biomedical science, we identify "gaps", topical areas of investigation that might be expected to occur but are missing. We define a field by carrying out a topical PubMed query, and analyze Medical Subject Headings by which the set of retrieved articles are indexed. Medical Subject headings (MeSH terms) which occur in >1% of the articles are examined pairwise to see how often they are predicted to co-occur within individual articles (assuming that they are independent of each other). A pair of MeSH terms that are predicted to co-occur in at least 10 articles, yet are not observed to co-occur in any article, are "gaps" and were studied further in a corpus of 10 disease-related article sets and 10 related to biological processes. Overall, articles that filled gaps were cited more heavily than non-gap-filling articles and were 61% more likely to be published in multidisciplinary high-impact journals. Nine different features of these "gaps" were characterized and tested to learn which, if any, correlate with the appearance of one or more articles containing both MeSH terms within the next five years. Several different types of gaps were identified, each having distinct combinations of predictive features: a) those arising as a byproduct of MeSH indexing rules; b) those having little biological meaning; c) those representing "low hanging fruit" for immediate exploitation; and d) those representing gaps across disciplines or sub-disciplines that do not talk to each other or work together. We have built a free, open tool called "Mine the Gap!" that identifies and characterizes the "gaps" for any PubMed query, which can be accessed via the Anne O'Tate value-added PubMed search interface (http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/AnneOTate.cgi).

20.

Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach.

Wallace, Byron C; Noel-Storr, Anna; Marshall, Iain J; Cohen, Aaron M; Smalheiser, Neil R; Thomas, James.

J Am Med Inform Assoc ; 24(6): 1165-1168, 2017 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-28541493

RESUMO

OBJECTIVES: Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed to make this process more efficient via a hybrid approach using both crowdsourcing and ML. METHODS: We trained a classifier to discriminate between citations that describe RCTs and those that do not. We then adopted a simple strategy of automatically excluding citations deemed very unlikely to be RCTs by the classifier and deferring to crowdworkers otherwise. RESULTS: Combining ML and crowdsourcing provides a highly sensitive RCT identification strategy (our estimates suggest 95%-99% recall) with substantially less effort (we observed a reduction of around 60%-80%) than relying on manual screening alone. CONCLUSIONS: Hybrid crowd-ML strategies warrant further exploration for biomedical curation/annotation tasks.

Assuntos

Crowdsourcing , Armazenamento e Recuperação da Informação/métodos , Aprendizado de Máquina , Ensaios Clínicos Controlados Aleatórios como Assunto , Pesquisa Biomédica , Bases de Dados Bibliográficas , Processamento de Linguagem Natural , Curva ROC , Literatura de Revisão como Assunto , Máquina de Vetores de Suporte

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA