Pesquisa | Portal de Pesquisa da BVS

1.

Monitoring User Opinions and Side Effects on COVID-19 Vaccines in the Twittersphere: Infodemiology Study of Tweets.

Portelli, Beatrice; Scaboro, Simone; Tonino, Roberto; Chersoni, Emmanuele; Santus, Enrico; Serra, Giuseppe.

J Med Internet Res ; 24(5): e35115, 2022 05 13.

Artigo em Inglês | MEDLINE | ID: mdl-35446781

RESUMO

BACKGROUND: In the current phase of the COVID-19 pandemic, we are witnessing the most massive vaccine rollout in human history. Like any other drug, vaccines may cause unexpected side effects, which need to be investigated in a timely manner to minimize harm in the population. If not properly dealt with, side effects may also impact public trust in the vaccination campaigns carried out by national governments. OBJECTIVE: Monitoring social media for the early identification of side effects, and understanding the public opinion on the vaccines are of paramount importance to ensure a successful and harmless rollout. The objective of this study was to create a web portal to monitor the opinion of social media users on COVID-19 vaccines, which can offer a tool for journalists, scientists, and users alike to visualize how the general public is reacting to the vaccination campaign. METHODS: We developed a tool to analyze the public opinion on COVID-19 vaccines from Twitter, exploiting, among other techniques, a state-of-the-art system for the identification of adverse drug events on social media; natural language processing models for sentiment analysis; statistical tools; and open-source databases to visualize the trending hashtags, news articles, and their factuality. All modules of the system are displayed through an open web portal. RESULTS: A set of 650,000 tweets was collected and analyzed in an ongoing process that was initiated in December 2020. The results of the analysis are made public on a web portal (updated daily), together with the processing tools and data. The data provide insights on public opinion about the vaccines and its change over time. For example, users show a high tendency to only share news from reliable sources when discussing COVID-19 vaccines (98% of the shared URLs). The general sentiment of Twitter users toward the vaccines is negative/neutral; however, the system is able to record fluctuations in the attitude toward specific vaccines in correspondence with specific events (eg, news about new outbreaks). The data also show how news coverage had a high impact on the set of discussed topics. To further investigate this point, we performed a more in-depth analysis of the data regarding the AstraZeneca vaccine. We observed how media coverage of blood clot-related side effects suddenly shifted the topic of public discussions regarding both the AstraZeneca and other vaccines. This became particularly evident when visualizing the most frequently discussed symptoms for the vaccines and comparing them month by month. CONCLUSIONS: We present a tool connected with a web portal to monitor and display some key aspects of the public's reaction to COVID-19 vaccines. The system also provides an overview of the opinions of the Twittersphere through graphic representations, offering a tool for the extraction of suspected adverse events from tweets with a deep learning model.

Assuntos

COVID-19 , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Mídias Sociais , Atitude , COVID-19/epidemiologia , COVID-19/prevenção & controle , Vacinas contra COVID-19/efeitos adversos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Humanos , Infodemiologia , Pandemias , SARS-CoV-2

2.

Artificial Intelligence-Aided Precision Medicine for COVID-19: Strategic Areas of Research and Development.

Santus, Enrico; Marino, Nicola; Cirillo, Davide; Chersoni, Emmanuele; Montagud, Arnau; Santuccione Chadha, Antonella; Valencia, Alfonso; Hughes, Kevin; Lindvall, Charlotta.

J Med Internet Res ; 23(3): e22453, 2021 03 12.

Artigo em Inglês | MEDLINE | ID: mdl-33560998

RESUMO

Artificial intelligence (AI) technologies can play a key role in preventing, detecting, and monitoring epidemics. In this paper, we provide an overview of the recently published literature on the COVID-19 pandemic in four strategic areas: (1) triage, diagnosis, and risk prediction; (2) drug repurposing and development; (3) pharmacogenomics and vaccines; and (4) mining of the medical literature. We highlight how AI-powered health care can enable public health systems to efficiently handle future outbreaks and improve patient outcomes.

Assuntos

Inteligência Artificial , COVID-19/terapia , Medicina de Precisão/métodos , Humanos , Pandemias , Pesquisa , SARS-CoV-2/isolamento & purificação

3.

New Insights in Cheese Yield Capacity of the Milk of Italian Brown and Italian Friesian Cattle in the Production of High-Moisture Mozzarella.

Franceschi, Piero; Malacarne, Massimo; Faccia, Michele; Rossoni, Attilio; Santus, Enrico; Formaggioni, Paolo; Summer, Andrea.

Food Technol Biotechnol ; 58(1): 91-97, 2020 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-32684793

RESUMO

The aim of the present study is to investigate the effect of κ-casein B content in milk on the yield of high-moisture mozzarella cheese. The study was carried out by monitoring the production of eight mozzarella cheese batches at four cheese making factories. At each factory, two cheese making trials were performed in parallel: one using bulk milk from Italian Brown cattle and the other using bulk milk from Italian Friesian cattle. The average κ-casein B content was 0.04 g per 100 g in the Italian Friesian cows' milk, whereas it was four time higher in the Italian Brown cows' milk, reaching values of 0.16 g per 100 g. Both the κ-casein content and κ-casein B to casein ratio were positively correlated with actual cheese yield. Both parameters showed correlation coefficient values over 0.9, higher than for any other protein fraction. The influence of the level of κ-casein on the increase of the yield is probably due to smaller and more homogeneous micelles, with more efficient rennet coagulation. Consequently, milk with higher κ-casein B content produces a more elastic curd that withstands better the technological treatments and limits losses during curd mincing and stretching. In conclusion, the Italian Brown cows' milk used, characterized by higher κ-casein content than the Italian Friesian's one, allowed a yield increase of about 2.65%, which is a very relevant result for both farms and cheese making factories.

4.

Successful Development of a Natural Language Processing Algorithm for Pancreatic Neoplasms and Associated Histologic Features.

Harrison, Jon Michael; Yala, Adam; Mikhael, Peter; Roldan, Jorge; Ciprani, Debora; Michelakos, Theodoros; Bolm, Louisa; Qadan, Motaz; Ferrone, Cristina; Fernandez-Del Castillo, Carlos; Lillemoe, Keith Douglas; Santus, Enrico; Hughes, Kevin.

Pancreas ; 52(4): e219-e223, 2023 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-37716007

RESUMO

OBJECTIVES: Natural language processing (NLP) algorithms can interpret unstructured text for commonly used terms and phrases. Pancreatic pathologies are diverse and include benign and malignant entities with associated histologic features. Creating a pancreas NLP algorithm can aid in electronic health record coding as well as large database creation and curation. METHODS: Text-based pancreatic anatomic and cytopathologic reports for pancreatic cancer, pancreatic ductal adenocarcinoma, neuroendocrine tumor, intraductal papillary neoplasm, tumor dysplasia, and suspicious findings were collected. This dataset was split 80/20 for model training and development. A separate set was held out for testing purposes. We trained using convolutional neural network to predict each heading. RESULTS: Over 14,000 reports were obtained from the Mass General Brigham Healthcare System electronic record. Of these, 1252 reports were used for algorithm development. Final accuracy and F1 scores relative to the test set ranged from 95% and 98% for each queried pathology. To understand the dependence of our results to training set size, we also generated learning curves. Scoring metrics improved as more reports were submitted for training; however, some queries had high index performance. CONCLUSIONS: Natural language processing algorithms can be used for pancreatic pathologies. Increased training volume, nonoverlapping terminology, and conserved text structure improve NLP algorithm performance.

Assuntos

Processamento de Linguagem Natural , Neoplasias Pancreáticas , Humanos , Algoritmos , Neoplasias Pancreáticas/diagnóstico , Neoplasias Pancreáticas/terapia , Redes Neurais de Computação , Neoplasias Pancreáticas

5.

Towards AI-driven longevity research: An overview.

Marino, Nicola; Putignano, Guido; Cappilli, Simone; Chersoni, Emmanuele; Santuccione, Antonella; Calabrese, Giuliana; Bischof, Evelyne; Vanhaelen, Quentin; Zhavoronkov, Alex; Scarano, Bryan; Mazzotta, Alessandro D; Santus, Enrico.

Front Aging ; 4: 1057204, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36936271

RESUMO

While in the past technology has mostly been utilized to store information about the structural configuration of proteins and molecules for research and medical purposes, Artificial Intelligence is nowadays able to learn from the existing data how to predict and model properties and interactions, revealing important knowledge about complex biological processes, such as aging. Modern technologies, moreover, can rely on a broader set of information, including those derived from the next-generation sequencing (e.g., proteomics, lipidomics, and other omics), to understand the interactions between human body and the external environment. This is especially relevant as external factors have been shown to have a key role in aging. As the field of computational systems biology keeps improving and new biomarkers of aging are being developed, artificial intelligence promises to become a major ally of aging research.

6.

Increasing adverse drug events extraction robustness on social media: Case study on negation and speculation.

Scaboro, Simone; Portelli, Beatrice; Chersoni, Emmanuele; Santus, Enrico; Serra, Giuseppe.

Exp Biol Med (Maywood) ; 247(22): 2003-2014, 2022 11.

Artigo em Inglês | MEDLINE | ID: mdl-36314865

RESUMO

In the last decade, an increasing number of users have started reporting adverse drug events (ADEs) on social media platforms, blogs, and health forums. Given the large volume of reports, pharmacovigilance has focused on ways to use natural language processing (NLP) techniques to rapidly examine these large collections of text, detecting mentions of drug-related adverse reactions to trigger medical investigations. However, despite the growing interest in the task and the advances in NLP, the robustness of these models in face of linguistic phenomena such as negations and speculations is an open research question. Negations and speculations are pervasive phenomena in natural language and can severely hamper the ability of an automated system to discriminate between factual and non-factual statements in text. In this article, we take into consideration four state-of-the-art systems for ADE detection on social media texts. We introduce SNAX, a benchmark to test their performance against samples containing negated and speculated ADEs, showing their fragility against these phenomena. We then introduce two possible strategies to increase the robustness of these models, showing that both of them bring significant increases in performance, lowering the number of spurious entities predicted by the models by 60% for negation and 80% for speculations.

Assuntos

Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Mídias Sociais , Humanos , Processamento de Linguagem Natural , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Farmacovigilância

7.

Digital biomarkers and sex impacts in Alzheimer's disease management - potential utility for innovative 3P medicine approach.

Harms, Robbert L; Ferrari, Alberto; Meier, Irene B; Martinkova, Julie; Santus, Enrico; Marino, Nicola; Cirillo, Davide; Mellino, Simona; Catuara Solarz, Silvina; Tarnanas, Ioannis; Szoeke, Cassandra; Hort, Jakub; Valencia, Alfonso; Ferretti, Maria Teresa; Seixas, Azizi; Santuccione Chadha, Antonella.

EPMA J ; 13(2): 299-313, 2022 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-35719134

RESUMO

Digital biomarkers are defined as objective, quantifiable physiological and behavioral data that are collected and measured by means of digital devices. Their use has revolutionized clinical research by enabling high-frequency, longitudinal, and sensitive measurements. In the field of neurodegenerative diseases, an example of a digital biomarker-based technology is instrumental activities of daily living (iADL) digital medical application, a predictive biomarker of conversion from mild cognitive impairment (MCI) due to Alzheimer's disease (AD) to dementia due to AD in individuals aged 55 + . Digital biomarkers show promise to transform clinical practice. Nevertheless, their use may be affected by variables such as demographics, genetics, and phenotype. Among these factors, sex is particularly important in Alzheimer's, where men and women present with different symptoms and progression patterns that impact diagnosis. In this study, we explore sex differences in Altoida's digital medical application in a sample of 568 subjects consisting of a clinical dataset (MCI and dementia due to AD) and a healthy population. We found that a biological sex-classifier, built on digital biomarker features captured using Altoida's application, achieved a 75% ROC-AUC (receiver operating characteristic - area under curve) performance in predicting biological sex in healthy individuals, indicating significant differences in neurocognitive performance signatures between males and females. The performance dropped when we applied this classifier to more advanced stages on the AD continuum, including MCI and dementia, suggesting that sex differences might be disease-stage dependent. Our results indicate that neurocognitive performance signatures built on data from digital biomarker features are different between men and women. These results stress the need to integrate traditional approaches to dementia research with digital biomarker technologies and personalized medicine perspectives to achieve more precise predictive diagnostics, targeted prevention, and customized treatment of cognitive decline. Supplementary Information: The online version contains supplementary material available at 10.1007/s13167-022-00284-3.

8.

Simulating SARS-CoV-2 epidemics by region-specific variables and modeling contact tracing app containment.

Ferrari, Alberto; Santus, Enrico; Cirillo, Davide; Ponce-de-Leon, Miguel; Marino, Nicola; Ferretti, Maria Teresa; Santuccione Chadha, Antonella; Mavridis, Nikolaos; Valencia, Alfonso.

NPJ Digit Med ; 4(1): 9, 2021 Jan 14.

Artigo em Inglês | MEDLINE | ID: mdl-33446891

RESUMO

Targeted contact-tracing through mobile phone apps has been proposed as an instrument to help contain the spread of COVID-19 and manage the lifting of nation-wide lock-downs currently in place in USA and Europe. However, there is an ongoing debate on its potential efficacy, especially in light of region-specific demographics. We built an expanded SIR model of COVID-19 epidemics that accounts for region-specific population densities, and we used it to test the impact of a contact-tracing app in a number of scenarios. Using demographic and mobility data from Italy and Spain, we used the model to simulate scenarios that vary in baseline contact rates, population densities, and fraction of app users in the population. Our results show that, in support of efficient isolation of symptomatic cases, app-mediated contact-tracing can successfully mitigate the epidemic even with a relatively small fraction of users, and even suppress altogether with a larger fraction of users. However, when regional differences in population density are taken into consideration, the epidemic can be significantly harder to contain in higher density areas, highlighting potential limitations of this intervention in specific contexts. This work corroborates previous results in favor of app-mediated contact-tracing as mitigation measure for COVID-19, and draws attention on the importance of region-specific demographic and mobility factors to achieve maximum efficacy in containment policies.

9.

Integration of Wet-Lab Measures, Milk Infrared Spectra, and Genomics to Improve Difficult-to-Measure Traits in Dairy Cattle Populations.

Cecchinato, Alessio; Toledo-Alvarado, Hugo; Pegolo, Sara; Rossoni, Attilio; Santus, Enrico; Maltecca, Christian; Bittante, Giovanni; Tiezzi, Francesco.

Front Genet ; 11: 563393, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33133149

RESUMO

The objective of this study was to evaluate the contribution of Fourier-transformed infrared spectroscopy (FTIR) data for dairy cattle breeding through two different approaches: (i) estimating the genetic parameters for 30 measured milk traits and their FTIR predictions and investigating the additive genetic correlation between them and (ii) evaluating the effectiveness of FTIR-derived phenotyping to replicate a candidate bull's progeny testing or breeding value prediction at birth. Records were available from 1,123 cows phenotyped using gold standard laboratory methodologies (LAB data). This included phenotypes related to fine milk composition and milk technological characteristics, milk acidity, and milk protein fractions. The dataset used to generate FTIR predictions comprised 729,202 test-day records from 51,059 Brown Swiss cows (FIELD data). A first approach consisted of estimating genetic parameters for phenotypes available from LAB and FIELD datasets. To do so, a set of bivariate animal models were run, and genetic correlations between LAB and FIELD phenotypes were estimated using FIELD information obtained at the population level. Heritability estimates were generally higher for FIELD predictions than for the corresponding LAB measures. The additive genetic correlations (r a ) between LAB and FIELD phenotypes had different magnitudes across traits but were generally strong. Overall, these results demonstrated the potential of using FIELD information as indicator traits for the indirect genetic improvement of LAB measures. In the second approach, we included genotype information for 1,011 cows from the LAB dataset, 1,493 cows from the FIELD dataset, 181 sires with daughters in both LAB and FIELD datasets, and 540 sires with daughters in the FIELD dataset only. Predictions were obtained using the single-step GBLUP method. A four fold cross-validation was used to assess the predictive ability of the different models, assessed as the ability to predict masked LAB records from daughters of progeny testing bulls. The correlation between observed and predicted LAB measures in validation was averaged over the four training-validation sets. Different sets of phenotypic information were used sequentially in cross-validation schemes: (i) LAB cows from the training set; (ii) FIELD cows from the training set; and (iii) FIELD cows from the validation set. Models that included FIELD records showed an improvement for the majority of traits. This study suggests that breeding programs for difficult-to-measure traits could be implemented using FTIR information. While these programs should use progeny testing, acceptable values of accuracy can be achieved also for bulls without phenotyped progeny. Robust calibration equations are, deemed as essential.

10.

Exploiting Rules to Enhance Machine Learning in Extracting Information From Multi-Institutional Prostate Pathology Reports.

Santus, Enrico; Schuster, Tal; Tahmasebi, Amir M; Li, Clara; Yala, Adam; Lanahan, Conor R; Prinsen, Peter; Thompson, Scott F; Coons, Samuel; Mynderse, Lance; Barzilay, Regina; Hughes, Kevin.

JCO Clin Cancer Inform ; 4: 865-874, 2020 10.

Artigo em Inglês | MEDLINE | ID: mdl-33006906

RESUMO

PURPOSE: Literature on clinical note mining has highlighted the superiority of machine learning (ML) over hand-crafted rules. Nevertheless, most studies assume the availability of large training sets, which is rarely the case. For this reason, in the clinical setting, rules are still common. We suggest 2 methods to leverage the knowledge encoded in pre-existing rules to inform ML decisions and obtain high performance, even with scarce annotations. METHODS: We collected 501 prostate pathology reports from 6 American hospitals. Reports were split into 2,711 core segments, annotated with 20 attributes describing the histology, grade, extension, and location of tumors. The data set was split by institutions to generate a cross-institutional evaluation setting. We assessed 4 systems, namely a rule-based approach, an ML model, and 2 hybrid systems integrating the previous methods: a Rule as Feature model and a Classifier Confidence model. Several ML algorithms were tested, including logistic regression (LR), support vector machine (SVM), and eXtreme gradient boosting (XGB). RESULTS: When training on data from a single institution, LR lags behind the rules by 3.5% (F1 score: 92.2% v 95.7%). Hybrid models, instead, obtain competitive results, with Classifier Confidence outperforming the rules by +0.5% (96.2%). When a larger amount of data from multiple institutions is used, LR improves by +1.5% over the rules (97.2%), whereas hybrid systems obtain +2.2% for Rule as Feature (97.7%) and +2.6% for Classifier Confidence (98.3%). Replacing LR with SVM or XGB yielded similar performance gains. CONCLUSION: We developed methods to use pre-existing handcrafted rules to inform ML algorithms. These hybrid systems obtain better performance than either rules or ML models alone, even when training data are limited.

Assuntos

Aprendizado de Máquina , Próstata , Algoritmos , Humanos , Modelos Logísticos , Masculino , Máquina de Vetores de Suporte , Estados Unidos

11.

Deep Natural Language Processing to Identify Symptom Documentation in Clinical Notes for Patients With Heart Failure Undergoing Cardiac Resynchronization Therapy.

Leiter, Richard E; Santus, Enrico; Jin, Zhijing; Lee, Katherine C; Yusufov, Miryam; Chien, Isabel; Ramaswamy, Ashwin; Moseley, Edward T; Qian, Yujie; Schrag, Deborah; Lindvall, Charlotta.

J Pain Symptom Manage ; 60(5): 948-958.e3, 2020 11.

Artigo em Inglês | MEDLINE | ID: mdl-32585181

RESUMO

CONTEXT: Clinicians lack reliable methods to predict which patients with congestive heart failure (CHF) will benefit from cardiac resynchronization therapy (CRT). Symptom burden may help to predict response, but this information is buried in free-text clinical notes. Natural language processing (NLP) may identify symptoms recorded in the electronic health record and thereby enable this information to inform clinical decisions about the appropriateness of CRT. OBJECTIVES: To develop, train, and test a deep NLP model that identifies documented symptoms in patients with CHF receiving CRT. METHODS: We identified a random sample of clinical notes from a cohort of patients with CHF who later received CRT. Investigators labeled documented symptoms as present, absent, and context dependent (pathologic depending on the clinical situation). The algorithm was trained on 80% and fine-tuned parameters on 10% of the notes. We tested the model on the remaining 10%. We compared the model's performance to investigators' annotations using accuracy, precision (positive predictive value), recall (sensitivity), and F1 score (a combined measure of precision and recall). RESULTS: Investigators annotated 154 notes (352,157 words) and identified 1340 present, 1300 absent, and 221 context-dependent symptoms. In the test set of 15 notes (35,467 words), the model's accuracy was 99.4% and recall was 66.8%. Precision was 77.6%, and overall F1 score was 71.8. F1 scores for present (70.8) and absent (74.7) symptoms were higher than that for context-dependent symptoms (48.3). CONCLUSION: A deep NLP algorithm can be trained to capture symptoms in patients with CHF who received CRT with promising precision and recall.

Assuntos

Terapia de Ressincronização Cardíaca , Insuficiência Cardíaca , Documentação , Registros Eletrônicos de Saúde , Insuficiência Cardíaca/diagnóstico , Insuficiência Cardíaca/terapia , Humanos , Processamento de Linguagem Natural

12.

Can machine learning improve patient selection for cardiac resynchronization therapy?

Hu, Szu-Yeu; Santus, Enrico; Forsyth, Alexander W; Malhotra, Devvrat; Haimson, Josh; Chatterjee, Neal A; Kramer, Daniel B; Barzilay, Regina; Tulsky, James A; Lindvall, Charlotta.

PLoS One ; 14(10): e0222397, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31581234

RESUMO

RATIONALE: Multiple clinical trials support the effectiveness of cardiac resynchronization therapy (CRT); however, optimal patient selection remains challenging due to substantial treatment heterogeneity among patients who meet the clinical practice guidelines. OBJECTIVE: To apply machine learning to create an algorithm that predicts CRT outcome using electronic health record (EHR) data avaible before the procedure. METHODS AND RESULTS: We applied machine learning and natural language processing to the EHR of 990 patients who received CRT at two academic hospitals between 2004-2015. The primary outcome was reduced CRT benefit, defined as <0% improvement in left ventricular ejection fraction (LVEF) 6-18 months post-procedure or death by 18 months. Data regarding demographics, laboratory values, medications, clinical characteristics, and past health services utilization were extracted from the EHR available before the CRT procedure. Bigrams (i.e., two-word sequences) were also extracted from the clinical notes using natural language processing. Patients accrued on average 75 clinical notes (SD, 29) before the procedure including data not captured anywhere else in the EHR. A machine learning model was built using 80% of the patient sample (training and validation dataset), and tested on a held-out 20% patient sample (test dataset). Among 990 patients receiving CRT the mean age was 71.6 (SD, 11.8), 78.1% were male, 87.2% non-Hispanic white, and the mean baseline LVEF was 24.8% (SD, 7.69). Out of 990 patients, 403 (40.7%) were identified as having a reduced benefit from the CRT device (<0% LVEF improvement in 25.2%, death by 18 months in 15.6%). The final model identified 26% of these patients at a positive predictive value of 79% (model performance: Fß (ß = 0.1): 77%; recall 0.26; precision 0.79; accuracy 0.65). CONCLUSIONS: A machine learning model that leveraged readily available EHR data and clinical notes identified a subset of CRT patients who may not benefit from CRT before the procedure.

Assuntos

Terapia de Ressincronização Cardíaca , Aprendizado de Máquina , Seleção de Pacientes , Idoso , Feminino , Humanos , Masculino , Modelos Teóricos , Avaliação de Resultados em Cuidados de Saúde , Curva ROC

13.

Do Neural Information Extraction Algorithms Generalize Across Institutions?

Santus, Enrico; Li, Clara; Yala, Adam; Peck, Donald; Soomro, Rufina; Faridi, Naveen; Mamshad, Isra; Tang, Rong; Lanahan, Conor R; Barzilay, Regina; Hughes, Kevin.

JCO Clin Cancer Inform ; 3: 1-8, 2019 07.

Artigo em Inglês | MEDLINE | ID: mdl-31310566

RESUMO

PURPOSE: Natural language processing (NLP) techniques have been adopted to reduce the curation costs of electronic health records. However, studies have questioned whether such techniques can be applied to data from previously unseen institutions. We investigated the performance of a common neural NLP algorithm on data from both known and heldout (ie, institutions whose data were withheld from the training set and only used for testing) hospitals. We also explored how diversity in the training data affects the system's generalization ability. METHODS: We collected 24,881 breast pathology reports from seven hospitals and manually annotated them with nine key attributes that describe types of atypia and cancer. We trained a convolutional neural network (CNN) on annotations from either only one (CNN1), only two (CNN2), or only four (CNN4) hospitals. The trained systems were tested on data from five organizations, including both known and heldout ones. For every setting, we provide the accuracy scores as well as the learning curves that show how much data are necessary to achieve good performance and generalizability. RESULTS: The system achieved a cross-institutional accuracy of 93.87% when trained on reports from only one hospital (CNN1). Performance improved to 95.7% and 96%, respectively, when the system was trained on reports from two (CNN2) and four (CNN4) hospitals. The introduction of diversity during training did not lead to improvements on the known institutions, but it boosted performance on the heldout institutions. When tested on reports from heldout hospitals, CNN4 outperformed CNN1 and CNN2 by 2.13% and 0.3%, respectively. CONCLUSION: Real-world scenarios require that neural NLP approaches scale to data from previously unseen institutions. We show that a common neural NLP algorithm for information extraction can achieve this goal, especially when diverse data are used during training.

Assuntos

Algoritmos , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Bases de Dados Factuais , Registros Eletrônicos de Saúde/economia , Registros Eletrônicos de Saúde/organização & administração , Registros Eletrônicos de Saúde/normas , Humanos , Informática Médica/economia , Informática Médica/métodos , Informática Médica/organização & administração , Informática Médica/normas

14.

Fine mapping for Weaver syndrome in Brown Swiss cattle and the identification of 41 concordant mutations across NRCAM, PNPLA8 and CTTNBP2.

McClure, Matthew; Kim, Euisoo; Bickhart, Derek; Null, Daniel; Cooper, Tabatha; Cole, John; Wiggans, George; Ajmone-Marsan, Paolo; Colli, Licia; Santus, Enrico; Liu, George E; Schroeder, Steve; Matukumalli, Lakshmi; Van Tassell, Curt; Sonstegard, Tad.

PLoS One ; 8(3): e59251, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23527149

RESUMO

Bovine Progressive Degenerative Myeloencephalopathy (Weaver Syndrome) is a recessive neurological disease that has been observed in the Brown Swiss cattle breed since the 1970's in North America and Europe. Bilateral hind leg weakness and ataxia appear in afflicted animals at 6 to 18 months of age, and slowly progresses to total loss of hind limb control by 3 to 4 years of age. While Weaver has previously been mapped to Bos taurus autosome (BTA) 4â¶46-56 Mb and a diagnostic test based on the 6 microsatellite (MS) markers is commercially available, neither the causative gene nor mutation has been identified; therefore misdiagnosis can occur due to recombination between the diagnostic MS markers and the causative mutation. Analysis of 34,980 BTA 4 SNPs genotypes derived from the Illumina BovineHD assay for 20 Brown Swiss Weaver carriers and 49 homozygous normal bulls refined the Weaver locus to 48-53 Mb. Genotyping of 153 SNPs, identified from whole genome sequencing of 10 normal and 10 carrier animals, across a validation set of 841 animals resulted in the identification of 41 diagnostic SNPs that were concordant with the disease. Except for one intergenic SNP all are associated with genes expressed in nervous tissues: 37 distal to NRCAM, one non-synonymous (serine to asparagine) in PNPLA8, one synonymous and one non-synonymous (lysine to glutamic acid) in CTTNBP2. Haplotype and imputation analyses of 7,458 Brown Swiss animals with Illumina BovineSNP50 data and the 41 diagnostic SNPs resulted in the identification of only one haplotype concordant with the Weaver phenotype. Use of this haplotype and the diagnostic SNPs more accurately identifies Weaver carriers in both Brown Swiss purebred and influenced herds.

Assuntos

Doenças dos Bovinos/genética , Doenças do Sistema Nervoso Central/veterinária , Bainha de Mielina/patologia , Doenças Neurodegenerativas/veterinária , Fenótipo , Esclerose Lateral Amiotrófica/genética , Animais , Sequência de Bases , Bovinos , Moléculas de Adesão Celular/genética , Doenças do Sistema Nervoso Central/genética , Mapeamento Cromossômico/veterinária , Genes Recessivos , Estudo de Associação Genômica Ampla , Genótipo , Haplótipos/genética , Humanos , Lipase/genética , Dados de Sequência Molecular , Proteínas do Tecido Nervoso/genética , Doenças Neurodegenerativas/genética , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência , Análise de Sequência de DNA/veterinária , Especificidade da Espécie

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA