Búsqueda | Portal Regional de la BVS

1.

High-quality gene/disease embedding in a multi-relational heterogeneous graph after a joint matrix/tensor decomposition.

Zhou, Kaiyin; Zhang, Sheng; Wang, Yuxing; Cohen, Kevin Bretonnel; Kim, Jin-Dong; Luo, Qi; Yao, Xinzhi; Zhou, Xingyu; Xia, Jingbo.

J Biomed Inform ; 126: 103973, 2022 02.

Artículo en Inglés | MEDLINE | ID: mdl-34995810

RESUMEN

MOTIVATION: Node embedding of biological entity network has been widely investigated for the downstream application scenarios. To embed full semantics of gene and disease, a multi-relational heterogeneous graph is considered in a scenario where uni-relation between gene/disease and other heterogeneous entities are abundant while multi-relation between gene and disease is relatively sparse. After introducing this novel graph format, it is illuminative to design a specific data integration algorithm to fully capture the graph information and bring embeddings with high quality. RESULTS: First, a typical multi-relational triple dataset was introduced, which carried significant association between gene and disease. Second, we curated all human genes and diseases in seven mainstream datasets and constructed a large-scale gene-disease network, which compromising 163,024 nodes and 25,265,607 edges, and relates to 27,165 genes, 2,665 diseases, 15,067 chemicals, 108,023 mutations, 2,363 pathways, and 7.732 phenotypes. Third, we proposed a Joint Decomposition of Heterogeneous Matrix and Tensor (JDHMT) model, which integrated all heterogeneous data resources and obtained embedding for each gene or disease. Forth, a visualized intrinsic evaluation was performed, which investigated the embeddings in terms of interpretable data clustering. Furthermore, an extrinsic evaluation was performed in the form of linking prediction. Both intrinsic and extrinsic evaluation results showed that JDHMT model outperformed other eleven state-of-the-art (SOTA) methods which are under relation-learning, proximity-preserving or message-passing paradigms. Finally, the constructed gene-disease network, embedding results and codes were made available. DATA AND CODES AVAILABILITY: The constructed massive gene-disease network is available at: https://hzaubionlp.com/heterogeneous-biological-network/. The codes are available at: https://github.com/bionlp-hzau/JDHMT.

Asunto(s)

Algoritmos , Semántica , Aprendizaje , Fenotipo

2.

Optimizing anxiolysis and analgesia for percutaneous intervention by the abdominal radiologist.

Shah, Amar; Cohen, Kevin; Patel, Bhavik; Dahiya, Nirvikar; Fananapazir, Ghaneh.

Abdom Radiol (NY) ; 47(8): 2721-2729, 2022 08.

Artículo en Inglés | MEDLINE | ID: mdl-35072783

RESUMEN

Abdominal radiologists perform a wide variety of image-guided interventions. Procedures performed by abdominal radiologists can be broadly categorized into paracentesis, thoracentesis, superficial and deep soft tissue biopsy, drain placement, and ablation. As these procedures continue to develop as an alternative to more invasive and potentially morbid interventions, and with continued improvements in minimally invasive technologies, it becomes increasingly important for abdominal radiologists to be familiar with options for peri-procedural analgesia and anxiolysis, as well as when to consult anesthesiology. In this review, we discuss analgesic, anxiolytic, and nonpharmacologic options available to the abdominal radiologist. We focus on practical agents that are relatively safe for general use, special populations, and considerations for post-procedural monitoring.

Asunto(s)

Analgesia , Radiólogos , Drenaje/métodos , Humanos , Manejo del Dolor , Paracentesis

3.

Editor's introduction to the special section on the 7th Biomedical Linked Annotation Hackathon (BLAH7).

Kim, Jin-Dong; Cohen, Kevin Bretonnel; Rinaldi, Fabio; Lu, Zhiyong; Park, Hyun-Seok.

Genomics Inform ; 19(3): e20, 2021 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-34638167

4.

Bridging heterogeneous mutation data to enhance disease gene discovery.

Zhou, Kaiyin; Wang, Yuxing; Bretonnel Cohen, Kevin; Kim, Jin-Dong; Ma, Xiaohang; Shen, Zhixue; Meng, Xiangyu; Xia, Jingbo.

Brief Bioinform ; 22(5)2021 09 02.

Artículo en Inglés | MEDLINE | ID: mdl-33847357

RESUMEN

Bridging heterogeneous mutation data fills in the gap between various data categories and propels discovery of disease-related genes. It is known that genome-wide association study (GWAS) infers significant mutation associations that link genotype and phenotype. However, due to the differences of size and quality between GWAS studies, not all de facto vital variations are able to pass the multiple testing. In the meantime, mutation events widely reported in literature unveil typical functional biological process, including mutation types like gain of function and loss of function. To bring together the heterogeneous mutation data, we propose a 'Gene-Disease Association prediction by Mutation Data Bridging (GDAMDB)' pipeline with a statistic generative model. The model learns the distribution parameters of mutation associations and mutation types and recovers false-negative GWAS mutations that fail to pass significant test but represent supportive evidences of functional biological process in literature. Eventually, we applied GDAMDB in Alzheimer's disease (AD) and predicted 79 AD-associated genes. Besides, 12 of them from the original GWAS, 60 of them are supported to be AD-related by other GWAS or literature report, and rest of them are newly predicted genes. Our model is capable of enhancing the GWAS-based gene association discovery by well combining text mining results. The positive result indicates that bridging the heterogeneous mutation data is contributory for the novel disease-related gene discovery.

Asunto(s)

Enfermedad de Alzheimer/genética , Estudios de Asociación Genética/métodos , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo/métodos , Mutación , Polimorfismo de Nucleótido Simple , Algoritmos , Biología Computacional/métodos , Minería de Datos/métodos , Redes Reguladoras de Genes/genética , Genotipo , Humanos , Fenotipo , Mapas de Interacción de Proteínas/genética , Reproducibilidad de los Resultados

5.

Unsupervised clinical relevancy ranking of structured medical records to retrieve condition-specific information in the emergency department.

Korach, Zfania Tom; Gradwohl, Stephen; Messinger, Amanda; Bookman, Kelly; Cohen, Kevin; Zhou, Li; Goss, Foster.

Int J Med Inform ; 149: 104410, 2021 05.

Artículo en Inglés | MEDLINE | ID: mdl-33621793

RESUMEN

BACKGROUND: Decision making in the Emergency Department (ED) requires timely identification of clinical information relevant to the complaints. Existing information retrieval solutions for the electronic health record (EHR) focus on patient cohort identification and lack clinical relevancy ranking. We aimed to compare knowledge-based (KB) and unsupervised statistical methods for ranking EHR information by relevancy to a chief complaint of chest or back pain among ED patients. METHODS: We used Pointwise-mutual information (PMI) with corpus level signiï¬cance adjustment (cPMId), which modifies PMI to reward co-occurrence patterns with a higher absolute count. cPMId for each pair of medication/problem and chief complaint was estimated from a corpus of 100,000 un-annotated ED encounters. Five specialist physicians ranked the relevancy of medications and problems to each chief complaint on a 0-4 Likert scale to form the KB ranking. Reverse chronological order was used as a baseline. We directly compared the three methods on 1010 medications and 2913 problems from 99 patients with chest or back pain, where each item was manually labeled as relevant or not to the chief complaint, using mean average-precision. RESULTS: cPMId out-performed KB ranking on problems (86.8% vs. 81.3%, p < 0.01) but under-performed it on medications (93.1% vs. 96.8%, p < 0.01). Both methods significantly outperformed the baseline for both medications and problems (71.8% and 72.1%, respectively, p < 0.01 for both comparisons). The two complaints represented virtually completely different information needs (average Jaccard index of 0.008). CONCLUSION: A fully unsupervised statistical method can provide a reasonably accurate, low-effort and scalable means for situation-specific ranking of clinical information within the EHR.

Asunto(s)

Registros Electrónicos de Salud , Servicio de Urgencia en Hospital , Humanos , Almacenamiento y Recuperación de la Información

6.

Natural Language Processing for Rapid Response to Emergent Diseases: Case Study of Calcium Channel Blockers and Hypertension in the COVID-19 Pandemic.

Neuraz, Antoine; Lerner, Ivan; Digan, William; Paris, Nicolas; Tsopra, Rosy; Rogier, Alice; Baudoin, David; Cohen, Kevin Bretonnel; Burgun, Anita; Garcelon, Nicolas; Rance, Bastien.

J Med Internet Res ; 22(8): e20773, 2020 Aug 14.

Artículo en Inglés | MEDLINE | ID: mdl-32759101

RESUMEN

BACKGROUND: A novel disease poses special challenges for informatics solutions. Biomedical informatics relies for the most part on structured data, which require a preexisting data or knowledge model; however, novel diseases do not have preexisting knowledge models. In an emergent epidemic, language processing can enable rapid conversion of unstructured text to a novel knowledge model. However, although this idea has often been suggested, no opportunity has arisen to actually test it in real time. The current coronavirus disease (COVID-19) pandemic presents such an opportunity. OBJECTIVE: The aim of this study was to evaluate the added value of information from clinical text in response to emergent diseases using natural language processing (NLP). METHODS: We explored the effects of long-term treatment by calcium channel blockers on the outcomes of COVID-19 infection in patients with high blood pressure during in-patient hospital stays using two sources of information: data available strictly from structured electronic health records (EHRs) and data available through structured EHRs and text mining. RESULTS: In this multicenter study involving 39 hospitals, text mining increased the statistical power sufficiently to change a negative result for an adjusted hazard ratio to a positive one. Compared to the baseline structured data, the number of patients available for inclusion in the study increased by 2.95 times, the amount of available information on medications increased by 7.2 times, and the amount of additional phenotypic information increased by 11.9 times. CONCLUSIONS: In our study, use of calcium channel blockers was associated with decreased in-hospital mortality in patients with COVID-19 infection. This finding was obtained by quickly adapting an NLP pipeline to the domain of the novel disease; the adapted pipeline still performed sufficiently to extract useful information. When that information was used to supplement existing structured data, the sample size could be increased sufficiently to see treatment effects that were not previously statistically detectable.

Asunto(s)

Betacoronavirus , Bloqueadores de los Canales de Calcio/uso terapéutico , Infecciones por Coronavirus/tratamiento farmacológico , Hipertensión/complicaciones , Procesamiento de Lenguaje Natural , Neumonía Viral/tratamiento farmacológico , COVID-19 , Infecciones por Coronavirus/complicaciones , Minería de Datos , Registros Electrónicos de Salud , Humanos , Pandemias , Neumonía Viral/complicaciones , SARS-CoV-2 , Factores de Tiempo , Tratamiento Farmacológico de COVID-19

7.

Editor's introduction to the special issue of the 6th Biomedical Linked Annotation Hackathon (BLAH6).

Kim, Jin-Dong; Cohen, Kevin Bretonnel; Rinaldi, Fabio; Lu, Zhiyong; Collier, Nigel; Park, Hyun-Seok.

Genomics Inform ; 18(2): e12, 2020 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-32634866

8.

A Machine Learning Approach to Identifying Changes in Suicidal Language.

Pestian, John; Santel, Daniel; Sorter, Michael; Bayram, Ulya; Connolly, Brian; Glauser, Tracy; DelBello, Melissa; Tamang, Suzanne; Cohen, Kevin.

Suicide Life Threat Behav ; 50(5): 939-947, 2020 10.

Artículo en Inglés | MEDLINE | ID: mdl-32484597

RESUMEN

OBJECTIVE: With early identification and intervention, many suicidal deaths are preventable. Tools that include machine learning methods have been able to identify suicidal language. This paper examines the persistence of this suicidal language up to 30 days after discharge from care. METHOD: In a multi-center study, 253 subjects were enrolled into either suicidal or control cohorts. Their responses to standardized instruments and interviews were analyzed using machine learning algorithms. Subjects were re-interviewed approximately 30 days later, and their language was compared to the original language to determine the presence of suicidal ideation. RESULTS: The results show that language characteristics used to classify suicidality at the initial encounter are still present in the speech 30 days later (AUC = 89% (95% CI: 85-95%), p < .0001) and that algorithms trained on the second interviews could also identify the subjects that produced the first interviews (AUC = 85% (95% CI: 81-90%), p < .0001). CONCLUSIONS: This approach explores the stability of suicidal language. When using advanced computational methods, the results show that a patient's language is similar 30 days after first captured, while responses to standard measures change. This can be useful when developing methods that identify the data-based phenotype of a subject.

Asunto(s)

Lenguaje , Ideación Suicida , Algoritmos , Humanos , Aprendizaje Automático , Medición de Riesgo

9.

BioHackathon 2015: Semantics of data for life sciences and reproducible research.

Vos, Rutger A; Katayama, Toshiaki; Mishima, Hiroyuki; Kawano, Shin; Kawashima, Shuichi; Kim, Jin-Dong; Moriya, Yuki; Tokimatsu, Toshiaki; Yamaguchi, Atsuko; Yamamoto, Yasunori; Wu, Hongyan; Amstutz, Peter; Antezana, Erick; Aoki, Nobuyuki P; Arakawa, Kazuharu; Bolleman, Jerven T; Bolton, Evan; Bonnal, Raoul J P; Bono, Hidemasa; Burger, Kees; Chiba, Hirokazu; Cohen, Kevin B; Deutsch, Eric W; Fernández-Breis, Jesualdo T; Fu, Gang; Fujisawa, Takatomo; Fukushima, Atsushi; García, Alexander; Goto, Naohisa; Groza, Tudor; Hercus, Colin; Hoehndorf, Robert; Itaya, Kotone; Juty, Nick; Kawashima, Takeshi; Kim, Jee-Hyub; Kinjo, Akira R; Kotera, Masaaki; Kozaki, Kouji; Kumagai, Sadahiro; Kushida, Tatsuya; Lütteke, Thomas; Matsubara, Masaaki; Miyamoto, Joe; Mohsen, Attayeb; Mori, Hiroshi; Naito, Yuki; Nakazato, Takeru; Nguyen-Xuan, Jeremy; Nishida, Kozo.

F1000Res ; 9: 136, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-32308977

RESUMEN

We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.

Asunto(s)

Disciplinas de las Ciencias Biológicas , Biología Computacional , Web Semántica , Minería de Datos , Metadatos , Reproducibilidad de los Resultados

10.

Introduction to BLAH5 special issue: recent progress on interoperability of biomedical text mining.

Kim, Jin-Dong; Cohen, Kevin Bretonnel; Collier, Nigel; Lu, Zhiyong; Rinaldi, Fabio.

Genomics Inform ; 17(2): e12, 2019 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-31307127

11.

Interpersonal prosodic correlation in frontotemporal dementia.

Pressman, Peter S; Ross, Elliott D; Cohen, Kevin B; Chen, Kuan-Hua; Miller, Bruce L; Hunter, Lawrence E; Gorno-Tempini, Maria Luisa; Levenson, Robert W.

Ann Clin Transl Neurol ; 6(7): 1352-1357, 2019 07.

Artículo en Inglés | MEDLINE | ID: mdl-31353851

RESUMEN

Communication accommodation describes how individuals adjust their communicative style to that of their conversational partner. We predicted that interpersonal prosodic correlation related to pitch and timing would be decreased in behavioral variant frontotemporal dementia (bvFTD). We predicted that the interpersonal correlation in a timing measure and a pitch measure would be increased in right temporal FTD (rtFTD) due to sparing of the neural substrate for speech timing and pitch modulation but loss of social semantics. We found no significant effects in bvFTD, but conversations including rtFTD demonstrated higher interpersonal correlations in speech rate than healthy controls.

Asunto(s)

Comunicación , Demencia Frontotemporal/psicología , Habla , Anciano , Femenino , Demencia Frontotemporal/patología , Humanos , Masculino , Persona de Mediana Edad

12.

GOF/LOF knowledge inference with tensor decomposition in support of high order link discovery for gene, mutation and disease.

Zhou, Kai Yin; Wang, Yu Xing; Zhang, Sheng; Gachloo, Mina; Kim, Jin Dong; Luo, Qi; Cohen, Kevin Bretonnel; Xia, Jing Bo.

Math Biosci Eng ; 16(3): 1376-1391, 2019 02 20.

Artículo en Inglés | MEDLINE | ID: mdl-30947425

RESUMEN

For discovery of new usage of drugs, the function type of their target genes plays an important role, and the hypothesis of "Antagonist-GOF" and "Agonist-LOF" has laid a solid foundation for supporting drug repurposing. In this research, an active gene annotation corpus was used as training data to predict the gain-of-function or loss-of-function or unknown character of each human gene after variation events. Unlike the design of(entity, predicate, entity) triples in a traditional three way tensor, a four way and a five way tensor, GMFD-/GMAFD-tensor, were designed to represent higher order links among or among part of these entities: genes(G), mutations(M), functions(F), diseases( D) and annotation labels(A). A tensor decomposition algorithm, CP decomposition, was applied to the higher order tensor and to unveil the correlation among entities. Meanwhile, a state-of-the-art baseline tensor decomposition algorithm, RESCAL, was carried on the three way tensor as a comparing method. The result showed that CP decomposition on higher order tensor performed better than RESCAL on traditional three way tensor in recovering masked data and making predictions. In addition, The four way tensor was proved to be the best format for our issue. At the end, a case study reproducing two disease-gene-drug links(Myelodysplatic Syndromes-IL2RA-Aldesleukin, Lymphoma- IL2RA-Aldesleukin) presented the feasibility of our prediction model for drug repurposing.

Asunto(s)

Reposicionamiento de Medicamentos/economía , Reposicionamiento de Medicamentos/métodos , Variación Genética , Aprendizaje Automático , Mutación , Algoritmos , Análisis Costo-Beneficio , Enfermedades Genéticas Congénitas/genética , Humanos , Interleucina-2/análogos & derivados , Interleucina-2/uso terapéutico , Subunidad alfa del Receptor de Interleucina-2/genética , Linfoma/genética , Modelos Genéticos , Anotación de Secuencia Molecular , Síndromes Mielodisplásicos/genética , Proteínas Recombinantes/uso terapéutico , Programas Informáticos

13.

Plasma carotenoids and the risk of premalignant breast disease in women aged 50 and younger: a nested case-control study.

Cohen, Kevin; Liu, Ying; Luo, Jingqin; Appleton, Catherine M; Colditz, Graham A.

Breast Cancer Res Treat ; 162(3): 571-580, 2017 04.

Artículo en Inglés | MEDLINE | ID: mdl-28190250

RESUMEN

PURPOSE: To examine the association of plasma carotenoids, micronutrients in fruits, and vegetables, with risk of premalignant breast disease (PBD) in younger women. METHODS: Blood samples were collected at the Siteman Cancer Center between 2008 and 2012 from 3537 women aged 50 or younger with no history of cancer or PBD. The analysis included 147 participants diagnosed with benign breast disease or breast carcinoma in situ during a 27-month follow-up and 293 controls. Cases and controls were matched on age, race/ethnicity, and date of and fasting status at blood draw. Plasma carotenoids were quantified. We used logistic regression to calculate odds ratios (ORs) and 95% confidence intervals (CIs) and linear regression to assess racial differences in plasma carotenoids. RESULTS: The risk reduction between the highest and lowest tertiles varied by carotenoid, with ß-cryptoxanthin having the greatest reduction (OR 0.62; 95% CI, 0.62-1.09; P trend = 0.056) and total carotenoids the least (OR 0.83; 95% CI, 0.48-1.44; P trend = 0.12). We observed an inverse association between plasma carotenoids and risk of PBD in obese women (BMI ≥ 30 kg/m2; 61 cases and 115 controls) but not lean women (BMI < 25 kg/m2; 54 cases and 79 controls), although the interaction was not statistically significant. Compared to white women, black women had lower levels of α and ß-carotene and higher levels of ß-cryptoxanthin and lutein/zeaxanthin. CONCLUSIONS: We observed suggestive inverse associations between plasma carotenoids and risk of PBD in younger women, consistent with inverse associations reported for invasive breast cancer. Carotenoids may play a role early in breast cancer development.

Asunto(s)

Neoplasias de la Mama/sangre , Neoplasias de la Mama/patología , Carotenoides/sangre , Lesiones Precancerosas/sangre , Lesiones Precancerosas/patología , Adulto , Factores de Edad , Biomarcadores , Biopsia , Estudios de Casos y Controles , Femenino , Humanos , Persona de Mediana Edad , Oportunidad Relativa , Riesgo , Adulto Joven

14.

Are Cystic Fibrosis Aspergillus fumigatus Isolates Different? Intermicrobial Interactions with Pseudomonas.

Nazik, Hasan; Moss, Richard B; Karna, Vyshnavi; Clemons, Karl V; Banaei, Niaz; Cohen, Kevin; Choudhary, Varun; Stevens, David A.

Mycopathologia ; 182(3-4): 315-318, 2017 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-27822731

RESUMEN

Pseudomonas aeruginosa and Aspergillus fumigatus are the leading bacterial and fungal pathogens in cystic fibrosis (CF). We have shown that Af biofilms are susceptible to Pseudomonas, particularly CF phenotypes. Those studies were performed with a reference virulent non-CF Aspergillus. Pseudomonas resident in CF airways undergo profound genetic and phenotypic adaptations to the abnormal environment. Studies have also indicated Aspergillus from CF patients have unexpected profiles of antifungal susceptibility. This would suggest that Aspergillus isolates from CF patients may be different or altered from other clinical isolates. It is important to know whether Aspergillus may also be altered, as a result of that CF environment, in susceptibility to Pseudomonas. CF Aspergillus proved not different in that susceptibility.

Asunto(s)

Aspergilosis/microbiología , Aspergillus fumigatus/aislamiento & purificación , Aspergillus fumigatus/fisiología , Biopelículas/crecimiento & desarrollo , Fibrosis Quística/complicaciones , Interacciones Microbianas , Pseudomonas aeruginosa/fisiología , Antifúngicos/farmacología , Aspergillus fumigatus/efectos de los fármacos , Humanos , Viabilidad Microbiana , Pseudomonas aeruginosa/aislamiento & purificación

15.

A Machine Learning Approach to Identifying the Thought Markers of Suicidal Subjects: A Prospective Multicenter Trial.

Pestian, John P; Sorter, Michael; Connolly, Brian; Bretonnel Cohen, Kevin; McCullumsmith, Cheryl; Gee, Jeffry T; Morency, Louis-Philippe; Scherer, Stefan; Rohlfs, Lesley.

Suicide Life Threat Behav ; 47(1): 112-121, 2017 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-27813129

RESUMEN

Death by suicide demonstrates profound personal suffering and societal failure. While basic sciences provide the opportunity to understand biological markers related to suicide, computer science provides opportunities to understand suicide thought markers. In this novel prospective, multimodal, multicenter, mixed demographic study, we used machine learning to measure and fuse two classes of suicidal thought markers: verbal and nonverbal. Machine learning algorithms were used with the subjects' words and vocal characteristics to classify 379 subjects recruited from two academic medical centers and a rural community hospital into one of three groups: suicidal, mentally ill but not suicidal, or controls. By combining linguistic and acoustic characteristics, subjects could be classified into one of the three groups with up to 85% accuracy. The results provide insight into how advanced technology can be used for suicide assessment and prevention.

Asunto(s)

Aprendizaje Automático , Ideación Suicida , Prevención del Suicidio , Suicidio , Adolescente , Adulto , Inteligencia Artificial , Diagnóstico por Computador/métodos , Femenino , Humanos , Masculino , Pronóstico , Estudios Prospectivos , Suicidio/psicología

16.

Inter-Annotator Agreement and the Upper Limit on Machine Performance: Evidence from Biomedical Natural Language Processing.

Boguslav, Mayla; Cohen, Kevin Bretonnel.

Stud Health Technol Inform ; 245: 298-302, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-29295103

RESUMEN

Human-annotated data is a fundamental part of natural language processing system development and evaluation. The quality of that data is typically assessed by calculating the agreement between the annotators. It is widely assumed that this agreement between annotators is the upper limit on system performance in natural language processing: if humans can't agree with each other about the classification more than some percentage of the time, we don't expect a computer to do any better. We trace the logical positivist roots of the motivation for measuring inter-annotator agreement, demonstrate the prevalence of the widely-held assumption about the relationship between inter-annotator agreement and system performance, and present data that suggest that inter-annotator agreement is not, in fact, an upper bound on language processing system performance.

Asunto(s)

Curaduría de Datos , Procesamiento de Lenguaje Natural , Humanos , Variaciones Dependientes del Observador

17.

Crowdsourcing and curation: perspectives from biology and natural language processing.

Hirschman, Lynette; Fort, Karën; Boué, Stéphanie; Kyrpides, Nikos; Islamaj Dogan, Rezarta; Cohen, Kevin Bretonnel.

Database (Oxford) ; 20162016.

Artículo en Inglés | MEDLINE | ID: mdl-27504010

RESUMEN

Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging 'the crowd'; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9-11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing.Database URL: http://www.mitre.org/publications/technical-papers/crowdsourcing-and-curation-perspectives.

Asunto(s)

Colaboración de las Masas , Curaduría de Datos/métodos , Metadatos , Procesamiento de Lenguaje Natural

18.

Methodological Issues in Predicting Pediatric Epilepsy Surgery Candidates Through Natural Language Processing and Machine Learning.

Cohen, Kevin Bretonnel; Glass, Benjamin; Greiner, Hansel M; Holland-Bouley, Katherine; Standridge, Shannon; Arya, Ravindra; Faist, Robert; Morita, Diego; Mangano, Francesco; Connolly, Brian; Glauser, Tracy; Pestian, John.

Biomed Inform Insights ; 8: 11-8, 2016.

Artículo en Inglés | MEDLINE | ID: mdl-27257386

RESUMEN

OBJECTIVE: We describe the development and evaluation of a system that uses machine learning and natural language processing techniques to identify potential candidates for surgical intervention for drug-resistant pediatric epilepsy. The data are comprised of free-text clinical notes extracted from the electronic health record (EHR). Both known clinical outcomes from the EHR and manual chart annotations provide gold standards for the patient's status. The following hypotheses are then tested: 1) machine learning methods can identify epilepsy surgery candidates as well as physicians do and 2) machine learning methods can identify candidates earlier than physicians do. These hypotheses are tested by systematically evaluating the effects of the data source, amount of training data, class balance, classification algorithm, and feature set on classifier performance. The results support both hypotheses, with F-measures ranging from 0.71 to 0.82. The feature set, classification algorithm, amount of training data, class balance, and gold standard all significantly affected classification performance. It was further observed that classification performance was better than the highest agreement between two annotators, even at one year before documented surgery referral. The results demonstrate that such machine learning methods can contribute to predicting pediatric epilepsy surgery candidates and reducing lag time to surgery referral.

19.

A Controlled Trial Using Natural Language Processing to Examine the Language of Suicidal Adolescents in the Emergency Department.

Pestian, John P; Grupp-Phelan, Jacqueline; Bretonnel Cohen, Kevin; Meyers, Gabriel; Richey, Linda A; Matykiewicz, Pawel; Sorter, Michael T.

Suicide Life Threat Behav ; 46(2): 154-9, 2016 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-26252868

RESUMEN

What adolescents say when they think about or attempt suicide influences the medical care they receive. Mental health professionals use teenagers' words, actions, and gestures to gain insight into their emotional state and to prescribe what they believe to be optimal care. This prescription is often inconsistent among caregivers, however, and leads to varying outcomes. This variation could be reduced by applying machine learning as an aid in clinical decision support. We designed a prospective clinical trial to test the hypothesis that machine learning methods can discriminate between the conversation of suicidal and nonsuicidal individuals. Using semisupervised machine learning methods, the conversations of 30 suicidal adolescents and 30 matched controls were recorded and analyzed. The results show that the machines accurately distinguished between suicidal and nonsuicidal teenagers.

Asunto(s)

Servicio de Urgencia en Hospital , Procesamiento de Lenguaje Natural , Medición de Riesgo , Ideación Suicida , Intento de Suicidio/psicología , Conducta Verbal , Adolescente , Técnicas de Apoyo para la Decisión , Femenino , Humanos , Aprendizaje Automático , Masculino , Estudios Prospectivos , Intento de Suicidio/prevención & control

20.

Erratum for Nazik et al., effects of iron chelators on the formation and development of Aspergillus fumigatus biofilm.

Nazik, Hasan; Penner, John C; Ferreira, Jose A; Haagensen, Janus A J; Cohen, Kevin; Spormann, Alfred M; Martinez, Marife; Chen, Vicky; Hsu, Joe L; Clemons, Karl V; Stevens, David A.

Antimicrob Agents Chemother ; 59(11): 7160, 2015 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-26464401

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA