Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 53
Filtrar
1.
BMC Med Genomics ; 15(1): 167, 2022 07 30.
Artículo en Inglés | MEDLINE | ID: mdl-35907849

RESUMEN

BACKGROUND: Next-generation sequencing provides comprehensive information about individuals' genetic makeup and is commonplace in precision oncology practice. Due to the heterogeneity of individual patient's disease conditions and treatment journeys, not all targeted therapies were initiated despite actionable mutations. To better understand and support the clinical decision-making process in precision oncology, there is a need to examine real-world associations between patients' genetic information and treatment choices. METHODS: To fill the gap of insufficient use of real-world data (RWD) in electronic health records (EHRs), we generated a single Resource Description Framework (RDF) resource, called PO2RDF (precision oncology to RDF), by integrating information regarding genes, variants, diseases, and drugs from genetic reports and EHRs. RESULTS: There are a total 2,309,014 triples contained in the PO2RDF. Among them, 32,815 triples are related to Gene, 34,695 triples are related to Variant, 8,787 triples are related to Disease, 26,154 triples are related to Drug. We performed two use case analyses to demonstrate the usability of the PO2RDF: (1) we examined real-world associations between EGFR mutations and targeted therapies to confirm existing knowledge and detect off-label use. (2) We examined differences in prognosis for lung cancer patients with/without TP53 mutations. CONCLUSIONS: In conclusion, our work proposed to use RDF to organize and distribute clinical RWD that is otherwise inaccessible externally. Our work serves as a pilot study that will lead to new clinical applications and could ultimately stimulate progress in the field of precision oncology.


Asunto(s)
Neoplasias , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Oncología Médica , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Proyectos Piloto , Medicina de Precisión
2.
Stud Health Technol Inform ; 290: 243-247, 2022 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-35673010

RESUMEN

Precision oncology is expected to improve selection of targeted therapies, tailored to individual patients and ultimately improve cancer patients' outcomes. Several cancer genetics knowledge databases have been successfully developed for such purposes, including CIViC and OncoKB, with active community-based curations and scoring of genetic-treatment evidences. Although many studies were conducted based on each knowledge base respectively, the integrative analysis across both knowledge bases remains largely unexplored. Thus, there exists an urgent need for a heterogeneous precision oncology knowledge resource with computational power to support drug repurposing discovery in a timely manner, especially for life-threatening cancer. In this pilot study, we built a heterogeneous precision oncology knowledge resource (POKR) by integrating CIViC and OncoKB, in order to incorporate unique information contained in each knowledge base and make associations amongst biomedical entities (e.g., gene, drug, disease) computable and measurable via training POKR graph embeddings. All the relevant codes, database dump files, and pre-trained POKR embeddings can be accessed through the following URL: https://github.com/shenfc/POKR.


Asunto(s)
Neoplasias , Humanos , Bases del Conocimiento , Oncología Médica , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Proyectos Piloto , Medicina de Precisión
3.
Crit Rev Anal Chem ; : 1-46, 2022 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-35575782

RESUMEN

The strong development of mankind is inseparable from the proper use of drugs, and the electroanalytical research of drugs occupies an important position in the field of analytical chemistry. This review mainly elaborates the research progress of drugs electroanalysis based on direct electrochemical redox on various electrodes for the recent decade from 2011 to 2021. At first, we summarize some frequently used electrochemical data processing and electrochemical mechanism research derivation methods in the literature. Then, according to the drug therapeutic and application/usage purposes, the research progress of drugs electrochemical analysis is classified and discussed, where we focus on drugs electrochemical reaction mechanism. At the same time, the comparisons of electrochemical sensing performance of the drugs on various electrodes from recent studies are listed, so that readers can more intuitively compare and understand the electroanalytical sensing performance of each modified electrode for each of the drug. Finally, this review discusses the shortcomings and prospects of the drugs electroanalysis based on direct electrochemical redox research.

5.
Proc Natl Acad Sci U S A ; 118(11)2021 03 16.
Artículo en Inglés | MEDLINE | ID: mdl-33836575

RESUMEN

Technological advances have allowed improvements in genome reference sequence assemblies. Here, we combined long- and short-read sequence resources to assemble the genome of a female Great Dane dog. This assembly has improved continuity compared to the existing Boxer-derived (CanFam3.1) reference genome. Annotation of the Great Dane assembly identified 22,182 protein-coding gene models and 7,049 long noncoding RNAs, including 49 protein-coding genes not present in the CanFam3.1 reference. The Great Dane assembly spans the majority of sequence gaps in the CanFam3.1 reference and illustrates that 2,151 gaps overlap the transcription start site of a predicted protein-coding gene. Moreover, a subset of the resolved gaps, which have an 80.95% median GC content, localize to transcription start sites and recombination hotspots more often than expected by chance, suggesting the stable canine recombinational landscape has shaped genome architecture. Alignment of the Great Dane and CanFam3.1 assemblies identified 16,834 deletions and 15,621 insertions, as well as 2,665 deletions and 3,493 insertions located on secondary contigs. These structural variants are dominated by retrotransposon insertion/deletion polymorphisms and include 16,221 dimorphic canine short interspersed elements (SINECs) and 1,121 dimorphic long interspersed element-1 sequences (LINE-1_Cfs). Analysis of sequences flanking the 3' end of LINE-1_Cfs (i.e., LINE-1_Cf 3'-transductions) suggests multiple retrotransposition-competent LINE-1_Cfs segregate among dog populations. Consistent with this conclusion, we demonstrate that a canine LINE-1_Cf element with intact open reading frames can retrotranspose its own RNA and that of a SINEC_Cf consensus sequence in cultured human cells, implicating ongoing retrotransposon activity as a driver of canine genetic variation.


Asunto(s)
Perros/genética , Secuencia Rica en GC , Genoma , Secuencias Repetitivas Esparcidas , Animales , Perros/clasificación , Elementos de Nucleótido Esparcido Largo , Elementos de Nucleótido Esparcido Corto , Especificidad de la Especie
6.
JMIR Med Inform ; 9(1): e24008, 2021 Jan 27.
Artículo en Inglés | MEDLINE | ID: mdl-33502329

RESUMEN

BACKGROUND: As a risk factor for many diseases, family history (FH) captures both shared genetic variations and living environments among family members. Though there are several systems focusing on FH extraction using natural language processing (NLP) techniques, the evaluation protocol of such systems has not been standardized. OBJECTIVE: The n2c2/OHNLP (National NLP Clinical Challenges/Open Health Natural Language Processing) 2019 FH extraction task aims to encourage the community efforts on a standard evaluation and system development on FH extraction from synthetic clinical narratives. METHODS: We organized the first BioCreative/OHNLP FH extraction shared task in 2018. We continued the shared task in 2019 in collaboration with the n2c2 and OHNLP consortium, and organized the 2019 n2c2/OHNLP FH extraction track. The shared task comprises 2 subtasks. Subtask 1 focuses on identifying family member entities and clinical observations (diseases), and subtask 2 expects the association of the living status, side of the family, and clinical observations with family members to be extracted. Subtask 2 is an end-to-end task which is based on the result of subtask 1. We manually curated the first deidentified clinical narrative from FH sections of clinical notes at Mayo Clinic Rochester, the content of which is highly relevant to patients' FH. RESULTS: A total of 17 teams from all over the world participated in the n2c2/OHNLP FH extraction shared task, where 38 runs were submitted for subtask 1 and 21 runs were submitted for subtask 2. For subtask 1, the top 3 runs were generated by Harbin Institute of Technology, ezDI, Inc., and The Medical University of South Carolina with F1 scores of 0.8745, 0.8225, and 0.8130, respectively. For subtask 2, the top 3 runs were from Harbin Institute of Technology, ezDI, Inc., and University of Florida with F1 scores of 0.681, 0.6586, and 0.6544, respectively. The workshop was held in conjunction with the AMIA 2019 Fall Symposium. CONCLUSIONS: A wide variety of methods were used by different teams in both tasks, such as Bidirectional Encoder Representations from Transformers, convolutional neural network, bidirectional long short-term memory, conditional random field, support vector machine, and rule-based strategies. System performances show that relation extraction from FH is a more challenging task when compared to entity identification task.

7.
J Biomed Inform ; 113: 103660, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33321199

RESUMEN

Coronavirus Disease 2019 has emerged as a significant global concern, triggering harsh public health restrictions in a successful bid to curb its exponential growth. As discussion shifts towards relaxation of these restrictions, there is significant concern of second-wave resurgence. The key to managing these outbreaks is early detection and intervention, and yet there is a significant lag time associated with usage of laboratory confirmed cases for surveillance purposes. To address this, syndromic surveillance can be considered to provide a timelier alternative for first-line screening. Existing syndromic surveillance solutions are however typically focused around a known disease and have limited capability to distinguish between outbreaks of individual diseases sharing similar syndromes. This poses a challenge for surveillance of COVID-19 as its active periods tend to overlap temporally with other influenza-like illnesses. In this study we explore performing sentinel syndromic surveillance for COVID-19 and other influenza-like illnesses using a deep learning-based approach. Our methods are based on aberration detection utilizing autoencoders that leverages symptom prevalence distributions to distinguish outbreaks of two ongoing diseases that share similar syndromes, even if they occur concurrently. We first demonstrate that this approach works for detection of outbreaks of influenza, which has known temporal boundaries. We then demonstrate that the autoencoder can be trained to not alert on known and well-managed influenza-like illnesses such as the common cold and influenza. Finally, we applied our approach to 2019-2020 data in the context of a COVID-19 syndromic surveillance task to demonstrate how implementation of such a system could have provided early warning of an outbreak of a novel influenza-like illness that did not match the symptom prevalence profile of influenza and other known influenza-like illnesses.


Asunto(s)
COVID-19/epidemiología , Gripe Humana/epidemiología , Vigilancia de Guardia , COVID-19/virología , Aprendizaje Profundo , Brotes de Enfermedades , Humanos , SARS-CoV-2/aislamiento & purificación
8.
JMIR Med Inform ; 8(11): e23375, 2020 Nov 27.
Artículo en Inglés | MEDLINE | ID: mdl-33245291

RESUMEN

BACKGROUND: Semantic textual similarity is a common task in the general English domain to assess the degree to which the underlying semantics of 2 text segments are equivalent to each other. Clinical Semantic Textual Similarity (ClinicalSTS) is the semantic textual similarity task in the clinical domain that attempts to measure the degree of semantic equivalence between 2 snippets of clinical text. Due to the frequent use of templates in the Electronic Health Record system, a large amount of redundant text exists in clinical notes, making ClinicalSTS crucial for the secondary use of clinical text in downstream clinical natural language processing applications, such as clinical text summarization, clinical semantics extraction, and clinical information retrieval. OBJECTIVE: Our objective was to release ClinicalSTS data sets and to motivate natural language processing and biomedical informatics communities to tackle semantic text similarity tasks in the clinical domain. METHODS: We organized the first BioCreative/OHNLP ClinicalSTS shared task in 2018 by making available a real-world ClinicalSTS data set. We continued the shared task in 2019 in collaboration with National NLP Clinical Challenges (n2c2) and the Open Health Natural Language Processing (OHNLP) consortium and organized the 2019 n2c2/OHNLP ClinicalSTS track. We released a larger ClinicalSTS data set comprising 1642 clinical sentence pairs, including 1068 pairs from the 2018 shared task and 1006 new pairs from 2 electronic health record systems, GE and Epic. We released 80% (1642/2054) of the data to participating teams to develop and fine-tune the semantic textual similarity systems and used the remaining 20% (412/2054) as blind testing to evaluate their systems. The workshop was held in conjunction with the American Medical Informatics Association 2019 Annual Symposium. RESULTS: Of the 78 international teams that signed on to the n2c2/OHNLP ClinicalSTS shared task, 33 produced a total of 87 valid system submissions. The top 3 systems were generated by IBM Research, the National Center for Biotechnology Information, and the University of Florida, with Pearson correlations of r=.9010, r=.8967, and r=.8864, respectively. Most top-performing systems used state-of-the-art neural language models, such as BERT and XLNet, and state-of-the-art training schemas in deep learning, such as pretraining and fine-tuning schema, and multitask learning. Overall, the participating systems performed better on the Epic sentence pairs than on the GE sentence pairs, despite a much larger portion of the training data being GE sentence pairs. CONCLUSIONS: The 2019 n2c2/OHNLP ClinicalSTS shared task focused on computing semantic similarity for clinical text sentences generated from clinical notes in the real world. It attracted a large number of international teams. The ClinicalSTS shared task could continue to serve as a venue for researchers in natural language processing and medical informatics communities to develop and improve semantic textual similarity techniques for clinical text.

9.
JMIR Med Inform ; 8(10): e17376, 2020 Oct 06.
Artículo en Inglés | MEDLINE | ID: mdl-33021486

RESUMEN

BACKGROUND: Widespread adoption of electronic health records has enabled the secondary use of electronic health record data for clinical research and health care delivery. Natural language processing techniques have shown promise in their capability to extract the information embedded in unstructured clinical data, and information retrieval techniques provide flexible and scalable solutions that can augment natural language processing systems for retrieving and ranking relevant records. OBJECTIVE: In this paper, we present the implementation of a cohort retrieval system that can execute textual cohort selection queries on both structured data and unstructured text-Cohort Retrieval Enhanced by Analysis of Text from Electronic Health Records (CREATE). METHODS: CREATE is a proof-of-concept system that leverages a combination of structured queries and information retrieval techniques on natural language processing results to improve cohort retrieval performance using the Observational Medical Outcomes Partnership Common Data Model to enhance model portability. The natural language processing component was used to extract common data model concepts from textual queries. We designed a hierarchical index to support the common data model concept search utilizing information retrieval techniques and frameworks. RESULTS: Our case study on 5 cohort identification queries, evaluated using the precision at 5 information retrieval metric at both the patient-level and document-level, demonstrates that CREATE achieves a mean precision at 5 of 0.90, which outperforms systems using only structured data or only unstructured text with mean precision at 5 values of 0.54 and 0.74, respectively. CONCLUSIONS: The implementation and evaluation of Mayo Clinic Biobank data demonstrated that CREATE outperforms cohort retrieval systems that only use one of either structured data or unstructured text in complex textual cohort queries.

10.
J Am Med Inform Assoc ; 27(11): 1808-1812, 2020 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-32885823

RESUMEN

Defining patient-to-patient similarity is essential for the development of precision medicine in clinical care and research. Conceptually, the identification of similar patient cohorts appears straightforward; however, universally accepted definitions remain elusive. Simultaneously, an explosion of vendors and published algorithms have emerged and all provide varied levels of functionality in identifying patient similarity categories. To provide clarity and a common framework for patient similarity, a workshop at the American Medical Informatics Association 2019 Annual Meeting was convened. This workshop included invited discussants from academics, the biotechnology industry, the FDA, and private practice oncology groups. Drawing from a broad range of backgrounds, workshop participants were able to coalesce around 4 major patient similarity classes: (1) feature, (2) outcome, (3) exposure, and (4) mixed-class. This perspective expands into these 4 subtypes more critically and offers the medical informatics community a means of communicating their work on this important topic.


Asunto(s)
Medicina de Precisión , Femenino , Humanos , Masculino , Informática Médica , Terminología como Asunto
11.
J Am Med Inform Assoc ; 27(10): 1529-1537, 2020 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-32968800

RESUMEN

OBJECTIVE: The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task track 3, focused on medical concept normalization (MCN) in clinical records. This track aimed to assess the state of the art in identifying and matching salient medical concepts to a controlled vocabulary. In this paper, we describe the task, describe the data set used, compare the participating systems, present results, identify the strengths and limitations of the current state of the art, and identify directions for future research. MATERIALS AND METHODS: Participating teams were provided with narrative discharge summaries in which text spans corresponding to medical concepts were identified. This paper refers to these text spans as mentions. Teams were tasked with normalizing these mentions to concepts, represented by concept unique identifiers, within the Unified Medical Language System. Submitted systems represented 4 broad categories of approaches: cascading dictionary matching, cosine distance, deep learning, and retrieve-and-rank systems. Disambiguation modules were common across all approaches. RESULTS: A total of 33 teams participated in the MCN task. The best-performing team achieved an accuracy of 0.8526. The median and mean performances among all teams were 0.7733 and 0.7426, respectively. CONCLUSIONS: Overall performance among the top 10 teams was high. However, several mention types were challenging for all teams. These included mentions requiring disambiguation of misspelled words, acronyms, abbreviations, and mentions with more than 1 possible semantic type. Also challenging were complex mentions of long, multi-word terms that may require new ways of extracting and representing mention meaning, the use of domain knowledge, parse trees, or hand-crafted rules.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Resumen del Alta del Paciente , Unified Medical Language System , Conjuntos de Datos como Asunto , Aprendizaje Profundo , Humanos
12.
Genome Biol Evol ; 12(12): 2211-2230, 2020 12 06.
Artículo en Inglés | MEDLINE | ID: mdl-32970804

RESUMEN

Copy number variation (CNV) can promote phenotypic diversification and adaptive evolution. However, the genomic architecture of CNVs among Macaca species remains scarcely reported, and the roles of CNVs in adaptation and evolution of macaques have not been well addressed. Here, we identified and characterized 1,479 genome-wide hetero-specific CNVs across nine Macaca species with bioinformatic methods, along with 26 CNV-dense regions and dozens of lineage-specific CNVs. The genes intersecting CNVs were overrepresented in nutritional metabolism, xenobiotics/drug metabolism, and immune-related pathways. Population-level transcriptome data showed that nearly 46% of CNV genes were differentially expressed across populations and also mainly consisted of metabolic and immune-related genes, which implied the role of CNVs in environmental adaptation of Macaca. Several CNVs overlapping drug metabolism genes were verified with genomic quantitative polymerase chain reaction, suggesting that these macaques may have different drug metabolism features. The CNV-dense regions, including 15 first reported here, represent unstable genomic segments in macaques where biological innovation may evolve. Twelve gains and 40 losses specific to the Barbary macaque contain genes with essential roles in energy homeostasis and immunity defense, inferring the genetic basis of its unique distribution in North Africa. Our study not only elucidated the genetic diversity across Macaca species from the perspective of structural variation but also provided suggestive evidence for the role of CNVs in adaptation and genome evolution. Additionally, our findings provide new insights into the application of diverse macaques to drug study.


Asunto(s)
Adaptación Biológica , Evolución Biológica , Variaciones en el Número de Copia de ADN , Duplicación de Gen , Macaca/genética , Animales
13.
J Biomed Inform ; 109: 103526, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32768446

RESUMEN

BACKGROUND: Concept extraction, a subdomain of natural language processing (NLP) with a focus on extracting concepts of interest, has been adopted to computationally extract clinical information from text for a wide range of applications ranging from clinical decision support to care quality improvement. OBJECTIVES: In this literature review, we provide a methodology review of clinical concept extraction, aiming to catalog development processes, available methods and tools, and specific considerations when developing clinical concept extraction applications. METHODS: Based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, a literature search was conducted for retrieving EHR-based information extraction articles written in English and published from January 2009 through June 2019 from Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and the ACM Digital Library. RESULTS: A total of 6,686 publications were retrieved. After title and abstract screening, 228 publications were selected. The methods used for developing clinical concept extraction applications were discussed in this review.


Asunto(s)
Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural , Bibliometría , Proyectos de Investigación
14.
medRxiv ; 2020 Jun 09.
Artículo en Inglés | MEDLINE | ID: mdl-32577704

RESUMEN

Coronavirus Disease 2019 (COVID-19) has emerged as a significant global concern, triggering harsh public health restrictions in a successful bid to curb its exponential growth. As discussion shifts towards relaxation of these restrictions, there is significant concern of second-wave resurgence. The key to managing these outbreaks is early detection and intervention, and yet there is significant lag time associated with usage of laboratory confirmed cases for surveillance purposes. To address this, syndromic surveillance can be considered to provide a timelier alternative for first-line screening. Existing syndromic surveillance solutions are however typically focused around a known disease and have limited capability to distinguish between outbreaks of individual diseases sharing similar syndromes. This poses a challenge for surveillance of COVID-19 as its active periods are tend to overlap temporally with other influenza-like illnesses. In this study we explore performing sentinel syndromic surveillance for COVID-19 and other influenza-like illnesses using a deep learning-based approach. Our methods are based on aberration detection utilizing autoencoders that leverages symptom prevalence distributions to distinguish outbreaks of two ongoing diseases that share similar syndromes, even if they occur concurrently. We first demonstrate that this approach works for detection of outbreaks of influenza, which has known temporal boundaries. We then demonstrate that the autoencoder can be trained to not alert on known and well-managed influenza-like illnesses such as the common cold and influenza. Finally, we applied our approach to 2019-2020 data in the context of a COVID-19 syndromic surveillance task to demonstrate how implementation of such a system could have provided early warning of an outbreak of a novel influenza-like illness that did not match the symptom prevalence profile of influenza and other known influenza-like illnesses.

15.
AMIA Jt Summits Transl Sci Proc ; 2020: 720-729, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32477695

RESUMEN

Despite an abundance of information in clinical genetic testing reports, information is oftentimes not well documented/utilized for decision making. Unstructured information in genetic reports can contribute to long-term patient management and future translational research. Thus, we proposed a knowledge model that could manage unstructured information in medical genetic reports and facilitate knowledge extraction, curation and updating. For this pilot study, we used a dataset including 1,565 cancer genetics reports of Mayo Clinic patients. We used a previously developed, data-driven discovery pipeline that involves both semantic annotation and co-occurrence association analysis to establish a knowledge model. We showed that compared to genetic reports, around 56% of testing results are missing or incomplete in the clinical notes. We built a genetic report knowledge model and highlighted four key semantic groups including "Genes and Gene Products" and "Treatments". Coverage of term annotation was 99.5%. Accuracies of term annotation and relationship extraction were 98.9% and 92.9% respectively.

16.
J Am Med Inform Assoc ; 27(8): 1259-1267, 2020 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-32458963

RESUMEN

OBJECTIVE: As coronavirus disease 2019 (COVID-19) started its rapid emergence and gradually transformed into an unprecedented pandemic, the need for having a knowledge repository for the disease became crucial. To address this issue, a new COVID-19 machine-readable dataset known as the COVID-19 Open Research Dataset (CORD-19) has been released. Based on this, our objective was to build a computable co-occurrence network embeddings to assist association detection among COVID-19-related biomedical entities. MATERIALS AND METHODS: Leveraging a Linked Data version of CORD-19 (ie, CORD-19-on-FHIR), we first utilized SPARQL to extract co-occurrences among chemicals, diseases, genes, and mutations and build a co-occurrence network. We then trained the representation of the derived co-occurrence network using node2vec with 4 edge embeddings operations (L1, L2, Average, and Hadamard). Six algorithms (decision tree, logistic regression, support vector machine, random forest, naïve Bayes, and multilayer perceptron) were applied to evaluate performance on link prediction. An unsupervised learning strategy was also developed incorporating the t-SNE (t-distributed stochastic neighbor embedding) and DBSCAN (density-based spatial clustering of applications with noise) algorithms for case studies. RESULTS: The random forest classifier showed the best performance on link prediction across different network embeddings. For edge embeddings generated using the Average operation, random forest achieved the optimal average precision of 0.97 along with a F1 score of 0.90. For unsupervised learning, 63 clusters were formed with silhouette score of 0.128. Significant associations were detected for 5 coronavirus infectious diseases in their corresponding subgroups. CONCLUSIONS: In this study, we constructed COVID-19-centered co-occurrence network embeddings. Results indicated that the generated embeddings were able to extract significant associations for COVID-19 and coronavirus infectious diseases.


Asunto(s)
Algoritmos , Infecciones por Coronavirus , Redes Neurales de la Computación , Pandemias , Neumonía Viral , Teorema de Bayes , COVID-19 , Conjuntos de Datos como Asunto , Árboles de Decisión , Humanos , Modelos Logísticos , Curva ROC , Programas Informáticos , Máquina de Vectores de Soporte
17.
Genes (Basel) ; 11(2)2020 01 29.
Artículo en Inglés | MEDLINE | ID: mdl-32013076

RESUMEN

Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus.


Asunto(s)
Biología Computacional/métodos , Variaciones en el Número de Copia de ADN , Análisis de Secuencia de ADN/métodos , Algoritmos , Evolución Molecular , Duplicación de Gen , Genoma Humano , Humanos
18.
J Med Internet Res ; 21(12): e14204, 2019 12 10.
Artículo en Inglés | MEDLINE | ID: mdl-31821152

RESUMEN

BACKGROUND: The rise in the number of patients with chronic kidney disease (CKD) and consequent end-stage renal disease necessitating renal replacement therapy has placed a significant strain on health care. The rate of progression of CKD is influenced by both modifiable and unmodifiable risk factors. Identification of modifiable risk factors, such as lifestyle choices, is vital in informing strategies toward renoprotection. Modification of unhealthy lifestyle choices lessens the risk of CKD progression and associated comorbidities, although the lifestyle risk factors and modification strategies may vary with different comorbidities (eg, diabetes, hypertension). However, there are limited studies on suitable lifestyle interventions for CKD patients with comorbidities. OBJECTIVE: The objectives of our study are to (1) identify the lifestyle risk factors for CKD with common comorbid chronic conditions using a US nationwide survey in combination with literature mining, and (2) demonstrate the potential effectiveness of association rule mining (ARM) analysis for the aforementioned task, which can be generalized for similar tasks associated with noncommunicable diseases (NCDs). METHODS: We applied ARM to identify lifestyle risk factors for CKD progression with comorbidities (cardiovascular disease, chronic pulmonary disease, rheumatoid arthritis, diabetes, and cancer) using questionnaire data for 450,000 participants collected from the Behavioral Risk Factor Surveillance System (BRFSS) 2017. The BRFSS is a Web-based resource, which includes demographic information, chronic health conditions, fruit and vegetable consumption, and sugar- or salt-related behavior. To enrich the BRFSS questionnaire, the Semantic MEDLINE Database was also mined to identify lifestyle risk factors. RESULTS: The results suggest that lifestyle modification for CKD varies among different comorbidities. For example, the lifestyle modification of CKD with cardiovascular disease needs to focus on increasing aerobic capacity by improving muscle strength or functional ability. For CKD patients with chronic pulmonary disease or rheumatoid arthritis, lifestyle modification should be high dietary fiber intake and participation in moderate-intensity exercise. Meanwhile, the management of CKD patients with diabetes focuses on exercise and weight loss predominantly. CONCLUSIONS: We have demonstrated the use of ARM to identify lifestyle risk factors for CKD with common comorbid chronic conditions using data from BRFSS 2017. Our methods can be generalized to advance chronic disease management with more focused and optimized lifestyle modification of NCDs.


Asunto(s)
Estilo de Vida , Insuficiencia Renal Crónica/epidemiología , Comorbilidad , Progresión de la Enfermedad , Femenino , Humanos , Masculino , Persona de Mediana Edad , Factores de Riesgo , Encuestas y Cuestionarios
19.
J Healthc Inform Res ; 3(3): 267-282, 2019 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-31728432

RESUMEN

Postsurgical complications (PSCs) are known as a deviation from the normal postsurgical course and categorized by severity and treatment requirements. Surgical site infection (SSI) is one of major PSCs and the most common healthcare-associated infection, resulting in increased length of hospital stay and cost. In this work, we proposed an automated way to generate keyword features using sublanguage analysis with heuristics to detect SSI from cohort in clinical notes and evaluated these keywords with medical experts. To further valid our approach, we also applied different machine learning algorithms on cohort using automatically generated keywords. The results showed that our approach was able to identify SSI keywords from clinical narratives and can be used as a foundation to develop an information extraction system or support search-based natural language processing (NLP) approaches by augmenting search queries.

20.
J Healthc Inform Res ; 3(3): 329-344, 2019 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-31598581

RESUMEN

The rich semantic representation and sophisticated structure definition of the HL7 Fast Healthcare Interoperability Resources (FHIR) specification requires relatively great efforts to understand and utilize. The objective of our study is to design, develop and evaluate an open-source and user-friendly visualization interface for exploring the FHIR specification. We prototyped an interactive visualization tool for navigating and manipulating the FHIR core resources, profiles and extensions. The utility of the tool was evaluated using evaluation metrics mainly focusing on its interactive mechanism and content expressiveness. We demonstrated that the visualization techniques are helpful for navigating the HL7 FHIR specification and aiding its profiling.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...