RESUMO
Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.
Assuntos
Bases de Dados Factuais , Doença , Genes , Fenótipo , Humanos , Internet , Bases de Dados Factuais/normas , Software , Genes/genética , Doença/genéticaRESUMO
MOTIVATION: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas. RESULTS: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM's native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. AVAILABILITY AND IMPLEMENTATION: SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.
Assuntos
Bases de Conhecimento , Semântica , Bases de Dados FactuaisRESUMO
MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION: https://kghub.org.
Assuntos
Ontologias Biológicas , COVID-19 , Humanos , Reconhecimento Automatizado de Padrão , Doenças Raras , Aprendizado de MáquinaRESUMO
OBJECTIVES: Quality improvement strategies have been an integral part of healthcare to attain improved care delivery and effective health outcomes. The dental quality initiative improvement (DQII) presented in this manuscript represents a case study of successful implementation of a quality improvement culture within a large integrated-medical-dental health system serving a largely rural population. METHODS: The key elements of DQII included steering committee establishment, definition or dental quality measures and development/implementation of a dental quality analytics dashboard (DQAD) that provides relevant data on dental quality measures. Qualitative metrics were applied to look at the improvement in performance for the various measures relative to quality benchmarks. RESULTS: DQII facilitated improved oversight of care continuity and provider performance surrounding quality measures at granular and/or institutional level. Improvement associated with care delivery performance relative to benchmarks was observed. CONCLUSIONS: DQII further advanced the quality improvement culture prevalent in our learning healthcare environment with its focus on value-based care delivery. DQII initiative and establishment of DQAD provided ability to track performance in operational care delivery for dental providers in a clinical setting in real time.
Assuntos
Atenção à Saúde , Melhoria de Qualidade , Benchmarking , Criança , Feminino , Programas Governamentais , Humanos , Recém-Nascido , Assistência Perinatal , GravidezRESUMO
PURPOSE: Non-traumatic dental condition visits (NTDCs) represent about 1.4% to 2% of all Emergency Department (ED) visits and are limited to palliative care only, while associated with high cost of care. Feasibility of establishing a tele-dental approach to manage NTDCs in ED and Urgent care (UC) settings was undertaken to explore the possibility of utilizing remote tele-dental consults. METHODS: Participants with NTDCs in ED/UCs were examined extra and intra-orally: (1) directly by ED provider, (2) remotely by tele-dental examiner (trained dentist) using intra-oral camera and high-definition pan-tilt-zoom (PTZ) camera, (3) directly by treating dentist post ED/UC visit (if applicable) and, (4) secondary assessment by tele-dental reviewer. Comparisons were drawn between differential diagnoses and recommended managements provided by ED/UC providers, tele-dental examiner, treating dentist, and tele-dental reviewer. RESULTS: 13 patients participated in the study. The overall inter-rater agreement between the tele-dental examiner and tele-dental reviewer was high while it was low between tele-dentists and the ED providers. The preliminary testing of tele-dental intervention in the ED/UC setting demonstrated potential feasibility in addressing the NTDC landing in ED/UC. Larger interventional studies in multi-site setting are needed to validate this approach and especially evaluate impact on cost, ED/UC workflow and patient outcomes. CLINICAL SIGNIFICANCE: Using tele-dentistry to triage non-traumatic dental visits to the emergency room may be a promising approach. Once this approach is validated through a larger study, tele-dental outreach could help in directing non-traumatic dental emergency patients to the appropriate dental setting to provide treatment for the patients.
Assuntos
Doenças Estomatognáticas , Doenças Dentárias , Assistência Odontológica , Emergências , Estudos de Viabilidade , HumanosRESUMO
Advanced omics technologies and facilities generate a wealth of valuable data daily; however, the data often lack the essential metadata required for researchers to find, curate, and search them effectively. The lack of metadata poses a significant challenge in the utilization of these data sets. Machine learning (ML)-based metadata extraction techniques have emerged as a potentially viable approach to automatically annotating scientific data sets with the metadata necessary for enabling effective search. Text labeling, usually performed manually, plays a crucial role in validating machine-extracted metadata. However, manual labeling is time-consuming and not always feasible; thus, there is a need to develop automated text labeling techniques in order to accelerate the process of scientific innovation. This need is particularly urgent in fields such as environmental genomics and microbiome science, which have historically received less attention in terms of metadata curation and creation of gold-standard text mining data sets. In this paper, we present two novel automated text labeling approaches for the validation of ML-generated metadata for unlabeled texts, with specific applications in environmental genomics. Our techniques show the potential of two new ways to leverage existing information that is only available for select documents within a corpus to validate ML models, which can then be used to describe the remaining documents in the corpus. The first technique exploits relationships between different types of data sources related to the same research study, such as publications and proposals. The second technique takes advantage of domain-specific controlled vocabularies or ontologies. In this paper, we detail applying these approaches in the context of environmental genomics research for ML-generated metadata validation. Our results show that the proposed label assignment approaches can generate both generic and highly specific text labels for the unlabeled texts, with up to 44% of the labels matching with those suggested by a ML keyword extraction algorithm.
Assuntos
Curadoria de Dados , Mineração de Dados , Aprendizado de Máquina , Curadoria de Dados/métodos , Mineração de Dados/métodos , MetadadosRESUMO
OBJECTIVE: Female reproductive disorders (FRDs) are common health conditions that may present with significant symptoms. Diet and environment are potential areas for FRD interventions. We utilized a knowledge graph (KG) method to predict factors associated with common FRDs (for example, endometriosis, ovarian cyst, and uterine fibroids). MATERIALS AND METHODS: We harmonized survey data from the Personalized Environment and Genes Study (PEGS) on internal and external environmental exposures and health conditions with biomedical ontology content. We merged the harmonized data and ontologies with supplemental nutrient and agricultural chemical data to create a KG. We analyzed the KG by embedding edges and applying a random forest for edge prediction to identify variables potentially associated with FRDs. We also conducted logistic regression analysis for comparison. RESULTS: Across 9765 PEGS respondents, the KG analysis resulted in 8535 significant or suggestive predicted links between FRDs and chemicals, phenotypes, and diseases. Amongst these links, 32 were exact matches when compared with the logistic regression results, including comorbidities, medications, foods, and occupational exposures. DISCUSSION: Mechanistic underpinnings of predicted links documented in the literature may support some of our findings. Our KG methods are useful for predicting possible associations in large, survey-based datasets with added information on directionality and magnitude of effect from logistic regression. These results should not be construed as causal but can support hypothesis generation. CONCLUSION: This investigation enabled the generation of hypotheses on a variety of potential links between FRDs and exposures. Future investigations should prospectively evaluate the variables hypothesized to impact FRDs.
Assuntos
Exposição Ambiental , Humanos , Feminino , Exposição Ambiental/efeitos adversos , Doenças dos Genitais Femininos , Modelos Logísticos , Estado Nutricional , Dieta , Adulto , Algoritmo Florestas AleatóriasRESUMO
Introduction: Climate change is already affecting ecosystems around the world and forcing us to adapt to meet societal needs. The speed with which climate change is progressing necessitates a massive scaling up of the number of species with understood genotype-environment-phenotype (G×E×P) dynamics in order to increase ecosystem and agriculture resilience. An important part of predicting phenotype is understanding the complex gene regulatory networks present in organisms. Previous work has demonstrated that knowledge about one species can be applied to another using ontologically-supported knowledge bases that exploit homologous structures and homologous genes. These types of structures that can apply knowledge about one species to another have the potential to enable the massive scaling up that is needed through in silico experimentation. Methods: We developed one such structure, a knowledge graph (KG) using information from Planteome and the EMBL-EBI Expression Atlas that connects gene expression, molecular interactions, functions, and pathways to homology-based gene annotations. Our preliminary analysis uses data from gene expression studies in Arabidopsis thaliana and Populus trichocarpa plants exposed to drought conditions. Results: A graph query identified 16 pairs of homologous genes in these two taxa, some of which show opposite patterns of gene expression in response to drought. As expected, analysis of the upstream cis-regulatory region of these genes revealed that homologs with similar expression behavior had conserved cis-regulatory regions and potential interaction with similar trans-elements, unlike homologs that changed their expression in opposite ways. Discussion: This suggests that even though the homologous pairs share common ancestry and functional roles, predicting expression and phenotype through homology inference needs careful consideration of integrating cis and trans-regulatory components in the curated and inferred knowledge graph.
RESUMO
BACKGROUND: The evidence base supports effectiveness of dental sealants for prevention of childhood caries in school-aged children. OBJECTIVE: This study describes planning, development, usability testing and outcomes following implementation of DentaSeal, a web-based application designed to accurately track unique student data and generate reports for all Wisconsin school-based sealant placement (SP) programs. METHODS: Application software development was informed by a steering committee of representative stakeholders who were interviewed to inform design and provide feedback for design of DentaSeal during development and evaluation. Software development proceeded based on wireframes developed to build architectural design. Usability testing followed and informed any required adjustments to the application. The DentaSeal prototype was beta tested and fully implemented subsequently in the public health sector. RESULTS: The DentaSeal application demonstrated capacity to: 1) track unique student SP data and longitudinal encounter history, 2) generate reports and 3) support administrative tracking. In 2019, DentaSeal captured SP data of 47 school-based programs in Wisconsin that sponsored > 7,000 program visits for 184,000 children from 62 counties. Delivery of > 548,000 SP services were catalogued. CONCLUSIONS: For public health initiatives targeting reduction in caries incidence, web-based applications such as DentaSeal represent useful longitudinal tracking tools for cataloguing SP in school-based program participants.
Assuntos
Cárie Dentária , Selantes de Fossas e Fissuras , Criança , Humanos , Cárie Dentária/epidemiologia , Cárie Dentária/prevenção & controle , Selantes de Fossas e Fissuras/uso terapêutico , Relatório de Pesquisa , Instituições Acadêmicas , Serviços de Saúde EscolarRESUMO
The Global Alliance for Genomics and Health (GA4GH) is a standards-setting organization that is developing a suite of coordinated standards for genomics. The GA4GH Phenopacket Schema is a standard for sharing disease and phenotype information that characterizes an individual person or biosample. The Phenopacket Schema is flexible and can represent clinical data for any kind of human disease including rare disease, complex disease, and cancer. It also allows consortia or databases to apply additional constraints to ensure uniform data collection for specific goals. We present phenopacket-tools, an open-source Java library and command-line application for construction, conversion, and validation of phenopackets. Phenopacket-tools simplifies construction of phenopackets by providing concise builders, programmatic shortcuts, and predefined building blocks (ontology classes) for concepts such as anatomical organs, age of onset, biospecimen type, and clinical modifiers. Phenopacket-tools can be used to validate the syntax and semantics of phenopackets as well as to assess adherence to additional user-defined requirements. The documentation includes examples showing how to use the Java library and the command-line tool to create and validate phenopackets. We demonstrate how to create, convert, and validate phenopackets using the library or the command-line application. Source code, API documentation, comprehensive user guide and a tutorial can be found at https://github.com/phenopackets/phenopacket-tools. The library can be installed from the public Maven Central artifact repository and the application is available as a standalone archive. The phenopacket-tools library helps developers implement and standardize the collection and exchange of phenotypic and other clinical data for use in phenotype-driven genomic diagnostics, translational research, and precision medicine applications.
Assuntos
Neoplasias , Software , Humanos , Genômica , Bases de Dados Factuais , Biblioteca GênicaRESUMO
Background: The objective of this study was to build models that define variables contributing to pneumonia risk by applying supervised Machine Learning-(ML) to medical and oral disease data to define key risk variables contributing to pneumonia emergence for any pneumonia/pneumonia subtypes. Methods: Retrospective medical and dental data were retrieved from Marshfield Clinic Health System's data warehouse and integrated electronic medical-dental health records (iEHR). Retrieved data were pre-processed prior to conducting analyses and included matching of cases to controls by (a) race/ethnicity and (b) 1:1 Case: Control ratio. Variables with >30% missing data were excluded from analysis. Datasets were divided into four subsets: (1) All Pneumonia (all cases and controls); (2) community (CAP)/healthcare associated (HCAP) pneumonias; (3) ventilator-associated (VAP)/hospital-acquired (HAP) pneumonias and (4) aspiration pneumonia (AP). Performance of five algorithms were compared across the four subsets: Naïve Bayes, Logistic Regression, Support Vector Machine (SVM), Multi-Layer Perceptron (MLP) and Random Forests. Feature (input variables) selection and ten-fold cross validation was performed on all the datasets. An evaluation set (10%) was extracted from the subsets for further validation. Model performance was evaluated in terms of total accuracy, sensitivity, specificity, F-measure, Mathews-correlation-coefficient and area under receiver operating characteristic curve (AUC). Results: In total, 6,034 records (cases and controls) met eligibility for inclusion in the main dataset. After feature selection, the variables retained in the subsets were: All Pneumonia (n = 29 variables), CAP-HCAP (n = 26 variables); VAP-HAP (n = 40 variables) and AP (n = 37 variables), respectively. Variables retained (n = 22) were common across all four pneumonia subsets. Of these, the number of missing teeth, periodontal status, periodontal pocket depth more than 5 mm and number of restored teeth contributed to all the subsets and were retained in the model. MLP outperformed other predictive models for All Pneumonia, CAP-HCAP and AP subsets, while SVM outperformed other models in VAP-HAP subset. Conclusion: This study validates previously described associations between poor oral health and pneumonia. Benefits of an integrated medical-dental record and care delivery environment for modeling pneumonia risk are highlighted. Based on findings, risk score development could inform referrals and follow-up in integrated healthcare delivery environment and coordinated patient management.
RESUMO
Oral cavity cancer (OCC) is associated with high morbidity and mortality rates when diagnosed at late stages. Early detection of increased risk provides an opportunity for implementing prevention strategies surrounding modifiable risk factors and screening to promote early detection and intervention. Historical evidence identified a gap in the training of primary care providers (PCPs) surrounding the examination of the oral cavity. The absence of clinically applicable analytical tools to identify patients with high-risk OCC phenotypes at point-of-care (POC) causes missed opportunities for implementing patient-specific interventional strategies. This study developed an OCC risk assessment tool prototype by applying machine learning (ML) approaches to a rich retrospectively collected data set abstracted from a clinical enterprise data warehouse. We compared the performance of six ML classifiers by applying the 10-fold cross-validation approach. Accuracy, recall, precision, specificity, area under the receiver operating characteristic curve, and recall-precision curves for the derived voting algorithm were: 78%, 64%, 88%, 92%, 0.83, and 0.81, respectively. The performance of two classifiers, multilayer perceptron and AdaBoost, closely mirrored the voting algorithm. Integration of the OCC risk assessment tool developed by clinical informatics application into an electronic health record as a clinical decision support tool can assist PCPs in targeting at-risk patients for personalized interventional care.
RESUMO
INTRODUCTION: Pneumonia is caused by microbes that establish an infectious process in the lungs. The gold standard for pneumonia diagnosis is radiologist-documented pneumonia-related features in radiology notes that are captured in electronic health records in an unstructured format. OBJECTIVE: The study objective was to develop a methodological approach for assessing validity of a pneumonia diagnosis based on identifying presence or absence of key radiographic features in radiology reports with subsequent rendering of diagnostic decisions into a structured format. METHODS: A pneumonia-specific natural language processing (NLP) pipeline was strategically developed applying Clinical Text Analysis and Knowledge Extraction System (cTAKES) to validate pneumonia diagnoses following development of a pneumonia feature-specific lexicon. Radiographic reports of study-eligible subjects identified by International Classification of Diseases (ICD) codes were parsed through the NLP pipeline. Classification rules were developed to assign each pneumonia episode into one of three categories: "positive," "negative," or "not classified: requires manual review" based on tagged concepts that support or refute diagnostic codes. RESULTS: A total of 91,998 pneumonia episodes diagnosed in 65,904 patients were retrieved retrospectively. Approximately 89% (81,707/91,998) of the total pneumonia episodes were documented by 225,893 chest X-ray reports. NLP classified and validated 33% (26,800/81,707) of pneumonia episodes classified as "Pneumonia-positive," 19% as (15401/81,707) as "Pneumonia-negative," and 48% (39,209/81,707) as "episode classification pending further manual review." NLP pipeline performance metrics included accuracy (76.3%), sensitivity (88%), and specificity (75%). CONCLUSION: The pneumonia-specific NLP pipeline exhibited good performance comparable to other pneumonia-specific NLP systems developed to date.
Assuntos
Pneumonia , Radiologia , Registros Eletrônicos de Saúde , Humanos , Processamento de Linguagem Natural , Pneumonia/diagnóstico por imagem , Estudos RetrospectivosRESUMO
BACKGROUND: The International Classification of Disease (ICD) coding for pneumonia classification is based on causal organism or use of general pneumonia codes, creating challenges for epidemiological evaluations where pneumonia is standardly subtyped by settings, exposures, and time of emergence. Pneumonia subtype classification requires data available in electronic health records (EHRs), frequently in nonstructured formats including radiological interpretation or clinical notes that complicate electronic classification. OBJECTIVE: The current study undertook development of a rule-based pneumonia subtyping algorithm for stratifying pneumonia by the setting in which it emerged using information documented in the EHR. METHODS: Pneumonia subtype classification was developed by interrogating patient information within the EHR of a large private Health System. ICD coding was mined in the EHR applying requirements for "rule of two" pneumonia-related codes or one ICD code and radiologically confirmed pneumonia validated by natural language processing and/or documented antibiotic prescriptions. A rule-based algorithm flow chart was created to support subclassification based on features including symptomatic patient point of entry into the health care system timing of pneumonia emergence and identification of clinical, laboratory, or medication orders that informed definition of the pneumonia subclassification algorithm. RESULTS: Data from 65,904 study-eligible patients with 91,998 episodes of pneumonia diagnoses documented by 380,509 encounters were analyzed, while 8,611 episodes were excluded following Natural Language Processing classification of pneumonia status as "negative" or "unknown." Subtyping of 83,387 episodes identified: community-acquired (54.5%), hospital-acquired (20%), aspiration-related (10.7%), health care-acquired (5%), and ventilator-associated (0.4%) cases, and 9.4% cases were not classifiable by the algorithm. CONCLUSION: Study outcome indicated capacity to achieve electronic pneumonia subtype classification based on interrogation of big data available in the EHR. Examination of portability of the algorithm to achieve rule-based pneumonia classification in other health systems remains to be explored.
Assuntos
Registros Eletrônicos de Saúde , Pneumonia , Algoritmos , Humanos , Classificação Internacional de Doenças , Processamento de Linguagem Natural , Pneumonia/diagnóstico , Pneumonia/epidemiologiaRESUMO
The standardized identification of biomedical entities is a cornerstone of interoperability, reuse, and data integration in the life sciences. Several registries have been developed to catalog resources maintaining identifiers for biomedical entities such as small molecules, proteins, cell lines, and clinical trials. However, existing registries have struggled to provide sufficient coverage and metadata standards that meet the evolving needs of modern life sciences researchers. Here, we introduce the Bioregistry, an integrative, open, community-driven metaregistry that synthesizes and substantially expands upon 23 existing registries. The Bioregistry addresses the need for a sustainable registry by leveraging public infrastructure and automation, and employing a progressive governance model centered around open code and open data to foster community contribution. The Bioregistry can be used to support the standardized annotation of data, models, ontologies, and scientific literature, thereby promoting their interoperability and reuse. The Bioregistry can be accessed through https://bioregistry.io and its source code and data are available under the MIT and CC0 Licenses at https://github.com/biopragmatics/bioregistry .
RESUMO
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec.
Assuntos
Metadados , Web Semântica , Gerenciamento de Dados , Bases de Dados Factuais , Fluxo de TrabalhoRESUMO
In agricultural settings, microbes and antimicrobial resistance genes (ARGs) have the potential to be transferred across diverse environments and ecosystems. The consequences of these microbial transfers are unclear and understudied. On dairy farms, the storage of cow manure in manure pits and subsequent application to field soil as a fertilizer may facilitate the spread of the mammalian gut microbiome and its associated ARGs to the environment. To determine the extent of both taxonomic and resistance similarity during these transitions, we collected fresh manure, manure from pits, and field soil across 15 different dairy farms for three consecutive seasons. We used a combination of shotgun metagenomic sequencing and functional metagenomics to quantitatively interrogate taxonomic and ARG compositional variation on farms. We found that as the microbiome transitions from fresh dairy cow manure to manure pits, microbial taxonomic compositions and resistance profiles experience distinct restructuring, including decreases in alpha diversity and shifts in specific ARG abundances that potentially correspond to fresh manure going from a gut-structured community to an environment-structured community. Further, we did not find evidence of shared microbial community or a transfer of ARGs between manure and field soil microbiomes. Our results suggest that fresh manure experiences a compositional change in manure pits during storage and that the storage of manure in manure pits does not result in a depletion of ARGs. We did not find evidence of taxonomic or ARG restructuring of soil microbiota with the application of manure to field soils, as soil communities remained resilient to manure-induced perturbation.IMPORTANCE The addition of dairy cow manure-stored in manure pits-to field soil has the potential to introduce not only organic nutrients but also mammalian microbial communities and antimicrobial resistance genes (ARGs) to soil communities. Using shotgun sequencing paired with functional metagenomics, we showed that microbial community composition changed between fresh manure and manure pit samples with a decrease in gut-associated pathobionts, while ARG abundance and diversity remained high. However, field soil communities were distinct from those in manure in both microbial taxonomic and ARG composition. These results broaden our understanding of the transfer of microbial communities in agricultural settings and suggest that field soil microbial communities are resilient against the deposition of ARGs or microbial communities from manure.
Assuntos
Antibacterianos/farmacologia , Esterco/microbiologia , Metagenômica , Microbiota/efeitos dos fármacos , Microbiota/genética , Microbiologia do Solo , Agricultura , Animais , Bovinos , Indústria de Laticínios , Resistência Microbiana a Medicamentos/genética , Fazendas , Feminino , Genes Bacterianos , Metagenoma , Estações do AnoRESUMO
The objective was to develop a predictive model using medical-dental data from an integrated electronic health record (iEHR) to identify individuals with undiagnosed diabetes mellitus (DM) in dental settings. Retrospective data retrieved from Marshfield Clinic Health System's data-warehouse was pre-processed prior to conducting analysis. A subset was extracted from the preprocessed dataset for external evaluation (Nvalidation) of derived predictive models. Further, subsets of 30%-70%, 40%-60% and 50%-50% case-to-control ratios were created for training/testing. Feature selection was performed on all datasets. Four machine learning (ML) classifiers were evaluated: logistic regression (LR), multilayer perceptron (MLP), support vector machines (SVM) and random forests (RF). Model performance was evaluated on Nvalidation. We retrieved a total of 5319 cases and 36,224 controls. From the initial 116 medical and dental features, 107 were used after performing feature selection. RF applied to the 50%-50% case-control ratio outperformed other predictive models over Nvalidation achieving a total accuracy (94.14%), sensitivity (0.941), specificity (0.943), F-measure (0.941), Mathews-correlation-coefficient (0.885) and area under the receiver operating curve (0.972). Future directions include incorporation of this predictive model into iEHR as a clinical decision support tool to screen and detect patients at risk for DM triggering follow-ups and referrals for integrated care delivery between dentists and physicians.
RESUMO
BACKGROUND: This cross-sectional retrospective study utilized Natural Language Processing (NLP) to extract tobacco-use associated variables from clinical notes documented in the Electronic Health Record (EHR). OBJECITVE: To develop a rule-based algorithm for determining the present status of the patient's tobacco-use. METHODS: Clinical notes (n= 5,371 documents) from 363 patients were mined and classified by NLP software into four classes namely: "Current Smoker", "Past Smoker", "Nonsmoker" and "Unknown". Two coders manually classified these documents into above mentioned classes (document-level gold standard classification (DLGSC)). A tobacco-use status was derived per patient (patient-level gold standard classification (PLGSC)), based on individual documents' status by the same two coders. The DLGSC and PLGSC were compared to the results derived from NLP and rule-based algorithm, respectively. RESULTS: The initial Cohen's kappa (n= 1,000 documents) was 0.9448 (95% CI = 0.9281-0.9615), indicating a strong agreement between the two raters. Subsequently, for 371 documents the Cohen's kappa was 0.9889 (95% CI = 0.979-1.000). The F-measures for the document-level classification for the four classes were 0.700, 0.753, 0.839 and 0.988 while the patient-level classifications were 0.580, 0.771, 0.730 and 0.933 respectively. CONCLUSIONS: NLP and the rule-based algorithm exhibited utility for deriving the present tobacco-use status of patients. Current strategies are targeting further improvement in precision to enhance translational value of the tool.
Assuntos
Mineração de Dados/métodos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Processamento de Linguagem Natural , Uso de Tabaco/epidemiologia , Algoritmos , Estudos Transversais , Humanos , Estudos RetrospectivosRESUMO
Gold nanorods are widely known for their photothermal properties to treat solid tumors. Our work demonstrates the unrealized capacity to image these reagents in liquid at high resolution using Transmission Electron Microscopy (TEM). Here we perform the first atomic measurements of functionalized nanorods in solution while visualizing their dynamic behaviour with TEM.