Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Alzheimers Dement ; 20(2): 975-985, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37830443

RESUMEN

INTRODUCTION: Little is known about the heterogeneous treatment effects of metformin on dementia risk in people with type 2 diabetes (T2D). METHODS: Participants (≥ 50 years) with T2D and normal cognition at baseline were identified from the National Alzheimer's Coordinating Center database (2005-2021). We applied a doubly robust learning approach to estimate risk differences (RD) with a 95% confidence interval (CI) for dementia risk between metformin use and no use in the overall population and subgroups identified through a decision tree model. RESULTS: Among 1393 participants, 104 developed dementia over a 4-year median follow-up. Metformin was significantly associated with a lower risk of dementia in the overall population (RD, -3.2%; 95% CI, -6.2% to -0.2%). We identified four subgroups with varied risks for dementia, defined by neuropsychiatric disorders, non-steroidal anti-inflammatory drugs, and antidepressant use. DISCUSSION: Metformin use was significantly associated with a lower risk of dementia in individuals with T2D, with significant variability among subgroups.


Asunto(s)
Demencia , Diabetes Mellitus Tipo 2 , Metformina , Humanos , Diabetes Mellitus Tipo 2/complicaciones , Diabetes Mellitus Tipo 2/tratamiento farmacológico , Diabetes Mellitus Tipo 2/epidemiología , Metformina/uso terapéutico , Hipoglucemiantes/uso terapéutico , Heterogeneidad del Efecto del Tratamiento , Demencia/tratamiento farmacológico , Demencia/epidemiología , Demencia/etiología
2.
NPJ Digit Med ; 6(1): 210, 2023 Nov 16.
Artículo en Inglés | MEDLINE | ID: mdl-37973919

RESUMEN

There are enormous enthusiasm and concerns in applying large language models (LLMs) to healthcare. Yet current assumptions are based on general-purpose LLMs such as ChatGPT, which are not developed for medical use. This study develops a generative clinical LLM, GatorTronGPT, using 277 billion words of text including (1) 82 billion words of clinical text from 126 clinical departments and approximately 2 million patients at the University of Florida Health and (2) 195 billion words of diverse general English text. We train GatorTronGPT using a GPT-3 architecture with up to 20 billion parameters and evaluate its utility for biomedical natural language processing (NLP) and healthcare text generation. GatorTronGPT improves biomedical natural language processing. We apply GatorTronGPT to generate 20 billion words of synthetic text. Synthetic NLP models trained using synthetic text generated by GatorTronGPT outperform models trained using real-world clinical text. Physicians' Turing test using 1 (worst) to 9 (best) scale shows that there are no significant differences in linguistic readability (p = 0.22; 6.57 of GatorTronGPT compared with 6.93 of human) and clinical relevance (p = 0.91; 7.0 of GatorTronGPT compared with 6.97 of human) and that physicians cannot differentiate them (p < 0.001). This study provides insights into the opportunities and challenges of LLMs for medical research and healthcare.

3.
JMIR Res Protoc ; 12: e48521, 2023 Nov 09.
Artículo en Inglés | MEDLINE | ID: mdl-37943599

RESUMEN

BACKGROUND: Hospital-induced delirium is one of the most common and costly iatrogenic conditions, and its incidence is predicted to increase as the population of the United States ages. An academic and clinical interdisciplinary systems approach is needed to reduce the frequency and impact of hospital-induced delirium. OBJECTIVE: The long-term goal of our research is to enhance the safety of hospitalized older adults by reducing iatrogenic conditions through an effective learning health system. In this study, we will develop models for predicting hospital-induced delirium. In order to accomplish this objective, we will create a computable phenotype for our outcome (hospital-induced delirium), design an expert-based traditional logistic regression model, leverage machine learning techniques to generate a model using structured data, and use machine learning and natural language processing to produce an integrated model with components from both structured data and text data. METHODS: This study will explore text-based data, such as nursing notes, to improve the predictive capability of prognostic models for hospital-induced delirium. By using supervised and unsupervised text mining in addition to structured data, we will examine multiple types of information in electronic health record data to predict medical-surgical patient risk of developing delirium. Development and validation will be compliant to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement. RESULTS: Work on this project will take place through March 2024. For this study, we will use data from approximately 332,230 encounters that occurred between January 2012 to May 2021. Findings from this project will be disseminated at scientific conferences and in peer-reviewed journals. CONCLUSIONS: Success in this study will yield a durable, high-performing research-data infrastructure that will process, extract, and analyze clinical text data in near real time. This model has the potential to be integrated into the electronic health record and provide point-of-care decision support to prevent harm and improve quality of care. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/48521.

4.
JMIR Aging ; 6: e43185, 2023 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-37910448

RESUMEN

BACKGROUND: Delirium, an acute confusional state highlighted by inattention, has been reported to occur in 10% to 50% of patients with COVID-19. People hospitalized with COVID-19 have been noted to present with or develop delirium and neurocognitive disorders. Caring for patients with delirium is associated with more burden for nurses, clinicians, and caregivers. Using information in electronic health record data to recognize delirium and possibly COVID-19 could lead to earlier treatment of the underlying viral infection and improve outcomes in clinical and health care systems cost per patient. Clinical data repositories can further support rapid discovery through cohort identification tools, such as the Informatics for Integrating Biology and the Bedside tool. OBJECTIVE: The specific aim of this research was to investigate delirium in hospitalized older adults as a possible presenting symptom in COVID-19 using a data repository to identify neurocognitive disorders with a novel group of International Classification of Diseases, Tenth Revision (ICD-10) codes. METHODS: We analyzed data from 2 catchment areas with different demographics. The first catchment area (7 counties in the North-Central Florida) is predominantly rural while the second (1 county in North Florida) is predominantly urban. The Integrating Biology and the Bedside data repository was queried for patients with COVID-19 admitted to inpatient units via the emergency department (ED) within the health center from April 1, 2020, and April 1, 2022. Patients with COVID-19 were identified by having a positive COVID-19 laboratory test or a diagnosis code of U07.1. We identified neurocognitive disorders as delirium or encephalopathy, using ICD-10 codes. RESULTS: Less than one-third (1437/4828, 29.8%) of patients with COVID-19 were diagnosed with a co-occurring neurocognitive disorder. A neurocognitive disorder was present on admission for 15.8% (762/4828) of all patients with COVID-19 admitted through the ED. Among patients with both COVID-19 and a neurocognitive disorder, 56.9% (817/1437) were aged ≥65 years, a significantly higher proportion than those with no neurocognitive disorder (P<.001). The proportion of patients aged <65 years was significantly higher among patients diagnosed with encephalopathy only than patients diagnosed with delirium only and both delirium and encephalopathy (P<.001). Most (1272/4828, 26.3%) patients with COVID-19 admitted through the ED during our study period were admitted during the Delta variant peak. CONCLUSIONS: The data collected demonstrated that an increased number of older patients with neurocognitive disorder present on admission were infected with COVID-19. Knowing that delirium increases the staffing, nursing care needs, hospital resources used, and the length of stay as previously noted, identifying delirium early may benefit hospital administration when planning for newly anticipated COVID-19 surges. A robust and accessible data repository, such as the one used in this study, can provide invaluable support to clinicians and clinical administrators in such resource reallocation and clinical decision-making.

5.
PLoS One ; 18(10): e0292888, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37862334

RESUMEN

OBJECTIVE: This study aimed to develop and validate predictive models using electronic health records (EHR) data to determine whether hospitalized COVID-19-positive patients would be admitted to alternative medical care or discharged home. METHODS: We conducted a retrospective cohort study using deidentified data from the University of Florida Health Integrated Data Repository. The study included 1,578 adult patients (≥18 years) who tested positive for COVID-19 while hospitalized, comprising 960 (60.8%) female patients with a mean (SD) age of 51.86 (18.49) years and 618 (39.2%) male patients with a mean (SD) age of 54.35 (18.48) years. Machine learning (ML) model training involved cross-validation to assess their performance in predicting patient disposition. RESULTS: We developed and validated six supervised ML-based prediction models (logistic regression, Gaussian Naïve Bayes, k-nearest neighbors, decision trees, random forest, and support vector machine classifier) to predict patient discharge status. The models were evaluated based on the area under the receiver operating characteristic curve (ROC-AUC), precision, accuracy, F1 score, and Brier score. The random forest classifier exhibited the highest performance, achieving an accuracy of 0.84 and an AUC of 0.72. Logistic regression (accuracy: 0.85, AUC: 0.71), k-nearest neighbor (accuracy: 0.84, AUC: 0.63), decision tree (accuracy: 0.84, AUC: 0.61), Gaussian Naïve Bayes (accuracy: 0.84, AUC: 0.66), and support vector machine classifier (accuracy: 0.84, AUC: 0.67) also demonstrated valuable predictive capabilities. SIGNIFICANCE: This study's findings are crucial for efficiently allocating healthcare resources during pandemics like COVID-19. By harnessing ML techniques and EHR data, we can create predictive tools to identify patients at greater risk of severe symptoms based on their medical histories. The models developed here serve as a foundation for expanding the toolkit available to healthcare professionals and organizations. Additionally, explainable ML methods, such as Shapley Additive Explanations, aid in uncovering underlying data features that inform healthcare decision-making processes.


Asunto(s)
COVID-19 , Alta del Paciente , Adulto , Humanos , Persona de Mediana Edad , Estudios Retrospectivos , Registros Electrónicos de Salud , Teorema de Bayes , COVID-19/epidemiología , Aprendizaje Automático
6.
BMC Med Inform Decis Mak ; 23(1): 181, 2023 09 13.
Artículo en Inglés | MEDLINE | ID: mdl-37704994

RESUMEN

BACKGROUND: Prognostic models of hospital-induced delirium, that include potential predisposing and precipitating factors, may be used to identify vulnerable patients and inform the implementation of tailored preventive interventions. It is recommended that, in prediction model development studies, candidate predictors are selected on the basis of existing knowledge, including knowledge from clinical practice. The purpose of this article is to describe the process of identifying and operationalizing candidate predictors of hospital-induced delirium for application in a prediction model development study using a practice-based approach. METHODS: This study is part of a larger, retrospective cohort study that is developing prognostic models of hospital-induced delirium for medical-surgical older adult patients using structured data from administrative and electronic health records. First, we conducted a review of the literature to identify clinical concepts that had been used as candidate predictors in prognostic model development-and-validation studies of hospital-induced delirium. Then, we consulted a multidisciplinary task force of nine members who independently judged whether each clinical concept was associated with hospital-induced delirium. Finally, we mapped the clinical concepts to the administrative and electronic health records and operationalized our candidate predictors. RESULTS: In the review of 34 studies, we identified 504 unique clinical concepts. Two-thirds of the clinical concepts (337/504) were used as candidate predictors only once. The most common clinical concepts included age (31/34), sex (29/34), and alcohol use (22/34). 96% of the clinical concepts (484/504) were judged to be associated with the development of hospital-induced delirium by at least two members of the task force. All of the task force members agreed that 47 or 9% of the 504 clinical concepts were associated with hospital-induced delirium. CONCLUSIONS: Heterogeneity among candidate predictors of hospital-induced delirium in the literature suggests a still evolving list of factors that contribute to the development of this complex phenomenon. We demonstrated a practice-based approach to variable selection for our model development study of hospital-induced delirium. Expert judgement of variables enabled us to categorize the variables based on the amount of agreement among the experts and plan for the development of different models, including an expert-model and data-driven model.


Asunto(s)
Comités Consultivos , Delirio , Humanos , Anciano , Estudios Retrospectivos , Consumo de Bebidas Alcohólicas , Hospitales , Delirio/diagnóstico
7.
J Clin Transl Sci ; 7(1): e149, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37456264

RESUMEN

Objective: This study aims to develop a generalizable architecture for enhancing an enterprise data warehouse for research (EDW4R) with results from a natural language processing (NLP) model, which allows discrete data derived from clinical notes to be made broadly available for research use without need for NLP expertise. The study also quantifies the additional value that information extracted from clinical narratives brings to EDW4R. Materials and methods: Clinical notes written during one month at an academic health center were used to evaluate the performance of an existing NLP model and to quantify its value added to the structured data. Manual review was utilized for performance analysis. The architecture for enhancing the EDW4R is described in detail to enable reproducibility. Results: Two weeks were needed to enhance EDW4R with data from 250 million clinical notes. NLP generated 16 and 39% increase in data availability for two variables. Discussion: Our architecture is highly generalizable to a new NLP model. The positive predictive value obtained by an independent team showed only slightly lower NLP performance than the values reported by the NLP developers. The NLP showed significant value added to data already available in structured format. Conclusion: Given the value added by data extracted using NLP, it is important to enhance EDW4R with these data to enable research teams without NLP expertise to benefit from value added by NLP models.

8.
Int J Med Inform ; 177: 105115, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37302362

RESUMEN

OBJECTIVE: The objective of this study is to validate and report on portability and generalizability of a Natural Language Processing (NLP) method to extract individual social factors from clinical notes, which was originally developed at a different institution. MATERIALS AND METHODS: A rule-based deterministic state machine NLP model was developed to extract financial insecurity and housing instability using notes from one institution and was applied on all notes written during 6 months at another institution. 10% of positively-classified notes by NLP and the same number of negatively-classified notes were manually annotated. The NLP model was adjusted to accommodate notes at the new site. Accuracy, positive predictive value, sensitivity, and specificity were calculated. RESULTS: More than 6 million notes were processed at the receiving site by the NLP model, which resulted in about 13,000 and 19,000 classified as positive for financial insecurity and housing instability, respectively. The NLP model showed excellent performance on the validation dataset with all measures over 0.87 for both social factors. DISCUSSION: Our study illustrated the need to accommodate institution-specific note-writing templates as well as clinical terminology of emergent diseases when applying NLP model for social factors. A state machine is relatively simple to port effectively across institutions. Our study. showed superior performance to similar generalizability studies for extracting social factors. CONCLUSION: Rule-based NLP model to extract social factors from clinical notes showed strong portability and generalizability across organizationally and geographically distinct institutions. With only relatively simple modifications, we obtained promising performance from an NLP-based model.


Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Humanos , Algoritmos , Instituciones de Salud
9.
Health Aff Sch ; 1(4): qxad047, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-38756741

RESUMEN

Variation in availability, format, and standardization of patient attributes across health care organizations impacts patient-matching performance. We report on the changing nature of patient-matching features available from 2010-2020 across diverse care settings. We asked 38 health care provider organizations about their current patient attribute data-collection practices. All sites collected name, date of birth (DOB), address, and phone number. Name, DOB, current address, social security number (SSN), sex, and phone number were most commonly used for cross-provider patient matching. Electronic health record queries for a subset of 20 participating sites revealed that DOB, first name, last name, city, and postal codes were highly available (>90%) across health care organizations and time. SSN declined slightly in the last years of the study period. Birth sex, gender identity, language, country full name, country abbreviation, health insurance number, ethnicity, cell phone number, email address, and weight increased over 50% from 2010 to 2020. Understanding the wide variation in available patient attributes across care settings in the United States can guide selection and standardization efforts for improved patient matching in the United States.

10.
NPJ Digit Med ; 5(1): 194, 2022 Dec 26.
Artículo en Inglés | MEDLINE | ID: mdl-36572766

RESUMEN

There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest of which trained in the clinical domain is comparatively small at 110 million parameters (compared with billions of parameters in the general domain). It is not clear how large clinical language models with billions of parameters can help medical AI systems utilize unstructured EHRs. In this study, we develop from scratch a large clinical language model-GatorTron-using >90 billion words of text (including >82 billion words of de-identified clinical text) and systematically evaluate it on five clinical NLP tasks including clinical concept extraction, medical relation extraction, semantic textual similarity, natural language inference (NLI), and medical question answering (MQA). We examine how (1) scaling up the number of parameters and (2) scaling up the size of the training data could benefit these NLP tasks. GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve five clinical NLP tasks (e.g., 9.6% and 9.5% improvement in accuracy for NLI and MQA), which can be applied to medical AI systems to improve healthcare delivery. The GatorTron models are publicly available at: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/gatortron_og .

11.
J Am Med Inform Assoc ; 30(1): 54-63, 2022 12 13.
Artículo en Inglés | MEDLINE | ID: mdl-36214629

RESUMEN

OBJECTIVE: Federated learning (FL) allows multiple distributed data holders to collaboratively learn a shared model without data sharing. However, individual health system data are heterogeneous. "Personalized" FL variations have been developed to counter data heterogeneity, but few have been evaluated using real-world healthcare data. The purpose of this study is to investigate the performance of a single-site versus a 3-client federated model using a previously described Coronavirus Disease 19 (COVID-19) diagnostic model. Additionally, to investigate the effect of system heterogeneity, we evaluate the performance of 4 FL variations. MATERIALS AND METHODS: We leverage a FL healthcare collaborative including data from 5 international healthcare systems (US and Europe) encompassing 42 hospitals. We implemented a COVID-19 computer vision diagnosis system using the Federated Averaging (FedAvg) algorithm implemented on Clara Train SDK 4.0. To study the effect of data heterogeneity, training data was pooled from 3 systems locally and federation was simulated. We compared a centralized/pooled model, versus FedAvg, and 3 personalized FL variations (FedProx, FedBN, and FedAMP). RESULTS: We observed comparable model performance with respect to internal validation (local model: AUROC 0.94 vs FedAvg: 0.95, P = .5) and improved model generalizability with the FedAvg model (P < .05). When investigating the effects of model heterogeneity, we observed poor performance with FedAvg on internal validation as compared to personalized FL algorithms. FedAvg did have improved generalizability compared to personalized FL algorithms. On average, FedBN had the best rank performance on internal and external validation. CONCLUSION: FedAvg can significantly improve the generalization of the model compared to other personalization FL algorithms; however, at the cost of poor internal validity. Personalized FL may offer an opportunity to develop both internal and externally validated algorithms.


Asunto(s)
Prueba de COVID-19 , COVID-19 , Humanos , Hospitales , Aprendizaje , Europa (Continente) , Estados Unidos
12.
MDM Policy Pract ; 7(1): 23814683221089844, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35368410

RESUMEN

Objective. The COVID-19 pandemic created an unprecedented strain on the health care system, and administrators had to make many critical decisions to respond appropriately. This study sought to understand how health care administrators used data and information for decision making during the first 6 mo of the COVID-19 pandemic. Materials and Methods. We conducted semistructured interviews with administrators across University of Florida (UF) Health. We performed an inductive thematic analysis of the transcripts. Results. Four themes emerged from the interviews: 1) common types of health systems or hospital operations data; 2) public health and other external data sources; 3) data interaction, integration, and exchange; and 4) novelty and evolution in data, information, or tools used over time. Participants illustrated the organizational, public health, and regional information they considered essential (e.g., hospital census, community positivity rate, etc.). Participants named specific challenges they faced due to data quality and timeliness. Participants elaborated on the necessity of data integration, validation, and coordination across different boundaries (e.g., different hospital systems in the same metro areas, public health agencies at the local, state, and federal level, etc.). Participants indicated that even within the first 6 mo of the COVID-19 pandemic, the data and tools used for making critical decisions changed. Discussion. While existing medical informatics infrastructure can facilitate decision making in pandemic response, data may not always be readily available in a usable format. Interoperable infrastructure and data standardization across multiple health systems would help provide more reliable and timely information for decision making. Conclusion. Our findings contribute to future discussions of improving data infrastructure and developing harmonized data standards needed to facilitate critical decisions at multiple health care system levels. Highlights: The study revealed common health systems or hospital operations data and information used in decision making during the first 6 mo of the COVID-19 pandemic.Participants described commonly used internal data sources, such as resource and financial reports and dashboards, and external data sources, such as federal, state, and local public health data.Participants described challenges including poor timeliness and limited local relevance of external data as well as poor integration of data sources within and across organizational boundaries.Results suggest the need for continued integration and standardization of health data to support health care administrative decision making during pandemics or other emergencies.

13.
J Am Med Inform Assoc ; 29(4): 686-693, 2022 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-34664656

RESUMEN

The OneFlorida Data Trust is a centralized research patient data repository created and managed by the OneFlorida Clinical Research Consortium ("OneFlorida"). It comprises structured electronic health record (EHR), administrative claims, tumor registry, death, and other data on 17.2 million individuals who received healthcare in Florida between January 2012 and the present. Ten healthcare systems in Miami, Orlando, Tampa, Jacksonville, Tallahassee, Gainesville, and rural areas of Florida contribute EHR data, covering the major metropolitan regions in Florida. Deduplication of patients is accomplished via privacy-preserving entity resolution (precision 0.97-0.99, recall 0.75), thereby linking patients' EHR, claims, and death data. Another unique feature is the establishment of mother-baby relationships via Florida vital statistics data. Research usage has been significant, including major studies launched in the National Patient-Centered Clinical Research Network ("PCORnet"), where OneFlorida is 1 of 9 clinical research networks. The Data Trust's robust, centralized, statewide data are a valuable and relatively unique research resource.


Asunto(s)
Registros Electrónicos de Salud , Investigación Biomédica Traslacional , Florida , Humanos , Privacidad
14.
J Am Med Inform Assoc ; 29(3): 461-471, 2022 01 29.
Artículo en Inglés | MEDLINE | ID: mdl-34897493

RESUMEN

OBJECTIVE: This study aimed to understand the association between primary care physician (PCP) proficiency with the electronic health record (EHR) system and time spent interacting with the EHR. MATERIALS AND METHODS: We examined the use of EHR proficiency tools among PCPs at one large academic health system using EHR-derived measures of clinician EHR proficiency and efficiency. Our main predictors were the use of EHR proficiency tools and our outcomes focused on 4 measures assessing time spent in the EHR: (1) total time spent interacting with the EHR, (2) time spent outside scheduled clinical hours, (3) time spent documenting, and (4) time spent on inbox management. We conducted multivariable quantile regression models with fixed effects for physician-level factors and time in order to identify factors that were independently associated with time spent in the EHR. RESULTS: Across 441 primary care physicians, we found mixed associations between certain EHR proficiency behaviors and time spent in the EHR. Across EHR activities studied, QuickActions, SmartPhrases, and documentation length were positively associated with increased time spent in the EHR. Models also showed a greater amount of help from team members in note writing was associated with less time spent in the EHR and documenting. DISCUSSION: Examining the prevalence of EHR proficiency behaviors may suggest targeted areas for initial and ongoing EHR training. Although documentation behaviors are key areas for training, team-based models for documentation and inbox management require further study. CONCLUSIONS: A nuanced association exists between physician EHR proficiency and time spent in the EHR.


Asunto(s)
Registros Electrónicos de Salud , Médicos de Atención Primaria , Documentación , Humanos , Análisis de Regresión
15.
Science ; 348(6239): 1139-43, 2015 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-25977371

RESUMEN

The evolution of eusociality is one of the major transitions in evolution, but the underlying genomic changes are unknown. We compared the genomes of 10 bee species that vary in social complexity, representing multiple independent transitions in social evolution, and report three major findings. First, many important genes show evidence of neutral evolution as a consequence of relaxed selection with increasing social complexity. Second, there is no single road map to eusociality; independent evolutionary transitions in sociality have independent genetic underpinnings. Third, though clearly independent in detail, these transitions do have similar general features, including an increase in constrained protein evolution accompanied by increases in the potential for gene regulation and decreases in diversity and abundance of transposable elements. Eusociality may arise through different mechanisms each time, but would likely always involve an increase in the complexity of gene networks.


Asunto(s)
Abejas/genética , Evolución Molecular , Flujo Genético , Conducta Social , Transcriptoma , N-Acetiltransferasa de Aminoácidos , Animales , Abejas/clasificación , Elementos Transponibles de ADN , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Genoma de los Insectos/genética , Filogenia , Selección Genética , Factores de Transcripción/química , Factores de Transcripción/genética
16.
Appl Environ Microbiol ; 80(13): 3793-803, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24747890

RESUMEN

Here, we report the genome of one gammaproteobacterial member of the gut microbiota, for which we propose the name "Candidatus Schmidhempelia bombi," that was inadvertently sequenced alongside the genome of its host, the bumble bee, Bombus impatiens. This symbiont is a member of the recently described bacterial order Orbales, which has been collected from the guts of diverse insect species; however, "Ca. Schmidhempelia" has been identified exclusively with bumble bees. Metabolic reconstruction reveals that "Ca. Schmidhempelia" lacks many genes for a functioning NADH dehydrogenase I, all genes for the high-oxygen cytochrome o, and most genes in the tricarboxylic acid (TCA) cycle. "Ca. Schmidhempelia" has retained NADH dehydrogenase II, the low-oxygen specific cytochrome bd, anaerobic nitrate respiration, mixed-acid fermentation pathways, and citrate fermentation, which may be important for survival in low-oxygen or anaerobic environments found in the bee hindgut. Additionally, a type 6 secretion system, a Flp pilus, and many antibiotic/multidrug transporters suggest complex interactions with its host and other gut commensals or pathogens. This genome has signatures of reduction (2.0 megabase pairs) and rearrangement, as previously observed for genomes of host-associated bacteria. A survey of wild and laboratory B. impatiens revealed that "Ca. Schmidhempelia" is present in 90% of individuals and, therefore, may provide benefits to its host.


Asunto(s)
Abejas/microbiología , Gammaproteobacteria/clasificación , Gammaproteobacteria/aislamiento & purificación , Genoma Bacteriano , Redes y Vías Metabólicas/genética , Simbiosis , Animales , ADN Bacteriano/química , ADN Bacteriano/genética , Gammaproteobacteria/genética , Gammaproteobacteria/fisiología , Datos de Secuencia Molecular , Análisis de Secuencia de ADN
17.
Bioinformatics ; 29(14): 1718-25, 2013 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-23665771

RESUMEN

MOTIVATION: A large and rapidly growing number of bacterial organisms have been sequenced by the newest sequencing technologies. Cheaper and faster sequencing technologies make it easy to generate very high coverage of bacterial genomes, but these advances mean that DNA preparation costs can exceed the cost of sequencing for small genomes. The need to contain costs often results in the creation of only a single sequencing library, which in turn introduces new challenges for genome assembly methods. RESULTS: We evaluated the ability of multiple genome assembly programs to assemble bacterial genomes from a single, deep-coverage library. For our comparison, we chose bacterial species spanning a wide range of GC content and measured the contiguity and accuracy of the resulting assemblies. We compared the assemblies produced by this very high-coverage, one-library strategy to the best assemblies created by two-library sequencing, and we found that remarkably good bacterial assemblies are possible with just one library. We also measured the effect of read length and depth of coverage on assembly quality and determined the values that provide the best results with current algorithms. CONTACT: salzberg@jhu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma Bacteriano , Genómica/métodos , Programas Informáticos , Algoritmos , Biblioteca de Genes , Análisis de Secuencia de ADN
18.
Evol Bioinform Online ; 9: 127-36, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23531787

RESUMEN

BACKGROUND: The expression levels of bacterial genes can be measured directly using next-generation sequencing (NGS) methods, offering much greater sensitivity and accuracy than earlier, microarray-based methods. Most bioinformatics software for estimating levels of gene expression from NGS data has been designed for eukaryotic genomes, with algorithms focusing particularly on detection of splicing patterns. These methods do not perform well on bacterial genomes. RESULTS: Here we describe the first software system designed explicitly for quantifying the degree of gene expression in bacteria and other prokaryotes. EDGE-pro (Estimated Degree of Gene Expression in PROkaryotes) processes the raw data from an RNA-seq experiment on a bacterial or archaeal species and produces estimates of the expression levels for each gene in these gene-dense genomes. SOFTWARE: The EDGE-pro tool is implemented as a pipeline of C++ and Perl programs and is freely available as open-source code at http://www.genomics.jhu.edu/software/EDGE/index.shtml.

19.
Genome Res ; 22(3): 557-67, 2012 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-22147368

RESUMEN

New sequencing technology has dramatically altered the landscape of whole-genome sequencing, allowing scientists to initiate numerous projects to decode the genomes of previously unsequenced organisms. The lowest-cost technology can generate deep coverage of most species, including mammals, in just a few days. The sequence data generated by one of these projects consist of millions or billions of short DNA sequences (reads) that range from 50 to 150 nt in length. These sequences must then be assembled de novo before most genome analyses can begin. Unfortunately, genome assembly remains a very difficult problem, made more difficult by shorter reads and unreliable long-range linking information. In this study, we evaluated several of the leading de novo assembly algorithms on four different short-read data sets, all generated by Illumina sequencers. Our results describe the relative performance of the different assemblers as well as other significant differences in assembly difficulty that appear to be inherent in the genomes themselves. Three overarching conclusions are apparent: first, that data quality, rather than the assembler itself, has a dramatic effect on the quality of an assembled genome; second, that the degree of contiguity of an assembly varies enormously among different assemblers and different genomes; and third, that the correctness of an assembly also varies widely and is not well correlated with statistics on contiguity. To enable others to replicate our results, all of our data and methods are freely available, as are all assemblers used in this study.


Asunto(s)
Algoritmos , Genómica/métodos , Análisis de Secuencia de ADN , Animales , Biología Computacional/métodos , Genoma , Genoma Bacteriano/genética , Humanos , Internet , Reproducibilidad de los Resultados
20.
Bioinformatics ; 27(21): 2957-63, 2011 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-21903629

RESUMEN

MOTIVATION: Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. RESULTS: We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of <1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds. AVAILABILITY AND IMPLEMENTATION: The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash. CONTACT: t.magoc@gmail.com.


Asunto(s)
Genómica/métodos , Análisis de Secuencia de ADN , Programas Informáticos , Cromosomas Humanos Par 14 , Genoma , Humanos , Staphylococcus aureus/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...