Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 606(7913): 382-388, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35614220

RESUMEN

Mitochondria are epicentres of eukaryotic metabolism and bioenergetics. Pioneering efforts in recent decades have established the core protein componentry of these organelles1 and have linked their dysfunction to more than 150 distinct disorders2,3. Still, hundreds of mitochondrial proteins lack clear functions4, and the underlying genetic basis for approximately 40% of mitochondrial disorders remains unresolved5. Here, to establish a more complete functional compendium of human mitochondrial proteins, we profiled more than 200 CRISPR-mediated HAP1 cell knockout lines using mass spectrometry-based multiomics analyses. This effort generated approximately 8.3 million distinct biomolecule measurements, providing a deep survey of the cellular responses to mitochondrial perturbations and laying a foundation for mechanistic investigations into protein function. Guided by these data, we discovered that PIGY upstream open reading frame (PYURF) is an S-adenosylmethionine-dependent methyltransferase chaperone that supports both complex I assembly and coenzyme Q biosynthesis and is disrupted in a previously unresolved multisystemic mitochondrial disorder. We further linked the putative zinc transporter SLC30A9 to mitochondrial ribosomes and OxPhos integrity and established RAB5IF as the second gene harbouring pathogenic variants that cause cerebrofaciothoracic dysplasia. Our data, which can be explored through the interactive online MITOMICS.app resource, suggest biological roles for many other orphan mitochondrial proteins that still lack robust functional characterization and define a rich cell signature of mitochondrial dysfunction that can support the genetic diagnosis of mitochondrial diseases.


Asunto(s)
Mitocondrias , Proteínas Mitocondriales , Proteínas de Transporte de Catión , Proteínas de Ciclo Celular , Metabolismo Energético , Humanos , Espectrometría de Masas , Mitocondrias/genética , Mitocondrias/metabolismo , Enfermedades Mitocondriales/genética , Enfermedades Mitocondriales/metabolismo , Proteínas Mitocondriales/genética , Proteínas Mitocondriales/metabolismo , Factores de Transcripción , Proteínas de Unión al GTP rab5
2.
Am J Respir Cell Mol Biol ; 67(4): 430-437, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-35580164

RESUMEN

Chromosome 17q12-q21 is the most replicated genetic locus for childhood-onset asthma. Polymorphisms in this locus containing ∼10 genes interact with a variety of environmental exposures in the home and outdoors to modify asthma risk. However, the functional basis for these associations and their linkages to the environment have remained enigmatic. Within this extended region, regulation of GSDMB (gasdermin B) expression in airway epithelial cells has emerged as the primary mechanism underlying the 17q12-q21 genome-wide association study signal. Asthma-associated SNPs influence the abundance of GSDMB transcripts as well as the functional properties of GSDMB protein in airway epithelial cells. GSDMB is a member of the gasdermin family of proteins, which regulate pyroptosis and inflammatory responses to microbial infections. The aims of this review are to synthesize recent studies on the relationship of 17q12-q21 SNPs to childhood asthma and the evidence pointing to GSDMB gene expression or protein function as the underlying mechanism and to explore the potential functions of GSDMB that may influence the risk of developing asthma during childhood.


Asunto(s)
Asma , Estudio de Asociación del Genoma Completo , Proteínas Citotóxicas Formadoras de Poros/genética , Asma/genética , Asma/metabolismo , Sitios Genéticos , Predisposición Genética a la Enfermedad , Humanos , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , Polimorfismo de Nucleótido Simple
3.
BMC Genomics ; 21(1): 771, 2020 Nov 09.
Artículo en Inglés | MEDLINE | ID: mdl-33167865

RESUMEN

BACKGROUND: Deep neural networks (DNN) are a particular case of artificial neural networks (ANN) composed by multiple hidden layers, and have recently gained attention in genome-enabled prediction of complex traits. Yet, few studies in genome-enabled prediction have assessed the performance of DNN compared to traditional regression models. Strikingly, no clear superiority of DNN has been reported so far, and results seem highly dependent on the species and traits of application. Nevertheless, the relatively small datasets used in previous studies, most with fewer than 5000 observations may have precluded the full potential of DNN. Therefore, the objective of this study was to investigate the impact of the dataset sample size on the performance of DNN compared to Bayesian regression models for genome-enable prediction of body weight in broilers by sub-sampling 63,526 observations of the training set. RESULTS: Predictive performance of DNN improved as sample size increased, reaching a plateau at about 0.32 of prediction correlation when 60% of the entire training set size was used (i.e., 39,510 observations). Interestingly, DNN showed superior prediction correlation using up to 3% of training set, but poorer prediction correlation after that compared to Bayesian Ridge Regression (BRR) and Bayes Cπ. Regardless of the amount of data used to train the predictive machines, DNN displayed the lowest mean square error of prediction compared to all other approaches. The predictive bias was lower for DNN compared to Bayesian models, across all dataset sizes, with estimates close to one with larger sample sizes. CONCLUSIONS: DNN had worse prediction correlation compared to BRR and Bayes Cπ, but improved mean square error of prediction and bias relative to both Bayesian models for genome-enabled prediction of body weight in broilers. Such findings, highlights advantages and disadvantages between predictive approaches depending on the criterion used for comparison. Furthermore, the inclusion of more data per se is not a guarantee for the DNN to outperform the Bayesian regression methods commonly used for genome-enabled prediction. Nonetheless, further analysis is necessary to detect scenarios where DNN can clearly outperform Bayesian benchmark models.


Asunto(s)
Pollos , Herencia Multifactorial , Animales , Teorema de Bayes , Peso Corporal , Pollos/genética , Redes Neurales de la Computación , Tamaño de la Muestra
4.
Bioinformatics ; 35(15): 2657-2659, 2019 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-30534948

RESUMEN

SUMMARY: Understanding the regulatory roles of non-coding genetic variants has become a central goal for interpreting results of genome-wide association studies. The regulatory significance of the variants may be interrogated by assessing their influence on transcription factor binding. We have developed atSNP Search, a comprehensive web database for evaluating motif matches to the human genome with both reference and variant alleles and assessing the overall significance of the variant alterations on the motif matches. Convenient search features, comprehensive search outputs and a useful help menu are key components of atSNP Search. atSNP Search enables convenient interpretation of regulatory variants by statistical significance testing and composite logo plots, which are graphical representations of motif matches with the reference and variant alleles. Existing motif-based regulatory variant discovery tools only consider a limited pool of variants due to storage or other limitations. In contrast, atSNP Search users can test more than 37 billion variant-motif pairs with marginal significance in motif matches or match alteration. Computational evidence from atSNP Search, when combined with experimental validation, may help with the discovery of underlying disease mechanisms. AVAILABILITY AND IMPLEMENTATION: atSNP Search is freely available at http://atsnp.biostat.wisc.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Programas Informáticos , Variación Genética , Humanos , Unión Proteica , Factores de Transcripción
5.
PLoS Comput Biol ; 15(6): e1006758, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-31246951

RESUMEN

Many biological studies involve either (i) manipulating some aspect of a cell or its environment and then simultaneously measuring the effect on thousands of genes, or (ii) systematically manipulating each gene and then measuring the effect on some response of interest. A common challenge that arises in these studies is to explain how genes identified as relevant in the given experiment are organized into a subnetwork that accounts for the response of interest. The task of inferring a subnetwork is typically dependent on the information available in publicly available, structured databases, which suffer from incompleteness. However, a wealth of potentially relevant information resides in the scientific literature, such as information about genes associated with certain concepts of interest, as well as interactions that occur among various biological entities. We contend that by exploiting this information, we can improve the explanatory power and accuracy of subnetwork inference in multiple applications. Here we propose and investigate several ways in which information extracted from the scientific literature can be used to augment subnetwork inference. We show that we can use literature-extracted information to (i) augment the set of entities identified as being relevant in a subnetwork inference task, (ii) augment the set of interactions used in the process, and (iii) support targeted browsing of a large inferred subnetwork by identifying entities and interactions that are closely related to concepts of interest. We use this approach to uncover the pathways involved in interactions between a virus and a host cell, and the pathways that are regulated by a transcription factor associated with breast cancer. Our experimental results demonstrate that these approaches can provide more accurate and more interpretable subnetworks. Integer program code, background network data, and pathfinding code are available at https://github.com/Craven-Biostat-Lab/subnetwork_inference.


Asunto(s)
Biología Computacional/métodos , Minería de Datos/métodos , Redes Reguladoras de Genes/genética , Mapeo de Interacción de Proteínas/métodos , Mapas de Interacción de Proteínas/genética , Bases de Datos Genéticas , VIH , Infecciones por VIH/genética , Infecciones por VIH/virología , Humanos
6.
J Surg Res ; 246: 160-169, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-31586890

RESUMEN

BACKGROUND: A major roadblock to reducing the mortality of colorectal cancer (CRC) is prompt detection and treatment, and a simple blood test is likely to have higher compliance than all of the current methods. The purpose of this report is to examine the utility of a mass spectrometry-based blood serum protein biomarker test for detection of CRC. MATERIALS AND METHODS: Blood was drawn from individuals (n = 213) before colonoscopy or from patients with nonmetastatic CRC (n = 50) before surgery. Proteins were isolated from the serum of patients using targeted liquid chromatography-tandem mass spectrometry. We designed a machine-learning statistical model to assess these proteins. RESULTS: When considered individually, over 70% of the selected biomarkers showed significance by Mann-Whitney testing for distinguishing cancer-bearing cases from cancer-free cases. Using machine-learning methods, peptides derived from epidermal growth factor receptor and leucine-rich alpha-2-glycoprotein 1 were consistently identified as highly predictive for detecting CRC from cancer-free cases. A five-marker panel consisting of leucine-rich alpha-2-glycoprotein 1, epidermal growth factor receptor, inter-alpha-trypsin inhibitor heavy-chain family member 4, hemopexin, and superoxide dismutase 3 performed the best with 70% specificity at over 89% sensitivity (area under the curve = 0.86) in the validation set. For distinguishing regional from localized cancers, cross-validation within the training set showed that a panel of four proteins consisting of CD44 molecule, GC-vitamin D-binding protein, C-reactive protein, and inter-alpha-trypsin inhibitor heavy-chain family member 3 yielded the highest performance (area under the curve = 0.75). CONCLUSIONS: The minimally invasive blood biomarker panels identified here could serve as screening/detection alternatives for CRC in a human population and potentially useful for staging of existing cancer.


Asunto(s)
Biomarcadores de Tumor/sangre , Neoplasias Colorrectales/diagnóstico , Detección Precoz del Cáncer/métodos , Metástasis Linfática/diagnóstico , Tamizaje Masivo/métodos , Adulto , Anciano , Anciano de 80 o más Años , Colectomía , Colonoscopía , Neoplasias Colorrectales/sangre , Neoplasias Colorrectales/patología , Neoplasias Colorrectales/cirugía , Estudios Transversales , Estudios de Factibilidad , Femenino , Humanos , Metástasis Linfática/patología , Masculino , Espectrometría de Masas/métodos , Persona de Mediana Edad , Estadificación de Neoplasias , Proyectos Piloto , Estudios Prospectivos , Curva ROC
7.
Prev Med ; 136: 106061, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32179026

RESUMEN

Just under half of the 85.7 million US adults with hypertension have uncontrolled blood pressure using a hypertension threshold of systolic pressure ≥ 140 or diastolic pressure ≥ 90. Uncontrolled hypertension increases risks of death, stroke, heart failure, and myocardial infarction. Guidelines on hypertension management include lifestyle modification such as diet and exercise. In order to improve hypertension control, it is important to identify predictors of lifestyle modification assessment or advice to tailor future interventions using these effective, low-risk interventions. Electronic health record data from 14,360 adult hypertension patients at an academic medical center were analyzed using statistical and machine learning methods to identify predictors and timing of lifestyle modification. Multiple variables were statistically significant in analysis of lifestyle modification documentation at multiple time points. Random Forest was the best machine learning method to classify lifestyle modification documentation at any time with Area Under the Receiver Operator Curve (AUROC) 0.831. Logistic regression was the best machine learning method for classifying lifestyle modification documentation at ≤3 months with an AUROC of 0.685. Analyzing narrative and coded data from electronic health records can improve understanding of timing of lifestyle modification and patient, clinic and provider characteristics that are correlated with or predictive of documentation of lifestyle modification for hypertension. This information can inform improvement efforts in hypertension care processes, treatment implementation, and ultimately hypertension control.


Asunto(s)
Registros Electrónicos de Salud , Hipertensión , Adulto , Documentación , Humanos , Hipertensión/prevención & control , Estilo de Vida , Aprendizaje Automático
8.
Respir Res ; 20(1): 115, 2019 Jun 10.
Artículo en Inglés | MEDLINE | ID: mdl-31182091

RESUMEN

BACKGROUND: Single birth cohort studies have been the basis for many discoveries about early life risk factors for childhood asthma but are limited in scope by sample size and characteristics of the local environment and population. The Children's Respiratory and Environmental Workgroup (CREW) was established to integrate multiple established asthma birth cohorts and to investigate asthma phenotypes and associated causal pathways (endotypes), focusing on how they are influenced by interactions between genetics, lifestyle, and environmental exposures during the prenatal period and early childhood. METHODS AND RESULTS: CREW is funded by the NIH Environmental influences on Child Health Outcomes (ECHO) program, and consists of 12 individual cohorts and three additional scientific centers. The CREW study population is diverse in terms of race, ethnicity, geographical distribution, and year of recruitment. We hypothesize that there are phenotypes in childhood asthma that differ based on clinical characteristics and underlying molecular mechanisms. Furthermore, we propose that asthma endotypes and their defining biomarkers can be identified based on personal and early life environmental risk factors. CREW has three phases: 1) to pool and harmonize existing data from each cohort, 2) to collect new data using standardized procedures, and 3) to enroll new families during the prenatal period to supplement and enrich extant data and enable unified systems approaches for identifying asthma phenotypes and endotypes. CONCLUSIONS: The overall goal of CREW program is to develop a better understanding of how early life environmental exposures and host factors interact to promote the development of specific asthma endotypes.


Asunto(s)
Asma/diagnóstico , Asma/epidemiología , Exposición a Riesgos Ambientales/análisis , Estilo de Vida , Vigilancia de la Población/métodos , Adolescente , Asma/genética , Niño , Preescolar , Estudios de Cohortes , Exposición a Riesgos Ambientales/prevención & control , Femenino , Humanos , Lactante , Masculino , Adulto Joven
9.
PLoS Pathog ; 12(3): e1005499, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26962864

RESUMEN

Herpes simplex virus type 1 causes mucocutaneous lesions, and is the leading cause of infectious blindness in the United States. Animal studies have shown that the severity of HSV-1 ocular disease is influenced by three main factors; innate immunity, host immune response and viral strain. We previously showed that mixed infection with two avirulent HSV-1 strains (OD4 and CJ994) resulted in recombinants that exhibit a range of disease phenotypes from severe to avirulent, suggesting epistatic interactions were involved. The goal of this study was to develop a quantitative trait locus (QTL) analysis of HSV-1 ocular virulence determinants and to identify virulence associated SNPs. Blepharitis and stromal keratitis quantitative scores were characterized for 40 OD4:CJ994 recombinants. Viral titers in the eye were also measured. Virulence quantitative trait locus mapping (vQTLmap) was performed using the Lasso, Random Forest, and Ridge regression methods to identify significant phenotypically meaningful regions for each ocular disease parameter. The most predictive Ridge regression model identified several phenotypically meaningful SNPs for blepharitis and stromal keratitis. Notably, phenotypically meaningful nonsynonymous variations were detected in the UL24, UL29 (ICP8), UL41 (VHS), UL53 (gK), UL54 (ICP27), UL56, ICP4, US1 (ICP22), US3 and gG genes. Network analysis revealed that many of these variations were in HSV-1 regulatory networks and viral genes that affect innate immunity. Several genes previously implicated in virulence were identified, validating this approach, while other genes were novel. Several novel polymorphisms were also identified in these genes. This approach provides a framework that will be useful for identifying virulence genes in other pathogenic viruses, as well as epistatic effects that affect HSV-1 ocular virulence.


Asunto(s)
Infecciones del Ojo/inmunología , Herpesvirus Humano 1/genética , Sitios de Carácter Cuantitativo/genética , Animales , Secuencia de Bases , Chlorocebus aethiops , ADN Viral/genética , Infecciones del Ojo/virología , Estudios de Asociación Genética , Herpesvirus Humano 1/inmunología , Herpesvirus Humano 1/patogenicidad , Modelos Lineales , Ratones , Datos de Secuencia Molecular , Alineación de Secuencia , Análisis de Secuencia de ADN , Células Vero , Virulencia , Factores de Virulencia , Replicación Viral
10.
PLoS Comput Biol ; 13(6): e1005466, 2017 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-28570593

RESUMEN

Various types of biological knowledge describe networks of interactions among elementary entities. For example, transcriptional regulatory networks consist of interactions among proteins and genes. Current knowledge about the exact structure of such networks is highly incomplete, and laboratory experiments that manipulate the entities involved are conducted to test hypotheses about these networks. In recent years, various automated approaches to experiment selection have been proposed. Many of these approaches can be characterized as active machine learning algorithms. Active learning is an iterative process in which a model is learned from data, hypotheses are generated from the model to propose informative experiments, and the experiments yield new data that is used to update the model. This review describes the various models, experiment selection strategies, validation techniques, and successful applications described in the literature; highlights common themes and notable distinctions among methods; and identifies likely directions of future research and open problems in the area.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Automático Supervisado , Algoritmos , Redes Reguladoras de Genes , Redes y Vías Metabólicas , Proyectos de Investigación
11.
J Virol ; 90(18): 8115-31, 2016 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-27384650

RESUMEN

UNLABELLED: Herpes simplex virus 1 (HSV-1) most commonly causes recrudescent labial ulcers; however, it is also the leading cause of infectious blindness in developed countries. Previous research in animal models has demonstrated that the severity of HSV-1 ocular disease is influenced by three main factors: host innate immunity, host immune response, and viral strain. We have previously shown that mixed infection with two avirulent HSV-1 strains (OD4 and CJ994) results in recombinants with a wide range of ocular disease phenotype severity. Recently, we developed a quantitative trait locus (QTL)-based computational approach (vQTLmap) to identify viral single nucleotide polymorphisms (SNPs) predicted to influence the severity of the ocular disease phenotypes. We have now applied vQTLmap to identify HSV-1 SNPs associated with corneal neovascularization and mean peak percentage weight loss (MPWL) using 65 HSV-1 OD4-CJ994 recombinants. The vQTLmap analysis using Random Forest for neovascularization identified phenotypically meaningful nonsynonymous SNPs in the ICP4, UL41 (VHS), UL42, UL46 (VP11/12), UL47 (VP13/14), UL48 (VP22), US3, US4 (gG), US6 (gD), and US7 (gI) coding regions. The ICP4 gene was previously identified as a corneal neovascularization determinant, validating the vQTLmap method. Further analysis detected an epistatic interaction for neovascularization between a segment of the unique long (UL) region and a segment of the inverted repeat short (IRS)/unique short (US) region. Ridge regression was used to identify MPWL-associated nonsynonymous SNPs in the UL1 (gL), UL2, UL4, UL49 (VP22), UL50, and ICP4 coding regions. The data provide additional insights into virulence gene and epistatic interaction discovery in HSV-1. IMPORTANCE: Herpes simplex virus 1 (HSV-1) typically causes recurrent cold sores; however, it is also the leading source of infectious blindness in developed countries. Corneal neovascularization is critical for the progression of blinding ocular disease, and weight loss is a measure of infection severity. Previous HSV-1 animal virulence studies have shown that the severity of ocular disease is partially due to the viral strain. In the current study, we used a recently described computational quantitative trait locus (QTL) approach in conjunction with 65 HSV-1 recombinants to identify viral single nucleotide polymorphisms (SNPs) involved in neovascularization and weight loss. Neovascularization SNPs were identified in the ICP4, VHS, UL42, VP11/12, VP13/14, VP22, gG, US3, gD, and gI genes. Further analysis revealed an epistatic interaction between the UL and US regions. MPWL-associated SNPs were detected in the UL1 (gL), UL2, UL4, VP22, UL50, and ICP4 genes. This approach will facilitate future HSV virulence studies.


Asunto(s)
Neovascularización de la Córnea/patología , Epistasis Genética , Genes Virales , Herpes Simple/patología , Herpesvirus Humano 1/patogenicidad , Factores de Virulencia/genética , Pérdida de Peso , Animales , Sitios Genéticos , Herpes Simple/virología , Ratones , Polimorfismo de Nucleótido Simple
12.
Ann Surg ; 263(6): 1213-8, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-27167563

RESUMEN

OBJECTIVES: To evaluate the association between multiple complications and postoperative outcomes and to assess which complications occur together in patients with multiple complications. BACKGROUND: Patients who suffer multiple complications have increased risk of prolonged hospital stay and mortality. However, little is known about what places patients at risk for multiple complications or which complications tend to occur in these patients. METHODS: Surgical patients were identified from the American College of Surgeons National Quality Improvement Program (ACS NSQIP) database from 2005 to 2011. The frequency of postoperative complications was assessed. Patients with less than two complications were compared with patients who had multiple complications using χ and logistic regression analysis. Relationships among postoperative complications were explored by learning a Bayesian network model. RESULTS: The study population consisted of 470,108 general surgery patients. The overall complication rate was 15% with multiple complications in 27,032 (6%) patients. Patients with multiple complications had worse postoperative outcomes (P < 0.001). The strongest predictors for developing multiple complications were admission from chronic care facility or nursing home, dependent functional status, and higher American Society of Anesthesiologist Physical Status classification. In patients with multiple complications, the most common complication was sepsis (42%), followed by failure to wean ventilator (31%), and organ space surgical site infection (27%). We found that severe complications were most strongly associated with development of multiple complications. Using a Bayesian network, we were able to identify how strongly associated specific complications were in patients who developed multiple complications. CONCLUSIONS: Almost half (40%) of patients with complications suffer multiple complications. Patient factors such as frailty and comorbidity strongly predict the development of multiple complications. The results of our Bayesian analysis identify targets for interventions aimed at disrupting the cascade of multiple complications in high-risk patients.


Asunto(s)
Cirugía General , Complicaciones Posoperatorias/epidemiología , Adolescente , Adulto , Anciano , Teorema de Bayes , Comorbilidad , Bases de Datos Factuales , Femenino , Mortalidad Hospitalaria , Humanos , Tiempo de Internación/estadística & datos numéricos , Masculino , Persona de Mediana Edad , Complicaciones Posoperatorias/mortalidad , Pronóstico , Factores de Riesgo , Estados Unidos/epidemiología
13.
J Virol ; 89(14): 7214-23, 2015 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-25926637

RESUMEN

UNLABELLED: Herpes simplex virus 1 (HSV-1) causes recurrent mucocutaneous ulcers and is the leading cause of infectious blindness and sporadic encephalitis in the United States. HSV-1 has been shown to be highly recombinogenic; however, to date, there has been no genome-wide analysis of recombination. To address this, we generated 40 HSV-1 recombinants derived from two parental strains, OD4 and CJ994. The 40 OD4-CJ994 HSV-1 recombinants were sequenced using the Illumina sequencing system, and recombination breakpoints were determined for each of the recombinants using the Bootscan program. Breakpoints occurring in the terminal inverted repeats were excluded from analysis to prevent double counting, resulting in a total of 272 breakpoints in the data set. By placing windows around the 272 breakpoints followed by Monte Carlo analysis comparing actual data to simulated data, we identified a recombination bias toward both high GC content and intergenic regions. A Monte Carlo analysis also suggested that recombination did not appear to be responsible for the generation of the spontaneous nucleotide mutations detected following sequencing. Additionally, kernel density estimation analysis across the genome found that the large, inverted repeats comprise a recombination hot spot. IMPORTANCE: Herpes simplex virus 1 (HSV-1) virus is the leading cause of sporadic encephalitis and blinding keratitis in developed countries. HSV-1 has been shown to be highly recombinogenic, and recombination itself appears to be a significant component of genome replication. To date, there has been no genome-wide analysis of recombination. Here we present the findings of the first genome-wide study of recombination performed by generating and sequencing 40 HSV-1 recombinants derived from the OD4 and CJ994 parental strains, followed by bioinformatics analysis. Recombination breakpoints were determined, yielding 272 breakpoints in the full data set. Kernel density analysis determined that the large inverted repeats constitute a recombination hot spot. Additionally, Monte Carlo analyses found biases toward high GC content and intergenic and repetitive regions.


Asunto(s)
ADN Viral/genética , Herpesvirus Humano 1/genética , Recombinación Genética , Animales , Composición de Base , Chlorocebus aethiops , ADN Viral/química , Análisis de Secuencia de ADN , Células Vero
14.
Mol Syst Biol ; 10: 759, 2014 Nov 19.
Artículo en Inglés | MEDLINE | ID: mdl-25411400

RESUMEN

Stressed cells coordinate a multi-faceted response spanning many levels of physiology. Yet knowledge of the complete stress-activated regulatory network as well as design principles for signal integration remains incomplete. We developed an experimental and computational approach to integrate available protein interaction data with gene fitness contributions, mutant transcriptome profiles, and phospho-proteome changes in cells responding to salt stress, to infer the salt-responsive signaling network in yeast. The inferred subnetwork presented many novel predictions by implicating new regulators, uncovering unrecognized crosstalk between known pathways, and pointing to previously unknown 'hubs' of signal integration. We exploited these predictions to show that Cdc14 phosphatase is a central hub in the network and that modification of RNA polymerase II coordinates induction of stress-defense genes with reduction of growth-related transcripts. We find that the orthologous human network is enriched for cancer-causing genes, underscoring the importance of the subnetwork's predictions in understanding stress biology.


Asunto(s)
Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Ciclo Celular/metabolismo , Biología Computacional/métodos , Perfilación de la Expresión Génica , Regulación Fúngica de la Expresión Génica , Aptitud Genética , Proteínas Tirosina Fosfatasas/metabolismo , ARN Polimerasa II/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Transducción de Señal , Cloruro de Sodio/metabolismo , Estrés Fisiológico
15.
J Gen Intern Med ; 30(5): 556-64, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25373831

RESUMEN

BACKGROUND: Only 38% of young adults with hypertension have controlled blood pressure. Lifestyle education is a critical initial step for hypertension control. Previous studies have not assessed the type and frequency of lifestyle education in young adults with incident hypertension. OBJECTIVE: The purpose of this study was to determine patient, provider, and visit predictors of documented lifestyle education among young adults with incident hypertension. DESIGN: We conducted a retrospective analysis of manually abstracted electronic health record data. PARTICIPANTS: A random selection of adults 18-39 years old (n = 500), managed by a large academic practice from 2008 to 2011 and who met JNC 7 clinical criteria for incident hypertension, participated in the study. MAIN MEASURES: The primary outcome was the presence of any documented lifestyle education during one year after meeting criteria for incident hypertension. Abstracted topics included documented patient education for exercise, tobacco cessation, alcohol use, stress management/stress reduction, Dietary Approaches to Stop Hypertension (DASH) diet, and weight loss. Clinic visits were categorized based upon a modified established taxonomy to characterize patients' patterns of outpatient service. We excluded patients with previous hypertension diagnoses, previous antihypertensive medications, or pregnancy. Logistic regression was used to identify predictors of documented education. KEY RESULTS: Overall, 55% (n = 275) of patients had documented lifestyle education within one year of incident hypertension. Exercise was the most frequent topic (64%). Young adult males had significantly decreased odds of receiving documented education. Patients with a previous diagnosis of hyperlipidemia or a family history of hypertension or coronary artery disease had increased odds of documented education. Among visit types, chronic disease visits predicted documented lifestyle education, but not acute or other/preventive visits. CONCLUSIONS: Among young adults with incident hypertension, only 55% had documented lifestyle education within one year. Knowledge of patient, provider, and visit predictors of education can help better target the development of interventions to improve young adult health education and hypertension control.


Asunto(s)
Antihipertensivos/uso terapéutico , Hipertensión/diagnóstico , Hipertensión/terapia , Cooperación del Paciente/estadística & datos numéricos , Educación del Paciente como Asunto/métodos , Adolescente , Adulto , Factores de Edad , Determinación de la Presión Sanguínea/métodos , Estudios de Cohortes , Intervalos de Confianza , Dieta , Registros Electrónicos de Salud , Femenino , Estudios de Seguimiento , Humanos , Estilo de Vida , Masculino , Oportunidad Relativa , Estudios Retrospectivos , Medición de Riesgo , Índice de Severidad de la Enfermedad , Factores Sexuales , Resultado del Tratamiento , Adulto Joven
16.
PLoS Comput Biol ; 10(5): e1003626, 2014 May.
Artículo en Inglés | MEDLINE | ID: mdl-24874113

RESUMEN

Systematic, genome-wide loss-of-function experiments can be used to identify host factors that directly or indirectly facilitate or inhibit the replication of a virus in a host cell. We present an approach that combines an integer linear program and a diffusion kernel method to infer the pathways through which those host factors modulate viral replication. The inputs to the method are a set of viral phenotypes observed in single-host-gene mutants and a background network consisting of a variety of host intracellular interactions. The output is an ensemble of subnetworks that provides a consistent explanation for the measured phenotypes, predicts which unassayed host factors modulate the virus, and predicts which host factors are the most direct interfaces with the virus. We infer host-virus interaction subnetworks using data from experiments screening the yeast genome for genes modulating the replication of two RNA viruses. Because a gold-standard network is unavailable, we assess the predicted subnetworks using both computational and qualitative analyses. We conduct a cross-validation experiment in which we predict whether held-aside test genes have an effect on viral replication. Our approach is able to make high-confidence predictions more accurately than several baselines, and about as well as the best baseline, which does not infer mechanistic pathways. We also examine two kinds of predictions made by our method: which host factors are nearest to a direct interaction with a viral component, and which unassayed host genes are likely to be involved in viral replication. Multiple predictions are supported by recent independent experimental data, or are components or functional partners of confirmed relevant complexes or pathways. Integer program code, background network data, and inferred host-virus subnetworks are available at http://www.biostat.wisc.edu/~craven/chasman_host_virus/.


Asunto(s)
Transformación Celular Viral/fisiología , Proteínas Fúngicas/metabolismo , Virus ARN/fisiología , Transducción de Señal/fisiología , Replicación Viral/fisiología , Levaduras/metabolismo , Levaduras/virología , Regulación Fúngica de la Expresión Génica/fisiología , Genes Virales
17.
PLoS Comput Biol ; 9(9): e1003235, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24068911

RESUMEN

Systematic, genome-wide RNA interference (RNAi) analysis is a powerful approach to identify gene functions that support or modulate selected biological processes. An emerging challenge shared with some other genome-wide approaches is that independent RNAi studies often show limited agreement in their lists of implicated genes. To better understand this, we analyzed four genome-wide RNAi studies that identified host genes involved in influenza virus replication. These studies collectively identified and validated the roles of 614 cell genes, but pair-wise overlap among the four gene lists was only 3% to 15% (average 6.7%). However, a number of functional categories were overrepresented in multiple studies. The pair-wise overlap of these enriched-category lists was high, ∼19%, implying more agreement among studies than apparent at the gene level. Probing this further, we found that the gene lists implicated by independent studies were highly connected in interacting networks by independent functional measures such as protein-protein interactions, at rates significantly higher than predicted by chance. We also developed a general, model-based approach to gauge the effects of false-positive and false-negative factors and to estimate, from a limited number of studies, the total number of genes involved in a process. For influenza virus replication, this novel statistical approach estimates the total number of cell genes involved to be ∼2,800. This and multiple other aspects of our experimental and computational results imply that, when following good quality control practices, the low overlap between studies is primarily due to false negatives rather than false-positive gene identifications. These results and methods have implications for and applications to multiple forms of genome-wide analysis.


Asunto(s)
Genes Virales , Orthomyxoviridae/genética , Interferencia de ARN , Replicación Viral/genética , Reacciones Falso Negativas , Reacciones Falso Positivas , Técnicas de Silenciamiento del Gen , Funciones de Verosimilitud , Orthomyxoviridae/fisiología
18.
Cell Genom ; 4(1): 100466, 2024 Jan 10.
Artículo en Inglés | MEDLINE | ID: mdl-38190108

RESUMEN

The data-intensive fields of genomics and machine learning (ML) are in an early stage of convergence. Genomics researchers increasingly seek to harness the power of ML methods to extract knowledge from their data; conversely, ML scientists recognize that genomics offers a wealth of large, complex, and well-annotated datasets that can be used as a substrate for developing biologically relevant algorithms and applications. The National Human Genome Research Institute (NHGRI) inquired with researchers working in these two fields to identify common challenges and receive recommendations to better support genomic research efforts using ML approaches. Those included increasing the amount and variety of training datasets by integrating genomic with multiomics, context-specific (e.g., by cell type), and social determinants of health datasets; reducing the inherent biases of training datasets; prioritizing transparency and interpretability of ML methods; and developing privacy-preserving technologies for research participants' data.


Asunto(s)
Bioética , Genómica , Humanos , Algoritmos , Privacidad , Aprendizaje Automático
19.
mSystems ; 9(6): e0141523, 2024 Jun 18.
Artículo en Inglés | MEDLINE | ID: mdl-38819130

RESUMEN

Wastewater surveillance has emerged as a crucial public health tool for population-level pathogen surveillance. Supported by funding from the American Rescue Plan Act of 2021, the FDA's genomic epidemiology program, GenomeTrakr, was leveraged to sequence SARS-CoV-2 from wastewater sites across the United States. This initiative required the evaluation, optimization, development, and publication of new methods and analytical tools spanning sample collection through variant analyses. Version-controlled protocols for each step of the process were developed and published on protocols.io. A custom data analysis tool and a publicly accessible dashboard were built to facilitate real-time visualization of the collected data, focusing on the relative abundance of SARS-CoV-2 variants and sub-lineages across different samples and sites throughout the project. From September 2021 through June 2023, a total of 3,389 wastewater samples were collected, with 2,517 undergoing sequencing and submission to NCBI under the umbrella BioProject, PRJNA757291. Sequence data were released with explicit quality control (QC) tags on all sequence records, communicating our confidence in the quality of data. Variant analysis revealed wide circulation of Delta in the fall of 2021 and captured the sweep of Omicron and subsequent diversification of this lineage through the end of the sampling period. This project successfully achieved two important goals for the FDA's GenomeTrakr program: first, contributing timely genomic data for the SARS-CoV-2 pandemic response, and second, establishing both capacity and best practices for culture-independent, population-level environmental surveillance for other pathogens of interest to the FDA. IMPORTANCE: This paper serves two primary objectives. First, it summarizes the genomic and contextual data collected during a Covid-19 pandemic response project, which utilized the FDA's laboratory network, traditionally employed for sequencing foodborne pathogens, for sequencing SARS-CoV-2 from wastewater samples. Second, it outlines best practices for gathering and organizing population-level next generation sequencing (NGS) data collected for culture-free, surveillance of pathogens sourced from environmental samples.


Asunto(s)
COVID-19 , SARS-CoV-2 , United States Food and Drug Administration , Aguas Residuales , SARS-CoV-2/genética , Estados Unidos/epidemiología , Aguas Residuales/virología , COVID-19/epidemiología , COVID-19/transmisión , COVID-19/prevención & control , COVID-19/virología , Humanos , Pandemias/prevención & control , Genoma Viral/genética , Monitoreo Epidemiológico Basado en Aguas Residuales
20.
BMC Bioinformatics ; 13 Suppl 11: S5, 2012 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-22759459

RESUMEN

BACKGROUND: Biomedical event extraction has attracted substantial attention as it can assist researchers in understanding the plethora of interactions among genes that are described in publications in molecular biology. While most recent work has focused on abstracts, the BioNLP 2011 shared task evaluated the submitted systems on both abstracts and full papers. In this article, we describe our submission to the shared task which decomposes event extraction into a set of classification tasks that can be learned either independently or jointly using the search-based structured prediction framework. Our intention is to explore how these two learning paradigms compare in the context of the shared task. RESULTS: We report that models learned using search-based structured prediction exceed the accuracy of independently learned classifiers by 8.3 points in F-score, with the gains being more pronounced on the more complex Regulation events (13.23 points). Furthermore, we show how the trade-off between recall and precision can be adjusted in both learning paradigms and that search-based structured prediction achieves better recall at all precision points. Finally, we report on experiments with a simple domain-adaptation method, resulting in the second-best performance achieved by a single system. CONCLUSIONS: We demonstrate that joint inference using the search-based structured prediction framework can achieve better performance than independently learned classifiers, thus demonstrating the potential of this learning paradigm for event extraction and other similarly complex information-extraction tasks.


Asunto(s)
Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural , Algoritmos , Almacenamiento y Recuperación de la Información/economía , Almacenamiento y Recuperación de la Información/métodos , Publicaciones Periódicas como Asunto
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA