Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 606(7913): 382-388, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35614220

RESUMO

Mitochondria are epicentres of eukaryotic metabolism and bioenergetics. Pioneering efforts in recent decades have established the core protein componentry of these organelles1 and have linked their dysfunction to more than 150 distinct disorders2,3. Still, hundreds of mitochondrial proteins lack clear functions4, and the underlying genetic basis for approximately 40% of mitochondrial disorders remains unresolved5. Here, to establish a more complete functional compendium of human mitochondrial proteins, we profiled more than 200 CRISPR-mediated HAP1 cell knockout lines using mass spectrometry-based multiomics analyses. This effort generated approximately 8.3 million distinct biomolecule measurements, providing a deep survey of the cellular responses to mitochondrial perturbations and laying a foundation for mechanistic investigations into protein function. Guided by these data, we discovered that PIGY upstream open reading frame (PYURF) is an S-adenosylmethionine-dependent methyltransferase chaperone that supports both complex I assembly and coenzyme Q biosynthesis and is disrupted in a previously unresolved multisystemic mitochondrial disorder. We further linked the putative zinc transporter SLC30A9 to mitochondrial ribosomes and OxPhos integrity and established RAB5IF as the second gene harbouring pathogenic variants that cause cerebrofaciothoracic dysplasia. Our data, which can be explored through the interactive online MITOMICS.app resource, suggest biological roles for many other orphan mitochondrial proteins that still lack robust functional characterization and define a rich cell signature of mitochondrial dysfunction that can support the genetic diagnosis of mitochondrial diseases.


Assuntos
Mitocôndrias , Proteínas Mitocondriais , Proteínas de Transporte de Cátions , Proteínas de Ciclo Celular , Metabolismo Energético , Humanos , Espectrometria de Massas , Mitocôndrias/genética , Mitocôndrias/metabolismo , Doenças Mitocondriais/genética , Doenças Mitocondriais/metabolismo , Proteínas Mitocondriais/genética , Proteínas Mitocondriais/metabolismo , Fatores de Transcrição , Proteínas rab5 de Ligação ao GTP
2.
Am J Respir Cell Mol Biol ; 67(4): 430-437, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35580164

RESUMO

Chromosome 17q12-q21 is the most replicated genetic locus for childhood-onset asthma. Polymorphisms in this locus containing ∼10 genes interact with a variety of environmental exposures in the home and outdoors to modify asthma risk. However, the functional basis for these associations and their linkages to the environment have remained enigmatic. Within this extended region, regulation of GSDMB (gasdermin B) expression in airway epithelial cells has emerged as the primary mechanism underlying the 17q12-q21 genome-wide association study signal. Asthma-associated SNPs influence the abundance of GSDMB transcripts as well as the functional properties of GSDMB protein in airway epithelial cells. GSDMB is a member of the gasdermin family of proteins, which regulate pyroptosis and inflammatory responses to microbial infections. The aims of this review are to synthesize recent studies on the relationship of 17q12-q21 SNPs to childhood asthma and the evidence pointing to GSDMB gene expression or protein function as the underlying mechanism and to explore the potential functions of GSDMB that may influence the risk of developing asthma during childhood.


Assuntos
Asma , Estudo de Associação Genômica Ampla , Proteínas Citotóxicas Formadoras de Poros/genética , Asma/genética , Asma/metabolismo , Loci Gênicos , Predisposição Genética para Doença , Humanos , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , Polimorfismo de Nucleotídeo Único
3.
BMC Genomics ; 21(1): 771, 2020 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-33167865

RESUMO

BACKGROUND: Deep neural networks (DNN) are a particular case of artificial neural networks (ANN) composed by multiple hidden layers, and have recently gained attention in genome-enabled prediction of complex traits. Yet, few studies in genome-enabled prediction have assessed the performance of DNN compared to traditional regression models. Strikingly, no clear superiority of DNN has been reported so far, and results seem highly dependent on the species and traits of application. Nevertheless, the relatively small datasets used in previous studies, most with fewer than 5000 observations may have precluded the full potential of DNN. Therefore, the objective of this study was to investigate the impact of the dataset sample size on the performance of DNN compared to Bayesian regression models for genome-enable prediction of body weight in broilers by sub-sampling 63,526 observations of the training set. RESULTS: Predictive performance of DNN improved as sample size increased, reaching a plateau at about 0.32 of prediction correlation when 60% of the entire training set size was used (i.e., 39,510 observations). Interestingly, DNN showed superior prediction correlation using up to 3% of training set, but poorer prediction correlation after that compared to Bayesian Ridge Regression (BRR) and Bayes Cπ. Regardless of the amount of data used to train the predictive machines, DNN displayed the lowest mean square error of prediction compared to all other approaches. The predictive bias was lower for DNN compared to Bayesian models, across all dataset sizes, with estimates close to one with larger sample sizes. CONCLUSIONS: DNN had worse prediction correlation compared to BRR and Bayes Cπ, but improved mean square error of prediction and bias relative to both Bayesian models for genome-enabled prediction of body weight in broilers. Such findings, highlights advantages and disadvantages between predictive approaches depending on the criterion used for comparison. Furthermore, the inclusion of more data per se is not a guarantee for the DNN to outperform the Bayesian regression methods commonly used for genome-enabled prediction. Nonetheless, further analysis is necessary to detect scenarios where DNN can clearly outperform Bayesian benchmark models.


Assuntos
Galinhas , Herança Multifatorial , Animais , Teorema de Bayes , Peso Corporal , Galinhas/genética , Redes Neurais de Computação , Tamanho da Amostra
4.
Bioinformatics ; 35(15): 2657-2659, 2019 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-30534948

RESUMO

SUMMARY: Understanding the regulatory roles of non-coding genetic variants has become a central goal for interpreting results of genome-wide association studies. The regulatory significance of the variants may be interrogated by assessing their influence on transcription factor binding. We have developed atSNP Search, a comprehensive web database for evaluating motif matches to the human genome with both reference and variant alleles and assessing the overall significance of the variant alterations on the motif matches. Convenient search features, comprehensive search outputs and a useful help menu are key components of atSNP Search. atSNP Search enables convenient interpretation of regulatory variants by statistical significance testing and composite logo plots, which are graphical representations of motif matches with the reference and variant alleles. Existing motif-based regulatory variant discovery tools only consider a limited pool of variants due to storage or other limitations. In contrast, atSNP Search users can test more than 37 billion variant-motif pairs with marginal significance in motif matches or match alteration. Computational evidence from atSNP Search, when combined with experimental validation, may help with the discovery of underlying disease mechanisms. AVAILABILITY AND IMPLEMENTATION: atSNP Search is freely available at http://atsnp.biostat.wisc.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Software , Variação Genética , Humanos , Ligação Proteica , Fatores de Transcrição
5.
PLoS Comput Biol ; 15(6): e1006758, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31246951

RESUMO

Many biological studies involve either (i) manipulating some aspect of a cell or its environment and then simultaneously measuring the effect on thousands of genes, or (ii) systematically manipulating each gene and then measuring the effect on some response of interest. A common challenge that arises in these studies is to explain how genes identified as relevant in the given experiment are organized into a subnetwork that accounts for the response of interest. The task of inferring a subnetwork is typically dependent on the information available in publicly available, structured databases, which suffer from incompleteness. However, a wealth of potentially relevant information resides in the scientific literature, such as information about genes associated with certain concepts of interest, as well as interactions that occur among various biological entities. We contend that by exploiting this information, we can improve the explanatory power and accuracy of subnetwork inference in multiple applications. Here we propose and investigate several ways in which information extracted from the scientific literature can be used to augment subnetwork inference. We show that we can use literature-extracted information to (i) augment the set of entities identified as being relevant in a subnetwork inference task, (ii) augment the set of interactions used in the process, and (iii) support targeted browsing of a large inferred subnetwork by identifying entities and interactions that are closely related to concepts of interest. We use this approach to uncover the pathways involved in interactions between a virus and a host cell, and the pathways that are regulated by a transcription factor associated with breast cancer. Our experimental results demonstrate that these approaches can provide more accurate and more interpretable subnetworks. Integer program code, background network data, and pathfinding code are available at https://github.com/Craven-Biostat-Lab/subnetwork_inference.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Redes Reguladoras de Genes/genética , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/genética , Bases de Dados Genéticas , HIV , Infecções por HIV/genética , Infecções por HIV/virologia , Humanos
6.
J Surg Res ; 246: 160-169, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31586890

RESUMO

BACKGROUND: A major roadblock to reducing the mortality of colorectal cancer (CRC) is prompt detection and treatment, and a simple blood test is likely to have higher compliance than all of the current methods. The purpose of this report is to examine the utility of a mass spectrometry-based blood serum protein biomarker test for detection of CRC. MATERIALS AND METHODS: Blood was drawn from individuals (n = 213) before colonoscopy or from patients with nonmetastatic CRC (n = 50) before surgery. Proteins were isolated from the serum of patients using targeted liquid chromatography-tandem mass spectrometry. We designed a machine-learning statistical model to assess these proteins. RESULTS: When considered individually, over 70% of the selected biomarkers showed significance by Mann-Whitney testing for distinguishing cancer-bearing cases from cancer-free cases. Using machine-learning methods, peptides derived from epidermal growth factor receptor and leucine-rich alpha-2-glycoprotein 1 were consistently identified as highly predictive for detecting CRC from cancer-free cases. A five-marker panel consisting of leucine-rich alpha-2-glycoprotein 1, epidermal growth factor receptor, inter-alpha-trypsin inhibitor heavy-chain family member 4, hemopexin, and superoxide dismutase 3 performed the best with 70% specificity at over 89% sensitivity (area under the curve = 0.86) in the validation set. For distinguishing regional from localized cancers, cross-validation within the training set showed that a panel of four proteins consisting of CD44 molecule, GC-vitamin D-binding protein, C-reactive protein, and inter-alpha-trypsin inhibitor heavy-chain family member 3 yielded the highest performance (area under the curve = 0.75). CONCLUSIONS: The minimally invasive blood biomarker panels identified here could serve as screening/detection alternatives for CRC in a human population and potentially useful for staging of existing cancer.


Assuntos
Biomarcadores Tumorais/sangue , Neoplasias Colorretais/diagnóstico , Detecção Precoce de Câncer/métodos , Metástase Linfática/diagnóstico , Programas de Rastreamento/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Colectomia , Colonoscopia , Neoplasias Colorretais/sangue , Neoplasias Colorretais/patologia , Neoplasias Colorretais/cirurgia , Estudos Transversais , Estudos de Viabilidade , Feminino , Humanos , Metástase Linfática/patologia , Masculino , Espectrometria de Massas/métodos , Pessoa de Meia-Idade , Estadiamento de Neoplasias , Projetos Piloto , Estudos Prospectivos , Curva ROC
7.
Prev Med ; 136: 106061, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32179026

RESUMO

Just under half of the 85.7 million US adults with hypertension have uncontrolled blood pressure using a hypertension threshold of systolic pressure ≥ 140 or diastolic pressure ≥ 90. Uncontrolled hypertension increases risks of death, stroke, heart failure, and myocardial infarction. Guidelines on hypertension management include lifestyle modification such as diet and exercise. In order to improve hypertension control, it is important to identify predictors of lifestyle modification assessment or advice to tailor future interventions using these effective, low-risk interventions. Electronic health record data from 14,360 adult hypertension patients at an academic medical center were analyzed using statistical and machine learning methods to identify predictors and timing of lifestyle modification. Multiple variables were statistically significant in analysis of lifestyle modification documentation at multiple time points. Random Forest was the best machine learning method to classify lifestyle modification documentation at any time with Area Under the Receiver Operator Curve (AUROC) 0.831. Logistic regression was the best machine learning method for classifying lifestyle modification documentation at ≤3 months with an AUROC of 0.685. Analyzing narrative and coded data from electronic health records can improve understanding of timing of lifestyle modification and patient, clinic and provider characteristics that are correlated with or predictive of documentation of lifestyle modification for hypertension. This information can inform improvement efforts in hypertension care processes, treatment implementation, and ultimately hypertension control.


Assuntos
Registros Eletrônicos de Saúde , Hipertensão , Adulto , Documentação , Humanos , Hipertensão/prevenção & controle , Estilo de Vida , Aprendizado de Máquina
8.
Respir Res ; 20(1): 115, 2019 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-31182091

RESUMO

BACKGROUND: Single birth cohort studies have been the basis for many discoveries about early life risk factors for childhood asthma but are limited in scope by sample size and characteristics of the local environment and population. The Children's Respiratory and Environmental Workgroup (CREW) was established to integrate multiple established asthma birth cohorts and to investigate asthma phenotypes and associated causal pathways (endotypes), focusing on how they are influenced by interactions between genetics, lifestyle, and environmental exposures during the prenatal period and early childhood. METHODS AND RESULTS: CREW is funded by the NIH Environmental influences on Child Health Outcomes (ECHO) program, and consists of 12 individual cohorts and three additional scientific centers. The CREW study population is diverse in terms of race, ethnicity, geographical distribution, and year of recruitment. We hypothesize that there are phenotypes in childhood asthma that differ based on clinical characteristics and underlying molecular mechanisms. Furthermore, we propose that asthma endotypes and their defining biomarkers can be identified based on personal and early life environmental risk factors. CREW has three phases: 1) to pool and harmonize existing data from each cohort, 2) to collect new data using standardized procedures, and 3) to enroll new families during the prenatal period to supplement and enrich extant data and enable unified systems approaches for identifying asthma phenotypes and endotypes. CONCLUSIONS: The overall goal of CREW program is to develop a better understanding of how early life environmental exposures and host factors interact to promote the development of specific asthma endotypes.


Assuntos
Asma/diagnóstico , Asma/epidemiologia , Exposição Ambiental/análise , Estilo de Vida , Vigilância da População/métodos , Adolescente , Asma/genética , Criança , Pré-Escolar , Estudos de Coortes , Exposição Ambiental/prevenção & controle , Feminino , Humanos , Lactente , Masculino , Adulto Jovem
9.
PLoS Pathog ; 12(3): e1005499, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26962864

RESUMO

Herpes simplex virus type 1 causes mucocutaneous lesions, and is the leading cause of infectious blindness in the United States. Animal studies have shown that the severity of HSV-1 ocular disease is influenced by three main factors; innate immunity, host immune response and viral strain. We previously showed that mixed infection with two avirulent HSV-1 strains (OD4 and CJ994) resulted in recombinants that exhibit a range of disease phenotypes from severe to avirulent, suggesting epistatic interactions were involved. The goal of this study was to develop a quantitative trait locus (QTL) analysis of HSV-1 ocular virulence determinants and to identify virulence associated SNPs. Blepharitis and stromal keratitis quantitative scores were characterized for 40 OD4:CJ994 recombinants. Viral titers in the eye were also measured. Virulence quantitative trait locus mapping (vQTLmap) was performed using the Lasso, Random Forest, and Ridge regression methods to identify significant phenotypically meaningful regions for each ocular disease parameter. The most predictive Ridge regression model identified several phenotypically meaningful SNPs for blepharitis and stromal keratitis. Notably, phenotypically meaningful nonsynonymous variations were detected in the UL24, UL29 (ICP8), UL41 (VHS), UL53 (gK), UL54 (ICP27), UL56, ICP4, US1 (ICP22), US3 and gG genes. Network analysis revealed that many of these variations were in HSV-1 regulatory networks and viral genes that affect innate immunity. Several genes previously implicated in virulence were identified, validating this approach, while other genes were novel. Several novel polymorphisms were also identified in these genes. This approach provides a framework that will be useful for identifying virulence genes in other pathogenic viruses, as well as epistatic effects that affect HSV-1 ocular virulence.


Assuntos
Infecções Oculares/imunologia , Herpesvirus Humano 1/genética , Locos de Características Quantitativas/genética , Animais , Sequência de Bases , Chlorocebus aethiops , DNA Viral/genética , Infecções Oculares/virologia , Estudos de Associação Genética , Herpesvirus Humano 1/imunologia , Herpesvirus Humano 1/patogenicidade , Modelos Lineares , Camundongos , Dados de Sequência Molecular , Alinhamento de Sequência , Análise de Sequência de DNA , Células Vero , Virulência , Fatores de Virulência , Replicação Viral
10.
PLoS Comput Biol ; 13(6): e1005466, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28570593

RESUMO

Various types of biological knowledge describe networks of interactions among elementary entities. For example, transcriptional regulatory networks consist of interactions among proteins and genes. Current knowledge about the exact structure of such networks is highly incomplete, and laboratory experiments that manipulate the entities involved are conducted to test hypotheses about these networks. In recent years, various automated approaches to experiment selection have been proposed. Many of these approaches can be characterized as active machine learning algorithms. Active learning is an iterative process in which a model is learned from data, hypotheses are generated from the model to propose informative experiments, and the experiments yield new data that is used to update the model. This review describes the various models, experiment selection strategies, validation techniques, and successful applications described in the literature; highlights common themes and notable distinctions among methods; and identifies likely directions of future research and open problems in the area.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina Supervisionado , Algoritmos , Redes Reguladoras de Genes , Redes e Vias Metabólicas , Projetos de Pesquisa
11.
J Virol ; 90(18): 8115-31, 2016 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-27384650

RESUMO

UNLABELLED: Herpes simplex virus 1 (HSV-1) most commonly causes recrudescent labial ulcers; however, it is also the leading cause of infectious blindness in developed countries. Previous research in animal models has demonstrated that the severity of HSV-1 ocular disease is influenced by three main factors: host innate immunity, host immune response, and viral strain. We have previously shown that mixed infection with two avirulent HSV-1 strains (OD4 and CJ994) results in recombinants with a wide range of ocular disease phenotype severity. Recently, we developed a quantitative trait locus (QTL)-based computational approach (vQTLmap) to identify viral single nucleotide polymorphisms (SNPs) predicted to influence the severity of the ocular disease phenotypes. We have now applied vQTLmap to identify HSV-1 SNPs associated with corneal neovascularization and mean peak percentage weight loss (MPWL) using 65 HSV-1 OD4-CJ994 recombinants. The vQTLmap analysis using Random Forest for neovascularization identified phenotypically meaningful nonsynonymous SNPs in the ICP4, UL41 (VHS), UL42, UL46 (VP11/12), UL47 (VP13/14), UL48 (VP22), US3, US4 (gG), US6 (gD), and US7 (gI) coding regions. The ICP4 gene was previously identified as a corneal neovascularization determinant, validating the vQTLmap method. Further analysis detected an epistatic interaction for neovascularization between a segment of the unique long (UL) region and a segment of the inverted repeat short (IRS)/unique short (US) region. Ridge regression was used to identify MPWL-associated nonsynonymous SNPs in the UL1 (gL), UL2, UL4, UL49 (VP22), UL50, and ICP4 coding regions. The data provide additional insights into virulence gene and epistatic interaction discovery in HSV-1. IMPORTANCE: Herpes simplex virus 1 (HSV-1) typically causes recurrent cold sores; however, it is also the leading source of infectious blindness in developed countries. Corneal neovascularization is critical for the progression of blinding ocular disease, and weight loss is a measure of infection severity. Previous HSV-1 animal virulence studies have shown that the severity of ocular disease is partially due to the viral strain. In the current study, we used a recently described computational quantitative trait locus (QTL) approach in conjunction with 65 HSV-1 recombinants to identify viral single nucleotide polymorphisms (SNPs) involved in neovascularization and weight loss. Neovascularization SNPs were identified in the ICP4, VHS, UL42, VP11/12, VP13/14, VP22, gG, US3, gD, and gI genes. Further analysis revealed an epistatic interaction between the UL and US regions. MPWL-associated SNPs were detected in the UL1 (gL), UL2, UL4, VP22, UL50, and ICP4 genes. This approach will facilitate future HSV virulence studies.


Assuntos
Neovascularização da Córnea/patologia , Epistasia Genética , Genes Virais , Herpes Simples/patologia , Herpesvirus Humano 1/patogenicidade , Fatores de Virulência/genética , Redução de Peso , Animais , Loci Gênicos , Herpes Simples/virologia , Camundongos , Polimorfismo de Nucleotídeo Único
12.
Ann Surg ; 263(6): 1213-8, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27167563

RESUMO

OBJECTIVES: To evaluate the association between multiple complications and postoperative outcomes and to assess which complications occur together in patients with multiple complications. BACKGROUND: Patients who suffer multiple complications have increased risk of prolonged hospital stay and mortality. However, little is known about what places patients at risk for multiple complications or which complications tend to occur in these patients. METHODS: Surgical patients were identified from the American College of Surgeons National Quality Improvement Program (ACS NSQIP) database from 2005 to 2011. The frequency of postoperative complications was assessed. Patients with less than two complications were compared with patients who had multiple complications using χ and logistic regression analysis. Relationships among postoperative complications were explored by learning a Bayesian network model. RESULTS: The study population consisted of 470,108 general surgery patients. The overall complication rate was 15% with multiple complications in 27,032 (6%) patients. Patients with multiple complications had worse postoperative outcomes (P < 0.001). The strongest predictors for developing multiple complications were admission from chronic care facility or nursing home, dependent functional status, and higher American Society of Anesthesiologist Physical Status classification. In patients with multiple complications, the most common complication was sepsis (42%), followed by failure to wean ventilator (31%), and organ space surgical site infection (27%). We found that severe complications were most strongly associated with development of multiple complications. Using a Bayesian network, we were able to identify how strongly associated specific complications were in patients who developed multiple complications. CONCLUSIONS: Almost half (40%) of patients with complications suffer multiple complications. Patient factors such as frailty and comorbidity strongly predict the development of multiple complications. The results of our Bayesian analysis identify targets for interventions aimed at disrupting the cascade of multiple complications in high-risk patients.


Assuntos
Cirurgia Geral , Complicações Pós-Operatórias/epidemiologia , Adolescente , Adulto , Idoso , Teorema de Bayes , Comorbidade , Bases de Dados Factuais , Feminino , Mortalidade Hospitalar , Humanos , Tempo de Internação/estatística & dados numéricos , Masculino , Pessoa de Meia-Idade , Complicações Pós-Operatórias/mortalidade , Prognóstico , Fatores de Risco , Estados Unidos/epidemiologia
13.
J Virol ; 89(14): 7214-23, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-25926637

RESUMO

UNLABELLED: Herpes simplex virus 1 (HSV-1) causes recurrent mucocutaneous ulcers and is the leading cause of infectious blindness and sporadic encephalitis in the United States. HSV-1 has been shown to be highly recombinogenic; however, to date, there has been no genome-wide analysis of recombination. To address this, we generated 40 HSV-1 recombinants derived from two parental strains, OD4 and CJ994. The 40 OD4-CJ994 HSV-1 recombinants were sequenced using the Illumina sequencing system, and recombination breakpoints were determined for each of the recombinants using the Bootscan program. Breakpoints occurring in the terminal inverted repeats were excluded from analysis to prevent double counting, resulting in a total of 272 breakpoints in the data set. By placing windows around the 272 breakpoints followed by Monte Carlo analysis comparing actual data to simulated data, we identified a recombination bias toward both high GC content and intergenic regions. A Monte Carlo analysis also suggested that recombination did not appear to be responsible for the generation of the spontaneous nucleotide mutations detected following sequencing. Additionally, kernel density estimation analysis across the genome found that the large, inverted repeats comprise a recombination hot spot. IMPORTANCE: Herpes simplex virus 1 (HSV-1) virus is the leading cause of sporadic encephalitis and blinding keratitis in developed countries. HSV-1 has been shown to be highly recombinogenic, and recombination itself appears to be a significant component of genome replication. To date, there has been no genome-wide analysis of recombination. Here we present the findings of the first genome-wide study of recombination performed by generating and sequencing 40 HSV-1 recombinants derived from the OD4 and CJ994 parental strains, followed by bioinformatics analysis. Recombination breakpoints were determined, yielding 272 breakpoints in the full data set. Kernel density analysis determined that the large inverted repeats constitute a recombination hot spot. Additionally, Monte Carlo analyses found biases toward high GC content and intergenic and repetitive regions.


Assuntos
DNA Viral/genética , Herpesvirus Humano 1/genética , Recombinação Genética , Animais , Composição de Bases , Chlorocebus aethiops , DNA Viral/química , Análise de Sequência de DNA , Células Vero
14.
Mol Syst Biol ; 10: 759, 2014 Nov 19.
Artigo em Inglês | MEDLINE | ID: mdl-25411400

RESUMO

Stressed cells coordinate a multi-faceted response spanning many levels of physiology. Yet knowledge of the complete stress-activated regulatory network as well as design principles for signal integration remains incomplete. We developed an experimental and computational approach to integrate available protein interaction data with gene fitness contributions, mutant transcriptome profiles, and phospho-proteome changes in cells responding to salt stress, to infer the salt-responsive signaling network in yeast. The inferred subnetwork presented many novel predictions by implicating new regulators, uncovering unrecognized crosstalk between known pathways, and pointing to previously unknown 'hubs' of signal integration. We exploited these predictions to show that Cdc14 phosphatase is a central hub in the network and that modification of RNA polymerase II coordinates induction of stress-defense genes with reduction of growth-related transcripts. We find that the orthologous human network is enriched for cancer-causing genes, underscoring the importance of the subnetwork's predictions in understanding stress biology.


Assuntos
Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Ciclo Celular/metabolismo , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Regulação Fúngica da Expressão Gênica , Aptidão Genética , Proteínas Tirosina Fosfatases/metabolismo , RNA Polimerase II/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Transdução de Sinais , Cloreto de Sódio/metabolismo , Estresse Fisiológico
15.
J Gen Intern Med ; 30(5): 556-64, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25373831

RESUMO

BACKGROUND: Only 38% of young adults with hypertension have controlled blood pressure. Lifestyle education is a critical initial step for hypertension control. Previous studies have not assessed the type and frequency of lifestyle education in young adults with incident hypertension. OBJECTIVE: The purpose of this study was to determine patient, provider, and visit predictors of documented lifestyle education among young adults with incident hypertension. DESIGN: We conducted a retrospective analysis of manually abstracted electronic health record data. PARTICIPANTS: A random selection of adults 18-39 years old (n = 500), managed by a large academic practice from 2008 to 2011 and who met JNC 7 clinical criteria for incident hypertension, participated in the study. MAIN MEASURES: The primary outcome was the presence of any documented lifestyle education during one year after meeting criteria for incident hypertension. Abstracted topics included documented patient education for exercise, tobacco cessation, alcohol use, stress management/stress reduction, Dietary Approaches to Stop Hypertension (DASH) diet, and weight loss. Clinic visits were categorized based upon a modified established taxonomy to characterize patients' patterns of outpatient service. We excluded patients with previous hypertension diagnoses, previous antihypertensive medications, or pregnancy. Logistic regression was used to identify predictors of documented education. KEY RESULTS: Overall, 55% (n = 275) of patients had documented lifestyle education within one year of incident hypertension. Exercise was the most frequent topic (64%). Young adult males had significantly decreased odds of receiving documented education. Patients with a previous diagnosis of hyperlipidemia or a family history of hypertension or coronary artery disease had increased odds of documented education. Among visit types, chronic disease visits predicted documented lifestyle education, but not acute or other/preventive visits. CONCLUSIONS: Among young adults with incident hypertension, only 55% had documented lifestyle education within one year. Knowledge of patient, provider, and visit predictors of education can help better target the development of interventions to improve young adult health education and hypertension control.


Assuntos
Anti-Hipertensivos/uso terapêutico , Hipertensão/diagnóstico , Hipertensão/terapia , Cooperação do Paciente/estatística & dados numéricos , Educação de Pacientes como Assunto/métodos , Adolescente , Adulto , Fatores Etários , Determinação da Pressão Arterial/métodos , Estudos de Coortes , Intervalos de Confiança , Dieta , Registros Eletrônicos de Saúde , Feminino , Seguimentos , Humanos , Estilo de Vida , Masculino , Razão de Chances , Estudos Retrospectivos , Medição de Risco , Índice de Gravidade de Doença , Fatores Sexuais , Resultado do Tratamento , Adulto Jovem
16.
PLoS Comput Biol ; 10(5): e1003626, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24874113

RESUMO

Systematic, genome-wide loss-of-function experiments can be used to identify host factors that directly or indirectly facilitate or inhibit the replication of a virus in a host cell. We present an approach that combines an integer linear program and a diffusion kernel method to infer the pathways through which those host factors modulate viral replication. The inputs to the method are a set of viral phenotypes observed in single-host-gene mutants and a background network consisting of a variety of host intracellular interactions. The output is an ensemble of subnetworks that provides a consistent explanation for the measured phenotypes, predicts which unassayed host factors modulate the virus, and predicts which host factors are the most direct interfaces with the virus. We infer host-virus interaction subnetworks using data from experiments screening the yeast genome for genes modulating the replication of two RNA viruses. Because a gold-standard network is unavailable, we assess the predicted subnetworks using both computational and qualitative analyses. We conduct a cross-validation experiment in which we predict whether held-aside test genes have an effect on viral replication. Our approach is able to make high-confidence predictions more accurately than several baselines, and about as well as the best baseline, which does not infer mechanistic pathways. We also examine two kinds of predictions made by our method: which host factors are nearest to a direct interaction with a viral component, and which unassayed host genes are likely to be involved in viral replication. Multiple predictions are supported by recent independent experimental data, or are components or functional partners of confirmed relevant complexes or pathways. Integer program code, background network data, and inferred host-virus subnetworks are available at http://www.biostat.wisc.edu/~craven/chasman_host_virus/.


Assuntos
Transformação Celular Viral/fisiologia , Proteínas Fúngicas/metabolismo , Vírus de RNA/fisiologia , Transdução de Sinais/fisiologia , Replicação Viral/fisiologia , Leveduras/metabolismo , Leveduras/virologia , Regulação Fúngica da Expressão Gênica/fisiologia , Genes Virais
17.
PLoS Comput Biol ; 9(9): e1003235, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24068911

RESUMO

Systematic, genome-wide RNA interference (RNAi) analysis is a powerful approach to identify gene functions that support or modulate selected biological processes. An emerging challenge shared with some other genome-wide approaches is that independent RNAi studies often show limited agreement in their lists of implicated genes. To better understand this, we analyzed four genome-wide RNAi studies that identified host genes involved in influenza virus replication. These studies collectively identified and validated the roles of 614 cell genes, but pair-wise overlap among the four gene lists was only 3% to 15% (average 6.7%). However, a number of functional categories were overrepresented in multiple studies. The pair-wise overlap of these enriched-category lists was high, ∼19%, implying more agreement among studies than apparent at the gene level. Probing this further, we found that the gene lists implicated by independent studies were highly connected in interacting networks by independent functional measures such as protein-protein interactions, at rates significantly higher than predicted by chance. We also developed a general, model-based approach to gauge the effects of false-positive and false-negative factors and to estimate, from a limited number of studies, the total number of genes involved in a process. For influenza virus replication, this novel statistical approach estimates the total number of cell genes involved to be ∼2,800. This and multiple other aspects of our experimental and computational results imply that, when following good quality control practices, the low overlap between studies is primarily due to false negatives rather than false-positive gene identifications. These results and methods have implications for and applications to multiple forms of genome-wide analysis.


Assuntos
Genes Virais , Orthomyxoviridae/genética , Interferência de RNA , Replicação Viral/genética , Reações Falso-Negativas , Reações Falso-Positivas , Técnicas de Silenciamento de Genes , Funções Verossimilhança , Orthomyxoviridae/fisiologia
18.
Cell Genom ; 4(1): 100466, 2024 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-38190108

RESUMO

The data-intensive fields of genomics and machine learning (ML) are in an early stage of convergence. Genomics researchers increasingly seek to harness the power of ML methods to extract knowledge from their data; conversely, ML scientists recognize that genomics offers a wealth of large, complex, and well-annotated datasets that can be used as a substrate for developing biologically relevant algorithms and applications. The National Human Genome Research Institute (NHGRI) inquired with researchers working in these two fields to identify common challenges and receive recommendations to better support genomic research efforts using ML approaches. Those included increasing the amount and variety of training datasets by integrating genomic with multiomics, context-specific (e.g., by cell type), and social determinants of health datasets; reducing the inherent biases of training datasets; prioritizing transparency and interpretability of ML methods; and developing privacy-preserving technologies for research participants' data.


Assuntos
Bioética , Genômica , Humanos , Algoritmos , Privacidade , Aprendizado de Máquina
19.
mSystems ; 9(6): e0141523, 2024 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-38819130

RESUMO

Wastewater surveillance has emerged as a crucial public health tool for population-level pathogen surveillance. Supported by funding from the American Rescue Plan Act of 2021, the FDA's genomic epidemiology program, GenomeTrakr, was leveraged to sequence SARS-CoV-2 from wastewater sites across the United States. This initiative required the evaluation, optimization, development, and publication of new methods and analytical tools spanning sample collection through variant analyses. Version-controlled protocols for each step of the process were developed and published on protocols.io. A custom data analysis tool and a publicly accessible dashboard were built to facilitate real-time visualization of the collected data, focusing on the relative abundance of SARS-CoV-2 variants and sub-lineages across different samples and sites throughout the project. From September 2021 through June 2023, a total of 3,389 wastewater samples were collected, with 2,517 undergoing sequencing and submission to NCBI under the umbrella BioProject, PRJNA757291. Sequence data were released with explicit quality control (QC) tags on all sequence records, communicating our confidence in the quality of data. Variant analysis revealed wide circulation of Delta in the fall of 2021 and captured the sweep of Omicron and subsequent diversification of this lineage through the end of the sampling period. This project successfully achieved two important goals for the FDA's GenomeTrakr program: first, contributing timely genomic data for the SARS-CoV-2 pandemic response, and second, establishing both capacity and best practices for culture-independent, population-level environmental surveillance for other pathogens of interest to the FDA. IMPORTANCE: This paper serves two primary objectives. First, it summarizes the genomic and contextual data collected during a Covid-19 pandemic response project, which utilized the FDA's laboratory network, traditionally employed for sequencing foodborne pathogens, for sequencing SARS-CoV-2 from wastewater samples. Second, it outlines best practices for gathering and organizing population-level next generation sequencing (NGS) data collected for culture-free, surveillance of pathogens sourced from environmental samples.


Assuntos
COVID-19 , SARS-CoV-2 , United States Food and Drug Administration , Águas Residuárias , SARS-CoV-2/genética , Estados Unidos/epidemiologia , Águas Residuárias/virologia , COVID-19/epidemiologia , COVID-19/transmissão , COVID-19/prevenção & controle , COVID-19/virologia , Humanos , Pandemias/prevenção & controle , Genoma Viral/genética , Vigilância Epidemiológica Baseada em Águas Residuárias
20.
BMC Bioinformatics ; 13 Suppl 11: S5, 2012 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-22759459

RESUMO

BACKGROUND: Biomedical event extraction has attracted substantial attention as it can assist researchers in understanding the plethora of interactions among genes that are described in publications in molecular biology. While most recent work has focused on abstracts, the BioNLP 2011 shared task evaluated the submitted systems on both abstracts and full papers. In this article, we describe our submission to the shared task which decomposes event extraction into a set of classification tasks that can be learned either independently or jointly using the search-based structured prediction framework. Our intention is to explore how these two learning paradigms compare in the context of the shared task. RESULTS: We report that models learned using search-based structured prediction exceed the accuracy of independently learned classifiers by 8.3 points in F-score, with the gains being more pronounced on the more complex Regulation events (13.23 points). Furthermore, we show how the trade-off between recall and precision can be adjusted in both learning paradigms and that search-based structured prediction achieves better recall at all precision points. Finally, we report on experiments with a simple domain-adaptation method, resulting in the second-best performance achieved by a single system. CONCLUSIONS: We demonstrate that joint inference using the search-based structured prediction framework can achieve better performance than independently learned classifiers, thus demonstrating the potential of this learning paradigm for event extraction and other similarly complex information-extraction tasks.


Assuntos
Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Algoritmos , Armazenamento e Recuperação da Informação/economia , Armazenamento e Recuperação da Informação/métodos , Publicações Periódicas como Assunto
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA