Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
1.
Development ; 147(24)2020 12 21.
Article in English | MEDLINE | ID: mdl-33144399

ABSTRACT

Sense organs acquire their distinctive shapes concomitantly with the differentiation of sensory cells and neurons necessary for their function. Although our understanding of the mechanisms controlling morphogenesis and neurogenesis in these structures has grown, how these processes are coordinated remains largely unexplored. Neurogenesis in the zebrafish olfactory epithelium requires the bHLH proneural transcription factor Neurogenin 1 (Neurog1). To address whether Neurog1 also controls morphogenesis, we analysed the migratory behaviour of early olfactory neural progenitors in neurog1 mutant embryos. Our results indicate that the oriented movements of these progenitors are disrupted in this context. Morphogenesis is similarly affected by mutations in the chemokine receptor gene, cxcr4b, suggesting it is a potential Neurog1 target gene. We find that Neurog1 directly regulates cxcr4b through an E-box cluster located just upstream of the cxcr4b transcription start site. Our results suggest that proneural transcription factors, such as Neurog1, directly couple distinct aspects of nervous system development.


Subject(s)
Basic Helix-Loop-Helix Transcription Factors/genetics , Morphogenesis/genetics , Nerve Tissue Proteins/genetics , Neurogenesis/genetics , Olfactory Mucosa/growth & development , Receptors, CXCR4/genetics , Zebrafish Proteins/genetics , Animals , E-Box Elements/genetics , Embryo, Nonmammalian , Embryonic Development/genetics , Gene Expression Regulation, Developmental/genetics , Mutation/genetics , Neurons/metabolism , Transcription Initiation Site , Zebrafish/genetics , Zebrafish/growth & development
2.
Nucleic Acids Res ; 46(18): 9299-9308, 2018 10 12.
Article in English | MEDLINE | ID: mdl-30137416

ABSTRACT

Genetic variation in cis-regulatory elements is thought to be a major driving force in morphological and physiological changes. However, identifying transcription factor binding events that code for complex traits remains a challenge, motivating novel means of detecting putatively important binding events. Using a curated set of 1154 high-quality transcription factor motifs, we demonstrate that independently eroded binding sites are enriched for independently lost traits in three distinct pairs of placental mammals. We show that these independently eroded events pinpoint the loss of hindlimbs in dolphin and manatee, degradation of vision in naked mole-rat and star-nosed mole, and the loss of external testes in white rhinoceros and Weddell seal. We additionally show that our method may also be utilized with more than two species. Our study exhibits a novel methodology to detect cis-regulatory mutations which help explain a portion of the molecular mechanism underlying complex trait formation and loss.


Subject(s)
Evolution, Molecular , Nucleotide Motifs/genetics , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/genetics , Vision, Ocular/genetics , Animals , Binding Sites/genetics , Dolphins/genetics , Dolphins/physiology , Hindlimb/physiology , Male , Mammals/genetics , Mammals/physiology , Mole Rats/genetics , Mole Rats/physiology , Protein Binding/genetics , Testis/physiology , Trichechus/genetics , Trichechus/physiology , Vision, Ocular/physiology
3.
Genet Med ; 21(2): 464-470, 2019 02.
Article in English | MEDLINE | ID: mdl-29997393

ABSTRACT

PURPOSE: Exome sequencing and diagnosis is beginning to spread across the medical establishment. The most time-consuming part of genome-based diagnosis is the manual step of matching the potentially long list of patient candidate genes to patient phenotypes to identify the causative disease. METHODS: We introduce Phrank (for phenotype ranking), an information theory-inspired method that utilizes a Bayesian network to prioritize candidate diseases or genes, as a stand-alone module that can be run with any underlying knowledgebase and any variant filtering scheme. RESULTS: Phrank outperforms existing methods at ranking the causative disease or gene when applied to 169 real patient exomes with Mendelian diagnoses. Phrank's greatest improvement is in disease space, where across all 169 patients it ranks only 3 diseases on average ahead of the true diagnosis, whereas Phenomizer ranks 32 diseases ahead of the causal one. CONCLUSIONS: Using Phrank to rank all patient candidate genes or diseases, as they start working through a new case, will save the busy clinician much time in deriving a genetic diagnosis.


Subject(s)
Diagnosis, Computer-Assisted , Genetic Diseases, Inborn/diagnosis , Genetic Testing , Phenotype , Software , Benchmarking , Computational Biology/methods , Exome , Humans , Knowledge Bases , Pathology, Molecular/methods
4.
Am J Med Genet A ; 176(4): 1030-1036, 2018 04.
Article in English | MEDLINE | ID: mdl-29575631

ABSTRACT

Robinow syndrome (RS) is a well-recognized Mendelian disorder known to demonstrate both autosomal dominant and autosomal recessive inheritance. Typical manifestations include short stature, characteristic facies, and skeletal anomalies. Recessive inheritance has been associated with mutations in ROR2 while dominant inheritance has been observed for mutations in WNT5A, DVL1, and DVL3. Through trio whole genome sequencing, we identified a homozygous frameshifting single nucleotide deletion in WNT5A in a previously reported, deceased infant with a unique constellation of features comprising a 46,XY disorder of sex development with multiple congenital malformations including congenital diaphragmatic hernia, ambiguous genitalia, dysmorphic facies, shortened long bones, adactyly, and ventricular septal defect. The parents, who are both heterozygous for the deletion, appear clinically unaffected. In conjunction with published observations of Wnt5a double knockout mice, we provide evidence for the possibility of autosomal recessive inheritance in association with WNT5A loss-of-function mutations in RS.


Subject(s)
Alleles , Craniofacial Abnormalities/diagnosis , Craniofacial Abnormalities/genetics , Dwarfism/diagnosis , Dwarfism/genetics , Limb Deformities, Congenital/diagnosis , Limb Deformities, Congenital/genetics , Loss of Function Mutation , Phenotype , Urogenital Abnormalities/diagnosis , Urogenital Abnormalities/genetics , Wnt-5a Protein/genetics , Animals , Disease Models, Animal , Female , Frameshift Mutation , Gene Frequency , Genetic Association Studies , Homozygote , Humans , Infant , Mice , Mice, Knockout , Point Mutation , Severity of Illness Index , Symptom Assessment , Ultrasonography , Whole Genome Sequencing
5.
Brain ; 140(10): 2610-2622, 2017 Oct 01.
Article in English | MEDLINE | ID: mdl-28969385

ABSTRACT

Mutations of genes within the phosphatidylinositol-3-kinase (PI3K)-AKT-MTOR pathway are well known causes of brain overgrowth (megalencephaly) as well as segmental cortical dysplasia (such as hemimegalencephaly, focal cortical dysplasia and polymicrogyria). Mutations of the AKT3 gene have been reported in a few individuals with brain malformations, to date. Therefore, our understanding regarding the clinical and molecular spectrum associated with mutations of this critical gene is limited, with no clear genotype-phenotype correlations. We sought to further delineate this spectrum, study levels of mosaicism and identify genotype-phenotype correlations of AKT3-related disorders. We performed targeted sequencing of AKT3 on individuals with these phenotypes by molecular inversion probes and/or Sanger sequencing to determine the type and level of mosaicism of mutations. We analysed all clinical and brain imaging data of mutation-positive individuals including neuropathological analysis in one instance. We performed ex vivo kinase assays on AKT3 engineered with the patient mutations and examined the phospholipid binding profile of pleckstrin homology domain localizing mutations. We identified 14 new individuals with AKT3 mutations with several phenotypes dependent on the type of mutation and level of mosaicism. Our comprehensive clinical characterization, and review of all previously published patients, broadly segregates individuals with AKT3 mutations into two groups: patients with highly asymmetric cortical dysplasia caused by the common p.E17K mutation, and patients with constitutional AKT3 mutations exhibiting more variable phenotypes including bilateral cortical malformations, polymicrogyria, periventricular nodular heterotopia and diffuse megalencephaly without cortical dysplasia. All mutations increased kinase activity, and pleckstrin homology domain mutants exhibited enhanced phospholipid binding. Overall, our study shows that activating mutations of the critical AKT3 gene are associated with a wide spectrum of brain involvement ranging from focal or segmental brain malformations (such as hemimegalencephaly and polymicrogyria) predominantly due to mosaic AKT3 mutations, to diffuse bilateral cortical malformations, megalencephaly and heterotopia due to constitutional AKT3 mutations. We also provide the first detailed neuropathological examination of a child with extreme megalencephaly due to a constitutional AKT3 mutation. This child has one of the largest documented paediatric brain sizes, to our knowledge. Finally, our data show that constitutional AKT3 mutations are associated with megalencephaly, with or without autism, similar to PTEN-related disorders. Recognition of this broad clinical and molecular spectrum of AKT3 mutations is important for providing early diagnosis and appropriate management of affected individuals, and will facilitate targeted design of future human clinical trials using PI3K-AKT pathway inhibitors.


Subject(s)
Developmental Disabilities/genetics , Megalencephaly/genetics , Mutation/genetics , Proto-Oncogene Proteins c-akt/genetics , Brain/diagnostic imaging , Child , Developmental Disabilities/diagnostic imaging , Developmental Disabilities/pathology , Female , Genetic Association Studies , HEK293 Cells , Humans , Immunoprecipitation , Magnetic Resonance Imaging , Male , Megalencephaly/diagnostic imaging , Megalencephaly/pathology , Mutagenesis, Site-Directed/methods , Phosphatidylinositols/metabolism , Transfection
6.
Genome Res ; 24(9): 1504-16, 2014 Sep.
Article in English | MEDLINE | ID: mdl-24963153

ABSTRACT

Microbiota regulate intestinal physiology by modifying host gene expression along the length of the intestine, but the underlying regulatory mechanisms remain unresolved. Transcriptional specificity occurs through interactions between transcription factors (TFs) and cis-regulatory regions (CRRs) characterized by nucleosome-depleted accessible chromatin. We profiled transcriptome and accessible chromatin landscapes in intestinal epithelial cells (IECs) from mice reared in the presence or absence of microbiota. We show that regional differences in gene transcription along the intestinal tract were accompanied by major alterations in chromatin accessibility. Surprisingly, we discovered that microbiota modify host gene transcription in IECs without significantly impacting the accessible chromatin landscape. Instead, microbiota regulation of host gene transcription might be achieved by differential expression of specific TFs and enrichment of their binding sites in nucleosome-depleted CRRs near target genes. Our results suggest that the chromatin landscape in IECs is preprogrammed by the host in a region-specific manner to permit responses to microbiota through binding of open CRRs by specific TFs.


Subject(s)
Chromatin Assembly and Disassembly , Intestinal Mucosa/metabolism , Microbiota , Transcription, Genetic , Animals , Intestinal Mucosa/microbiology , Mice , Mice, Inbred C57BL , Organ Specificity , Promoter Regions, Genetic , Transcription Factors/genetics , Transcription Factors/metabolism , Transcriptome
7.
Genet Med ; 19(2): 209-214, 2017 02.
Article in English | MEDLINE | ID: mdl-27441994

ABSTRACT

PURPOSE: Clinical exome sequencing is nondiagnostic for about 75% of patients evaluated for a possible Mendelian disorder. We examined the ability of systematic reevaluation of exome data to establish additional diagnoses. METHODS: The exome and phenotypic data of 40 individuals with previously nondiagnostic clinical exomes were reanalyzed with current software and literature. RESULTS: A definitive diagnosis was identified for 4 of 40 participants (10%). In these cases the causative variant is de novo and in a relevant autosomal-dominant disease gene. The literature to tie the causative genes to the participants' phenotypes was weak, nonexistent, or not readily located at the time of the initial clinical exome reports. At the time of diagnosis by reanalysis, the supporting literature was 1 to 3 years old. CONCLUSION: Approximately 250 gene-disease and 9,200 variant-disease associations are reported annually. This increase in information necessitates regular reevaluation of nondiagnostic exomes. To be practical, systematic reanalysis requires further automation and more up-to-date variant databases. To maximize the diagnostic yield of exome sequencing, providers should periodically request reanalysis of nondiagnostic exomes. Accordingly, policies regarding reanalysis should be weighed in combination with factors such as cost and turnaround time when selecting a clinical exome laboratory.Genet Med 19 2, 209-214.


Subject(s)
Exome Sequencing/standards , Genetic Diseases, Inborn/diagnosis , Genetic Diseases, Inborn/genetics , Genetics, Medical/standards , Child , Child, Preschool , Exome/genetics , Female , Genetic Diseases, Inborn/pathology , Humans , Infant , Male , Mutation , Pedigree , Sequence Analysis, DNA
8.
PLoS Comput Biol ; 12(2): e1004711, 2016 Feb.
Article in English | MEDLINE | ID: mdl-26845687

ABSTRACT

Although many human diseases have a genetic component involving many loci, the majority of studies are statistically underpowered to isolate the many contributing variants, raising the question of the existence of alternate processes to identify disease mutations. To address this question, we collect ancestral transcription factor binding sites disrupted by an individual's variants and then look for their most significant congregation next to a group of functionally related genes. Strikingly, when the method is applied to five different full human genomes, the top enriched function for each is invariably reflective of their very different medical histories. For example, our method implicates "abnormal cardiac output" for a patient with a longstanding family history of heart disease, "decreased circulating sodium level" for an individual with hypertension, and other biologically appealing links for medical histories spanning narcolepsy to axonal neuropathy. Our results suggest that erosion of gene regulation by mutation load significantly contributes to observed heritable phenotypes that manifest in the medical history. The test we developed exposes a hitherto hidden layer of personal variants that promise to shed new light on human disease penetrance, expressivity and the sensitivity with which we can detect them.


Subject(s)
Binding Sites/genetics , Genome, Human/genetics , Genomics/methods , Models, Genetic , HLA-DQ beta-Chains/genetics , Humans , Linkage Disequilibrium , Models, Statistical , Mutation , Narcolepsy/genetics , Precision Medicine
9.
Genome Res ; 23(5): 889-904, 2013 May.
Article in English | MEDLINE | ID: mdl-23382538

ABSTRACT

The human genome encodes 1500-2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.


Subject(s)
Binding Sites/genetics , Computational Biology , Software , Transcription Factors/genetics , Algorithms , Animals , Base Sequence , DNA-Binding Proteins/genetics , Genome , Humans , Mice , Protein Binding/genetics , Regulatory Sequences, Nucleic Acid
10.
PLoS Genet ; 9(8): e1003728, 2013 Aug.
Article in English | MEDLINE | ID: mdl-24009522

ABSTRACT

Genetic studies have identified a core set of transcription factors and target genes that control the development of the neocortex, the region of the human brain responsible for higher cognition. The specific regulatory interactions between these factors, many key upstream and downstream genes, and the enhancers that mediate all these interactions remain mostly uncharacterized. We perform p300 ChIP-seq to identify over 6,600 candidate enhancers active in the dorsal cerebral wall of embryonic day 14.5 (E14.5) mice. Over 95% of the peaks we measure are conserved to human. Eight of ten (80%) candidates tested using mouse transgenesis drive activity in restricted laminar patterns within the neocortex. GREAT based computational analysis reveals highly significant correlation with genes expressed at E14.5 in key areas for neocortex development, and allows the grouping of enhancers by known biological functions and pathways for further studies. We find that multiple genes are flanked by dozens of candidate enhancers each, including well-known key neocortical genes as well as suspected and novel genes. Nearly a quarter of our candidate enhancers are conserved well beyond mammals. Human and zebrafish regions orthologous to our candidate enhancers are shown to most often function in other aspects of central nervous system development. Finally, we find strong evidence that specific interspersed repeat families have contributed potentially key developmental enhancers via co-option. Our analysis expands the methodologies available for extracting the richness of information found in genome-wide functional maps.


Subject(s)
Enhancer Elements, Genetic , Evolution, Molecular , Neocortex/growth & development , Regulatory Sequences, Nucleic Acid/genetics , Animals , Base Sequence , Conserved Sequence/genetics , Gene Expression Regulation, Developmental , Humans , Mice , Neocortex/metabolism , Oligonucleotide Array Sequence Analysis , Promoter Regions, Genetic , Transcription Factors/genetics , Zebrafish/genetics , Zebrafish/growth & development
11.
BMC Bioinformatics ; 16: 172, 2015 May 25.
Article in English | MEDLINE | ID: mdl-26003204

ABSTRACT

BACKGROUND: High-throughput technologies such as flow and mass cytometry have the potential to illuminate cellular networks. However, analyzing the data produced by these technologies is challenging. Visualization is needed to help researchers explore this data. RESULTS: We developed a web-based software program, NetworkPainter, to enable researchers to analyze dynamic cytometry data in the context of pathway diagrams. NetworkPainter provides researchers a graphical interface to draw and "paint" pathway diagrams with experimental data, producing animated diagrams which display the activity of each network node at each time point. CONCLUSION: NetworkPainter enables researchers to more fully explore multi-parameter, dynamical cytometry data.


Subject(s)
Computational Biology/methods , Flow Cytometry/instrumentation , Internet , Leukocytes, Mononuclear/metabolism , Signal Transduction , Software , Computer Simulation , Cytoplasm/metabolism , Database Management Systems , Databases, Factual , Flow Cytometry/standards , Humans
12.
Nucleic Acids Res ; 41(15): e151, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23814184

ABSTRACT

Many important model organisms for biomedical and evolutionary research have sequenced genomes, but occupy a phylogenetically isolated position, evolutionarily distant from other sequenced genomes. This phylogenetic isolation is exemplified for zebrafish, a vertebrate model for cis-regulation, development and human disease, whose evolutionary distance to all other currently sequenced fish exceeds the distance between human and chicken. Such large distances make it difficult to align genomes and use them for comparative analysis beyond gene-focused questions. In particular, detecting conserved non-genic elements (CNEs) as promising cis-regulatory elements with biological importance is challenging. Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs. Our approach integrates highly sensitive and quality-controlled local alignments and uses alignment transitivity and ancestral reconstruction to bridge large evolutionary distances. We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets. Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse. Our zebrafish CNEs (http://zebrafish.stanford.edu) are highly enriched in known enhancers and extend existing experimental (ChIP-Seq) sets. The same framework can now be applied to the isolated genomes of frog, amphioxus, Caenorhabditis elegans and many others.


Subject(s)
Computational Biology/methods , Conserved Sequence , Phylogeny , Sequence Analysis, DNA/methods , Zebrafish/genetics , Animals , Base Sequence , Evolution, Molecular , Genomics/methods , Internet , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid , Sensitivity and Specificity , Sequence Alignment , Synteny , Zebrafish/classification
13.
bioRxiv ; 2024 Jan 08.
Article in English | MEDLINE | ID: mdl-38260620

ABSTRACT

Alzheimer's disease (AD) and related dementias (ADRD) is a complex disease with multiple pathophysiological drivers that determine clinical symptomology and disease progression. These diseases develop insidiously over time, through many pathways and disease mechanisms and continue to have a huge societal impact for affected individuals and their families. While emerging blood-based biomarkers, such as plasma p-tau181 and p-tau217, accurately detect Alzheimer neuropthology and are associated with faster cognitive decline, the full extension of plasma proteomic changes in ADRD remains unknown. Earlier detection and better classification of the different subtypes may provide opportunities for earlier, more targeted interventions, and perhaps a higher likelihood of successful therapeutic development. In this study, we aim to leverage unbiased mass spectrometry proteomics to identify novel, blood-based biomarkers associated with cognitive decline. 1,786 plasma samples from 1,005 patients were collected over 12 years from partcipants in the Massachusetts Alzheimer's Disease Research Center Longitudinal Cohort Study. Patient metadata includes demographics, final diagnoses, and clinical dementia rating (CDR) scores taken concurrently. The Proteograph™ Product Suite (Seer, Inc.) and liquid-chromatography mass-spectrometry (LC-MS) analysis were used to process the plasma samples in this cohort and generate unbiased proteomics data. Data-independent acquisition (DIA) mass spectrometry results yielded 36,259 peptides and 4,007 protein groups. Linear mixed effects models revealed 138 differentially abundant proteins between AD and healthy controls. Machine learning classification models for AD diagnosis identified potential candidate biomarkers including MBP, BGLAP, and APoD. Cox regression models were created to determine the association of proteins with disease progression and suggest CLNS1A, CRISPLD2, and GOLPH3 as targets of further investigation as potential biomarkers. The Proteograph workflow provided deep, unbiased coverage of the plasma proteome at a speed that enabled a cohort study of almost 1,800 samples, which is the largest, deep, unbiased proteomics study of ADRD conducted to date.

14.
Nat Commun ; 15(1): 989, 2024 Feb 02.
Article in English | MEDLINE | ID: mdl-38307861

ABSTRACT

Proteogenomics studies generate hypotheses on protein function and provide genetic evidence for drug target prioritization. Most previous work has been conducted using affinity-based proteomics approaches. These technologies face challenges, such as uncertainty regarding target identity, non-specific binding, and handling of variants that affect epitope affinity binding. Mass spectrometry-based proteomics can overcome some of these challenges. Here we report a pQTL study using the Proteograph™ Product Suite workflow (Seer, Inc.) where we quantify over 18,000 unique peptides from nearly 3000 proteins in more than 320 blood samples from a multi-ethnic cohort in a bottom-up, peptide-centric, mass spectrometry-based proteomics approach. We identify 184 protein-altering variants in 137 genes that are significantly associated with their corresponding variant peptides, confirming target specificity of co-associated affinity binders, identifying putatively causal cis-encoded proteins and providing experimental evidence for their presence in blood, including proteins that may be inaccessible to affinity-based proteomics.


Subject(s)
Proteogenomics , Proteomics , Humans , Proteomics/methods , Mass Spectrometry/methods , Proteins/analysis , Peptides/analysis , Proteogenomics/methods , Mutant Proteins
15.
bioRxiv ; 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38853852

ABSTRACT

Genome-wide association studies (GWAS) with proteomics are essential tools for drug discovery. To date, most studies have used affinity proteomics platforms, which have limited discovery to protein panels covered by the available affinity binders. Furthermore, it is not clear to which extent protein epitope changing variants interfere with the detection of protein quantitative trait loci (pQTLs). Mass spectrometry-based (MS) proteomics can overcome some of these limitations. Here we report a GWAS using the MS-based Seer Proteograph™ platform with blood samples from a discovery cohort of 1,260 American participants and a replication in 325 individuals from Asia, with diverse ethnic backgrounds. We analysed 1,980 proteins quantified in at least 80% of the samples, out of 5,753 proteins quantified across the discovery cohort. We identified 252 and replicated 90 pQTLs, where 30 of the replicated pQTLs have not been reported before. We further investigated 200 of the strongest associated cis-pQTLs previously identified using the SOMAscan and the Olink platforms and found that up to one third of the affinity proteomics pQTLs may be affected by epitope effects, while another third were confirmed by MS proteomics to be consistent with the hypothesis that genetic variants induce changes in protein expression. The present study demonstrates the complementarity of the different proteomics approaches and reports pQTLs not accessible to affinity proteomics, suggesting that many more pQTLs remain to be discovered using MS-based platforms.

16.
BMJ Open ; 12(10): e049657, 2022 10 12.
Article in English | MEDLINE | ID: mdl-36223959

ABSTRACT

OBJECTIVES: The enormous toll of the COVID-19 pandemic has heightened the urgency of collecting and analysing population-scale datasets in real time to monitor and better understand the evolving pandemic. The objectives of this study were to examine the relationship of risk factors to COVID-19 susceptibility and severity and to develop risk models to accurately predict COVID-19 outcomes using rapidly obtained self-reported data. DESIGN: A cross-sectional study. SETTING: AncestryDNA customers in the USA who consented to research. PARTICIPANTS: The AncestryDNA COVID-19 Study collected self-reported survey data on symptoms, outcomes, risk factors and exposures for over 563 000 adult individuals in the USA in just under 4 months, including over 4700 COVID-19 cases as measured by a self-reported positive test. RESULTS: We replicated previously reported associations between several risk factors and COVID-19 susceptibility and severity outcomes, and additionally found that differences in known exposures accounted for many of the susceptibility associations. A notable exception was elevated susceptibility for men even after adjusting for known exposures and age (adjusted OR=1.36, 95% CI=1.19 to 1.55). We also demonstrated that self-reported data can be used to build accurate risk models to predict individualised COVID-19 susceptibility (area under the curve (AUC)=0.84) and severity outcomes including hospitalisation and critical illness (AUC=0.87 and 0.90, respectively). The risk models achieved robust discriminative performance across different age, sex and genetic ancestry groups within the study. CONCLUSIONS: The results highlight the value of self-reported epidemiological data to rapidly provide public health insights into the evolving COVID-19 pandemic.


Subject(s)
COVID-19 , Adult , COVID-19/epidemiology , Cross-Sectional Studies , Humans , Male , Pandemics , Risk Factors , SARS-CoV-2
17.
Nat Genet ; 54(4): 374-381, 2022 04.
Article in English | MEDLINE | ID: mdl-35410379

ABSTRACT

Multiple COVID-19 genome-wide association studies (GWASs) have identified reproducible genetic associations indicating that there is a genetic component to susceptibility and severity risk. To complement these studies, we collected deep coronavirus disease 2019 (COVID-19) phenotype data from a survey of 736,723 AncestryDNA research participants. With these data, we defined eight phenotypes related to COVID-19 outcomes: four phenotypes that align with previously studied COVID-19 definitions and four 'expanded' phenotypes that focus on susceptibility given exposure, mild clinical manifestations and an aggregate score of symptom severity. We performed a replication analysis of 12 previously reported COVID-19 genetic associations with all eight phenotypes in a trans-ancestry meta-analysis of AncestryDNA research participants. In this analysis, we show distinct patterns of association at the 12 loci with the eight outcomes that we assessed. We also performed a genome-wide discovery analysis of all eight phenotypes, which did not yield new genome-wide significant loci but did suggest that three of the four 'expanded' COVID-19 phenotypes have enhanced power to capture protective genetic associations relative to the previously studied phenotypes. Thus, we conclude that continued large-scale ascertainment of deep COVID-19 phenotype data would likely represent a boon for COVID-19 therapeutic target identification.


Subject(s)
COVID-19 , Genome-Wide Association Study , COVID-19/genetics , Genetic Predisposition to Disease , Humans , Phenotype , Polymorphism, Single Nucleotide/genetics
18.
Nat Genet ; 54(4): 382-392, 2022 04.
Article in English | MEDLINE | ID: mdl-35241825

ABSTRACT

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) enters human host cells via angiotensin-converting enzyme 2 (ACE2) and causes coronavirus disease 2019 (COVID-19). Here, through a genome-wide association study, we identify a variant (rs190509934, minor allele frequency 0.2-2%) that downregulates ACE2 expression by 37% (P = 2.7 × 10-8) and reduces the risk of SARS-CoV-2 infection by 40% (odds ratio = 0.60, P = 4.5 × 10-13), providing human genetic evidence that ACE2 expression levels influence COVID-19 risk. We also replicate the associations of six previously reported risk variants, of which four were further associated with worse outcomes in individuals infected with the virus (in/near LZTFL1, MHC, DPP9 and IFNAR2). Lastly, we show that common variants define a risk score that is strongly associated with severe disease among cases and modestly improves the prediction of disease severity relative to demographic and clinical factors alone.


Subject(s)
COVID-19 , Angiotensin-Converting Enzyme 2/genetics , COVID-19/genetics , Genome-Wide Association Study , Humans , Risk Factors , SARS-CoV-2/genetics
19.
Sci Transl Med ; 12(544)2020 05 20.
Article in English | MEDLINE | ID: mdl-32434849

ABSTRACT

The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient's disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process. AMELIE parses all 29 million PubMed abstracts and downloads and further parses hundreds of thousands of full-text articles in search of information supporting the causality and associated phenotypes of most published genetic variants. AMELIE then prioritizes patient candidate variants for their likelihood of explaining any patient's given set of phenotypes. Diagnosis of singleton patients (without relatives' exomes) is the most time-consuming scenario, and AMELIE ranked the causative gene at the very top for 66% of 215 diagnosed singleton Mendelian patients from the Deciphering Developmental Disorders project. Evaluating only the top 11 AMELIE-scored genes of 127 (median) candidate genes per patient resulted in a rapid diagnosis in more than 90% of cases. AMELIE-based evaluation of all cases was 3 to 19 times more efficient than hand-curated database-based approaches. We replicated these results on a retrospective cohort of clinical cases from Stanford Children's Health and the Manton Center for Orphan Disease Research. An analysis web portal with our most recent update, programmatic interface, and code is available at AMELIE.stanford.edu.


Subject(s)
Exome , Child , Genotype , Humans , Phenotype , Probability , Retrospective Studies
20.
Bioinformatics ; 23(16): 2196-7, 2007 Aug 15.
Article in English | MEDLINE | ID: mdl-17545178

ABSTRACT

UNLABELLED: The BioText Search Engine is a freely available Web-based application that provides biologists with new ways to access the scientific literature. One novel feature is the ability to search and browse article figures and their captions. A grid view juxtaposes many different figures associated with the same keywords, providing new insight into the literature. An abstract/title search and list view shows at a glance many of the figures associated with each article. The interface is carefully designed according to usability principles and techniques. The search engine is a work in progress, and more functionality will be added over time. AVAILABILITY: http://biosearch.berkeley.edu.


Subject(s)
Abstracting and Indexing/methods , Artificial Intelligence , Biology/methods , Database Management Systems , Databases, Bibliographic , Information Storage and Retrieval/methods , Natural Language Processing
SELECTION OF CITATIONS
SEARCH DETAIL