RESUMO
MOTIVATION: Cellular identity and behavior is controlled by complex gene regulatory networks. Transcription factors (TFs) bind to specific DNA sequences to regulate the transcription of their target genes. On the basis of these TF motifs in cis-regulatory elements we can model the influence of TFs on gene expression. In such models of TF motif activity the data is usually modeled assuming a linear relationship between the motif activity and the gene expression level. A commonly used method to model motif influence is based on Ridge Regression. One important assumption of linear regression is the independence between samples. However, if samples are generated from the same cell line, tissue, or other biological source, this assumption may be invalid. This same assumption of independence is also applied to different yet similar experimental conditions, which may also be inappropriate. In theory, the independence assumption between samples could lead to loss in signal detection. Here we investigate whether a Bayesian model that allows for correlations results in more accurate inference of motif activities. RESULTS: We extend the Ridge Regression to a Bayesian Linear Mixed Model, which allows us to model dependence between different samples. In a simulation study, we investigate the differences between the two model assumptions. We show that our Bayesian Linear Mixed Model implementation outperforms Ridge Regression in a simulation scenario where the noise, which is the signal that can not be explained by TF motifs, is uncorrelated. However, we demonstrate that there is no such gain in performance if the noise has a similar covariance structure over samples as the signal that can be explained by motifs. We give a mathematical explanation to why this is the case. Using four representative real datasets we show that at most â¼â<40% of the signal is explained by motifs using the linear model. With these data there is no advantage to using the Bayesian Linear Mixed Model, due to the similarity of the covariance structure. AVAILABILITY & IMPLEMENTATION: The project implementation is available at https://github.com/Sim19/SimGEXPwMotifs.
Assuntos
Motivos de Aminoácidos/genética , Teorema de Bayes , Regulação da Expressão Gênica/genética , Fatores de Transcrição/genética , Sequenciamento de Cromatina por Imunoprecipitação/métodos , Simulação por Computador , Redes Reguladoras de Genes , Modelos Lineares , Elementos Reguladores de Transcrição/genética , Fatores de Transcrição/químicaRESUMO
BACKGROUND: Estimating the genetic component of a complex phenotype is a complicated problem, mainly because there are many allele effects to estimate from a limited number of phenotypes. In spite of this difficulty, linear methods with variable selection have been able to give good predictions of additive effects of individuals. However, prediction of non-additive genetic effects is challenging with the usual prediction methods. In machine learning, non-additive relations between inputs can be modeled with neural networks. We developed a novel method (NetSparse) that uses Bayesian neural networks with variable selection for the prediction of genotypic values of individuals, including non-additive genetic effects. RESULTS: We simulated several populations with different phenotypic models and compared NetSparse to genomic best linear unbiased prediction (GBLUP), BayesB, their dominance variants, and an additive by additive method. We found that when the number of QTL was relatively small (10 or 100), NetSparse had 2 to 28 percentage points higher accuracy than the reference methods. For scenarios that included dominance or epistatic effects, NetSparse had 0.0 to 3.9 percentage points higher accuracy for predicting phenotypes than the reference methods, except in scenarios with extreme overdominance, for which reference methods that explicitly model dominance had 6 percentage points higher accuracy than NetSparse. CONCLUSIONS: Bayesian neural networks with variable selection are promising for prediction of the genetic component of complex traits in animal breeding, and their performance is robust across different genetic models. However, their large computational costs can hinder their use in practice.
Assuntos
Previsões/métodos , Herança Multifatorial/genética , Fenótipo , Algoritmos , Alelos , Animais , Teorema de Bayes , Frequência do Gene/genética , Genética Populacional/métodos , Genômica/métodos , Genótipo , Humanos , Modelos Genéticos , Redes Neurais de Computação , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Seleção Genética/genéticaRESUMO
Remodeling of chromatin accessibility is necessary for successful reprogramming of fibroblasts to neurons. However, it is still not fully known which transcription factors can induce a neuronal chromatin accessibility profile when overexpressed in fibroblasts. To identify such transcription factors, we used ATAC-sequencing to generate differential chromatin accessibility profiles between human fibroblasts and iNeurons, an in vitro neuronal model system obtained by overexpression of Neurog2 in induced pluripotent stem cells (iPSCs). We found that the ONECUT transcription factor sequence motif was strongly associated with differential chromatin accessibility between iNeurons and fibroblasts. All three ONECUT transcription factors associated with this motif (ONECUT1, ONECUT2 and ONECUT3) induced a neuron-like morphology and expression of neuronal genes within two days of overexpression in fibroblasts. We observed widespread remodeling of chromatin accessibility; in particular, we found that chromatin regions that contain the ONECUT motif were in- or lowly accessible in fibroblasts and became accessible after the overexpression of ONECUT1, ONECUT2 or ONECUT3. There was substantial overlap with iNeurons, still, many regions that gained accessibility following ONECUT overexpression were not accessible in iNeurons. Our study highlights both the potential and challenges of ONECUT-based direct neuronal reprogramming.
Assuntos
Reprogramação Celular , Cromatina/genética , Regulação da Expressão Gênica , Células-Tronco Pluripotentes Induzidas/metabolismo , Neurônios/metabolismo , Fatores de Transcrição Onecut/genética , Diferenciação Celular , Linhagem Celular , Cromatina/metabolismo , Fibroblastos/citologia , Fibroblastos/metabolismo , Perfilação da Expressão Gênica , Ontologia Genética , Fator 6 Nuclear de Hepatócito/genética , Fator 6 Nuclear de Hepatócito/metabolismo , Proteínas de Homeodomínio , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Neurônios/citologia , Fatores de Transcrição Onecut/metabolismo , Fatores de TranscriçãoRESUMO
Cell-based small molecule screening is an effective strategy leading to new medicines. Scientists in the pharmaceutical industry as well as in academia have made tremendous progress in developing both large-scale and smaller-scale screening assays. However, an accessible and universal technology for measuring large numbers of molecular and cellular phenotypes in many samples in parallel is not available. Here we present the immuno-detection by sequencing (ID-seq) technology that combines antibody-based protein detection and DNA-sequencing via DNA-tagged antibodies. We use ID-seq to simultaneously measure 70 (phospho-)proteins in primary human epidermal stem cells to screen the effects of ~300 kinase inhibitor probes to characterise the role of 225 kinases. The results show an association between decreased mTOR signalling and increased differentiation and uncover 13 kinases potentially regulating epidermal renewal through distinct mechanisms. Taken together, our work establishes ID-seq as a flexible solution for large-scale high-dimensional phenotyping in fixed cell populations.
Assuntos
Anticorpos/metabolismo , Imunoensaio/métodos , Proteoma/metabolismo , Proteômica/métodos , Análise de Sequência de DNA/métodos , Anticorpos/imunologia , Diferenciação Celular/genética , Células Cultivadas , Células Epidérmicas/citologia , Perfilação da Expressão Gênica , Humanos , Queratinócitos/citologia , Queratinócitos/metabolismo , Fenótipo , Proteoma/genética , Proteoma/imunologia , Transdução de Sinais/genética , Células-Tronco/metabolismoRESUMO
Whole-transcriptome or RNA sequencing (RNA-Seq) is a powerful and versatile tool for functional analysis of different types of RNA molecules, but sample reagent and sequencing cost can be prohibitive for hypothesis-driven studies where the aim is to quantify differential expression of a limited number of genes. Here we present an approach for quantification of differential mRNA expression by targeted resequencing of complementary DNA using single-molecule molecular inversion probes (cDNA-smMIPs) that enable highly multiplexed resequencing of cDNA target regions of â¼100 nucleotides and counting of individual molecules. We show that accurate estimates of differential expression can be obtained from molecule counts for hundreds of smMIPs per reaction and that smMIPs are also suitable for quantification of relative gene expression and allele-specific expression. Compared with low-coverage RNA-Seq and a hybridization-based targeted RNA-Seq method, cDNA-smMIPs are a cost-effective high-throughput tool for hypothesis-driven expression analysis in large numbers of genes (10 to 500) and samples (hundreds to thousands).
Assuntos
DNA Complementar/genética , Sequenciamento do Exoma/métodos , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Linhagem Celular Transformada , Transformação Celular Viral , Células HEK293 , Humanos , RNA Mensageiro/biossínteseRESUMO
Neurons derived from human induced Pluripotent Stem Cells (hiPSCs) provide a promising new tool for studying neurological disorders. In the past decade, many protocols for differentiating hiPSCs into neurons have been developed. However, these protocols are often slow with high variability, low reproducibility, and low efficiency. In addition, the neurons obtained with these protocols are often immature and lack adequate functional activity both at the single-cell and network levels unless the neurons are cultured for several months. Partially due to these limitations, the functional properties of hiPSC-derived neuronal networks are still not well characterized. Here, we adapt a recently published protocol that describes production of human neurons from hiPSCs by forced expression of the transcription factor neurogenin-212. This protocol is rapid (yielding mature neurons within 3 weeks) and efficient, with nearly 100% conversion efficiency of transduced cells (>95% of DAPI-positive cells are MAP2 positive). Furthermore, the protocol yields a homogeneous population of excitatory neurons that would allow the investigation of cell-type specific contributions to neurological disorders. We modified the original protocol by generating stably transduced hiPSC cells, giving us explicit control over the total number of neurons. These cells are then used to generate hiPSC-derived neuronal networks on micro-electrode arrays. In this way, the spontaneous electrophysiological activity of hiPSC-derived neuronal networks can be measured and characterized, while retaining interexperimental consistency in terms of cell density. The presented protocol is broadly applicable, especially for mechanistic and pharmacological studies on human neuronal networks.
Assuntos
Diferenciação Celular , Células-Tronco Pluripotentes Induzidas/metabolismo , Análise em Microsséries , Neurônios/metabolismo , Fatores de Transcrição Hélice-Alça-Hélice Básicos/genética , Fatores de Transcrição Hélice-Alça-Hélice Básicos/metabolismo , Linhagem Celular , Reprogramação Celular , Fibroblastos/citologia , Vetores Genéticos/genética , Vetores Genéticos/metabolismo , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Lentivirus/genética , Microeletrodos , Proteínas do Tecido Nervoso/genética , Proteínas do Tecido Nervoso/metabolismo , Neurogênese , Neurônios/citologia , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
Characterizing the multifaceted contribution of genetic and epigenetic factors to disease phenotypes is a major challenge in human genetics and medicine. We carried out high-resolution genetic, epigenetic, and transcriptomic profiling in three major human immune cell types (CD14+ monocytes, CD16+ neutrophils, and naive CD4+ T cells) from up to 197 individuals. We assess, quantitatively, the relative contribution of cis-genetic and epigenetic factors to transcription and evaluate their impact as potential sources of confounding in epigenome-wide association studies. Further, we characterize highly coordinated genetic effects on gene expression, methylation, and histone variation through quantitative trait locus (QTL) mapping and allele-specific (AS) analyses. Finally, we demonstrate colocalization of molecular trait QTLs at 345 unique immune disease loci. This expansive, high-resolution atlas of multi-omics changes yields insights into cell-type-specific correlation between diverse genomic inputs, more generalizable correlations between these inputs, and defines molecular events that may underpin complex disease risk.
Assuntos
Epigenômica , Doenças do Sistema Imunitário/genética , Monócitos/metabolismo , Neutrófilos/metabolismo , Linfócitos T/metabolismo , Transcrição Gênica , Adulto , Idoso , Processamento Alternativo , Feminino , Predisposição Genética para Doença , Células-Tronco Hematopoéticas/metabolismo , Código das Histonas , Humanos , Masculino , Pessoa de Meia-Idade , Locos de Características Quantitativas , Adulto JovemRESUMO
BACKGROUND: During early embryonic development, one of the two X chromosomes in mammalian female cells is inactivated to compensate for a potential imbalance in transcript levels with male cells, which contain a single X chromosome. Here, we use mouse female embryonic stem cells (ESCs) with non-random X chromosome inactivation (XCI) and polymorphic X chromosomes to study the dynamics of gene silencing over the inactive X chromosome by high-resolution allele-specific RNA-seq. RESULTS: Induction of XCI by differentiation of female ESCs shows that genes proximal to the X-inactivation center are silenced earlier than distal genes, while lowly expressed genes show faster XCI dynamics than highly expressed genes. The active X chromosome shows a minor but significant increase in gene activity during differentiation, resulting in complete dosage compensation in differentiated cell types. Genes escaping XCI show little or no silencing during early propagation of XCI. Allele-specific RNA-seq of neural progenitor cells generated from the female ESCs identifies three regions distal to the X-inactivation center that escape XCI. These regions, which stably escape during propagation and maintenance of XCI, coincide with topologically associating domains (TADs) as present in the female ESCs. Also, the previously characterized gene clusters escaping XCI in human fibroblasts correlate with TADs. CONCLUSIONS: The gene silencing observed during XCI provides further insight in the establishment of the repressive complex formed by the inactive X chromosome. The association of escape regions with TADs, in mouse and human, suggests that TADs are the primary targets during propagation of XCI over the X chromosome.
Assuntos
Inativação Gênica , Inativação do Cromossomo X , Alelos , Animais , Cromatina/química , Corpos Embrioides/metabolismo , Células-Tronco Embrionárias/metabolismo , Feminino , Humanos , Camundongos , Células-Tronco Neurais/metabolismo , Análise de Sequência de RNARESUMO
OBJECTIVES: Pharmacogenetic studies of tumour necrosis factor inhibitors (TNFi) response in patients with rheumatoid arthritis (RA) have largely relied on the changes in complex disease scores, such as disease activity score 28 (DAS28), as a measure of treatment response. It is expected that genetic architecture of such complex score is heterogeneous and not very suitable for pharmacogenetic studies. We aimed to select the most optimal phenotype for TNFi response using heritability estimates. METHODS: Using two linear mixed-modelling approaches (Bayz and GCTA), we estimated heritability, together with genomic and environmental correlations for the TNFi drug-response phenotype ΔDAS28 and its separate components: Δ swollen joint count (SJC), Δ tender joint count (TJC), Δ erythrocyte sedimentation rate (ESR) and Δ visual-analogue scale of general health (VAS-GH). For this, we used genome-wide single nucleotide polymorphism (SNP) data from 878 TNFi-treated Dutch patients with RA. Furthermore, a multivariate genome-wide association study (GWAS) approach was implemented, analysing separate DAS28 components simultaneously. RESULTS: The highest heritability estimates were found for ΔSJC (h(2)gbayz=0.76 and h(2)gGCTA=0.87) and ΔTJC (h(2)gbayz=0.62 and h(2)gGCTA=0.82); lower heritability was found for ΔDAS28 (h(2)gbayz=0.59 and h(2)gGCTA=0.71) while estimates for ΔESR and ΔVASGH were near or equal to zero. The highest genomic correlations were observed for ΔSJC and ΔTJC (0.49), and the highest environmental correlation was seen between ΔTJC and ΔVASGH (0.62). The multivariate GWAS did not generate excess of low p values as compared with a univariate analysis of ΔDAS28. CONCLUSIONS: Our results indicate that multiple SNPs together explain a substantial portion of the variation in change in joint counts in TNFi-treated patients with RA. In conclusion, of the outcomes studied, the joint counts are most suitable for TNFi pharmacogenetics in RA.
Assuntos
Antirreumáticos/uso terapêutico , Artrite Reumatoide/genética , DNA/genética , Estudo de Associação Genômica Ampla , Polimorfismo Genético , Fator de Necrose Tumoral alfa/antagonistas & inibidores , Adulto , Idoso , Artrite Reumatoide/tratamento farmacológico , Artrite Reumatoide/metabolismo , Feminino , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Índice de Gravidade de DoençaRESUMO
The blood group Vel was discovered 60 years ago, but the underlying gene is unknown. Individuals negative for the Vel antigen are rare and are required for the safe transfusion of patients with antibodies to Vel. To identify the responsible gene, we sequenced the exomes of five individuals negative for the Vel antigen and found that four were homozygous and one was heterozygous for a low-frequency 17-nucleotide frameshift deletion in the gene encoding the 78-amino-acid transmembrane protein SMIM1. A follow-up study showing that 59 of 64 Vel-negative individuals were homozygous for the same deletion and expression of the Vel antigen on SMIM1-transfected cells confirm SMIM1 as the gene underlying the Vel blood group. An expression quantitative trait locus (eQTL), the common SNP rs1175550 contributes to variable expression of the Vel antigen (P = 0.003) and influences the mean hemoglobin concentration of red blood cells (RBCs; P = 8.6 × 10(-15)). In vivo, zebrafish with smim1 knockdown showed a mild reduction in the number of RBCs, identifying SMIM1 as a new regulator of RBC formation. Our findings are of immediate relevance, as the homozygous presence of the deletion allows the unequivocal identification of Vel-negative blood donors.
Assuntos
Antígenos de Grupos Sanguíneos/genética , Membrana Eritrocítica/metabolismo , Eritrócitos/imunologia , Deleção de Genes , Homozigoto , Proteínas de Membrana/genética , Locos de Características Quantitativas , Alelos , Animais , Biomarcadores/metabolismo , Antígenos de Grupos Sanguíneos/imunologia , Antígenos de Grupos Sanguíneos/metabolismo , Ensaio de Desvio de Mobilidade Eletroforética , Eritrócitos/metabolismo , Eritrócitos/patologia , Exoma/genética , Feminino , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Isoanticorpos/imunologia , Proteínas de Membrana/imunologia , Proteínas de Membrana/metabolismo , Dados de Sequência Molecular , Análise de Sequência com Séries de Oligonucleotídeos , Gravidez , Peixe-Zebra/genéticaRESUMO
Nearly three-quarters of the 143 genetic signals associated with platelet and erythrocyte phenotypes identified by meta-analyses of genome-wide association (GWA) studies are located at non-protein-coding regions. Here, we assessed the role of candidate regulatory variants associated with cell type-restricted, closely related hematological quantitative traits in biologically relevant hematopoietic cell types. We used formaldehyde-assisted isolation of regulatory elements followed by next-generation sequencing (FAIRE-seq) to map regions of open chromatin in three primary human blood cells of the myeloid lineage. In the precursors of platelets and erythrocytes, as well as in monocytes, we found that open chromatin signatures reflect the corresponding hematopoietic lineages of the studied cell types and associate with the cell type-specific gene expression patterns. Dependent on their signal strength, open chromatin regions showed correlation with promoter and enhancer histone marks, distance to the transcription start site, and ontology classes of nearby genes. Cell type-restricted regions of open chromatin were enriched in sequence variants associated with hematological indices. The majority (63.6%) of such candidate functional variants at platelet quantitative trait loci (QTLs) coincided with binding sites of five transcription factors key in regulating megakaryopoiesis. We experimentally tested 13 candidate regulatory variants at 10 platelet QTLs and found that 10 (76.9%) affected protein binding, suggesting that this is a frequent mechanism by which regulatory variants influence quantitative trait levels. Our findings demonstrate that combining large-scale GWA data with open chromatin profiles of relevant cell types can be a powerful means of dissecting the genetic architecture of closely related quantitative traits.
Assuntos
Montagem e Desmontagem da Cromatina , Cromatina/metabolismo , Variação Genética , Locos de Características Quantitativas , Característica Quantitativa Herdável , Sequências Reguladoras de Ácido Nucleico , Plaquetas/metabolismo , Linhagem da Célula/genética , Mapeamento Cromossômico , Análise por Conglomerados , Eritrócitos/metabolismo , Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla , Histonas/metabolismo , Humanos , Células Mieloides/metabolismo , Nucleossomos/metabolismo , Especificidade de Órgãos/genética , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Thrombocytopenia with absent radii (TAR) syndrome is a rare disorder combining specific skeletal abnormalities with a reduced platelet count. Rare proximal microdeletions of 1q21.1 are found in the majority of patients but are also found in unaffected parents. Recently it was shown that TAR syndrome is caused by the compound inheritance of a low-frequency noncoding SNP and a rare null allele in RBM8A, a gene encoding the exon-junction complex subunit member Y14 located in the deleted region. This finding provides new insight into the complex inheritance pattern and new clues to the molecular mechanisms underlying TAR syndrome. We discuss TAR syndrome in the context of abnormal phenotypes associated with proximal and distal 1q21.1 microdeletion and microduplications with incomplete penetrance and variable expressivity.
Assuntos
Anormalidades Múltiplas/genética , Padrões de Herança/genética , Megalencefalia/genética , Trombocitopenia/genética , Deformidades Congênitas das Extremidades Superiores/genética , Anormalidades Múltiplas/patologia , Deleção Cromossômica , Duplicação Cromossômica , Cromossomos Humanos Par 1/genética , Síndrome Congênita de Insuficiência da Medula Óssea , Humanos , Megalencefalia/patologia , Fenótipo , Polimorfismo de Nucleotídeo Único , Rádio (Anatomia)/patologia , Trombocitopenia/patologia , Deformidades Congênitas das Extremidades Superiores/patologiaRESUMO
Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%-48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.
Assuntos
Evolução Molecular , Genoma Humano , Mutação INDEL/genética , Genética Populacional , Estudo de Associação Genômica Ampla , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutagênese Insercional , Taxa de Mutação , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Anaemia is a chief determinant of global ill health, contributing to cognitive impairment, growth retardation and impaired physical capacity. To understand further the genetic factors influencing red blood cells, we carried out a genome-wide association study of haemoglobin concentration and related parameters in up to 135,367 individuals. Here we identify 75 independent genetic loci associated with one or more red blood cell phenotypes at P < 10(-8), which together explain 4-9% of the phenotypic variance per trait. Using expression quantitative trait loci and bioinformatic strategies, we identify 121 candidate genes enriched in functions relevant to red blood cell biology. The candidate genes are expressed preferentially in red blood cell precursors, and 43 have haematopoietic phenotypes in Mus musculus or Drosophila melanogaster. Through open-chromatin and coding-variant analyses we identify potential causal genetic variants at 41 loci. Our findings provide extensive new insights into genetic mechanisms and biological pathways controlling red blood cell formation and function.
Assuntos
Eritrócitos/metabolismo , Loci Gênicos , Estudo de Associação Genômica Ampla , Fenótipo , Animais , Ciclo Celular/genética , Citocinas/metabolismo , Drosophila melanogaster/genética , Eritrócitos/citologia , Feminino , Regulação da Expressão Gênica/genética , Hematopoese/genética , Hemoglobinas/genética , Humanos , Masculino , Camundongos , Especificidade de Órgãos , Polimorfismo de Nucleotídeo Único/genética , Interferência de RNA , Transdução de Sinais/genéticaRESUMO
Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.
Assuntos
Variação Genética , Genoma Humano , Proteínas/genética , Doença/genética , Expressão Gênica , Frequência do Gene , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Seleção GenéticaRESUMO
The exon-junction complex (EJC) performs essential RNA processing tasks. Here, we describe the first human disorder, thrombocytopenia with absent radii (TAR), caused by deficiency in one of the four EJC subunits. Compound inheritance of a rare null allele and one of two low-frequency SNPs in the regulatory regions of RBM8A, encoding the Y14 subunit of EJC, causes TAR. We found that this inheritance mechanism explained 53 of 55 cases (P < 5 × 10(-228)) of the rare congenital malformation syndrome. Of the 53 cases with this inheritance pattern, 51 carried a submicroscopic deletion of 1q21.1 that has previously been associated with TAR, and two carried a truncation or frameshift null mutation in RBM8A. We show that the two regulatory SNPs result in diminished RBM8A transcription in vitro and that Y14 expression is reduced in platelets from individuals with TAR. Our data implicate Y14 insufficiency and, presumably, an EJC defect as the cause of TAR syndrome.
Assuntos
Predisposição Genética para Doença , Proteínas de Ligação a RNA/genética , Trombocitopenia/genética , Deformidades Congênitas das Extremidades Superiores/genética , Regiões 5' não Traduzidas/genética , Adolescente , Adulto , Sequência de Aminoácidos , Animais , Sequência de Bases , Criança , Pré-Escolar , Síndrome Congênita de Insuficiência da Medula Óssea , Feminino , Variação Genética , Humanos , Lactente , Recém-Nascido , Masculino , Mutação , Contagem de Plaquetas , Polimorfismo de Nucleotídeo Único , Rádio (Anatomia)/anormalidades , Alinhamento de Sequência , Análise de Sequência de DNA , Trombocitopenia/congênito , Adulto Jovem , Peixe-Zebra/genéticaRESUMO
Gray platelet syndrome (GPS) is a predominantly recessive platelet disorder that is characterized by mild thrombocytopenia with large platelets and a paucity of α-granules; these abnormalities cause mostly moderate but in rare cases severe bleeding. We sequenced the exomes of four unrelated individuals and identified NBEAL2 as the causative gene; it has no previously known function but is a member of a gene family that is involved in granule development. Silencing of nbeal2 in zebrafish abrogated thrombocyte formation.
Assuntos
Plaquetas/metabolismo , Grânulos Citoplasmáticos/metabolismo , Síndrome da Plaqueta Cinza/genética , Proteínas do Tecido Nervoso/genética , Vesículas Secretórias/metabolismo , Adulto , Idoso , Animais , Animais Geneticamente Modificados , Sequência de Bases , Plaquetas/patologia , Embrião não Mamífero/citologia , Embrião não Mamífero/metabolismo , Feminino , Regulação da Expressão Gênica no Desenvolvimento , Humanos , Masculino , Pessoa de Meia-Idade , Dados de Sequência Molecular , Proteínas do Tecido Nervoso/antagonistas & inibidores , Linhagem , Análise de Sequência de DNA , Homologia de Sequência do Ácido Nucleico , Adulto Jovem , Peixe-Zebra/crescimento & desenvolvimento , Peixe-Zebra/metabolismoRESUMO
SUMMARY: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. AVAILABILITY: http://vcftools.sourceforge.net
Assuntos
Variação Genética , Genômica/métodos , Armazenamento e Recuperação da Informação/métodos , Software , Alelos , Genoma Humano , Genótipo , HumanosRESUMO
Small insertions and deletions (indels) are a common and functionally important type of sequence polymorphism. Most of the focus of studies of sequence variation is on single nucleotide variants (SNVs) and large structural variants. In principle, high-throughput sequencing studies should allow identification of indels just as SNVs. However, inference of indels from next-generation sequence data is challenging, and so far methods for identifying indels lag behind methods for calling SNVs in terms of sensitivity and specificity. We propose a Bayesian method to call indels from short-read sequence data in individuals and populations by realigning reads to candidate haplotypes that represent alternative sequence to the reference. The candidate haplotypes are formed by combining candidate indels and SNVs identified by the read mapper, while allowing for known sequence variants or candidates from other methods to be included. In our probabilistic realignment model we account for base-calling errors, mapping errors, and also, importantly, for increased sequencing error indel rates in long homopolymer runs. We show that our method is sensitive and achieves low false discovery rates on simulated and real data sets, although challenges remain. The algorithm is implemented in the program Dindel, which has been used in the 1000 Genomes Project call sets.