Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Genet Sel Evol ; 52(1): 26, 2020 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-32414320

RESUMEN

BACKGROUND: Estimating the genetic component of a complex phenotype is a complicated problem, mainly because there are many allele effects to estimate from a limited number of phenotypes. In spite of this difficulty, linear methods with variable selection have been able to give good predictions of additive effects of individuals. However, prediction of non-additive genetic effects is challenging with the usual prediction methods. In machine learning, non-additive relations between inputs can be modeled with neural networks. We developed a novel method (NetSparse) that uses Bayesian neural networks with variable selection for the prediction of genotypic values of individuals, including non-additive genetic effects. RESULTS: We simulated several populations with different phenotypic models and compared NetSparse to genomic best linear unbiased prediction (GBLUP), BayesB, their dominance variants, and an additive by additive method. We found that when the number of QTL was relatively small (10 or 100), NetSparse had 2 to 28 percentage points higher accuracy than the reference methods. For scenarios that included dominance or epistatic effects, NetSparse had 0.0 to 3.9 percentage points higher accuracy for predicting phenotypes than the reference methods, except in scenarios with extreme overdominance, for which reference methods that explicitly model dominance had 6 percentage points higher accuracy than NetSparse. CONCLUSIONS: Bayesian neural networks with variable selection are promising for prediction of the genetic component of complex traits in animal breeding, and their performance is robust across different genetic models. However, their large computational costs can hinder their use in practice.


Asunto(s)
Predicción/métodos , Herencia Multifactorial/genética , Fenotipo , Algoritmos , Alelos , Animales , Teorema de Bayes , Frecuencia de los Genes/genética , Genética de Población/métodos , Genómica/métodos , Genotipo , Humanos , Modelos Genéticos , Redes Neurales de la Computación , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Selección Genética/genética
2.
PLoS One ; 15(5): e0231824, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32357166

RESUMEN

MOTIVATION: Cellular identity and behavior is controlled by complex gene regulatory networks. Transcription factors (TFs) bind to specific DNA sequences to regulate the transcription of their target genes. On the basis of these TF motifs in cis-regulatory elements we can model the influence of TFs on gene expression. In such models of TF motif activity the data is usually modeled assuming a linear relationship between the motif activity and the gene expression level. A commonly used method to model motif influence is based on Ridge Regression. One important assumption of linear regression is the independence between samples. However, if samples are generated from the same cell line, tissue, or other biological source, this assumption may be invalid. This same assumption of independence is also applied to different yet similar experimental conditions, which may also be inappropriate. In theory, the independence assumption between samples could lead to loss in signal detection. Here we investigate whether a Bayesian model that allows for correlations results in more accurate inference of motif activities. RESULTS: We extend the Ridge Regression to a Bayesian Linear Mixed Model, which allows us to model dependence between different samples. In a simulation study, we investigate the differences between the two model assumptions. We show that our Bayesian Linear Mixed Model implementation outperforms Ridge Regression in a simulation scenario where the noise, which is the signal that can not be explained by TF motifs, is uncorrelated. However, we demonstrate that there is no such gain in performance if the noise has a similar covariance structure over samples as the signal that can be explained by motifs. We give a mathematical explanation to why this is the case. Using four representative real datasets we show that at most ∼â€<40% of the signal is explained by motifs using the linear model. With these data there is no advantage to using the Bayesian Linear Mixed Model, due to the similarity of the covariance structure. AVAILABILITY & IMPLEMENTATION: The project implementation is available at https://github.com/Sim19/SimGEXPwMotifs.


Asunto(s)
Secuencias de Aminoácidos/genética , Teorema de Bayes , Regulación de la Expresión Génica/genética , Factores de Transcripción/genética , Secuenciación de Inmunoprecipitación de Cromatina/métodos , Simulación por Computador , Redes Reguladoras de Genes , Modelos Lineales , Elementos Reguladores de la Transcripción/genética , Factores de Transcripción/química
3.
Nucleic Acids Res ; 47(11): 5587-5602, 2019 06 20.
Artículo en Inglés | MEDLINE | ID: mdl-31049588

RESUMEN

Remodeling of chromatin accessibility is necessary for successful reprogramming of fibroblasts to neurons. However, it is still not fully known which transcription factors can induce a neuronal chromatin accessibility profile when overexpressed in fibroblasts. To identify such transcription factors, we used ATAC-sequencing to generate differential chromatin accessibility profiles between human fibroblasts and iNeurons, an in vitro neuronal model system obtained by overexpression of Neurog2 in induced pluripotent stem cells (iPSCs). We found that the ONECUT transcription factor sequence motif was strongly associated with differential chromatin accessibility between iNeurons and fibroblasts. All three ONECUT transcription factors associated with this motif (ONECUT1, ONECUT2 and ONECUT3) induced a neuron-like morphology and expression of neuronal genes within two days of overexpression in fibroblasts. We observed widespread remodeling of chromatin accessibility; in particular, we found that chromatin regions that contain the ONECUT motif were in- or lowly accessible in fibroblasts and became accessible after the overexpression of ONECUT1, ONECUT2 or ONECUT3. There was substantial overlap with iNeurons, still, many regions that gained accessibility following ONECUT overexpression were not accessible in iNeurons. Our study highlights both the potential and challenges of ONECUT-based direct neuronal reprogramming.


Asunto(s)
Reprogramación Celular , Cromatina/genética , Regulación de la Expresión Génica , Células Madre Pluripotentes Inducidas/metabolismo , Neuronas/metabolismo , Factores de Transcripción Onecut/genética , Diferenciación Celular , Línea Celular , Cromatina/metabolismo , Fibroblastos/citología , Fibroblastos/metabolismo , Perfilación de la Expresión Génica , Ontología de Genes , Factor Nuclear 6 del Hepatocito/genética , Factor Nuclear 6 del Hepatocito/metabolismo , Proteínas de Homeodominio , Humanos , Células Madre Pluripotentes Inducidas/citología , Neuronas/citología , Factores de Transcripción Onecut/metabolismo , Factores de Transcripción
4.
Nat Commun ; 9(1): 2384, 2018 06 19.
Artículo en Inglés | MEDLINE | ID: mdl-29921844

RESUMEN

Cell-based small molecule screening is an effective strategy leading to new medicines. Scientists in the pharmaceutical industry as well as in academia have made tremendous progress in developing both large-scale and smaller-scale screening assays. However, an accessible and universal technology for measuring large numbers of molecular and cellular phenotypes in many samples in parallel is not available. Here we present the immuno-detection by sequencing (ID-seq) technology that combines antibody-based protein detection and DNA-sequencing via DNA-tagged antibodies. We use ID-seq to simultaneously measure 70 (phospho-)proteins in primary human epidermal stem cells to screen the effects of ~300 kinase inhibitor probes to characterise the role of 225 kinases. The results show an association between decreased mTOR signalling and increased differentiation and uncover 13 kinases potentially regulating epidermal renewal through distinct mechanisms. Taken together, our work establishes ID-seq as a flexible solution for large-scale high-dimensional phenotyping in fixed cell populations.


Asunto(s)
Anticuerpos/metabolismo , Inmunoensayo/métodos , Proteoma/metabolismo , Proteómica/métodos , Análisis de Secuencia de ADN/métodos , Anticuerpos/inmunología , Diferenciación Celular/genética , Células Cultivadas , Células Epidérmicas/citología , Perfilación de la Expresión Génica , Humanos , Queratinocitos/citología , Queratinocitos/metabolismo , Fenotipo , Proteoma/genética , Proteoma/inmunología , Transducción de Señal/genética , Células Madre/metabolismo
5.
Nat Commun ; 8: 15190, 2017 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-28474677

RESUMEN

Whole-transcriptome or RNA sequencing (RNA-Seq) is a powerful and versatile tool for functional analysis of different types of RNA molecules, but sample reagent and sequencing cost can be prohibitive for hypothesis-driven studies where the aim is to quantify differential expression of a limited number of genes. Here we present an approach for quantification of differential mRNA expression by targeted resequencing of complementary DNA using single-molecule molecular inversion probes (cDNA-smMIPs) that enable highly multiplexed resequencing of cDNA target regions of ∼100 nucleotides and counting of individual molecules. We show that accurate estimates of differential expression can be obtained from molecule counts for hundreds of smMIPs per reaction and that smMIPs are also suitable for quantification of relative gene expression and allele-specific expression. Compared with low-coverage RNA-Seq and a hybridization-based targeted RNA-Seq method, cDNA-smMIPs are a cost-effective high-throughput tool for hypothesis-driven expression analysis in large numbers of genes (10 to 500) and samples (hundreds to thousands).


Asunto(s)
ADN Complementario/genética , Secuenciación del Exoma/métodos , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ARN/métodos , Línea Celular Transformada , Transformación Celular Viral , Células HEK293 , Humanos , ARN Mensajero/biosíntesis
6.
J Vis Exp ; (119)2017 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-28117798

RESUMEN

Neurons derived from human induced Pluripotent Stem Cells (hiPSCs) provide a promising new tool for studying neurological disorders. In the past decade, many protocols for differentiating hiPSCs into neurons have been developed. However, these protocols are often slow with high variability, low reproducibility, and low efficiency. In addition, the neurons obtained with these protocols are often immature and lack adequate functional activity both at the single-cell and network levels unless the neurons are cultured for several months. Partially due to these limitations, the functional properties of hiPSC-derived neuronal networks are still not well characterized. Here, we adapt a recently published protocol that describes production of human neurons from hiPSCs by forced expression of the transcription factor neurogenin-212. This protocol is rapid (yielding mature neurons within 3 weeks) and efficient, with nearly 100% conversion efficiency of transduced cells (>95% of DAPI-positive cells are MAP2 positive). Furthermore, the protocol yields a homogeneous population of excitatory neurons that would allow the investigation of cell-type specific contributions to neurological disorders. We modified the original protocol by generating stably transduced hiPSC cells, giving us explicit control over the total number of neurons. These cells are then used to generate hiPSC-derived neuronal networks on micro-electrode arrays. In this way, the spontaneous electrophysiological activity of hiPSC-derived neuronal networks can be measured and characterized, while retaining interexperimental consistency in terms of cell density. The presented protocol is broadly applicable, especially for mechanistic and pharmacological studies on human neuronal networks.


Asunto(s)
Diferenciación Celular , Células Madre Pluripotentes Inducidas/metabolismo , Análisis por Micromatrices , Neuronas/metabolismo , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/genética , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/metabolismo , Línea Celular , Reprogramación Celular , Fibroblastos/citología , Vectores Genéticos/genética , Vectores Genéticos/metabolismo , Humanos , Células Madre Pluripotentes Inducidas/citología , Lentivirus/genética , Microelectrodos , Proteínas del Tejido Nervioso/genética , Proteínas del Tejido Nervioso/metabolismo , Neurogénesis , Neuronas/citología , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
7.
Cell ; 167(5): 1398-1414.e24, 2016 11 17.
Artículo en Inglés | MEDLINE | ID: mdl-27863251

RESUMEN

Characterizing the multifaceted contribution of genetic and epigenetic factors to disease phenotypes is a major challenge in human genetics and medicine. We carried out high-resolution genetic, epigenetic, and transcriptomic profiling in three major human immune cell types (CD14+ monocytes, CD16+ neutrophils, and naive CD4+ T cells) from up to 197 individuals. We assess, quantitatively, the relative contribution of cis-genetic and epigenetic factors to transcription and evaluate their impact as potential sources of confounding in epigenome-wide association studies. Further, we characterize highly coordinated genetic effects on gene expression, methylation, and histone variation through quantitative trait locus (QTL) mapping and allele-specific (AS) analyses. Finally, we demonstrate colocalization of molecular trait QTLs at 345 unique immune disease loci. This expansive, high-resolution atlas of multi-omics changes yields insights into cell-type-specific correlation between diverse genomic inputs, more generalizable correlations between these inputs, and defines molecular events that may underpin complex disease risk.


Asunto(s)
Epigenómica , Enfermedades del Sistema Inmune/genética , Monocitos/metabolismo , Neutrófilos/metabolismo , Linfocitos T/metabolismo , Transcripción Genética , Adulto , Anciano , Empalme Alternativo , Femenino , Predisposición Genética a la Enfermedad , Células Madre Hematopoyéticas/metabolismo , Código de Histonas , Humanos , Masculino , Persona de Mediana Edad , Sitios de Carácter Cuantitativo , Adulto Joven
9.
Genome Biol ; 16: 149, 2015 Aug 03.
Artículo en Inglés | MEDLINE | ID: mdl-26235224

RESUMEN

BACKGROUND: During early embryonic development, one of the two X chromosomes in mammalian female cells is inactivated to compensate for a potential imbalance in transcript levels with male cells, which contain a single X chromosome. Here, we use mouse female embryonic stem cells (ESCs) with non-random X chromosome inactivation (XCI) and polymorphic X chromosomes to study the dynamics of gene silencing over the inactive X chromosome by high-resolution allele-specific RNA-seq. RESULTS: Induction of XCI by differentiation of female ESCs shows that genes proximal to the X-inactivation center are silenced earlier than distal genes, while lowly expressed genes show faster XCI dynamics than highly expressed genes. The active X chromosome shows a minor but significant increase in gene activity during differentiation, resulting in complete dosage compensation in differentiated cell types. Genes escaping XCI show little or no silencing during early propagation of XCI. Allele-specific RNA-seq of neural progenitor cells generated from the female ESCs identifies three regions distal to the X-inactivation center that escape XCI. These regions, which stably escape during propagation and maintenance of XCI, coincide with topologically associating domains (TADs) as present in the female ESCs. Also, the previously characterized gene clusters escaping XCI in human fibroblasts correlate with TADs. CONCLUSIONS: The gene silencing observed during XCI provides further insight in the establishment of the repressive complex formed by the inactive X chromosome. The association of escape regions with TADs, in mouse and human, suggests that TADs are the primary targets during propagation of XCI over the X chromosome.


Asunto(s)
Silenciador del Gen , Inactivación del Cromosoma X , Alelos , Animales , Cromatina/química , Cuerpos Embrioides/metabolismo , Células Madre Embrionarias/metabolismo , Femenino , Humanos , Ratones , Células-Madre Neurales/metabolismo , Análisis de Secuencia de ARN
10.
Ann Rheum Dis ; 74(12): 2183-7, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25114059

RESUMEN

OBJECTIVES: Pharmacogenetic studies of tumour necrosis factor inhibitors (TNFi) response in patients with rheumatoid arthritis (RA) have largely relied on the changes in complex disease scores, such as disease activity score 28 (DAS28), as a measure of treatment response. It is expected that genetic architecture of such complex score is heterogeneous and not very suitable for pharmacogenetic studies. We aimed to select the most optimal phenotype for TNFi response using heritability estimates. METHODS: Using two linear mixed-modelling approaches (Bayz and GCTA), we estimated heritability, together with genomic and environmental correlations for the TNFi drug-response phenotype ΔDAS28 and its separate components: Δ swollen joint count (SJC), Δ tender joint count (TJC), Δ erythrocyte sedimentation rate (ESR) and Δ visual-analogue scale of general health (VAS-GH). For this, we used genome-wide single nucleotide polymorphism (SNP) data from 878 TNFi-treated Dutch patients with RA. Furthermore, a multivariate genome-wide association study (GWAS) approach was implemented, analysing separate DAS28 components simultaneously. RESULTS: The highest heritability estimates were found for ΔSJC (h(2)gbayz=0.76 and h(2)gGCTA=0.87) and ΔTJC (h(2)gbayz=0.62 and h(2)gGCTA=0.82); lower heritability was found for ΔDAS28 (h(2)gbayz=0.59 and h(2)gGCTA=0.71) while estimates for ΔESR and ΔVASGH were near or equal to zero. The highest genomic correlations were observed for ΔSJC and ΔTJC (0.49), and the highest environmental correlation was seen between ΔTJC and ΔVASGH (0.62). The multivariate GWAS did not generate excess of low p values as compared with a univariate analysis of ΔDAS28. CONCLUSIONS: Our results indicate that multiple SNPs together explain a substantial portion of the variation in change in joint counts in TNFi-treated patients with RA. In conclusion, of the outcomes studied, the joint counts are most suitable for TNFi pharmacogenetics in RA.


Asunto(s)
Antirreumáticos/uso terapéutico , Artritis Reumatoide/genética , ADN/genética , Estudio de Asociación del Genoma Completo , Polimorfismo Genético , Factor de Necrosis Tumoral alfa/antagonistas & inhibidores , Adulto , Anciano , Artritis Reumatoide/tratamiento farmacológico , Artritis Reumatoide/metabolismo , Femenino , Genotipo , Humanos , Masculino , Persona de Mediana Edad , Fenotipo , Índice de Severidad de la Enfermedad
11.
Nat Genet ; 45(5): 542-545, 2013 May.
Artículo en Inglés | MEDLINE | ID: mdl-23563608

RESUMEN

The blood group Vel was discovered 60 years ago, but the underlying gene is unknown. Individuals negative for the Vel antigen are rare and are required for the safe transfusion of patients with antibodies to Vel. To identify the responsible gene, we sequenced the exomes of five individuals negative for the Vel antigen and found that four were homozygous and one was heterozygous for a low-frequency 17-nucleotide frameshift deletion in the gene encoding the 78-amino-acid transmembrane protein SMIM1. A follow-up study showing that 59 of 64 Vel-negative individuals were homozygous for the same deletion and expression of the Vel antigen on SMIM1-transfected cells confirm SMIM1 as the gene underlying the Vel blood group. An expression quantitative trait locus (eQTL), the common SNP rs1175550 contributes to variable expression of the Vel antigen (P = 0.003) and influences the mean hemoglobin concentration of red blood cells (RBCs; P = 8.6 × 10(-15)). In vivo, zebrafish with smim1 knockdown showed a mild reduction in the number of RBCs, identifying SMIM1 as a new regulator of RBC formation. Our findings are of immediate relevance, as the homozygous presence of the deletion allows the unequivocal identification of Vel-negative blood donors.


Asunto(s)
Antígenos de Grupos Sanguíneos/genética , Membrana Eritrocítica/metabolismo , Eritrocitos/inmunología , Eliminación de Gen , Homocigoto , Proteínas de la Membrana/genética , Sitios de Carácter Cuantitativo , Alelos , Animales , Biomarcadores/metabolismo , Antígenos de Grupos Sanguíneos/inmunología , Antígenos de Grupos Sanguíneos/metabolismo , Ensayo de Cambio de Movilidad Electroforética , Eritrocitos/metabolismo , Eritrocitos/patología , Exoma/genética , Femenino , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Isoanticuerpos/inmunología , Proteínas de la Membrana/inmunología , Proteínas de la Membrana/metabolismo , Datos de Secuencia Molecular , Análisis de Secuencia por Matrices de Oligonucleótidos , Embarazo , Pez Cebra/genética
12.
Curr Opin Genet Dev ; 23(3): 316-23, 2013 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-23602329

RESUMEN

Thrombocytopenia with absent radii (TAR) syndrome is a rare disorder combining specific skeletal abnormalities with a reduced platelet count. Rare proximal microdeletions of 1q21.1 are found in the majority of patients but are also found in unaffected parents. Recently it was shown that TAR syndrome is caused by the compound inheritance of a low-frequency noncoding SNP and a rare null allele in RBM8A, a gene encoding the exon-junction complex subunit member Y14 located in the deleted region. This finding provides new insight into the complex inheritance pattern and new clues to the molecular mechanisms underlying TAR syndrome. We discuss TAR syndrome in the context of abnormal phenotypes associated with proximal and distal 1q21.1 microdeletion and microduplications with incomplete penetrance and variable expressivity.


Asunto(s)
Anomalías Múltiples/genética , Patrón de Herencia/genética , Megalencefalia/genética , Trombocitopenia/genética , Deformidades Congénitas de las Extremidades Superiores/genética , Anomalías Múltiples/patología , Deleción Cromosómica , Duplicación Cromosómica , Cromosomas Humanos Par 1/genética , Síndromes Congénitos de Insuficiencia de la Médula Ósea , Humanos , Megalencefalia/patología , Fenotipo , Polimorfismo de Nucleótido Simple , Radio (Anatomía)/patología , Trombocitopenia/patología , Deformidades Congénitas de las Extremidades Superiores/patología
13.
Genome Res ; 23(7): 1130-41, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23570689

RESUMEN

Nearly three-quarters of the 143 genetic signals associated with platelet and erythrocyte phenotypes identified by meta-analyses of genome-wide association (GWA) studies are located at non-protein-coding regions. Here, we assessed the role of candidate regulatory variants associated with cell type-restricted, closely related hematological quantitative traits in biologically relevant hematopoietic cell types. We used formaldehyde-assisted isolation of regulatory elements followed by next-generation sequencing (FAIRE-seq) to map regions of open chromatin in three primary human blood cells of the myeloid lineage. In the precursors of platelets and erythrocytes, as well as in monocytes, we found that open chromatin signatures reflect the corresponding hematopoietic lineages of the studied cell types and associate with the cell type-specific gene expression patterns. Dependent on their signal strength, open chromatin regions showed correlation with promoter and enhancer histone marks, distance to the transcription start site, and ontology classes of nearby genes. Cell type-restricted regions of open chromatin were enriched in sequence variants associated with hematological indices. The majority (63.6%) of such candidate functional variants at platelet quantitative trait loci (QTLs) coincided with binding sites of five transcription factors key in regulating megakaryopoiesis. We experimentally tested 13 candidate regulatory variants at 10 platelet QTLs and found that 10 (76.9%) affected protein binding, suggesting that this is a frequent mechanism by which regulatory variants influence quantitative trait levels. Our findings demonstrate that combining large-scale GWA data with open chromatin profiles of relevant cell types can be a powerful means of dissecting the genetic architecture of closely related quantitative traits.


Asunto(s)
Ensamble y Desensamble de Cromatina , Cromatina/metabolismo , Variación Genética , Sitios de Carácter Cuantitativo , Carácter Cuantitativo Heredable , Secuencias Reguladoras de Ácidos Nucleicos , Plaquetas/metabolismo , Linaje de la Célula/genética , Mapeo Cromosómico , Análisis por Conglomerados , Eritrocitos/metabolismo , Regulación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Histonas/metabolismo , Humanos , Células Mieloides/metabolismo , Nucleosomas/metabolismo , Especificidad de Órganos/genética , Fenotipo , Polimorfismo de Nucleótido Simple
14.
Genome Res ; 23(5): 749-61, 2013 May.
Artículo en Inglés | MEDLINE | ID: mdl-23478400

RESUMEN

Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%-48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.


Asunto(s)
Evolución Molecular , Genoma Humano , Mutación INDEL/genética , Genética de Población , Estudio de Asociación del Genoma Completo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutagénesis Insercional , Tasa de Mutación , Polimorfismo de Nucleótido Simple
15.
Nature ; 492(7429): 369-75, 2012 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-23222517

RESUMEN

Anaemia is a chief determinant of global ill health, contributing to cognitive impairment, growth retardation and impaired physical capacity. To understand further the genetic factors influencing red blood cells, we carried out a genome-wide association study of haemoglobin concentration and related parameters in up to 135,367 individuals. Here we identify 75 independent genetic loci associated with one or more red blood cell phenotypes at P < 10(-8), which together explain 4-9% of the phenotypic variance per trait. Using expression quantitative trait loci and bioinformatic strategies, we identify 121 candidate genes enriched in functions relevant to red blood cell biology. The candidate genes are expressed preferentially in red blood cell precursors, and 43 have haematopoietic phenotypes in Mus musculus or Drosophila melanogaster. Through open-chromatin and coding-variant analyses we identify potential causal genetic variants at 41 loci. Our findings provide extensive new insights into genetic mechanisms and biological pathways controlling red blood cell formation and function.


Asunto(s)
Eritrocitos/metabolismo , Sitios Genéticos , Estudio de Asociación del Genoma Completo , Fenotipo , Animales , Ciclo Celular/genética , Citocinas/metabolismo , Drosophila melanogaster/genética , Eritrocitos/citología , Femenino , Regulación de la Expresión Génica/genética , Hematopoyesis/genética , Hemoglobinas/genética , Humanos , Masculino , Ratones , Especificidad de Órganos , Polimorfismo de Nucleótido Simple/genética , Interferencia de ARN , Transducción de Señal/genética
16.
Nat Genet ; 44(4): 435-9, S1-2, 2012 Feb 26.
Artículo en Inglés | MEDLINE | ID: mdl-22366785

RESUMEN

The exon-junction complex (EJC) performs essential RNA processing tasks. Here, we describe the first human disorder, thrombocytopenia with absent radii (TAR), caused by deficiency in one of the four EJC subunits. Compound inheritance of a rare null allele and one of two low-frequency SNPs in the regulatory regions of RBM8A, encoding the Y14 subunit of EJC, causes TAR. We found that this inheritance mechanism explained 53 of 55 cases (P < 5 × 10(-228)) of the rare congenital malformation syndrome. Of the 53 cases with this inheritance pattern, 51 carried a submicroscopic deletion of 1q21.1 that has previously been associated with TAR, and two carried a truncation or frameshift null mutation in RBM8A. We show that the two regulatory SNPs result in diminished RBM8A transcription in vitro and that Y14 expression is reduced in platelets from individuals with TAR. Our data implicate Y14 insufficiency and, presumably, an EJC defect as the cause of TAR syndrome.


Asunto(s)
Predisposición Genética a la Enfermedad , Proteínas de Unión al ARN/genética , Trombocitopenia/genética , Deformidades Congénitas de las Extremidades Superiores/genética , Regiones no Traducidas 5'/genética , Adolescente , Adulto , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Niño , Preescolar , Síndromes Congénitos de Insuficiencia de la Médula Ósea , Femenino , Variación Genética , Humanos , Lactante , Recién Nacido , Masculino , Mutación , Recuento de Plaquetas , Polimorfismo de Nucleótido Simple , Radio (Anatomía)/anomalías , Alineación de Secuencia , Análisis de Secuencia de ADN , Trombocitopenia/congénito , Adulto Joven , Pez Cebra/genética
17.
Science ; 335(6070): 823-8, 2012 Feb 17.
Artículo en Inglés | MEDLINE | ID: mdl-22344438

RESUMEN

Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.


Asunto(s)
Variación Genética , Genoma Humano , Proteínas/genética , Enfermedad/genética , Expresión Génica , Frecuencia de los Genes , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple , Selección Genética
18.
Nat Genet ; 43(8): 735-7, 2011 Jul 17.
Artículo en Inglés | MEDLINE | ID: mdl-21765411

RESUMEN

Gray platelet syndrome (GPS) is a predominantly recessive platelet disorder that is characterized by mild thrombocytopenia with large platelets and a paucity of α-granules; these abnormalities cause mostly moderate but in rare cases severe bleeding. We sequenced the exomes of four unrelated individuals and identified NBEAL2 as the causative gene; it has no previously known function but is a member of a gene family that is involved in granule development. Silencing of nbeal2 in zebrafish abrogated thrombocyte formation.


Asunto(s)
Plaquetas/metabolismo , Gránulos Citoplasmáticos/metabolismo , Síndrome de Plaquetas Grises/genética , Proteínas del Tejido Nervioso/genética , Vesículas Secretoras/metabolismo , Adulto , Anciano , Animales , Animales Modificados Genéticamente , Secuencia de Bases , Plaquetas/patología , Embrión no Mamífero/citología , Embrión no Mamífero/metabolismo , Femenino , Regulación del Desarrollo de la Expresión Génica , Humanos , Masculino , Persona de Mediana Edad , Datos de Secuencia Molecular , Proteínas del Tejido Nervioso/antagonistas & inhibidores , Linaje , Análisis de Secuencia de ADN , Homología de Secuencia de Ácido Nucleico , Adulto Joven , Pez Cebra/crecimiento & desarrollo , Pez Cebra/metabolismo
19.
Bioinformatics ; 27(15): 2156-8, 2011 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-21653522

RESUMEN

SUMMARY: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. AVAILABILITY: http://vcftools.sourceforge.net


Asunto(s)
Variación Genética , Genómica/métodos , Almacenamiento y Recuperación de la Información/métodos , Programas Informáticos , Alelos , Genoma Humano , Genotipo , Humanos
20.
Genome Res ; 21(6): 961-73, 2011 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-20980555

RESUMEN

Small insertions and deletions (indels) are a common and functionally important type of sequence polymorphism. Most of the focus of studies of sequence variation is on single nucleotide variants (SNVs) and large structural variants. In principle, high-throughput sequencing studies should allow identification of indels just as SNVs. However, inference of indels from next-generation sequence data is challenging, and so far methods for identifying indels lag behind methods for calling SNVs in terms of sensitivity and specificity. We propose a Bayesian method to call indels from short-read sequence data in individuals and populations by realigning reads to candidate haplotypes that represent alternative sequence to the reference. The candidate haplotypes are formed by combining candidate indels and SNVs identified by the read mapper, while allowing for known sequence variants or candidates from other methods to be included. In our probabilistic realignment model we account for base-calling errors, mapping errors, and also, importantly, for increased sequencing error indel rates in long homopolymer runs. We show that our method is sensitive and achieves low false discovery rates on simulated and real data sets, although challenges remain. The algorithm is implemented in the program Dindel, which has been used in the 1000 Genomes Project call sets.


Asunto(s)
Algoritmos , Mutación INDEL/genética , Modelos Genéticos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Teorema de Bayes , Haplotipos/genética , Funciones de Verosimilitud
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...