RESUMEN
Coordination of fetal maturation with birth timing is essential for mammalian reproduction. In humans, preterm birth is a disorder of profound global health significance. The signals initiating parturition in humans have remained elusive, due to divergence in physiological mechanisms between humans and model organisms typically studied. Because of relatively large human head size and narrow birth canal cross-sectional area compared to other primates, we hypothesized that genes involved in parturition would display accelerated evolution along the human and/or higher primate phylogenetic lineages to decrease the length of gestation and promote delivery of a smaller fetus that transits the birth canal more readily. Further, we tested whether current variation in such accelerated genes contributes to preterm birth risk. Evidence from allometric scaling of gestational age suggests human gestation has been shortened relative to other primates. Consistent with our hypothesis, many genes involved in reproduction show human acceleration in their coding or adjacent noncoding regions. We screened >8,400 SNPs in 150 human accelerated genes in 165 Finnish preterm and 163 control mothers for association with preterm birth. In this cohort, the most significant association was in FSHR, and 8 of the 10 most significant SNPs were in this gene. Further evidence for association of a linkage disequilibrium block of SNPs in FSHR, rs11686474, rs11680730, rs12473870, and rs1247381 was found in African Americans. By considering human acceleration, we identified a novel gene that may be associated with preterm birth, FSHR. We anticipate other human accelerated genes will similarly be associated with preterm birth risk and elucidate essential pathways for human parturition.
Asunto(s)
Negro o Afroamericano/genética , Evolución Molecular , Parto/genética , Polimorfismo de Nucleótido Simple , Nacimiento Prematuro/genética , Adulto , Animales , Estudios de Casos y Controles , Estudios de Cohortes , Femenino , Finlandia , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Desequilibrio de Ligamiento , Modelos Genéticos , Receptores de HFE/genética , Factores de Riesgo , Adulto JovenRESUMEN
We report a targeted, cost-effective method to quantify rare single-nucleotide polymorphisms from pooled human genomic DNA using second-generation sequencing. We pooled DNA from 1,111 individuals and targeted four genes to identify rare germline variants. Our base-calling algorithm, SNPSeeker, derived from large deviation theory, detected single-nucleotide polymorphisms present at frequencies below the raw error rate of the sequencing platform.
Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , ADN/genética , Frecuencia de los Genes/genética , Variación Genética/genética , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Datos de Secuencia Molecular , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Alineación de Secuencia/métodos , Programas InformáticosRESUMEN
The abundance and identity of functional variation segregating in natural populations is paramount to dissecting the molecular basis of quantitative traits as well as human genetic diseases. Genome sequencing of multiple organisms of the same species provides an efficient means of cataloging rearrangements, insertion, or deletion polymorphisms (InDels) and single-nucleotide polymorphisms (SNPs). While inbreeding depression and heterosis imply that a substantial amount of polymorphism is deleterious, distinguishing deleterious from neutral polymorphism remains a significant challenge. To identify deleterious and neutral DNA sequence variation within Saccharomyces cerevisiae, we sequenced the genome of a vineyard and oak tree strain and compared them to a reference genome. Among these three strains, 6% of the genome is variable, mostly attributable to variation in genome content that results from large InDels. Out of the 88,000 polymorphisms identified, 93% are SNPs and a small but significant fraction can be attributed to recent interspecific introgression and ectopic gene conversion. In comparison to the reference genome, there is substantial evidence for functional variation in gene content and structure that results from large InDels, frame-shifts, and polymorphic start and stop codons. Comparison of polymorphism to divergence reveals scant evidence for positive selection but an abundance of evidence for deleterious SNPs. We estimate that 12% of coding and 7% of noncoding SNPs are deleterious. Based on divergence among 11 yeast species, we identified 1,666 nonsynonymous SNPs that disrupt conserved amino acids and 1,863 noncoding SNPs that disrupt conserved noncoding motifs. The deleterious coding SNPs include those known to affect quantitative traits, and a subset of the deleterious noncoding SNPs occurs in the promoters of genes that show allele-specific expression, implying that some cis-regulatory SNPs are deleterious. Our results show that the genome sequences of both closely and distantly related species provide a means of identifying deleterious polymorphisms that disrupt functionally conserved coding and noncoding sequences.
Asunto(s)
Polimorfismo de Nucleótido Simple , Saccharomyces cerevisiae/genética , Secuencia de Bases , Sitios de Unión , Conversión Génica , Reordenamiento Génico , Genoma Fúngico , Datos de Secuencia Molecular , Filogenia , Quercus/microbiología , Saccharomyces cerevisiae/clasificación , Saccharomyces cerevisiae/metabolismo , Selección Genética , Alineación de Secuencia , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Vitis/microbiologíaRESUMEN
Cis-regulatory sequences are not always conserved across species. Divergence within cis-regulatory sequences may result from the evolution of species-specific patterns of gene expression or the flexible nature of the cis-regulatory code. The identification of functional divergence in cis-regulatory sequences is therefore important for both understanding the role of gene regulation in evolution and annotating regulatory elements. We have developed an evolutionary model to detect the loss of constraint on individual transcription factor binding sites (TFBSs). We find that a significant fraction of functionally constrained binding sites have been lost in a lineage-specific manner among three closely related yeast species. Binding site loss has previously been explained by turnover, where the concurrent gain and loss of a binding site maintains gene regulation. We estimate that nearly half of all loss events cannot be explained by binding site turnover. Recreating the mutations that led to binding site loss confirms that these sequence changes affect gene expression in some cases. We also estimate that there is a high rate of binding site gain, as more than half of experimentally identified S. cerevisiae binding sites are not conserved across species. The frequent gain and loss of TFBSs implies that cis-regulatory sequences are labile and, in the absence of turnover, may contribute to species-specific patterns of gene expression.
Asunto(s)
Evolución Molecular , Variación Genética/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Análisis de Secuencia de ADN/métodos , Factores de Transcripción/genética , Secuencia de Bases , Sitios de Unión , Frecuencia de los Genes , Datos de Secuencia Molecular , Unión ProteicaRESUMEN
BACKGROUND: Microarray technologies have evolved rapidly, enabling biologists to quantify genome-wide levels of gene expression, alternative splicing, and sequence variations for a variety of species. Analyzing and displaying these data present a significant challenge. Pathway-based approaches for analyzing microarray data have proven useful for presenting data and for generating testable hypotheses. RESULTS: To address the growing needs of the microarray community we have released version 2 of Gene Map Annotator and Pathway Profiler (GenMAPP), a new GenMAPP database schema, and integrated resources for pathway analysis. We have redesigned the GenMAPP database to support multiple gene annotations and species as well as custom species database creation for a potentially unlimited number of species. We have expanded our pathway resources by utilizing homology information to translate pathway content between species and extending existing pathways with data derived from conserved protein interactions and coexpression. We have implemented a new mode of data visualization to support analysis of complex data, including time-course, single nucleotide polymorphism (SNP), and splicing. GenMAPP version 2 also offers innovative ways to display and share data by incorporating HTML export of analyses for entire sets of pathways as organized web pages. CONCLUSION: GenMAPP version 2 provides a means to rapidly interrogate complex experimental data for pathway-level changes in a diverse range of organisms.
Asunto(s)
Perfilación de la Expresión Génica/métodos , Expresión Génica/fisiología , Modelos Biológicos , Proteoma/metabolismo , Transducción de Señal/fisiología , Programas Informáticos , Interfaz Usuario-Computador , Algoritmos , Gráficos por Computador , Simulación por ComputadorRESUMEN
Two different machine-learning algorithms have been used to predict the blood-brain barrier permeability of different classes of molecules, to develop a method to predict the ability of drug compounds to penetrate the CNS. The first algorithm is based on a multilayer perceptron neural network and the second algorithm uses a support vector machine. Both algorithms are trained on an identical data set consisting of 179 CNS active molecules and 145 CNS inactive molecules. The training parameters include molecular weight, lipophilicity, hydrogen bonding, and other variables that govern the ability of a molecule to diffuse through a membrane. The results show that the support vector machine outperforms the neural network. Based on over 30 different validation sets, the SVM can predict up to 96% of the molecules correctly, averaging 81.5% over 30 test sets, which comprised of equal numbers of CNS positive and negative molecules. This is quite favorable when compared with the neural network's average performance of 75.7% with the same 30 test sets. The results of the SVM algorithm are very encouraging and suggest that a classification tool like this one will prove to be a valuable prediction approach.
Asunto(s)
Algoritmos , Barrera Hematoencefálica/fisiología , Sistema Nervioso Central/metabolismo , Redes Neurales de la Computación , Preparaciones Farmacéuticas/metabolismo , Sistema Nervioso Central/anatomía & histología , Estructura Molecular , Permeabilidad , Preparaciones Farmacéuticas/químicaRESUMEN
BACKGROUND: The onset of birth in humans, like other apes, differs from non-primate mammals in its endocrine physiology. We hypothesize that higher primate-specific gene evolution may lead to these differences and target genes involved in human preterm birth, an area of global health significance. METHODS: We performed a comparative genomics screen of highly conserved noncoding elements and identified PLA2G4C, a phospholipase A isoform involved in prostaglandin biosynthesis as human accelerated. To examine whether this gene demonstrating primate-specific evolution was associated with birth timing, we genotyped and analyzed 8 common single nucleotide polymorphisms (SNPs) in PLA2G4C in US Hispanic (n = 73 preterm, 292 control), US White (n = 147 preterm, 157 control) and US Black (n = 79 preterm, 166 control) mothers. RESULTS: Detailed structural and phylogenic analysis of PLA2G4C suggested a short genomic element within the gene duplicated from a paralogous highly conserved element on chromosome 1 specifically in primates. SNPs rs8110925 and rs2307276 in US Hispanics and rs11564620 in US Whites were significant after correcting for multiple tests (p < 0.006). Additionally, rs11564620 (Thr360Pro) was associated with increased metabolite levels of the prostaglandin thromboxane in healthy individuals (p = 0.02), suggesting this variant may affect PLA2G4C activity. CONCLUSIONS: Our findings suggest that variation in PLA2G4C may influence preterm birth risk by increasing levels of prostaglandins, which are known to regulate labor.
Asunto(s)
Evolución Molecular , Fosfolipasas A2 Grupo IV/genética , Mutagénesis Insercional , Parto/genética , Nacimiento Prematuro/genética , Primates/genética , Animales , Vías Biosintéticas , Cromosomas Humanos Par 19 , Humanos , Intrones , Filogenia , Nacimiento Prematuro/etnología , Primates/fisiología , Prostaglandinas/biosíntesisRESUMEN
BACKGROUND: High-throughput mutagenesis of the mammalian genome is a powerful means to facilitate analysis of gene function. Gene trapping in embryonic stem cells (ESCs) is the most widely used form of insertional mutagenesis in mammals. However, the rules governing its efficiency are not fully understood, and the effects of vector design on the likelihood of gene-trapping events have not been tested on a genome-wide scale. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we used public gene-trap data to model gene-trap likelihood. Using the association of gene length and gene expression with gene-trap likelihood, we constructed spline-based regression models that characterize which genes are susceptible and which genes are resistant to gene-trapping techniques. We report results for three classes of gene-trap vectors, showing that both length and expression are significant determinants of trap likelihood for all vectors. Using our models, we also quantitatively identified hotspots of gene-trap activity, which represent loci where the high likelihood of vector insertion is controlled by factors other than length and expression. These formalized statistical models describe a high proportion of the variance in the likelihood of a gene being trapped by expression-dependent vectors and a lower, but still significant, proportion of the variance for vectors that are predicted to be independent of endogenous gene expression. CONCLUSIONS/SIGNIFICANCE: The findings of significant expression and length effects reported here further the understanding of the determinants of vector insertion. Results from this analysis can be applied to help identify other important determinants of this important biological phenomenon and could assist planning of large-scale mutagenesis efforts.
Asunto(s)
Células Madre Embrionarias/fisiología , Expresión Génica , Modelos Genéticos , Mutagénesis Insercional , Animales , Mapeo Cromosómico/métodos , Exones/genética , Vectores Genéticos , Genoma , Funciones de Verosimilitud , Ratones , Análisis de Secuencia por Matrices de Oligonucleótidos , Plásmidos/genética , Reacción en Cadena de la Polimerasa de Transcriptasa InversaRESUMEN
Comparative genomics provides a rapid means of identifying functional DNA elements by their sequence conservation between species. Transcription factor binding sites (TFBSs) may constitute a significant fraction of these conserved sequences, but the annotation of specific TFBSs is complicated by the fact that these short, degenerate sequences may frequently be conserved by chance rather than functional constraint. To identify intergenic sequences that function as TFBSs, we calculated the probability of binding site conservation between Saccharomyces cerevisiae and its two closest relatives under a neutral model of evolution. We found that this probability is <5% for 134 of 163 transcription factor binding motifs, implying that we can reliably annotate binding sites for the majority of these transcription factors by conservation alone. Although our annotation relies on a number of assumptions, mutations in five of five conserved Ume6 binding sites and three of four conserved Ndt80 binding sites show Ume6- and Ndt80-dependent effects on gene expression. We also found that three of five unconserved Ndt80 binding sites show Ndt80-dependent effects on gene expression. Together these data imply that although sequence conservation can be reliably used to predict functional TFBSs, unconserved sequences might also make a significant contribution to a species' biology.
Asunto(s)
Proteínas de Unión al ADN/metabolismo , Evolución Molecular , Expresión Génica , Modelos Genéticos , Proteínas Represoras/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces/genética , Factores de Transcripción/metabolismo , Secuencias de Aminoácidos/genética , Secuencia de Bases , Sitios de Unión/genética , Biología Computacional , Secuencia Conservada/genética , ADN Intergénico/genética , Proteínas de Unión al ADN/genética , Genómica/métodos , Datos de Secuencia Molecular , Mutación/genética , Proteínas Represoras/genética , Proteínas de Saccharomyces cerevisiae/genética , Alineación de Secuencia , Especificidad de la Especie , Factores de Transcripción/genéticaRESUMEN
BACKGROUND: Early transition to labor remains a major cause of infant mortality, yet the causes are largely unknown. Although several marker genes have been identified, little is known about the underlying global gene expression patterns and pathways that orchestrate these striking changes. RESULTS: We performed a detailed time-course study of over 9,000 genes in mouse myometrium at defined physiological states: non-pregnant, mid-gestation, late gestation, and postpartum. This dataset allowed us to identify distinct patterns of gene expression that correspond to phases of myometrial 'quiescence', 'term activation', and 'postpartum involution'. Using recently developed functional mapping tools (HOPACH (hierarchical ordered partitioning and collapsing hybrid) and GenMAPP 2.0), we have identified new potential transcriptional regulatory gene networks mediating the transition from quiescence to term activation. CONCLUSIONS: These results implicate the myometrium as an essential regulator of endocrine hormone (cortisol and progesterone synthesis) and signaling pathways (cyclic AMP and cyclic GMP stimulation) that direct quiescence via the transcriptional upregulation of both novel and previously associated regulators. With term activation, we observe the upregulation of cytoskeletal remodeling mediators (intermediate filaments), cell junctions, transcriptional regulators, and the coordinate downregulation of negative control checkpoints of smooth muscle contractile signaling. This analysis provides new evidence of multiple parallel mechanisms of uterine contractile regulation and presents new putative targets for regulating myometrial transformation and contraction.
Asunto(s)
Regulación de la Expresión Génica , Miometrio/metabolismo , Contracción Uterina/genética , Animales , Análisis por Conglomerados , Femenino , Perfilación de la Expresión Génica , Edad Gestacional , Proteínas de Unión al GTP Heterotriméricas/metabolismo , Cinética , Ratones , Parto , Periodo Posparto/genética , Periodo Posparto/metabolismo , Embarazo , Procesamiento Postranscripcional del ARN , ARN Mensajero/metabolismo , Transducción de Señal , Transcripción Genética , Contracción Uterina/metabolismoRESUMEN
MAPPFinder is a tool that creates a global gene-expression profile across all areas of biology by integrating the annotations of the Gene Ontology (GO) Project with the free software package GenMAPP http://www.GenMAPP.org. The results are displayed in a searchable browser, allowing the user to rapidly identify GO terms with over-represented numbers of gene-expression changes. Clicking on GO terms generates GenMAPP graphical files where gene relationships can be explored, annotated, and files can be freely exchanged.