RESUMEN
Choanoflagellates are the closest known relatives of metazoans. To discover potential molecular mechanisms underlying the evolution of metazoan multicellularity, we sequenced and analysed the genome of the unicellular choanoflagellate Monosiga brevicollis. The genome contains approximately 9,200 intron-rich genes, including a number that encode cell adhesion and signalling protein domains that are otherwise restricted to metazoans. Here we show that the physical linkages among protein domains often differ between M. brevicollis and metazoans, suggesting that abundant domain shuffling followed the separation of the choanoflagellate and metazoan lineages. The completion of the M. brevicollis genome allows us to reconstruct with increasing resolution the genomic changes that accompanied the origin of metazoans.
Asunto(s)
Células Eucariotas/metabolismo , Genoma/genética , Filogenia , Animales , Adhesión Celular , Secuencia Conservada , Células Eucariotas/clasificación , Células Eucariotas/citología , Evolución Molecular , Matriz Extracelular/metabolismo , Regulación de la Expresión Génica , Especiación Genética , Proteínas Hedgehog/química , Proteínas Hedgehog/genética , Humanos , Intrones/genética , Fosfotirosina/metabolismo , Estructura Terciaria de Proteína/genética , Receptores Notch/química , Receptores Notch/genética , Transducción de Señal/genética , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Transcripción GenéticaRESUMEN
The number of sequenced plant genomes and associated genomic resources is growing rapidly with the advent of both an increased focus on plant genomics from funding agencies, and the application of inexpensive next generation sequencing. To interact with this increasing body of data, we have developed Phytozome (http://www.phytozome.net), a comparative hub for plant genome and gene family data and analysis. Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number (currently 25) of complete plant genomes, including all the land plants and selected algae sequenced at the Joint Genome Institute, as well as selected species sequenced elsewhere. Through a comprehensive plant genome database and web portal, these data and analyses are available to the broader plant science research community, providing powerful comparative genomics tools that help to link model systems with other plants of economic and ecological importance.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genes de Plantas , Genoma de Planta , Genómica , Familia de Multigenes , Programas InformáticosRESUMEN
Clustering is a popular technique commonly used to search for groups of similarly expressed genes using mRNA expression data. There are many different clustering algorithms and the application of each one will usually produce different results. Without additional evaluation, it is difficult to determine which solutions are better.In this chapter we discuss methods to assess algorithms for clustering of gene expression data. In particular, we present a new method that uses two elements: an internal index of validity based on the MDL principle and an external index of validity that measures the consistency with experimental data. Each one is used to suggest an effective set of models, but it is only the combination of both that is capable of pinpointing the best model overall. Our method can be used to compare different clustering algorithms and pick the one that maximizes the correlation with functional links in gene networks while minimizing the error rate. We test our methods on several popular clustering algorithms as well as on clustering algorithms that are specially tailored to deal with noisy data. Finally, we propose methods for assessing the significance of individual clusters and study the correspondence between gene clusters and biochemical pathways.
Asunto(s)
Algoritmos , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Familia de Multigenes , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Animales , Teorema de Bayes , Análisis por Conglomerados , Bases de Datos Genéticas , Humanos , Familia de Multigenes/fisiología , Reproducibilidad de los ResultadosRESUMEN
Common single-nucleotide polymorphisms (SNPs) at nicotinic acetylcholine receptor (nAChR) subunit genes have previously been associated with measures of nicotine dependence. We investigated the contribution of common SNPs and rare single-nucleotide variants (SNVs) in nAChR genes to Fagerström test for nicotine dependence (FTND) scores in treatment-seeking smokers. Exons of 10 genes were resequenced with next-generation sequencing technology in 448 European-American participants of a smoking cessation trial, and CHRNB2 and CHRNA4 were resequenced by Sanger technology to improve sequence coverage. A total of 214 SNP/SNVs were identified, of which 19.2% were excluded from analyses because of reduced completion rate, 73.9% had minor allele frequencies <5%, and 48.1% were novel relative to dbSNP build 129. We tested associations of 173 SNP/SNVs with the FTND score using data obtained from 430 individuals (18 were excluded because of reduced completion rate) using linear regression for common, the cohort allelic sum test and the weighted sum statistic for rare, and the multivariate distance matrix regression method for both common and rare SNP/SNVs. Association testing with common SNPs with adjustment for correlated tests within each gene identified a significant association with two CHRNB2 SNPs, eg, the minor allele of rs2072660 increased the mean FTND score by 0.6 Units (P=0.01). We observed a significant evidence for association with the FTND score of common and rare SNP/SNVs at CHRNA5 and CHRNB2, and of rare SNVs at CHRNA4. Both common and/or rare SNP/SNVs from multiple nAChR subunit genes are associated with the FTND score in this sample of treatment-seeking smokers.
Asunto(s)
Estudios de Asociación Genética/métodos , Polimorfismo de Nucleótido Simple/genética , Receptores Nicotínicos/genética , Tabaquismo/genética , Alelos , Femenino , Predisposición Genética a la Enfermedad , Genotipo , Humanos , Masculino , Persona de Mediana Edad , Ensayos Clínicos Controlados Aleatorios como Asunto , Población Blanca/genéticaRESUMEN
It is commonly accepted that genes with similar expression profiles are functionally related. However, there are many ways one can measure the similarity of expression profiles, and it is not clear a priori what is the most effective one. Moreover, so far no clear distinction has been made as for the type of the functional link between genes as suggested by microarray data. Similarly expressed genes can be part of the same complex as interacting partners; they can participate in the same pathway without interacting directly; they can perform similar functions; or they can simply have similar regulatory sequences. Here we conduct a study of the notion of functional link as implied from expression data. We analyze different similarity measures of gene expression profiles and assess their usefulness and robustness in detecting biological relationships by comparing the similarity scores with results obtained from databases of interacting proteins, promoter signals and cellular pathways, as well as through sequence comparisons. We also introduce variations on similarity measures that are based on statistical analysis and better discriminate genes which are functionally nearby and faraway. Our tools can be used to assess other similarity measures for expression profiles, and are accessible at biozon.org/tools/expression/
Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica , Regulación Fúngica de la Expresión Génica , Saccharomyces cerevisiae/genética , Algoritmos , Análisis por Conglomerados , Regulación de la Expresión Génica , Genes Fúngicos , Mutación , Sistemas de Lectura Abierta , Reconocimiento de Normas Patrones Automatizadas , Alineación de Secuencia , Análisis de Secuencia de ADNRESUMEN
In stark contrast to the rapid morphological radiation of eumetazoans during the Cambrian explosion, the simple body plan of sponges (Phylum Porifera) emerged from the Cambrian relatively unchanged. Although the genetic and developmental underpinnings of these disparate evolutionary outcomes are unknown, comparisons between modern sponges and eumetazoans promise to reveal the extent to which critical genetic factors were present in their common ancestors. Two particularly interesting classes of genes in this respect are those involved in cell signaling and adhesion. These genes help guide development and morphogenesis in modern eumetazoans, but the timing and sequence of their origins is unknown. Here, we demonstrate that the sponge Oscarella carmela, one of the earliest branching animals, expresses core components of the Wnt, transforming growth factor beta, receptor tyrosine kinase, Notch, Hedgehog, and Jak/Stat signaling pathways. Furthermore, we identify sponge homologs of nearly every major eumetazoan cell-adhesion gene family, including those that encode cell-surface receptors, cytoplasmic linkers, and extracellular-matrix proteins. From these data, we infer that key signaling and adhesion genes were in place early in animal evolution, before the divergence of sponge and eumetazoan lineages.