Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Genome Med ; 11(1): 65, 2019 10 25.
Artículo en Inglés | MEDLINE | ID: mdl-31653223

RESUMEN

BACKGROUND: Neurodevelopmental disorders (NDDs) such as autism spectrum disorder, intellectual disability, developmental disability, and epilepsy are characterized by abnormal brain development that may affect cognition, learning, behavior, and motor skills. High co-occurrence (comorbidity) of NDDs indicates a shared, underlying biological mechanism. The genetic heterogeneity and overlap observed in NDDs make it difficult to identify the genetic causes of specific clinical symptoms, such as seizures. METHODS: We present a computational method, MAGI-S, to discover modules or groups of highly connected genes that together potentially perform a similar biological function. MAGI-S integrates protein-protein interaction and co-expression networks to form modules centered around the selection of a single "seed" gene, yielding modules consisting of genes that are highly co-expressed with the seed gene. We aim to dissect the epilepsy phenotype from a general NDD phenotype by providing MAGI-S with high confidence NDD seed genes with varying degrees of association with epilepsy, and we assess the enrichment of de novo mutation, NDD-associated genes, and relevant biological function of constructed modules. RESULTS: The newly identified modules account for the increased rate of de novo non-synonymous mutations in autism, intellectual disability, developmental disability, and epilepsy, and enrichment of copy number variations (CNVs) in developmental disability. We also observed that modules seeded with genes strongly associated with epilepsy tend to have a higher association with epilepsy phenotypes than modules seeded at other neurodevelopmental disorder genes. Modules seeded with genes strongly associated with epilepsy (e.g., SCN1A, GABRA1, and KCNB1) are significantly associated with synaptic transmission, long-term potentiation, and calcium signaling pathways. On the other hand, modules found with seed genes that are not associated or weakly associated with epilepsy are mostly involved with RNA regulation and chromatin remodeling. CONCLUSIONS: In summary, our method identifies modules enriched with de novo non-synonymous mutations and can capture specific networks that underlie the epilepsy phenotype and display distinct enrichment in relevant biological processes. MAGI-S is available at https://github.com/jchow32/magi-s .


Asunto(s)
Epilepsia/genética , Redes Reguladoras de Genes , Heterogeneidad Genética , Trastornos del Neurodesarrollo/genética , Comorbilidad , Bases de Datos Factuales , Epilepsia/epidemiología , Humanos , Mutación , Trastornos del Neurodesarrollo/epidemiología , Fenotipo , Pronóstico
2.
Nature ; 573(7772): 61-68, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31435019

RESUMEN

Elucidating the cellular architecture of the human cerebral cortex is central to understanding our cognitive abilities and susceptibility to disease. Here we used single-nucleus RNA-sequencing analysis to perform a comprehensive study of cell types in the middle temporal gyrus of human cortex. We identified a highly diverse set of excitatory and inhibitory neuron types that are mostly sparse, with excitatory types being less layer-restricted than expected. Comparison to similar mouse cortex single-cell RNA-sequencing datasets revealed a surprisingly well-conserved cellular architecture that enables matching of homologous types and predictions of properties of human cell types. Despite this general conservation, we also found extensive differences between homologous human and mouse cell types, including marked alterations in proportions, laminar distributions, gene expression and morphology. These species-specific features emphasize the importance of directly studying human brain.


Asunto(s)
Astrocitos/clasificación , Evolución Biológica , Corteza Cerebral/citología , Corteza Cerebral/metabolismo , Neuronas/clasificación , Adolescente , Adulto , Anciano , Animales , Astrocitos/citología , Femenino , Humanos , Masculino , Ratones , Persona de Mediana Edad , Inhibición Neural , Neuronas/citología , Análisis de Componente Principal , RNA-Seq , Análisis de la Célula Individual , Especificidad de la Especie , Transcriptoma/genética , Adulto Joven
3.
Nature ; 563(7729): 72-78, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30382198

RESUMEN

The neocortex contains a multitude of cell types that are segregated into layers and functionally distinct areas. To investigate the diversity of cell types across the mouse neocortex, here we analysed 23,822 cells from two areas at distant poles of the mouse neocortex: the primary visual cortex and the anterior lateral motor cortex. We define 133 transcriptomic cell types by deep, single-cell RNA sequencing. Nearly all types of GABA (γ-aminobutyric acid)-containing neurons are shared across both areas, whereas most types of glutamatergic neurons were found in one of the two areas. By combining single-cell RNA sequencing and retrograde labelling, we match transcriptomic types of glutamatergic neurons to their long-range projection specificity. Our study establishes a combined transcriptomic and projectional taxonomy of cortical cell types from functionally distinct areas of the adult mouse cortex.


Asunto(s)
Perfilación de la Expresión Génica , Neocórtex/citología , Neocórtex/metabolismo , Animales , Biomarcadores/análisis , Femenino , Neuronas GABAérgicas/metabolismo , Ácido Glutámico/metabolismo , Masculino , Ratones , Corteza Motora/anatomía & histología , Corteza Motora/citología , Corteza Motora/metabolismo , Neocórtex/anatomía & histología , Especificidad de Órganos , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Corteza Visual/anatomía & histología , Corteza Visual/citología , Corteza Visual/metabolismo
4.
Genome Res ; 28(10): 1566-1576, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-30228200

RESUMEN

Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental duplication. We developed an approach combining long-read sequencing and hybridization capture to yield full-length transcript information and confidently distinguish between nearly identical genes/paralogs. We used biotinylated probes to enrich for full-length cDNA from duplicated regions, which were then amplified, size-fractionated, and sequenced using single-molecule, long-read sequencing technology, permitting us to distinguish between highly identical genes by virtue of multiple paralogous sequence variants. We examined 19 gene families as expressed in developing and adult human brain, selected for their high sequence identity (average >99%) and overlap with human-specific segmental duplications (SDs). We characterized the transcriptional differences between related paralogs to better understand the birth-death process of duplicate genes and particularly how the process leads to gene innovation. In 48% of the cases, we find that the expressed duplicates have changed substantially from their ancestral models due to novel sites of transcription initiation, splicing, and polyadenylation, as well as fusion transcripts that connect duplication-derived exons with neighboring genes. We detect unannotated open reading frames in genes currently annotated as pseudogenes, while relegating other duplicates to nonfunctional status. Our method significantly improves gene annotation, specifically defining full-length transcripts, isoforms, and open reading frames for new genes in highly identical SDs. The approach will be more broadly applicable to genes in structurally complex regions of other genomes where the duplication process creates novel genes important for adaptive traits.


Asunto(s)
Encéfalo/metabolismo , Duplicaciones Segmentarias en el Genoma , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN/métodos , Evolución Molecular , Duplicación de Gen , Perfilación de la Expresión Génica , Humanos , Anotación de Secuencia Molecular , Familia de Multigenes , Sistemas de Lectura Abierta , Seudogenes
5.
Front Mol Biosci ; 5: 84, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30255025

RESUMEN

[This corrects the article DOI: 10.3389/fmolb.2018.00009.].

6.
Front Mol Biosci ; 5: 9, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29552563

RESUMEN

The 25 human bitter taste receptors (hT2Rs) recognize thousands of structurally and chemically diverse bitter substances. The binding modes of human bitter taste receptors hT2R10 and hT2R46, which are responsible for strychnine recognition, were previously established using site-directed mutagenesis, functional assays, and molecular modeling. Here we construct a phylogenetic tree and reconstruct ancestral sequences of the T2R10 and T2R46 clades. We next analyze the binding sites in view of experimental data to predict their ability to recognize strychnine. This analysis suggests that the common ancestor of hT2R10 and hT2R46 is unlikely to bind strychnine in the same mode as either of its two descendants. Estimation of relative divergence times shows that hT2R10 evolved earlier than hT2R46. Strychnine recognition was likely acquired first by the earliest common ancestor of the T2R10 clade before the separation of primates from other mammals, and was highly conserved within the clade. It was probably independently acquired by the common ancestor of T2R43-47 before the homo-ape speciation, lost in most T2Rs within this clade, but enhanced in the hT2R46 after humans diverged from the rest of primates. Our findings suggest hypothetical strychnine T2R receptors in several species, and serve as an experimental guide for further study. Improved understanding of how bitter taste receptors acquire the ability to be activated by particular ligands is valuable for the development of sensors for bitterness and for potential toxicity.

7.
Nat Ecol Evol ; 1(3): 69, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28580430

RESUMEN

Segmental duplications contribute to human evolution, adaptation and genomic instability but are often poorly characterized. We investigate the evolution, genetic variation and coding potential of human-specific segmental duplications (HSDs). We identify 218 HSDs based on analysis of 322 deeply sequenced archaic and contemporary hominid genomes. We sequence 550 human and nonhuman primate genomic clones to reconstruct the evolution of the largest, most complex regions with protein-coding potential (n=80 genes/33 gene families). We show that HSDs are non-randomly organized, associate preferentially with ancestral ape duplications termed "core duplicons", and evolved primarily in an interspersed inverted orientation. In addition to Homo sapiens-specific gene expansions (e.g., TCAF1/2), we highlight ten gene families (e.g., ARHGAP11B and SRGAP2C) where copy number never returns to the ancestral state, there is evidence of mRNA splicing, and no common gene-disruptive mutations are observed in the general population. Such duplicates are candidates for the evolution of human-specific adaptive traits.

8.
Genome Biol ; 18(1): 49, 2017 03 09.
Artículo en Inglés | MEDLINE | ID: mdl-28279197

RESUMEN

BACKGROUND: Gene innovation by duplication is a fundamental evolutionary process but is difficult to study in humans due to the large size, high sequence identity, and mosaic nature of segmental duplication blocks. The human-specific gene hydrocephalus-inducing 2, HYDIN2, was generated by a 364 kbp duplication of 79 internal exons of the large ciliary gene HYDIN from chromosome 16q22.2 to chromosome 1q21.1. Because the HYDIN2 locus lacks the ancestral promoter and seven terminal exons of the progenitor gene, we sought to characterize transcription at this locus by coupling reverse transcription polymerase chain reaction and long-read sequencing. RESULTS: 5' RACE indicates a transcription start site for HYDIN2 outside of the duplication and we observe fusion transcripts spanning both the 5' and 3' breakpoints. We observe extensive splicing diversity leading to the formation of altered open reading frames (ORFs) that appear to be under relaxed selection. We show that HYDIN2 adopted a new promoter that drives an altered pattern of expression, with highest levels in neural tissues. We estimate that the HYDIN duplication occurred ~3.2 million years ago and find that it is nearly fixed (99.9%) for diploid copy number in contemporary humans. Examination of 73 chromosome 1q21 rearrangement patients reveals that HYDIN2 is deleted or duplicated in most cases. CONCLUSIONS: Together, these data support a model of rapid gene innovation by fusion of incomplete segmental duplications, altered tissue expression, and potential subfunctionalization or neofunctionalization of HYDIN2 early in the evolution of the Homo lineage.


Asunto(s)
Duplicación de Gen , Fusión Génica , Neuronas/metabolismo , Aberraciones Cromosómicas , Puntos de Rotura del Cromosoma , Trastornos de los Cromosomas/genética , Cromosomas Humanos Par 1 , Variaciones en el Número de Copia de ADN , Evolución Molecular , Conversión Génica , Perfilación de la Expresión Génica , Variación Genética , Genética de Población , Genómica/métodos , Humanos , Sistemas de Lectura Abierta , Especificidad de Órganos/genética , Fenotipo , Selección Genética , Transcripción Genética
9.
Nature ; 536(7615): 205-9, 2016 08 11.
Artículo en Inglés | MEDLINE | ID: mdl-27487209

RESUMEN

Genetic differences that specify unique aspects of human evolution have typically been identified by comparative analyses between the genomes of humans and closely related primates, including more recently the genomes of archaic hominins. Not all regions of the genome, however, are equally amenable to such study. Recurrent copy number variation (CNV) at chromosome 16p11.2 accounts for approximately 1% of cases of autism and is mediated by a complex set of segmental duplications, many of which arose recently during human evolution. Here we reconstruct the evolutionary history of the locus and identify bolA family member 2 (BOLA2) as a gene duplicated exclusively in Homo sapiens. We estimate that a 95-kilobase-pair segment containing BOLA2 duplicated across the critical region approximately 282 thousand years ago (ka), one of the latest among a series of genomic changes that dramatically restructured the locus during hominid evolution. All humans examined carried one or more copies of the duplication, which nearly fixed early in the human lineage--a pattern unlikely to have arisen so rapidly in the absence of selection (P < 0.0097). We show that the duplication of BOLA2 led to a novel, human-specific in-frame fusion transcript and that BOLA2 copy number correlates with both RNA expression (r = 0.36) and protein level (r = 0.65), with the greatest expression difference between human and chimpanzee in experimentally derived stem cells. Analyses of 152 patients carrying a chromosome 16p11. rearrangement show that more than 96% of breakpoints occur within the H. sapiens-specific duplication. In summary, the duplicative transposition of BOLA2 at the root of the H. sapiens lineage about 282 ka simultaneously increased copy number of a gene associated with iron homeostasis and predisposed our species to recurrent rearrangements associated with disease.


Asunto(s)
Cromosomas Humanos Par 16/genética , Variaciones en el Número de Copia de ADN/genética , Evolución Molecular , Predisposición Genética a la Enfermedad , Proteínas/genética , Animales , Trastorno Autístico/genética , Rotura Cromosómica , Duplicación de Gen , Homeostasis/genética , Humanos , Hierro/metabolismo , Pan troglodytes/genética , Pongo/genética , Proteínas/análisis , Recombinación Genética , Especificidad de la Especie , Factores de Tiempo
10.
Am J Hum Genet ; 98(3): 541-552, 2016 Mar 03.
Artículo en Inglés | MEDLINE | ID: mdl-26942287

RESUMEN

Intellectual disability (ID) and autism spectrum disorders (ASD) are genetically heterogeneous, and a significant number of genes have been associated with both conditions. A few mutations in POGZ have been reported in recent exome studies; however, these studies do not provide detailed clinical information. We collected the clinical and molecular data of 25 individuals with disruptive mutations in POGZ by diagnostic whole-exome, whole-genome, or targeted sequencing of 5,223 individuals with neurodevelopmental disorders (ID primarily) or by targeted resequencing of this locus in 12,041 individuals with ASD and/or ID. The rarity of disruptive mutations among unaffected individuals (2/49,401) highlights the significance (p = 4.19 × 10(-13); odds ratio = 35.8) and penetrance (65.9%) of this genetic subtype with respect to ASD and ID. By studying the entire cohort, we defined common phenotypic features of POGZ individuals, including variable levels of developmental delay (DD) and more severe speech and language delay in comparison to the severity of motor delay and coordination issues. We also identified significant associations with vision problems, microcephaly, hyperactivity, a tendency to obesity, and feeding difficulties. Some features might be explained by the high expression of POGZ, particularly in the cerebellum and pituitary, early in fetal brain development. We conducted parallel studies in Drosophila by inducing conditional knockdown of the POGZ ortholog row, further confirming that dosage of POGZ, specifically in neurons, is essential for normal learning in a habituation paradigm. Combined, the data underscore the pathogenicity of loss-of-function mutations in POGZ and define a POGZ-related phenotype enriched in specific features.


Asunto(s)
Trastorno del Espectro Autista/genética , Discapacidad Intelectual/genética , Transposasas/genética , Adolescente , Adulto , Animales , Trastorno del Espectro Autista/diagnóstico , Niño , Preescolar , Estudios de Cohortes , Regulación hacia Abajo , Drosophila/genética , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Exoma , Femenino , Técnicas de Silenciamiento del Gen , Estudio de Asociación del Genoma Completo , Humanos , Lactante , Discapacidad Intelectual/diagnóstico , Trastornos del Desarrollo del Lenguaje/diagnóstico , Trastornos del Desarrollo del Lenguaje/genética , Modelos Lineales , Masculino , Microcefalia/diagnóstico , Microcefalia/genética , Mutación , Fenotipo , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
11.
Protein Eng Des Sel ; 28(11): 507-18, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26275856

RESUMEN

Ancestral reconstruction is a powerful tool for studying protein evolution as well as for protein design and engineering. However, in many positions alternative predictions with relatively high marginal probabilities exist, and thus the prediction comprises an ensemble of near-ancestor sequences that relate to the historical ancestor. The ancestral phenotype should therefore be explored for the entire ensemble, rather than for the sequence comprising the most probable amino acid at all positions [the most probable ancestor (mpa)]. To this end, we constructed libraries that sample ensembles of near-ancestor sequences. Specifically, we identified positions where alternatively predicted amino acids are likely to affect the ancestor's structure and/or function. Using the serum paraoxonases (PONs) enzyme family as a test case, we constructed libraries that combinatorially sample these alternatives. We next characterized these libraries, reflecting the vertebrate and mammalian PON ancestors. We found that the mpa of vertebrate PONs represented only one out of many different enzymatic phenotypes displayed by its ensemble. The mammalian ancestral library, however, exhibited a homogeneous phenotype that was well represented by the mpa. Our library design strategy that samples near-ancestor ensembles at potentially critical positions therefore provides a systematic way of examining the robustness of inferred ancestral phenotypes.


Asunto(s)
Biblioteca de Genes , Modelos Moleculares , Filogenia , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Animales , Hidrolasas de Éster Carboxílico/química , Hidrolasas de Éster Carboxílico/genética , Humanos , Mamíferos/genética , Fenotipo
12.
Genome Res ; 25(1): 142-54, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25378250

RESUMEN

Despite considerable genetic heterogeneity underlying neurodevelopmental diseases, there is compelling evidence that many disease genes will map to a much smaller number of biological subnetworks. We developed a computational method, termed MAGI (merging affected genes into integrated networks), that simultaneously integrates protein-protein interactions and RNA-seq expression profiles during brain development to discover "modules" enriched for de novo mutations in probands. We applied this method to recent exome sequencing of 1116 patients with autism and intellectual disability, discovering two distinct modules that differ in their properties and associated phenotypes. The first module consists of 80 genes associated with Wnt, Notch, SWI/SNF, and NCOR complexes and shows the highest expression early during embryonic development (8-16 post-conception weeks [pcw]). The second module consists of 24 genes associated with synaptic function, including long-term potentiation and calcium signaling with higher levels of postnatal expression. Patients with de novo mutations in these modules are more significantly intellectually impaired and carry more severe missense mutations when compared to probands with de novo mutations outside of these modules. We used our approach to define subsets of the network associated with higher functioning autism as well as greater severity with respect to IQ. Finally, we applied MAGI independently to epilepsy and schizophrenia exome sequencing cohorts and found significant overlap as well as expansion of these modules, suggesting a core set of integrated neurodevelopmental networks common to seemingly diverse human diseases.


Asunto(s)
Trastorno Autístico/diagnóstico , Trastorno Autístico/genética , Redes Reguladoras de Genes , Algoritmos , Análisis por Conglomerados , Estudios de Cohortes , Bases de Datos Factuales , Epilepsia/diagnóstico , Epilepsia/genética , Exoma , Heterogeneidad Genética , Humanos , Mutación Missense , Fenotipo , Esquizofrenia/diagnóstico , Esquizofrenia/genética , Análisis de Secuencia de ARN
13.
Cell ; 158(2): 263-276, 2014 Jul 17.
Artículo en Inglés | MEDLINE | ID: mdl-24998929

RESUMEN

Autism spectrum disorder (ASD) is a heterogeneous disease in which efforts to define subtypes behaviorally have met with limited success. Hypothesizing that genetically based subtype identification may prove more productive, we resequenced the ASD-associated gene CHD8 in 3,730 children with developmental delay or ASD. We identified a total of 15 independent mutations; no truncating events were identified in 8,792 controls, including 2,289 unaffected siblings. In addition to a high likelihood of an ASD diagnosis among patients bearing CHD8 mutations, characteristics enriched in this group included macrocephaly, distinct faces, and gastrointestinal complaints. chd8 disruption in zebrafish recapitulates features of the human phenotype, including increased head size as a result of expansion of the forebrain/midbrain and impairment of gastrointestinal motility due to a reduction in postmitotic enteric neurons. Our findings indicate that CHD8 disruptions define a distinct ASD subtype and reveal unexpected comorbidities between brain development and enteric innervation.


Asunto(s)
Trastornos Generalizados del Desarrollo Infantil/genética , Trastornos Generalizados del Desarrollo Infantil/fisiopatología , Proteínas de Unión al ADN/genética , Factores de Transcripción/genética , Adolescente , Secuencia de Aminoácidos , Animales , Encéfalo/crecimiento & desarrollo , Encéfalo/patología , Niño , Trastornos Generalizados del Desarrollo Infantil/clasificación , Trastornos Generalizados del Desarrollo Infantil/patología , Preescolar , Proteínas de Unión al ADN/metabolismo , Femenino , Tracto Gastrointestinal/inervación , Tracto Gastrointestinal/fisiopatología , Humanos , Macaca mulatta , Masculino , Megalencefalia/patología , Datos de Secuencia Molecular , Mutación , Alineación de Secuencia , Factores de Transcripción/metabolismo , Pez Cebra , Proteínas de Pez Cebra/genética , Proteínas de Pez Cebra/metabolismo
14.
PLoS One ; 7(8): e41469, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22870226

RESUMEN

BACKGROUND: Polyclonal serum consists of vast collections of antibodies, products of differentiated B-cells. The spectrum of antibody specificities is dynamic and varies with age, physiology, and exposure to pathological insults. The complete repertoire of antibody specificities in blood, the IgOme, is therefore an extraordinarily rich source of information-a molecular record of previous encounters as well as a status report of current immune activity. The ability to profile antibody specificities of polyclonal serum at exceptionally high resolution has been an important and serious challenge which can now be overcome. METHODOLOGY/PRINCIPAL FINDINGS: Here we illustrate the application of Deep Panning, a method that combines the flexibility of combinatorial phage display of random peptides with the power of high-throughput deep sequencing. Deep Panning is first applied to evaluate the quality and diversity of naïve random peptide libraries. The production of very large data sets, hundreds of thousands of peptides, has revealed unexpected properties of combinatorial random peptide libraries and indicates correctives to ensure the quality of the libraries generated. Next, Deep Panning is used to analyze a model monoclonal antibody in addition to allowing one to follow the dynamics of biopanning and peptide selection. Finally Deep Panning is applied to profile polyclonal sera derived from HIV infected individuals. CONCLUSIONS/SIGNIFICANCE: The ability to generate and characterize hundreds of thousands of affinity-selected peptides creates an effective means towards the interrogation of the IgOme and understanding of the humoral response to disease. Deep Panning should open the door to new possibilities for serological diagnostics, vaccine design and the discovery of the correlates of immunity to emerging infectious agents.


Asunto(s)
Anticuerpos Monoclonales/química , Especificidad de Anticuerpos , Biblioteca de Péptidos , Anticuerpos Monoclonales/genética , Anticuerpos Monoclonales/inmunología , Linfocitos B/química , Linfocitos B/inmunología , Humanos
15.
Nucleic Acids Res ; 40(Web Server issue): W580-4, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22661579

RESUMEN

Ancestral sequence reconstruction is essential to a variety of evolutionary studies. Here, we present the FastML web server, a user-friendly tool for the reconstruction of ancestral sequences. FastML implements various novel features that differentiate it from existing tools: (i) FastML uses an indel-coding method, in which each gap, possibly spanning multiples sites, is coded as binary data. FastML then reconstructs ancestral indel states assuming a continuous time Markov process. FastML provides the most likely ancestral sequences, integrating both indels and characters; (ii) FastML accounts for uncertainty in ancestral states: it provides not only the posterior probabilities for each character and indel at each sequence position, but also a sample of ancestral sequences from this posterior distribution, and a list of the k-most likely ancestral sequences; (iii) FastML implements a large array of evolutionary models, which makes it generic and applicable for nucleotide, protein and codon sequences; and (iv) a graphical representation of the results is provided, including, for example, a graphical logo of the inferred ancestral sequences. The utility of FastML is demonstrated by reconstructing ancestral sequences of the Env protein from various HIV-1 subtypes. FastML is freely available for all academic users and is available online at http://fastml.tau.ac.il/.


Asunto(s)
Filogenia , Programas Informáticos , Gráficos por Computador , Mutación INDEL , Internet , Probabilidad , Alineación de Secuencia , Productos del Gen env del Virus de la Inmunodeficiencia Humana/genética
16.
Mol Biol Evol ; 29(1): 1-5, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21772063

RESUMEN

Errors in the inferred multiple sequence alignment may lead to false prediction of positive selection. Recently, methods for detecting unreliable alignment regions were developed and were shown to accurately identify incorrectly aligned regions. While removing unreliable alignment regions is expected to increase the accuracy of positive selection inference, such filtering may also significantly decrease the power of the test, as positively selected regions are fast evolving, and those same regions are often those that are difficult to align. Here, we used realistic simulations that mimic sequence evolution of HIV-1 genes to test the hypothesis that the performance of positive selection inference using codon models can be improved by removing unreliable alignment regions. Our study shows that the benefit of removing unreliable regions exceeds the loss of power due to the removal of some of the true positively selected sites.


Asunto(s)
Modelos Genéticos , Alineación de Secuencia/métodos , Alineación de Secuencia/normas , Simulación por Computador , Bases de Datos Genéticas , Evolución Molecular , Genes Virales , VIH-1/genética , Filogenia , Selección Genética
17.
Genome Res ; 22(1): 35-50, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21974994

RESUMEN

Exon-intron architecture is one of the major features directing the splicing machinery to the short exons that are located within long flanking introns. However, the evolutionary dynamics of exon-intron architecture and its impact on splicing is largely unknown. Using a comparative genomic approach, we analyzed 17 vertebrate genomes and reconstructed the ancestral motifs of both 3' and 5' splice sites, as also the ancestral length of exons and introns. Our analyses suggest that vertebrate introns increased in length from the shortest ancestral introns to the longest primate introns. An evolutionary analysis of splice sites revealed that weak splice sites act as a restrictive force keeping introns short. In contrast, strong splice sites allow recognition of exons flanked by long introns. Reconstruction of the ancestral state suggests these phenomena were not prevalent in the vertebrate ancestor, but appeared during vertebrate evolution. By calculating evolutionary rate shifts in exons, we identified cis-acting regulatory sequences that became fixed during the transition from early vertebrates to mammals. Experimental validations performed on a selection of these hexamers confirmed their regulatory function. We additionally revealed many features of exons that can discriminate alternative from constitutive exons. These features were integrated into a machine-learning approach to predict whether an exon is alternative. Our algorithm obtains very high predictive power (AUC of 0.91), and using these predictions we have identified and successfully validated novel alternatively spliced exons. Overall, we provide novel insights regarding the evolutionary constraints acting upon exons and their recognition by the splicing machinery.


Asunto(s)
Evolución Molecular , Exones/fisiología , Genoma/fisiología , Intrones/fisiología , Sitios de Empalme de ARN/genética , Empalme del ARN/genética , Vertebrados/genética , Animales , Modelos Genéticos
18.
Syst Biol ; 59(2): 212-25, 2010 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-20525631

RESUMEN

Thymidylate synthases (Thy) are key enzymes in the synthesis of deoxythymidylate, 1 of the 4 building blocks of DNA. As such, they are essential for all DNA-based forms of life and therefore implicated in the hypothesized transition from RNA genomes to DNA genomes. Two evolutionally unrelated Thy enzymes, ThyA and ThyX, are known to catalyze the same biochemical reaction. Both enzymes are sporadically distributed within each of the 3 domains of life in a pattern that suggests multiple nonhomologous lateral gene transfer (LGT) events. We present a phylogenetic analysis of the evolution of the 2 enzymes, aimed at unraveling their entangled evolutionary history and tracing their origin back to early life. A novel probabilistic evolutionary model was developed, which allowed us to compute the posterior probabilities and the posterior expectation of the number of LGT events. Simulation studies were performed to validate the model's ability to accurately detect LGT events, which have occurred throughout a large phylogeny. Applying the model to the Thy data revealed widespread nonhomologous LGT between and within all 3 domains of life. By reconstructing the ThyA and ThyX gene trees, the most likely donor of each LGT event was inferred. The role of viruses in LGT of Thy is finally discussed.


Asunto(s)
Evolución Molecular , Transferencia de Gen Horizontal , Modelos Genéticos , Filogenia , Timidilato Sintasa/genética , Composición de Base , Secuencia de Bases , Clasificación/métodos , Biología Computacional , Simulación por Computador , Funciones de Verosimilitud
19.
Nucleic Acids Res ; 38(Web Server issue): W23-8, 2010 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-20497997

RESUMEN

Evaluating the accuracy of multiple sequence alignment (MSA) is critical for virtually every comparative sequence analysis that uses an MSA as input. Here we present the GUIDANCE web-server, a user-friendly, open access tool for the identification of unreliable alignment regions. The web-server accepts as input a set of unaligned sequences. The server aligns the sequences and provides a simple graphic visualization of the confidence score of each column, residue and sequence of an alignment, using a color-coding scheme. The method is generic and the user is allowed to choose the alignment algorithm (ClustalW, MAFFT and PRANK are supported) as well as any type of molecular sequences (nucleotide, protein or codon sequences). The server implements two different algorithms for evaluating confidence scores: (i) the heads-or-tails (HoT) method, which measures alignment uncertainty due to co-optimal solutions; (ii) the GUIDANCE method, which measures the robustness of the alignment to guide-tree uncertainty. The server projects the confidence scores onto the MSA and points to columns and sequences that are unreliably aligned. These can be automatically removed in preparation for downstream analyses. GUIDANCE is freely available for use at http://guidance.tau.ac.il.


Asunto(s)
Alineación de Secuencia/métodos , Programas Informáticos , Proteínas del Virus de la Inmunodeficiencia Humana/química , Internet , Análisis de Secuencia de Proteína , Proteínas Reguladoras y Accesorias Virales/química
20.
Mol Biol Evol ; 27(8): 1759-67, 2010 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-20207713

RESUMEN

Multiple sequence alignment (MSA) is the basis for a wide range of comparative sequence analyses from molecular phylogenetics to 3D structure prediction. Sophisticated algorithms have been developed for sequence alignment, but in practice, many errors can be expected and extensive portions of the MSA are unreliable. Hence, it is imperative to understand and characterize the various sources of errors in MSAs and to quantify site-specific alignment confidence. In this paper, we show that uncertainties in the guide tree used by progressive alignment methods are a major source of alignment uncertainty. We use this insight to develop a novel method for quantifying the robustness of each alignment column to guide tree uncertainty. We build on the widely used bootstrap method for perturbing the phylogenetic tree. Specifically, we generate a collection of trees and use each as a guide tree in the alignment algorithm, thus producing a set of MSAs. We next test the consistency of every column of the MSA obtained from the unperturbed guide tree with respect to the set of MSAs. We name this measure the "GUIDe tree based AligNment ConfidencE" (GUIDANCE) score. Using the Benchmark Alignment data BASE benchmark as well as simulation studies, we show that GUIDANCE scores accurately identify errors in MSAs. Additionally, we compare our results with the previously published Heads-or-Tails score and show that the GUIDANCE score is a better predictor of unreliably aligned regions.


Asunto(s)
Algoritmos , Secuencia de Aminoácidos , Secuencia de Bases , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Animales , Simulación por Computador , Bases de Datos Factuales , Drosophila melanogaster/genética , Datos de Secuencia Molecular , Filogenia , Curva ROC , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA