Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
1.
Bioessays ; : e2400146, 2024 Nov 03.
Artículo en Inglés | MEDLINE | ID: mdl-39491810

RESUMEN

The genome sequencing revolution has revealed that all species possess a large number of unique genes critical for trait variation, adaptation, and evolutionary innovation. One widely used approach to identify such genes consists of detecting protein-coding sequences with no homology in other genomes, termed orphan genes. These genes have been extensively studied, under the assumption that they represent valid proxies for species-specific genes. Here, we critically evaluate taxonomic, phylogenetic, and sequence evolution evidence showing that orphan genes belong to a range of evolutionary ages and thus cannot be assigned to a single lineage. Furthermore, we show that the processes generating orphan genes are substantially more diverse than generally thought and include horizontal gene transfer, transposable element domestication, and overprinting. Thus, orphan genes represent a heterogeneous collection of genes rather than a single biological entity, making them unsuitable as a subject for meaningful investigation of gene evolution and phenotypic innovation.

2.
PLoS Genet ; 16(10): e1009076, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-33048946

RESUMEN

Despite the fundamental role of centromeres two different types are observed across plants and animals. Monocentric chromosomes possess a single region that function as the centromere while in holocentric chromosomes centromere activity is spread across the entire chromosome. Proper segregation may fail in species with monocentric chromosomes after a fusion or fission, which may lead to chromosomes with no centromere or multiple centromeres. In contrast, species with holocentric chromosomes should still be able to safely segregate chromosomes after fusion or fission. This along with the observation of high chromosome number in some holocentric clades has led to the hypothesis that holocentricity leads to higher rates of chromosome number evolution. To test for differences in rates of chromosome number evolution between these systems, we analyzed data from 4,393 species of insects in a phylogenetic framework. We found that insect orders exhibit striking differences in rates of fissions, fusions, and polyploidy. However, across all insects we found no evidence that holocentric clades have higher rates of fissions, fusions, or polyploidy than monocentric clades. Our results suggest that holocentricity alone does not lead to higher rates of chromosome number changes. Instead, we suggest that other co-evolving traits must explain striking differences between clades.


Asunto(s)
Centrómero/genética , Segregación Cromosómica/genética , Cromosomas de Insectos/genética , Evolución Molecular , Animales , Cromosomas de Insectos/clasificación , Insectos/genética , Cariotipo , Filogenia , Poliploidía
3.
Nature ; 513(7517): 195-201, 2014 Sep 11.
Artículo en Inglés | MEDLINE | ID: mdl-25209798

RESUMEN

Gibbons are small arboreal apes that display an accelerated rate of evolutionary chromosomal rearrangement and occupy a key node in the primate phylogeny between Old World monkeys and great apes. Here we present the assembly and analysis of a northern white-cheeked gibbon (Nomascus leucogenys) genome. We describe the propensity for a gibbon-specific retrotransposon (LAVA) to insert into chromosome segregation genes and alter transcription by providing a premature termination site, suggesting a possible molecular mechanism for the genome plasticity of the gibbon lineage. We further show that the gibbon genera (Nomascus, Hylobates, Hoolock and Symphalangus) experienced a near-instantaneous radiation ∼5 million years ago, coincident with major geographical changes in southeast Asia that caused cycles of habitat compression and expansion. Finally, we identify signatures of positive selection in genes important for forelimb development (TBX5) and connective tissues (COL1A1) that may have been involved in the adaptation of gibbons to their arboreal habitat.


Asunto(s)
Genoma/genética , Hylobates/clasificación , Hylobates/genética , Cariotipo , Filogenia , Animales , Evolución Molecular , Hominidae/clasificación , Hominidae/genética , Humanos , Datos de Secuencia Molecular , Retroelementos/genética , Selección Genética , Terminación de la Transcripción Genética
4.
Plant J ; 2018 Jun 14.
Artículo en Inglés | MEDLINE | ID: mdl-29901849

RESUMEN

Gene duplications and gene losses are major determinants of genome evolution and phenotypic diversity. The frequency of gene turnover (gene gains and gene losses combined) is known to vary between organisms. Comparative genomic analyses of gene families can highlight such variation; however, estimates of gene turnover may be biased when using highly fragmented genome assemblies resulting in poor gene annotations. Here, we address potential biases introduced by gene annotation errors in estimates of gene turnover frequencies in a dataset including both well-annotated angiosperm genomes and the incomplete gene sets of four Pinaceae, including two pine species, Norway spruce and Douglas-fir. We show that Pinaceae experienced higher gene turnover rates than angiosperm lineages lacking recent whole-genome duplications. This finding is robust to both known major issues in Pinaceae gene sets: missing gene models and erroneous annotation of pseudogenes. A separate analysis limited to the four Pinaceae gene sets pointed to an accelerated gene turnover rate in pines compared with Norway spruce and Douglas-fir. Our results indicate that gene turnover significantly contributes to genome variation and possibly to speciation in Pinaceae, particularly in pines. Moreover, these findings indicate that reliable estimates of gene turnover frequencies can be discerned in incomplete and potentially inaccurate gene sets. Because gymnosperms are known to exhibit low overall substitution rates compared with angiosperms, our results suggest that the rate of single-base pair mutations is uncoupled from the rate of large DNA duplications and deletions associated with gene turnover in Pinaceae.

5.
Genome Res ; 22(3): 429-35, 2012 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-22090377

RESUMEN

Establishing the molecular basis of DNA mutations that cause inherited disease is of fundamental importance to understanding the origin, nature, and clinical sequelae of genetic disorders in humans. The majority of disease-associated mutations constitute single-base substitutions and short deletions and/or insertions resulting from DNA replication errors and the repair of damaged bases. However, pathological mutations can also be introduced by nonreciprocal recombination events between paralogous sequences, a phenomenon known as interlocus gene conversion (IGC). IGC events have thus far been linked to pathology in more than 20 human genes. However, the large number of duplicated gene sequences in the human genome implies that many more disease-associated mutations could originate via IGC. Here, we have used a genome-wide computational approach to identify disease-associated mutations derived from IGC events. Our approach revealed hundreds of known pathological mutations that could have been caused by IGC. Further, we identified several dozen high-confidence cases of inherited disease mutations resulting from IGC in ∼1% of all genes analyzed. About half of the donor sequences associated with such mutations are functional paralogous genes, suggesting that epistatic interactions or differential expression patterns will determine the impact upon fitness of specific substitutions between duplicated genes. In addition, we identified thousands of hitherto undescribed and potentially deleterious mutations that could arise via IGC. Our findings reveal the extent of the impact of interlocus gene conversion upon the spectrum of human inherited disease.


Asunto(s)
Conversión Génica , Enfermedades Genéticas Congénitas/genética , Mutación , Alelos , Cromosomas Humanos , Biología Computacional , Sitios Genéticos , Humanos
8.
bioRxiv ; 2024 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-38766115

RESUMEN

Dendroctonus frontalis, also known as southern pine beetle (SPB), represents the most damaging forest pest in the southeastern United States. Strategies to predict, monitor and suppress SPB outbreaks have had limited success. Genomic data are critical to inform on pest biology and to identify molecular targets to develop improved management approaches. Here, we produced a chromosome-level genome assembly of SPB using long-read sequencing data. Synteny analyses confirmed the conservation of the core coleopteran Stevens elements and validated the bona fide SPB X chromosome. Transcriptomic data were used to obtain 39,588 transcripts corresponding to 13,354 putative protein-coding loci. Comparative analyses of gene content across 14 beetle and 3 other insects revealed several losses of conserved genes in the Dendroctonus clade and gene gains in SPB and Dendroctonus that were enriched for loci encoding membrane proteins and extracellular matrix proteins. While lineage-specific gene losses contributed to the gene content reduction observed in Dendroctonus, we also showed that widespread misannotation of transposable elements represents a major cause of the apparent gene expansion in several non-Dendroctonus species. Our findings uncovered distinctive features of the SPB gene complement and disentangled the role of biological and annotation-related factors contributing to gene content variation across beetles.

9.
Mol Biol Evol ; 29(12): 3817-26, 2012 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-22844073

RESUMEN

Gene duplication is a major driver of organismal adaptation and evolution and plays an important role in multiple human diseases. Whole-genome analyses have shown similar and high rates of gene duplication across a variety of eukaryotic species. Most of these studies, however, did not address the possible impact of interlocus gene conversion (IGC) on the evolution of gene duplicates. Because IGC homogenizes pairs of duplicates, widespread conversion would cause gene duplication events that happened long ago to appear more recent, resulting in artificially high estimates of duplication rates. Although the majority of genome-wide studies (including in the budding yeast Saccharomyces cerevisiae [Scer]) point to levels of IGC between paralogs ranging from 2% to 18%, Gao and Innan (Gao LZ, Innan H. 2004. Very low gene duplication rate in the yeast genome. Science 306:1367-1370.) found that gene conversion in yeast affected >80% of paralog pairs. If conversion rates really are this high, it would imply that the rate of gene duplication in eukaryotes is much lower than previously reported. In this work, we apply four different methodologies-including one approach that closely mirrors Gao and Innan's method-to estimate the level of IGC in Scer. Our analyses point to a maximum conversion level of 13% between paralogs in this species, in close agreement with most estimates of IGC in eukaryotes. We also show that the exceedingly high levels of conversion found previously derive from application of an accurate method to an inappropriate data set. In conclusion, our work provides the most striking evidence to date supporting the reduced incidence of IGC among Scer paralogs and sets up a framework for future analyses in other eukaryotes.


Asunto(s)
Evolución Molecular , Conversión Génica/genética , Duplicación de Gen/genética , Genoma Fúngico/genética , Saccharomyces cerevisiae/genética , Secuencia de Bases , Biología Computacional , Funciones de Verosimilitud , Modelos Genéticos , Datos de Secuencia Molecular , Filogenia , Alineación de Secuencia , Especificidad de la Especie
10.
Mol Biol Evol ; 29(1): 239-47, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21813467

RESUMEN

Prolactin (PRL) is a multifunctional signaling molecule best known for its role in regulating lactation in mammals. Systemic PRL is produced by the anterior pituitary, but extrapituitary PRL has also been detected in many tissues including the human endometrium. Prolactin is essential for pregnancy in rodents and one of the most dramatically induced genes in the endometrium during human pregnancy. The promoter for human endometrial Prl is located about 5.8 kb upstream of the pituitary promoter and is derived from a transposable element called MER39. Although it has been shown that prolactin is expressed in the pregnant endometrium of a few mammals other than humans, MER39 has been described as primate specific. Thus, in an effort to understand mechanisms of prolactin regulatory evolution, we sought to determine how uterine prolactin is transcribed in species that lack MER39. Using a variety of complementary strategies, including reverse transcriptase-polymerase chain reaction, 5' rapid amplification of cDNA ends, and whole-transcriptome sequencing, we show that endometrial Prl expression is not a shared character of all placental mammals, as it is not expressed in rabbits, pigs, dogs, or armadillos. We show that in primates, mice, and elephants, prolactin mRNA is transcribed in the pregnant endometrium from alternative promoters, different from the pituitary promoter and different from each other. Moreover, we demonstrate that the spider monkey promoter derives from the long terminal repeat (LTR) element MER39 as in humans, the mouse promoter derives from the LTR element MER77, and the elephant promoter derives from the lineage-specific LINE retrotransposon L1-2_LA. We also find surprising variation of transcriptional start sites within these transposable elements and of Prl splice variants, suggesting a high degree of flexibility in the promoter architecture even among closely related species. Finally, the three groups shown here to express endometrial prolactin-the higher primates, the rodents, and the elephant-represent three of the four lineages that showed adaptive evolution of the Prl gene in an earlier study (Wallis M. 2000. Episodic evolution of protein hormones: molecular evolution of pituitary prolactin. J Mol Evol. 50:465-473), which supports our findings and suggests that the selective forces responsible for accelerated Prl evolution were in the endometrium. This is the first reported case of convergent evolution of gene expression through the independent recruitment of different transposable elements, highlighting the importance of transposable elements in gene regulatory, and potentially adaptive, evolution.


Asunto(s)
Elementos Transponibles de ADN , Endometrio/metabolismo , Evolución Molecular , Prolactina/biosíntesis , Prolactina/genética , Animales , Atelinae , Bases de Datos Genéticas , Elefantes , Femenino , Humanos , Ratones , Filogenia , Embarazo , Regiones Promotoras Genéticas , Alineación de Secuencia , Especificidad de la Especie , Transcripción Genética
11.
PeerJ ; 10: e12791, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35127287

RESUMEN

BACKGROUND: The recurrent evolution of the C4 photosynthetic pathway in angiosperms represents one of the most extraordinary examples of convergent evolution of a complex trait. Comparative genomic analyses have unveiled some of the molecular changes associated with the C4 pathway. For instance, several key enzymes involved in the transition from C3 to C4 photosynthesis have been found to share convergent amino acid replacements along C4 lineages. However, the extent of convergent replacements potentially associated with the emergence of C4 plants remains to be fully assessed. Here, we conducted an organelle-wide analysis to determine if convergent evolution occurred in multiple chloroplast proteins beside the well-known case of the large RuBisCO subunit encoded by the chloroplast gene rbcL. METHODS: Our study was based on the comparative analysis of 43 C4 and 21 C3 grass species belonging to the PACMAD clade, a focal taxonomic group in many investigations of C4 evolution. We first used protein sequences of 67 orthologous chloroplast genes to build an accurate phylogeny of these species. Then, we inferred amino acid replacements along 13 C4 lineages and 9 C3 lineages using reconstructed protein sequences of their reference branches, corresponding to the branches containing the most recent common ancestors of C4-only clades and C3-only clades. Pairwise comparisons between reference branches allowed us to identify both convergent and non-convergent amino acid replacements between C4:C4, C3:C3 and C3:C4 lineages. RESULTS: The reconstructed phylogenetic tree of 64 PACMAD grasses was characterized by strong supports in all nodes used for analyses of convergence. We identified 217 convergent replacements and 201 non-convergent replacements in 45/67 chloroplast proteins in both C4 and C3 reference branches. C4:C4 branches showed higher levels of convergent replacements than C3:C3 and C3:C4 branches. Furthermore, we found that more proteins shared unique convergent replacements in C4 lineages, with both RbcL and RpoC1 (the RNA polymerase beta' subunit 1) showing a significantly higher convergent/non-convergent replacements ratio in C4 branches. Notably, more C4:C4 reference branches showed higher numbers of convergent vs. non-convergent replacements than C3:C3 and C3:C4 branches. Our results suggest that, in the PACMAD clade, C4 grasses experienced higher levels of molecular convergence than C3 species across multiple chloroplast genes. These findings have important implications for our understanding of the evolution of the C4 photosynthesis pathway.


Asunto(s)
Genes del Cloroplasto , Ribulosa-Bifosfato Carboxilasa , Filogenia , Ribulosa-Bifosfato Carboxilasa/genética , Poaceae , Plantas/genética , Evolución Molecular , Proteínas de Cloroplastos/genética
12.
Mob DNA ; 13(1): 28, 2022 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-36451208

RESUMEN

BACKGROUND: Transposable elements (TEs) are selfish DNA sequences capable of moving and amplifying at the expense of host cells. Despite this, an increasing number of studies have revealed that TE proteins are important contributors to the emergence of novel host proteins through molecular domestication. We previously described seven transposase-derived domesticated genes from the PIF/Harbinger DNA family of TEs in Drosophila and a co-domestication. All PIF TEs known in plants and animals distinguish themselves from other DNA transposons by the presence of two genes. We hypothesize that there should often be co-domestications of the two genes from the same TE because the transposase (gene 1) has been described to be translocated to the nucleus by the MADF protein (gene 2). To provide support for this model of new gene origination, we investigated available insect species genomes for additional evidence of PIF TE domestication events and explored the co-domestication of the MADF protein from the same TE insertion. RESULTS: After the extensive insect species genomes exploration of hits to PIF transposases and analyses of their context and evolution, we present evidence of at least six independent PIF transposable elements proteins domestication events in insects: two co-domestications of both transposase and MADF proteins in Anopheles (Diptera), one transposase-only domestication event and one co-domestication in butterflies and moths (Lepidoptera), and two transposases-only domestication events in cockroaches (Blattodea). The predicted nuclear localization signals for many of those proteins and dicistronic transcription in some instances support the functional associations of co-domesticated transposase and MADF proteins. CONCLUSIONS: Our results add to a co-domestication that we previously described in fruit fly genomes and support that new gene origination through domestication of a PIF transposase is frequently accompanied by the co-domestication of a cognate MADF protein in insects, potentially for regulatory functions. We propose a detailed model that predicts that PIF TE protein co-domestication should often occur from the same PIF TE insertion.

13.
J Exp Zool B Mol Dev Evol ; 316(5): 330-8, 2011 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-21344644

RESUMEN

In many organisms, the specification of cell fate and the formation of embryonic axes depend on a proper distribution of maternal mRNAs during oogenesis. Asymmetrically localized determinants are required both for embryonic axes and germline determination in anuran amphibians. As a model system of these processes, we have used a species complex of the genus Pelophylax (Rana), characterized by a hybridogenetic reproduction that involves events of genome exclusion and endoreduplication during meiosis in both sexes. With the aim of characterizing the still largely unknown molecular events regulating Pelophylax gametogenesis, we have isolated in this animal model homologues of the deleted in AZoospermia-like (DAZl) and pumilio gene families (named RlDazl and RlPum1, respectively), which encode posttranscriptional regulators. Expression pattern analysis of these genes showed that RlDazl is exclusively expressed in gonadal tissues, whereas RlPum1 is expressed in both somatic tissues and gonads. In situ hybridization carried out on gonads revealed that the two transcripts were asymmetrically localized along the animal-vegetal (A-V) axis of oocytes. In particular, the RlDazl transcript progressively collected to the vegetal pole during oogenesis, whereas the RlPum1 mRNA was preferentially enriched at the animal hemisphere. In adult testes, RlDazl and RlPum1 were expressed in specific phases of spermatogenetic divisions as shown by immunostaining with anti-H3 phosphohistone antibody. Our results indicate that RlDazl and RlPum1 represent two early indicators of oocyte polarity in this hybridogenetic vertebrate model. Additionally, RlDazl share with vertebrate DAZ- like genes a germ cell-specific expression pattern.


Asunto(s)
Oocitos/metabolismo , Oogénesis/genética , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismo , Espermatogénesis/genética , Adulto , Animales , Femenino , Gametogénesis/genética , Regulación del Desarrollo de la Expresión Génica , Genes , Humanos , Masculino , Modelos Animales , ARN Mensajero/genética , ARN Mensajero/metabolismo , Ranidae
15.
Front Genet ; 12: 661440, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34140968

RESUMEN

Drought response is coordinated through expression changes in a large suite of genes. Interspecific variation in this response is common and associated with drought-tolerant and -sensitive genotypes. The extent to which different genetic networks orchestrate the adjustments to water deficit in tolerant and sensitive genotypes has not been fully elucidated, particularly in non-model or woody plants. Differential expression analysis via RNA-seq was evaluated in root tissue exposed to simulated drought conditions in two loblolly pine (Pinus taeda L.) clones with contrasting tolerance to drought. Loblolly pine is the prevalent conifer in southeastern U.S. and a major commercial forestry species worldwide. Significant changes in gene expression levels were found in more than 4,000 transcripts [drought-related transcripts (DRTs)]. Genotype by environment (GxE) interactions were prevalent, suggesting that different cohorts of genes are influenced by drought conditions in the tolerant vs. sensitive genotypes. Functional annotation categories and metabolic pathways associated with DRTs showed higher levels of overlap between clones, with the notable exception of GO categories in upregulated DRTs. Conversely, both differentially expressed transcription factors (TFs) and TF families were largely different between clones. Our results indicate that the response of a drought-tolerant loblolly pine genotype vs. a sensitive genotype to water limitation is remarkably different on a gene-by-gene level, although it involves similar genetic networks. Upregulated transcripts under drought conditions represent the most diverging component between genotypes, which might depend on the activation and repression of substantially different groups of TFs.

16.
Genetics ; 182(2): 615-22, 2009 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-19307604

RESUMEN

Gene conversion between duplicated genes has been implicated in homogenization of gene families and reassortment of variation among paralogs. If conversion is common, this process could lead to errors in gene tree inference and subsequent overestimation of rates of gene duplication. After performing simulations to assess our power to detect gene conversion events, we determined rates of conversion among young, lineage-specific gene duplicates in four mammal species: human, rhesus macaque, mouse, and rat. Gene conversion rates (number of conversion events/number of gene pairs) among young duplicates range from 8.3% in macaque to 18.96% in rat, including a 5% false-positive rate. For all lineages, only 1-3% of the total amount of sequence examined was converted. There is no increase in GC content in conversion tracts compared to flanking regions of the same genes nor in conversion tracts compared to the same region in nonconverted gene-family members, suggesting that ectopic gene conversion does not significantly alter nucleotide composition in these duplicates. While the majority of gene duplicate pairs reside on different chromosomes in mammalian genomes, the majority of gene conversion events occur between duplicates on the same chromosome, even after controlling for divergence between duplicates. Among intrachromosomal duplicates, however, there is no correlation between the probability of conversion and physical distance between duplicates after controlling for divergence. Finally, we use a novel method to show that at most 5-10% of all gene trees involving young duplicates are likely to be incorrect due to gene conversion. We conclude that gene conversion has had only a small effect on mammalian genomes and gene duplicate evolution in general.


Asunto(s)
Conversión Génica , Duplicación de Gen , Genoma/genética , Animales , Reacciones Falso Negativas , Reacciones Falso Positivas , Genómica , Humanos , Macaca mulatta/genética , Ratones , Ratas
17.
Genomics ; 93(1): 83-9, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18848618

RESUMEN

Retrogenes are processed copies of genes that are inserted into new genomic regions and that acquire new regulatory elements from the sequences in their surroundings. Here we use a comparative approach of phylogenetic footprinting and a non-comparative approach of measuring motif over-representation in retrogenes in order to describe putative elements present in cis-regulatory regions of 94 retrogenes recently described in Drosophila. The detailed examination of the motifs found in the core promoter regions of retrogenes reveals an abundance of the DNA replication-related element (DRE), the Initiator (Inr), and a new over-represented motif that we call the GCT motif. Parental genes also show an abundance of DRE and Inr motifs, but these do not seem to have been carried over with retrogenes. In particular, we also examined motifs upstream of retrogenes expressed in adult testis and were able to identify 6 additional over-represented motifs. Comparative analyses provide data on the conservation and origin of some of these motifs and reveal 15 additional conserved motifs in these retrogenes. Some of those conserved motifs are sequences bound by known transcription factors, while others are novel motifs. In this report we provide the first genome-wide data on which specific cis-regulatory regions can be recruited by retrogenes after they are inserted into new coding regions in the genome. Future experiments are needed to determine the function and role of the new elements presented here.


Asunto(s)
Secuencias de Aminoácidos , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Elementos de Facilitación Genéticos , Regulación de la Expresión Génica , Genes de Insecto , Retroelementos , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Secuencia Conservada , Masculino , Datos de Secuencia Molecular , Regiones Promotoras Genéticas , Testículo/metabolismo
18.
Front Genet ; 11: 82, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32153642

RESUMEN

Copy number variants are duplications and deletions of the genome that play an important role in phenotypic changes and human disease. Many software applications have been developed to detect copy number variants using either whole-genome sequencing or whole-exome sequencing data. However, there is poor agreement in the results from these applications. Simulated datasets containing copy number variants allow comprehensive comparisons of the operating characteristics of existing and novel copy number variant detection methods. Several software applications have been developed to simulate copy number variants and other structural variants in whole-genome sequencing data. However, none of the applications reliably simulate copy number variants in whole-exome sequencing data. We have developed and tested Simulator of Exome Copy Number Variants (SECNVs), a fast, robust and customizable software application for simulating copy number variants and whole-exome sequences from a reference genome. SECNVs is easy to install, implements a wide range of commands to customize simulations, can output multiple samples at once, and incorporates a pipeline to output rearranged genomes, short reads and BAM files in a single command. Variants generated by SECNVs are detected with high sensitivity and precision by tools commonly used to detect copy number variants. SECNVs is publicly available at https://github.com/YJulyXing/SECNVs.

19.
Mol Biol Evol ; 25(1): 29-41, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17940212

RESUMEN

The mammalian centromere-associated protein B (CENP-B) shares significant sequence similarity with 3 proteins in fission yeast (Abp1, Cbh1, and Cbh2) that also bind centromeres and have essential function for chromosome segregation and centromeric heterochromatin formation. Each of these proteins displays extensive sequence similarity with pogo-like transposases, which have been previously identified in the genomes of various insects and vertebrates, in the protozoan Entamoeba and in plants. Based on this distribution, it has been proposed that the mammalian and fission yeast centromeric proteins are derived from "domesticated" pogo-like transposons. Here we took advantage of the vast amount of sequence information that has become recently available for a wide range of fungal and animal species to investigate the origin of the mammalian CENP-B and yeast CENP-B-like genes. A highly conserved ortholog of CENP-B was detected in 31 species of mammals, including opossum and platypus, but was absent from all nonmammalian species represented in the databases. Similarly, no ortholog of the fission yeast centromeric proteins was identified in any of the various fungal genomes currently available. In contrast, we discovered a plethora of novel pogo-like transposons in diverse invertebrates and vertebrates and in several filamentous fungi. Phylogenetic analysis revealed that the mammalian and fission yeast CENP-B proteins fall into 2 distinct monophyletic clades, each of which includes a different set of pogo-like transposons. These results are most parsimoniously explained by independent domestication events of pogo-like transposases into centromeric proteins in the mammalian and fission yeast lineages, a case of "convergent domestication." These findings highlight the propensity of transposases to give rise to new host proteins and the potential of transposons as sources of genetic innovation.


Asunto(s)
Proteína B del Centrómero/genética , Proteínas Cromosómicas no Histona/genética , Proteínas de Unión al ADN/genética , Evolución Molecular , Proteínas de Schizosaccharomyces pombe/genética , Schizosaccharomyces/genética , Transposasas/genética , Animales , Bases de Datos Genéticas , Humanos , Filogenia , Análisis de Secuencia de ADN , Especificidad de la Especie
20.
J Mol Evol ; 68(6): 679-87, 2009 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-19471989

RESUMEN

Previous studies have shown that recombination between allelic sequences can cause likelihood-based methods for detecting positive selection to produce many false-positive results. In this article, we use simulations to study the impact of nonallelic gene conversion on the specificity of PAML to detect positive selection among gene duplicates. Our results show that, as expected, gene conversion leads to higher rates of false-positive results, although only moderately. These rates increase with the genetic distance between sequences, the length of converted tracts, and when no outgroup sequences are included in the analysis. We also find that branch-site models will incorrectly identify unconverted sequences as the targets of positive selection when their close paralogs are converted. Bayesian prediction of sites undergoing adaptive evolution implemented in PAML is affected by conversion, albeit in a less straightforward way. Our work suggests that particular attention should be devoted to the evolutionary analysis of recent duplicates that may have experienced gene conversion because they may provide false signals of positive selection. Fortunately, these results also imply that those cases most susceptible to false-positive results--i.e., high divergence between paralogs, long conversion tracts--are also the cases where detecting gene conversion is the easiest.


Asunto(s)
Evolución Molecular , Conversión Génica , Modelos Genéticos , Modelos Estadísticos , Selección Genética , Teorema de Bayes , Simulación por Computador , Bases de Datos Genéticas , Funciones de Verosimilitud , Método de Montecarlo , Recombinación Genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA