Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 166(2): 279-287, 2016 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-27419868

RESUMO

Genes encode components of coevolved and interconnected networks. The effect of genotype on phenotype therefore depends on genotypic context through gene interactions known as epistasis. Epistasis is important in predicting phenotype from genotype for an individual. It is also examined in population studies to identify genetic risk factors in complex traits and to predict evolution under selection. Paradoxically, the effects of genotypic context in individuals and populations are distinct and sometimes contradictory. We argue that predicting genotype from phenotype for individuals based on population studies is difficult and, especially in human genetics, likely to result in underestimating the effects of genotypic context.


Assuntos
Epistasia Genética , Genótipo , Animais , Genética Médica , Genética Populacional , Humanos , Característica Quantitativa Herdável
2.
PLoS Genet ; 19(3): e1010677, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36952570

RESUMO

The standard neutral model of molecular evolution has traditionally been used as the null model for population genomics. We gathered a collection of 45 genome-wide site frequency spectra from a diverse set of species, most of which display an excess of low and high frequency variants compared to the expectation of the standard neutral model, resulting in U-shaped spectra. We show that multiple merger coalescent models often provide a better fit to these observations than the standard Kingman coalescent. Hence, in many circumstances these under-utilized models may serve as the more appropriate reference for genomic analyses. We further discuss the underlying evolutionary processes that may result in the widespread U-shape of frequency spectra.


Assuntos
Evolução Biológica , Evolução Molecular , Modelos Genéticos
3.
Proc Natl Acad Sci U S A ; 120(43): e2307340120, 2023 10 24.
Artigo em Inglês | MEDLINE | ID: mdl-37844245

RESUMO

Echolocation, the detection of objects by means of sound waves, has evolved independently in diverse animals. Echolocators include not only mammals such as toothed whales and yangochiropteran and rhinolophoid bats but also Rousettus fruit bats, as well as two bird lineages, oilbirds and swiftlets. In whales and yangochiropteran and rhinolophoid bats, positive selection and molecular convergence has been documented in key hearing-related genes, such as prestin (SLC26A5), but few studies have examined these loci in other echolocators. Here, we examine patterns of selection and convergence in echolocation-related genes in echolocating birds and Rousettus bats. Fewer of these loci were under selection in Rousettus or birds compared with classically recognized echolocators, and elevated convergence (compared to outgroups) was not evident across this gene set. In certain genes, however, we detected convergent substitutions with potential functional relevance, including convergence between Rousettus and classic echolocators in prestin at a site known to affect hair cell electromotility. We also detected convergence between Yangochiroptera, Rhinolophidea, and oilbirds in TMC1, an important mechanosensory transduction channel in vertebrate hair cells, and observed an amino acid change at the same site within the pore domain. Our results suggest that although most proteins implicated in echolocation in specialized mammals may not have been recruited in birds or Rousettus fruit bats, certain hearing-related loci may have undergone convergent functional changes. Investigating adaptations in diverse echolocators will deepen our understanding of this unusual sensory modality.


Assuntos
Quirópteros , Ecolocação , Animais , Quirópteros/fisiologia , Filogenia , Evolução Molecular , Mamíferos/genética , Audição/genética , Baleias/fisiologia , Aves/genética , Ecolocação/fisiologia
4.
Mol Biol Evol ; 41(1)2024 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-38069903

RESUMO

The increasing availability of genomic resequencing data sets and high-quality reference genomes across the tree of life present exciting opportunities for comparative population genomic studies. However, substantial challenges prevent the simple reuse of data across different studies and species, arising from variability in variant calling pipelines, data quality, and the need for computationally intensive reanalysis. Here, we present snpArcher, a flexible and highly efficient workflow designed for the analysis of genomic resequencing data in nonmodel organisms. snpArcher provides a standardized variant calling pipeline and includes modules for variant quality control, data visualization, variant filtering, and other downstream analyses. Implemented in Snakemake, snpArcher is user-friendly, reproducible, and designed to be compatible with high-performance computing clusters and cloud environments. To demonstrate the flexibility of this pipeline, we applied snpArcher to 26 public resequencing data sets from nonmammalian vertebrates. These variant data sets are hosted publicly to enable future comparative population genomic analyses. With its extensibility and the availability of public data sets, snpArcher will contribute to a broader understanding of genetic variation across species by facilitating the rapid use and reuse of large genomic data sets.


Assuntos
Metagenômica , Software , Animais , Fluxo de Trabalho , Genômica , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala
5.
PLoS Comput Biol ; 20(4): e1011995, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38656999

RESUMO

Genomes contain conserved non-coding sequences that perform important biological functions, such as gene regulation. We present a phylogenetic method, PhyloAcc-C, that associates nucleotide substitution rates with changes in a continuous trait of interest. The method takes as input a multiple sequence alignment of conserved elements, continuous trait data observed in extant species, and a background phylogeny and substitution process. Gibbs sampling is used to assign rate categories (background, conserved, accelerated) to lineages and explore whether the assigned rate categories are associated with increases or decreases in the rate of trait evolution. We test our method using simulations and then illustrate its application using mammalian body size and lifespan data previously analyzed with respect to protein coding genes. Like other studies, we find processes such as tumor suppression, telomere maintenance, and p53 regulation to be related to changes in longevity and body size. In addition, we also find that skeletal genes, and developmental processes, such as sprouting angiogenesis, are relevant.


Assuntos
Evolução Molecular , Modelos Genéticos , Filogenia , Animais , Longevidade/genética , Humanos , Biologia Computacional/métodos , Simulação por Computador , Tamanho Corporal/genética , Nucleotídeos/genética , Alinhamento de Sequência/métodos
6.
Mol Biol Evol ; 40(9)2023 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-37665177

RESUMO

An important goal of evolutionary genomics is to identify genomic regions whose substitution rates differ among lineages. For example, genomic regions experiencing accelerated molecular evolution in some lineages may provide insight into links between genotype and phenotype. Several comparative genomics methods have been developed to identify genomic accelerations between species, including a Bayesian method called PhyloAcc, which models shifts in substitution rate in multiple target lineages on a phylogeny. However, few methods consider the possibility of discordance between the trees of individual loci and the species tree due to incomplete lineage sorting, which might cause false positives. Here, we present PhyloAcc-GT, which extends PhyloAcc by modeling gene tree heterogeneity. Given a species tree, we adopt the multispecies coalescent model as the prior distribution of gene trees, use Markov chain Monte Carlo (MCMC) for inference, and design novel MCMC moves to sample gene trees efficiently. Through extensive simulations, we show that PhyloAcc-GT outperforms PhyloAcc and other methods in identifying target lineage-specific accelerations and detecting complex patterns of rate shifts, and is robust to specification of population size parameters. PhyloAcc-GT is usually more conservative than PhyloAcc in calling convergent rate shifts because it identifies more accelerations on ancestral than on terminal branches. We apply PhyloAcc-GT to two examples of convergent evolution: flightlessness in ratites and marine mammal adaptations, and show that PhyloAcc-GT is a robust tool to identify shifts in substitution rate associated with specific target lineages while accounting for incomplete lineage sorting.


Assuntos
Evolução Biológica , Modelos Genéticos , Animais , Teorema de Bayes , Filogenia , Genômica , Mamíferos
7.
Mol Ecol ; : e17378, 2024 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-38721834

RESUMO

Recent advances in genomic technology, including the rapid development of long-read sequencing technology and single-cell RNA-sequencing methods, are poised to significantly expand the kinds of studies that are feasible in ecological genomics. In this perspective, we review these new technologies and discuss their potential impact on gene expression studies in non-model organisms. Although traditional RNA-sequencing methods have been an extraordinarily powerful tool to apply functional genomics in an ecological context, bulk RNA-seq approaches often rely on de novo transcriptome assembly, and cannot capture expression changes in rare cell populations or distinguish shifts in cell type abundance. Advancements in genome assembly technology, particularly long-read sequencing, and improvements in the scalability of single-cell RNA-sequencing (scRNA-seq) offer unprecedented resolution in understanding cellular heterogeneity and gene regulation. We discuss the potential of these technologies to enable disentangling differential gene regulation from cell type composition differences and uncovering subtle expression patterns masked by bulk RNA-seq. The integration of these approaches provides a more nuanced understanding of the ecological and evolutionary dynamics of gene expression, paving the way for refined models and deeper insights into the generation of biodiversity.

8.
Trends Genet ; 36(10): 792-803, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32800625

RESUMO

A major goal of comparative genomics research is modeling changes in DNA sequences between species to understand the evolutionary forces acting on species differences. Application of these models to a number of species over the past decade has revealed some commonalities across organisms, most notably a consistent role of positive selection in shaping the molecular evolution of the immune system. However, models of DNA sequence evolution also have important limitations that are increasingly being recognized, including issues with data quality and biases caused by simplifying assumptions. While new approaches have begun to address these challenges, ultimately, additional data, such as resequencing data from populations, will provide more power to fully understand the unique evolutionary forces acting on different species. In this review, I summarize the conclusions of recent genome-wide studies of selection, highlight some important challenges to applying these methods to large data sets, and discuss ways forward for the field.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Genoma , Polimorfismo Genético , Seleção Genética , Animais , Genômica , Humanos
9.
Bioinformatics ; 37(23): 4431-4436, 2021 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-34255817

RESUMO

MOTIVATION: The emergence of single-cell RNA sequencing (scRNA-seq) has led to an explosion in novel methods to study biological variation among individual cells, and to classify cells into functional and biologically meaningful categories. RESULTS: Here, we present a new cell type projection tool, Hierarchical Random Forest for Information Transfer (HieRFIT), based on hierarchical random forests. HieRFIT uses a priori information about cell type relationships to improve classification accuracy, taking as input a hierarchical tree structure representing the class relationships, along with the reference data. We use an ensemble approach combining multiple random forest models, organized in a hierarchical decision tree structure. We show that our hierarchical classification approach improves accuracy and reduces incorrect predictions especially for inter-dataset tasks which reflect real-life applications. We use a scoring scheme that adjusts probability distributions for candidate class labels and resolves uncertainties while avoiding the assignment of cells to incorrect types by labeling cells at internal nodes of the hierarchy when necessary. AVAILABILITY AND IMPLEMENTATION: HieRFIT is implemented as an R package, and it is available at (https://github.com/yasinkaymaz/HieRFIT/releases/tag/v1.0.0). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , Software , Análise de Sequência de RNA , Análise de Célula Única , Algoritmo Florestas Aleatórias
10.
Bioinformatics ; 37(15): 2212-2214, 2021 08 09.
Artigo em Inglês | MEDLINE | ID: mdl-33165513

RESUMO

MOTIVATION: One major goal of single-cell RNA sequencing (scRNAseq) experiments is to identify novel cell types. With increasingly large scRNAseq datasets, unsupervised clustering methods can now produce detailed catalogues of transcriptionally distinct groups of cells in a sample. However, the interpretation of these clusters is challenging for both technical and biological reasons. Popular clustering algorithms are sensitive to parameter choices, and can produce different clustering solutions with even small changes in the number of principal components used, the k nearest neighbor and the resolution parameters, among others. RESULTS: Here, we present a set of tools to evaluate cluster stability by subsampling, which can guide parameter choice and aid in biological interpretation. The R package scclusteval and the accompanying Snakemake workflow implement all steps of the pipeline: subsampling the cells, repeating the clustering with Seurat and estimation of cluster stability using the Jaccard similarity index and providing rich visualizations. AVAILABILITYAND IMPLEMENTATION: R package scclusteval: https://github.com/crazyhottommy/scclusteval Snakemake workflow: https://github.com/crazyhottommy/pyflow_seuratv3_parameter Tutorial: https://crazyhottommy.github.io/EvaluateSingleCellClustering/.


Assuntos
Algoritmos , Análise de Célula Única , Sequência de Bases , Análise por Conglomerados , Análise de Sequência de RNA , Sequenciamento do Exoma
11.
BMC Biol ; 19(1): 41, 2021 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-33750380

RESUMO

BACKGROUND: The stable fly, Stomoxys calcitrans, is a major blood-feeding pest of livestock that has near worldwide distribution, causing an annual cost of over $2 billion for control and product loss in the USA alone. Control of these flies has been limited to increased sanitary management practices and insecticide application for suppressing larval stages. Few genetic and molecular resources are available to help in developing novel methods for controlling stable flies. RESULTS: This study examines stable fly biology by utilizing a combination of high-quality genome sequencing and RNA-Seq analyses targeting multiple developmental stages and tissues. In conjunction, 1600 genes were manually curated to characterize genetic features related to stable fly reproduction, vector host interactions, host-microbe dynamics, and putative targets for control. Most notable was characterization of genes associated with reproduction and identification of expanded gene families with functional associations to vision, chemosensation, immunity, and metabolic detoxification pathways. CONCLUSIONS: The combined sequencing, assembly, and curation of the male stable fly genome followed by RNA-Seq and downstream analyses provide insights necessary to understand the biology of this important pest. These resources and new data will provide the groundwork for expanding the tools available to control stable fly infestations. The close relationship of Stomoxys to other blood-feeding (horn flies and Glossina) and non-blood-feeding flies (house flies, medflies, Drosophila) will facilitate understanding of the evolutionary processes associated with development of blood feeding among the Cyclorrhapha.


Assuntos
Genoma de Inseto , Interações Hospedeiro-Parasita/genética , Controle de Insetos , Muscidae/genética , Animais , Reprodução/genética
12.
BMC Bioinformatics ; 21(1): 149, 2020 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-32306895

RESUMO

BACKGROUND: Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, we test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases. RESULTS: At both the transcript and gene levels, 2 × 40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2 × 125 than 1 × 75 reads; in nearly all cases, those correlations are also greater than for 1 × 125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2 × 40 consistently outperform those using 1 × 75. CONCLUSION: Researchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level.


Assuntos
Perfilação da Expressão Gênica/métodos , Expressão Gênica/genética
13.
Mol Biol Evol ; 36(5): 1086-1100, 2019 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-30851112

RESUMO

Conservation of DNA sequence over evolutionary time is a strong indicator of function, and gain or loss of sequence conservation can be used to infer changes in function across a phylogeny. Changes in evolutionary rates on particular lineages in a phylogeny can indicate shared functional shifts, and thus can be used to detect genomic correlates of phenotypic convergence. However, existing methods do not allow easy detection of patterns of rate variation, which causes challenges for detecting convergent rate shifts or other complex evolutionary scenarios. Here we introduce PhyloAcc, a new Bayesian method to model substitution rate changes in conserved elements across a phylogeny. The method assumes several categories of substitution rate for each branch on the phylogenetic tree, estimates substitution rates per category, and detects changes of substitution rate as the posterior probability of a category switch. Simulations show that PhyloAcc can detect genomic regions with rate shifts in multiple target species better than previous methods and has a higher accuracy of reconstructing complex patterns of substitution rate changes than prevalent Bayesian relaxed clock models. We demonstrate the utility of PhyloAcc in two classic examples of convergent phenotypes: loss of flight in birds and the transition to marine life in mammals. In each case, our approach reveals numerous examples of conserved nonexonic elements with accelerations specific to the phenotypically convergent lineages. Our method is widely applicable to any set of conserved elements where multiple rate changes are expected on a phylogeny.


Assuntos
Evolução Molecular , Técnicas Genéticas , Modelos Genéticos , Filogenia , Animais , Teorema de Bayes , Aves/genética , Simulação por Computador , Mamíferos/genética , Software
14.
Syst Biol ; 68(6): 937-955, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31135914

RESUMO

Palaeognathae represent one of the two basal lineages in modern birds, and comprise the volant (flighted) tinamous and the flightless ratites. Resolving palaeognath phylogenetic relationships has historically proved difficult, and short internal branches separating major palaeognath lineages in previous molecular phylogenies suggest that extensive incomplete lineage sorting (ILS) might have accompanied a rapid ancient divergence. Here, we investigate palaeognath relationships using genome-wide data sets of three types of noncoding nuclear markers, together totaling 20,850 loci and over 41 million base pairs of aligned sequence data. We recover a fully resolved topology placing rheas as the sister to kiwi and emu + cassowary that is congruent across marker types for two species tree methods (MP-EST and ASTRAL-II). This topology is corroborated by patterns of insertions for 4274 CR1 retroelements identified from multispecies whole-genome screening, and is robustly supported by phylogenomic subsampling analyses, with MP-EST demonstrating particularly consistent performance across subsampling replicates as compared to ASTRAL. In contrast, analyses of concatenated data supermatrices recover rheas as the sister to all other nonostrich palaeognaths, an alternative that lacks retroelement support and shows inconsistent behavior under subsampling approaches. While statistically supporting the species tree topology, conflicting patterns of retroelement insertions also occur and imply high amounts of ILS across short successive internal branches, consistent with observed patterns of gene tree heterogeneity. Coalescent simulations and topology tests indicate that the majority of observed topological incongruence among gene trees is consistent with coalescent variation rather than arising from gene tree estimation error alone, and estimated branch lengths for short successive internodes in the inferred species tree fall within the theoretical range encompassing the anomaly zone. Distributions of empirical gene trees confirm that the most common gene tree topology for each marker type differs from the species tree, signifying the existence of an empirical anomaly zone in palaeognaths.


Assuntos
Genoma/genética , Paleógnatas/classificação , Paleógnatas/genética , Filogenia , Animais , Genômica
15.
Nature ; 514(7524): 646-9, 2014 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-25156254

RESUMO

Protein machines are multi-subunit protein complexes that orchestrate highly regulated biochemical tasks. An example is the anaphase-promoting complex/cyclosome (APC/C), a 13-subunit ubiquitin ligase that initiates the metaphase-anaphase transition and mitotic exit by targeting proteins such as securin and cyclin B1 for ubiquitin-dependent destruction by the proteasome. Because blocking mitotic exit is an effective approach for inducing tumour cell death, the APC/C represents a potential novel target for cancer therapy. APC/C activation in mitosis requires binding of Cdc20 (ref. 5), which forms a co-receptor with the APC/C to recognize substrates containing a destruction box (D-box). Here we demonstrate that we can synergistically inhibit APC/C-dependent proteolysis and mitotic exit by simultaneously disrupting two protein-protein interactions within the APC/C-Cdc20-substrate ternary complex. We identify a small molecule, called apcin (APC inhibitor), which binds to Cdc20 and competitively inhibits the ubiquitylation of D-box-containing substrates. Analysis of the crystal structure of the apcin-Cdc20 complex suggests that apcin occupies the D-box-binding pocket on the side face of the WD40-domain. The ability of apcin to block mitotic exit is synergistically amplified by co-addition of tosyl-l-arginine methyl ester, a small molecule that blocks the APC/C-Cdc20 interaction. This work suggests that simultaneous disruption of multiple, weak protein-protein interactions is an effective approach for inactivating a protein machine.


Assuntos
Ciclossomo-Complexo Promotor de Anáfase/química , Ciclossomo-Complexo Promotor de Anáfase/metabolismo , Carbamatos/farmacologia , Diaminas/farmacologia , Mitose/efeitos dos fármacos , Tosilarginina Metil Éster/farmacologia , Sítios de Ligação/efeitos dos fármacos , Proteínas Cdc20/química , Proteínas Cdc20/metabolismo , Morte Celular/efeitos dos fármacos , Cristalografia por Raios X , Sinergismo Farmacológico , Ligação Proteica/efeitos dos fármacos , Proteólise/efeitos dos fármacos , Ubiquitinação/efeitos dos fármacos
16.
Mol Biol Evol ; 34(4): 857-872, 2017 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-28087775

RESUMO

The house fly, Musca domestica, occupies an unusual diversity of potentially septic niches compared with other sequenced Dipteran insects and is a vector of numerous diseases of humans and livestock. In the present study, we apply whole-transcriptome sequencing to identify genes whose expression is regulated in adult flies upon bacterial infection. We then combine the transcriptomic data with analysis of rates of gene duplication and loss to provide insight into the evolutionary dynamics of immune-related genes. Genes up-regulated after bacterial infection are biased toward being evolutionarily recent innovations, suggesting the recruitment of novel immune components in the M. domestica or ancestral Dipteran lineages. In addition, using new models of gene family evolution, we show that several different classes of immune-related genes, particularly those involved in either pathogen recognition or pathogen killing, are duplicating at a significantly accelerated rate on the M. domestica lineage relative to other Dipterans. Taken together, these results suggest that the M. domestica immune response includes an elevated diversity of genes, perhaps as a consequence of its lifestyle in septic environments.


Assuntos
Imunidade Adaptativa/genética , Moscas Domésticas/genética , Animais , Sequência de Bases/genética , Deleção de Genes , Duplicação Gênica/genética , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica/genética , Regulação da Expressão Gênica/imunologia , Variação Genética/genética , Anotação de Sequência Molecular , Taxa de Mutação , Transcriptoma/genética
17.
PLoS Biol ; 13(4): e1002112, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25859758

RESUMO

The neutral theory of molecular evolution predicts that the amount of neutral polymorphisms within a species will increase proportionally with the census population size (Nc). However, this prediction has not been borne out in practice: while the range of Nc spans many orders of magnitude, levels of genetic diversity within species fall in a comparatively narrow range. Although theoretical arguments have invoked the increased efficacy of natural selection in larger populations to explain this discrepancy, few direct empirical tests of this hypothesis have been conducted. In this work, we provide a direct test of this hypothesis using population genomic data from a wide range of taxonomically diverse species. To do this, we relied on the fact that the impact of natural selection on linked neutral diversity depends on the local recombinational environment. In regions of relatively low recombination, selected variants affect more neutral sites through linkage, and the resulting correlation between recombination and polymorphism allows a quantitative assessment of the magnitude of the impact of selection on linked neutral diversity. By comparing whole genome polymorphism data and genetic maps using a coalescent modeling framework, we estimate the degree to which natural selection reduces linked neutral diversity for 40 species of obligately sexual eukaryotes. We then show that the magnitude of the impact of natural selection is positively correlated with Nc, based on body size and species range as proxies for census population size. These results demonstrate that natural selection removes more variation at linked neutral sites in species with large Nc than those with small Nc and provides direct empirical evidence that natural selection constrains levels of neutral genetic diversity across many species. This implies that natural selection may provide an explanation for this longstanding paradox of population genetics.


Assuntos
Biodiversidade , Seleção Genética , Animais , Modelos Teóricos , Plantas/genética
19.
BMC Genomics ; 17: 678, 2016 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-27561358

RESUMO

BACKGROUND: Nasonia vitripennis is an emerging insect model system with haplodiploid genetics. It holds a key position within the insect phylogeny for comparative, evolutionary and behavioral genetic studies. The draft genomes for N. vitripennis and two sibling species were published in 2010, yet a considerable amount of transcriptiome data have since been produced thereby enabling improvements to the original (OGS1.2) annotated gene set. We describe and apply the EvidentialGene method used to produce an updated gene set (OGS2). We also carry out comparative analyses showcasing the usefulness of the revised annotated gene set. RESULTS: The revised annotation (OGS2) now consists of 24,388 genes with supporting evidence, compared to 18,850 for OGS1.2. Improvements include the nearly complete annotation of untranslated regions (UTR) for 97 % of the genes compared to 28 % of genes for OGS1.2. The fraction of RNA-Seq validated introns also grow from 85 to 98 % in this latest gene set. The EST and RNA-Seq expression data provide support for several non-protein coding loci and 7712 alternative transcripts for 4146 genes. Notably, we report 180 alternative transcripts for the gene lola. Nasonia now has among the most complete insect gene set; only 27 conserved single copy orthologs in arthropods are missing from OGS2. Its genome also contains 2.1-fold more duplicated genes and 1.4-fold more single copy genes than the Drosophila melanogaster genome. The Nasonia gene count is larger than those of other sequenced hymenopteran species, owing both to improvements in the genome annotation and to unique genes in the wasp lineage. We identify 1008 genes and 171 gene families that deviate significantly from other hymenopterans in their rates of protein evolution and duplication history, respectively. We also provide an analysis of alternative splicing that reveals that genes with no annotated isoforms are characterized by shorter transcripts, fewer introns, faster protein evolution and higher probabilities of duplication than genes having alternative transcripts. CONCLUSIONS: Genome-wide expression data greatly improves the annotation of the N. vitripennis genome, by increasing the gene count, reducing the number of missing genes and providing more comprehensive data on splicing and gene structure. The improved gene set identifies lineage-specific genomic features tied to Nasonia's biology, as well as numerous novel genes. OGS2 and its associated search tools are available at http://arthropods.eugenes.org/EvidentialGene/nasonia/ , www.hymenopteragenome.org/nasonia/ and waspAtlas: www.tinyURL.com/waspAtlas . The EvidentialGene pipeline is available at https://sourceforge.net/projects/evidentialgene/ .


Assuntos
Biologia Computacional/métodos , Genoma de Inseto , Genômica , Vespas/genética , Processamento Alternativo , Animais , Mapeamento de Sequências Contíguas , Bases de Dados de Ácidos Nucleicos , Evolução Molecular , Perfilação da Expressão Gênica/métodos , Genes de Insetos , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Anotação de Sequência Molecular , Família Multigênica , Fases de Leitura Aberta , RNA não Traduzido , Software , Navegador
20.
Mol Biol Evol ; 31(12): 3148-63, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25158796

RESUMO

Native to Asia, the soft-skinned fruit pest Drosophila suzukii has recently invaded the United States and Europe. The eastern United States represents the most recent expansion of their range, and presents an opportunity to test alternative models of colonization history. Here, we investigate the genetic population structure of this invasive fruit fly, with a focus on the eastern United States. We sequenced six X-linked gene fragments from 246 individuals collected from a total of 12 populations. We examine patterns of genetic diversity within and between populations and explore alternative colonization scenarios using approximate Bayesian computation. Our results indicate high levels of nucleotide diversity in this species and suggest that the recent invasions of Europe and the continental United States are independent demographic events. More broadly speaking, our results highlight the importance of integrating population structure into demographic models, particularly when attempting to reconstruct invasion histories. Finally, our simulation results illustrate the general challenge in reconstructing invasion histories using genetic data and suggest that genome-level data are often required to distinguish among alternative demographic scenarios.


Assuntos
Drosophila/genética , Animais , Teorema de Bayes , Genes de Insetos , Variação Genética , Haplótipos , Espécies Introduzidas , Masculino , Repetições de Microssatélites , Modelos Genéticos , Espanha , Estados Unidos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa