RESUMO
The iris of the eye shows striking color variation across vertebrate species, and may play important roles in crypsis and communication. The domestic pigeon (Columba livia) has three common iris colors, orange, pearl (white), and bull (dark brown), segregating in a single species, thereby providing a unique opportunity to identify the genetic basis of iris coloration. We used comparative genomics and genetic mapping in laboratory crosses to identify two candidate genes that control variation in iris color in domestic pigeons. We identified a nonsense mutation in the solute carrier SLC2A11B that is shared among all pigeons with pearl eye color, and a locus associated with bull eye color that includes EDNRB2, a gene involved in neural crest migration and pigment development. However, bull eye is likely controlled by a heterogeneous collection of alleles across pigeon breeds. We also found that the EDNRB2 region is associated with regionalized plumage depigmentation (piebalding). Our study identifies two candidate genes for eye colors variation, and establishes a genetic link between iris and plumage color, two traits that vary widely in the evolution of birds and other vertebrates.
Assuntos
Columbidae , Cor de Olho , Alelos , Animais , Bovinos , Columbidae/genética , Cor de Olho/genética , Genômica , Masculino , Melhoramento VegetalRESUMO
High-throughput sequencing data are increasingly being made available to the research community for secondary analyses, providing new opportunities for large-scale association studies. However, heterogeneity in target capture and sequencing technologies often introduce strong technological stratification biases that overwhelm subtle signals of association in studies of complex traits. Here, we introduce the Cross-Platform Association Toolkit, XPAT, which provides a suite of tools designed to support and conduct large-scale association studies with heterogeneous sequencing datasets. XPAT includes tools to support cross-platform aware variant calling, quality control filtering, gene-based association testing and rare variant effect size estimation. To evaluate the performance of XPAT, we conducted case-control association studies for three diseases, including 783 breast cancer cases, 272 ovarian cancer cases, 205 Crohn disease cases and 3507 shared controls (including 1722 females) using sequencing data from multiple sources. XPAT greatly reduced Type I error inflation in the case-control analyses, while replicating many previously identified disease-gene associations. We also show that association tests conducted with XPAT using cross-platform data have comparable performance to tests using matched platform data. XPAT enables new association studies that combine existing sequencing datasets to identify genetic loci associated with common diseases and other complex traits.
Assuntos
Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único , Algoritmos , Neoplasias da Mama/genética , Estudos de Casos e Controles , Doença de Crohn/genética , Feminino , Predisposição Genética para Doença/genética , Humanos , Masculino , Neoplasias Ovarianas/genética , SoftwareRESUMO
Motivation: Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed and produce functional proteins. Results: We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and non-coding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or non-coding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products and we propose that they may commonly act as cryptic factors in disease. Availability and implementation: The software is available from geneprediction.org/SGRF. Supplementary information: Supplementary information is available at Bioinformatics online.
Assuntos
Éxons , Splicing de RNA , Software , Humanos , Anotação de Sequência Molecular , Análise de Sequência de RNARESUMO
MOTIVATION: The accurate interpretation of genetic variants is critical for characterizing genotype-phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. RESULTS: We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE ('Assessing Changes to Exons') converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene-structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA-seq reads, and can be used to improve read alignment and transcript quantification when an individual-specific genome sequence is available. Using publicly available RNA-seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense-mediated decay, and we show that predicted loss-of-function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population-based resequencing projects, particularly for assessing the joint impact of all variants at a locus. AVAILABILITY AND IMPLEMENTATION: ACE is written in open-source C ++ and Perl and is available from geneprediction.org/ACE. CONTACT: myandell@genetics.utah.edu or tim.reddy@duke.edu. SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.
Assuntos
Genômica/métodos , Polimorfismo Genético , Análise de Sequência de RNA/métodos , Software , Animais , Eucariotos/genética , Éxons , Haplótipos , Humanos , Mutação , Splicing de RNARESUMO
The large size and relative complexity of many plant genomes make creation, quality control, and dissemination of high-quality gene structure annotations challenging. In response, we have developed MAKER-P, a fast and easy-to-use genome annotation engine for plants. Here, we report the use of MAKER-P to update and revise the maize (Zea mays) B73 RefGen_v3 annotation build (5b+) in less than 3 h using the iPlant Cyberinfrastructure. MAKER-P identified and annotated 4,466 additional, well-supported protein-coding genes not present in the 5b+ annotation build, added additional untranslated regions to 1,393 5b+ gene models, identified 2,647 5b+ gene models that lack any supporting evidence (despite the use of large and diverse evidence data sets), identified 104,215 pseudogene fragments, and created an additional 2,522 noncoding gene annotations. We also describe a method for de novo training of MAKER-P for the annotation of newly sequenced grass genomes. Collectively, these results lead to the 6a maize genome annotation and demonstrate the utility of MAKER-P for rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes.
Assuntos
Genes de Plantas/genética , Genoma de Planta/genética , Anotação de Sequência Molecular/métodos , Zea mays/genética , Bases de Dados Genéticas/normas , Éxons/genética , Íntrons/genética , Modelos Genéticos , Anotação de Sequência Molecular/normas , Pseudogenes/genética , Controle de Qualidade , RNA não Traduzido/genéticaRESUMO
MOTIVATION: Copy number variations (CNVs) are a major source of genomic variability and are especially significant in cancer. Until recently microarray technologies have been used to characterize CNVs in genomes. However, advances in next-generation sequencing technology offer significant opportunities to deduce copy number directly from genome sequencing data. Unfortunately cancer genomes differ from normal genomes in several aspects that make them far less amenable to copy number detection. For example, cancer genomes are often aneuploid and an admixture of diploid/non-tumor cell fractions. Also patient-derived xenograft models can be laden with mouse contamination that strongly affects accurate assignment of copy number. Hence, there is a need to develop analytical tools that can take into account cancer-specific parameters for detecting CNVs directly from genome sequencing data. RESULTS: We have developed WaveCNV, a software package to identify copy number alterations by detecting breakpoints of CNVs using translation-invariant discrete wavelet transforms and assign digitized copy numbers to each event using next-generation sequencing data. We also assign alleles specifying the chromosomal ratio following duplication/loss. We verified copy number calls using both microarray (correlation coefficient 0.97) and quantitative polymerase chain reaction (correlation coefficient 0.94) and found them to be highly concordant. We demonstrate its utility in pancreatic primary and xenograft sequencing data. AVAILABILITY AND IMPLEMENTATION: Source code and executables are available at https://github.com/WaveCNV. The segmentation algorithm is implemented in MATLAB, and copy number assignment is implemented Perl. CONTACT: lakshmi.muthuswamy@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Variações do Número de Cópias de DNA , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias/genética , Algoritmos , Alelos , Aneuploidia , Animais , Humanos , Camundongos , Análise de Sequência de DNA , Software , Ensaios Antitumorais Modelo de XenoenxertoRESUMO
We have optimized and extended the widely used annotation engine MAKER in order to better support plant genome annotation efforts. New features include better parallelization for large repeat-rich plant genomes, noncoding RNA annotation capabilities, and support for pseudogene identification. We have benchmarked the resulting software tool kit, MAKER-P, using the Arabidopsis (Arabidopsis thaliana) and maize (Zea mays) genomes. Here, we demonstrate the ability of the MAKER-P tool kit to automatically update, extend, and revise the Arabidopsis annotations in light of newly available data and to annotate pseudogenes and noncoding RNAs absent from The Arabidopsis Informatics Resource 10 build. Our results demonstrate that MAKER-P can be used to manage and improve the annotations of even Arabidopsis, perhaps the best-annotated plant genome. We have also installed and benchmarked MAKER-P on the Texas Advanced Computing Center. We show that this public resource can de novo annotate the entire Arabidopsis and maize genomes in less than 3 h and produce annotations of comparable quality to those of the current The Arabidopsis Information Resource 10 and maize V2 annotation builds.
Assuntos
Arabidopsis/genética , Biologia Computacional/métodos , Genoma de Planta/genética , Anotação de Sequência Molecular/métodos , Software , Zea mays/genética , Processamento Alternativo/genética , Éxons/genética , Genes de Plantas/genética , Pseudogenes/genética , Sequências Repetitivas de Ácido Nucleico/genética , Reprodutibilidade dos TestesRESUMO
Leaf-cutter ants are one of the most important herbivorous insects in the Neotropics, harvesting vast quantities of fresh leaf material. The ants use leaves to cultivate a fungus that serves as the colony's primary food source. This obligate ant-fungus mutualism is one of the few occurrences of farming by non-humans and likely facilitated the formation of their massive colonies. Mature leaf-cutter ant colonies contain millions of workers ranging in size from small garden tenders to large soldiers, resulting in one of the most complex polymorphic caste systems within ants. To begin uncovering the genomic underpinnings of this system, we sequenced the genome of Atta cephalotes using 454 pyrosequencing. One prediction from this ant's lifestyle is that it has undergone genetic modifications that reflect its obligate dependence on the fungus for nutrients. Analysis of this genome sequence is consistent with this hypothesis, as we find evidence for reductions in genes related to nutrient acquisition. These include extensive reductions in serine proteases (which are likely unnecessary because proteolysis is not a primary mechanism used to process nutrients obtained from the fungus), a loss of genes involved in arginine biosynthesis (suggesting that this amino acid is obtained from the fungus), and the absence of a hexamerin (which sequesters amino acids during larval development in other insects). Following recent reports of genome sequences from other insects that engage in symbioses with beneficial microbes, the A. cephalotes genome provides new insights into the symbiotic lifestyle of this ant and advances our understanding of host-microbe symbioses.
Assuntos
Formigas/fisiologia , Genoma de Inseto/genética , Folhas de Planta/fisiologia , Simbiose , Animais , Formigas/genética , Arginina/genética , Arginina/metabolismo , Sequência de Bases , Fungos/genética , Proteínas de Insetos/genética , Proteínas de Insetos/metabolismo , Análise de Sequência de DNA , Serina Proteases/genética , Serina Proteases/metabolismoRESUMO
Ants are some of the most abundant and familiar animals on Earth, and they play vital roles in most terrestrial ecosystems. Although all ants are eusocial, and display a variety of complex and fascinating behaviors, few genomic resources exist for them. Here, we report the draft genome sequence of a particularly widespread and well-studied species, the invasive Argentine ant (Linepithema humile), which was accomplished using a combination of 454 (Roche) and Illumina sequencing and community-based funding rather than federal grant support. Manual annotation of >1,000 genes from a variety of different gene families and functional classes reveals unique features of the Argentine ant's biology, as well as similarities to Apis mellifera and Nasonia vitripennis. Distinctive features of the Argentine ant genome include remarkable expansions of gustatory (116 genes) and odorant receptors (367 genes), an abundance of cytochrome P450 genes (>110), lineage-specific expansions of yellow/major royal jelly proteins and desaturases, and complete CpG DNA methylation and RNAi toolkits. The Argentine ant genome contains fewer immune genes than Drosophila and Tribolium, which may reflect the prominent role played by behavioral and chemical suppression of pathogens. Analysis of the ratio of observed to expected CpG nucleotides for genes in the reproductive development and apoptosis pathways suggests higher levels of methylation than in the genome overall. The resources provided by this genome sequence will offer an abundance of tools for researchers seeking to illuminate the fascinating biology of this emerging model organism.
Assuntos
Formigas/genética , Genoma de Inseto/genética , Genômica/métodos , Filogenia , Animais , Formigas/fisiologia , Sequência de Bases , California , Metilação de DNA , Biblioteca Gênica , Genética Populacional , Hierarquia Social , Dados de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , Receptores Odorantes/genética , Análise de Sequência de DNARESUMO
We report the draft genome sequence of the red harvester ant, Pogonomyrmex barbatus. The genome was sequenced using 454 pyrosequencing, and the current assembly and annotation were completed in less than 1 y. Analyses of conserved gene groups (more than 1,200 manually annotated genes to date) suggest a high-quality assembly and annotation comparable to recently sequenced insect genomes using Sanger sequencing. The red harvester ant is a model for studying reproductive division of labor, phenotypic plasticity, and sociogenomics. Although the genome of P. barbatus is similar to other sequenced hymenopterans (Apis mellifera and Nasonia vitripennis) in GC content and compositional organization, and possesses a complete CpG methylation toolkit, its predicted genomic CpG content differs markedly from the other hymenopterans. Gene networks involved in generating key differences between the queen and worker castes (e.g., wings and ovaries) show signatures of increased methylation and suggest that ants and bees may have independently co-opted the same gene regulatory mechanisms for reproductive division of labor. Gene family expansions (e.g., 344 functional odorant receptors) and pseudogene accumulation in chemoreception and P450 genes compared with A. mellifera and N. vitripennis are consistent with major life-history changes during the adaptive radiation of Pogonomyrmex spp., perhaps in parallel with the development of the North American deserts.
Assuntos
Formigas/genética , Redes Reguladoras de Genes/genética , Genoma de Inseto/genética , Genômica/métodos , Filogenia , Animais , Formigas/fisiologia , Sequência de Bases , Clima Desértico , Hierarquia Social , Dados de Sequência Molecular , América do Norte , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Receptores Odorantes/genética , Análise de Sequência de DNARESUMO
Pigeons and doves (family Columbidae) are one of the most diverse extant avian lineages, and many species have served as key models for evolutionary genomics, developmental biology, physiology, and behavioral studies. Building genomic resources for columbids is essential to further many of these studies. Here, we present high-quality genome assemblies and annotations for 2 columbid species, Columba livia and Columba guinea. We simultaneously assembled C. livia and C. guinea genomes from long-read sequencing of a single F1 hybrid individual. The new C. livia genome assembly (Cliv_3) shows improved completeness and contiguity relative to Cliv_2.1, with an annotation incorporating long-read IsoSeq data for more accurate gene models. Intensive selective breeding of C. livia has given rise to hundreds of breeds with diverse morphological and behavioral characteristics, and Cliv_3 offers improved tools for mapping the genomic architecture of interesting traits. The C. guinea genome assembly is the first for this species and is a new resource for avian comparative genomics. Together, these assemblies and annotations provide improved resources for functional studies of columbids and avian comparative genomics in general.
Assuntos
Columbidae , Genoma , Animais , Columbidae/genética , Guiné , Evolução BiológicaRESUMO
Pigeons and doves (family Columbidae) are one of the most diverse extant avian lineages, and many species have served as key models for evolutionary genomics, developmental biology, physiology, and behavioral studies. Building genomic resources for colubids is essential to further many of these studies. Here, we present high-quality genome assemblies and annotations for two columbid species, Columba livia and C. guinea. We simultaneously assembled C. livia and C. guinea genomes from long-read sequencing of a single F1 hybrid individual. The new C. livia genome assembly (Cliv_3) shows improved completeness and contiguity relative to Cliv_2.1, with an annotation incorporating long-read IsoSeq data for more accurate gene models. Intensive selective breeding of C. livia has given rise to hundreds of breeds with diverse morphological and behavioral characteristics, and Cliv_3 offers improved tools for mapping the genomic architecture of interesting traits. The C. guinea genome assembly is the first for this species and is a new resource for avian comparative genomics. Together, these assemblies and annotations provide improved resources for functional studies of columbids and avian comparative genomics in general.
RESUMO
Programmed DNA loss is a gene silencing mechanism that is employed by several vertebrate and nonvertebrate lineages, including all living jawless vertebrates and songbirds. Reconstructing the evolution of somatically eliminated (germline-specific) sequences in these species has proven challenging due to a high content of repeats and gene duplications in eliminated sequences and a corresponding lack of highly accurate and contiguous assemblies for these regions. Here, we present an improved assembly of the sea lamprey (Petromyzon marinus) genome that was generated using recently standardized methods that increase the contiguity and accuracy of vertebrate genome assemblies. This assembly resolves highly contiguous, somatically retained chromosomes and at least one germline-specific chromosome, permitting new analyses that reconstruct the timing, mode, and repercussions of recruitment of genes to the germline-specific fraction. These analyses reveal major roles of interchromosomal segmental duplication, intrachromosomal duplication, and positive selection for germline functions in the long-term evolution of germline-specific chromosomes.
Assuntos
Petromyzon , Animais , Petromyzon/genética , Cromossomos/genética , DNA/genética , Genoma , Vertebrados/genética , Células Germinativas , Evolução Molecular , FilogeniaRESUMO
Congenital myasthenic syndrome (CMS) is a group of 32 disorders involving genetic dysfunction at the neuromuscular junction resulting in skeletal muscle weakness that worsens with physical activity. Precise diagnosis and molecular subtype identification are critical for treatment as medication for one subtype may exacerbate disease in another (Engel et al., Lancet Neurol 14: 420 [2015]; Finsterer, Orphanet J Rare Dis 14: 57 [2019]; Prior and Ghosh, J Child Neurol 36: 610 [2021]). The SNAP25-related CMS subtype (congenital myasthenic syndrome 18, CMS18; MIM #616330) is a rare disorder characterized by muscle fatigability, delayed psychomotor development, and ataxia. Herein, we performed rapid whole-genome sequencing (rWGS) on a critically ill newborn leading to the discovery of an unreported pathogenic de novo SNAP25 c.529C > T; p.Gln177Ter variant. In this report, we present a novel case of CMS18 with complex neonatal consequence. This discovery offers unique insight into the extent of phenotypic severity in CMS18, expands the reported SNAP25 variant phenotype, and paves a foundation for personalized management for CMS18.
Assuntos
Síndromes Miastênicas Congênitas , Humanos , Mapeamento Cromossômico , Síndromes Miastênicas Congênitas/diagnóstico , Síndromes Miastênicas Congênitas/genética , Linhagem , Fenótipo , Proteína 25 Associada a Sinaptossoma/genética , Sequenciamento Completo do GenomaRESUMO
BACKGROUND: Genetic disorders contribute to significant morbidity and mortality in critically ill newborns. Despite advances in genome sequencing technologies, a majority of neonatal cases remain unsolved. Complex structural variants (SVs) often elude conventional genome sequencing variant calling pipelines and will explain a portion of these unsolved cases. METHODS: As part of the Utah NeoSeq project, we used a research-based, rapid whole-genome sequencing (WGS) protocol to investigate the genomic etiology for a newborn with a left-sided congenital diaphragmatic hernia (CDH) and cardiac malformations, whose mother also had a history of CDH and atrial septal defect. RESULTS: Using both a novel, alignment-free and traditional alignment-based variant callers, we identified a maternally inherited complex SV on chromosome 8, consisting of an inversion flanked by deletions. This complex inversion, further confirmed using orthogonal molecular techniques, disrupts the ZFPM2 gene, which is associated with both CDH and various congenital heart defects. CONCLUSIONS: Our results demonstrate that complex structural events, which often are unidentifiable or not reported by clinically validated testing procedures, can be discovered and accurately characterized with conventional, short-read sequencing and underscore the utility of WGS as a first-line diagnostic tool.
Assuntos
Hérnias Diafragmáticas Congênitas , Proteínas de Ligação a DNA/genética , Genômica , Hérnias Diafragmáticas Congênitas/genética , Humanos , Recém-Nascido , Fatores de Transcrição/genética , Sequenciamento Completo do Genoma/métodosRESUMO
BACKGROUND: Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. RESULTS: We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. CONCLUSIONS: MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.
Assuntos
Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Animais , Genoma , Humanos , Anotação de Sequência Molecular , Plantas/genéticaRESUMO
Vertebrate craniofacial morphogenesis is a highly orchestrated process that is directed by evolutionarily conserved developmental pathways.1,2 Within species, canalized development typically produces modest morphological variation. However, as a result of millennia of artificial selection, the domestic pigeon displays radical craniofacial variation within a single species. One of the most striking cases of pigeon craniofacial variation is the short-beak phenotype, which has been selected in numerous breeds. Classical genetic experiments suggest that pigeon beak length is regulated by a small number of genetic factors, one of which is sex linked (Ku2 locus).3-5 However, the genetic underpinnings of pigeon craniofacial variation remain unknown. Using geometric morphometrics and quantitative trait locus (QTL) mapping on an F2 intercross between a short-beaked Old German Owl (OGO) and a medium-beaked Racing Homer (RH), we identified a single Z chromosome locus that explains a majority of the variation in beak morphology in the F2 population. Complementary comparative genomic analyses revealed that the same locus is strongly differentiated between breeds with short and medium beaks. Within the Ku2 locus, we identified an amino acid substitution in the non-canonical Wnt receptor ROR2 as a putative regulator of pigeon beak length. The non-canonical Wnt pathway serves critical roles in vertebrate neural crest cell migration and craniofacial morphogenesis.6,7 In humans, ROR2 mutations cause Robinow syndrome, a congenital disorder characterized by skeletal abnormalities, including a widened and shortened facial skeleton.8,9 Our results illustrate how the extraordinary craniofacial variation among pigeons can reveal genetic regulators of vertebrate craniofacial diversity.
Assuntos
Anormalidades Craniofaciais , Nanismo , Deformidades Congênitas dos Membros , Anormalidades Urogenitais , Animais , Columbidae/genética , Anormalidades Craniofaciais/genética , Nanismo/genética , Deformidades Congênitas dos Membros/genética , Anormalidades Urogenitais/genéticaRESUMO
BACKGROUND: In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. RESULTS: We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS) sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (> or = 75% nucleotide identity) elsewhere in the genome, but only 23% have identical copies (99% identity). The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. CONCLUSIONS: This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is a feasible goal.
Assuntos
Genoma de Planta , Pinus taeda/genética , Sequências Repetitivas de Ácido Nucleico , DNA de Plantas/química , Genes de Plantas , Variação Genética , Magnoliopsida/genética , Repetições Minissatélites , Retroelementos , Análise de Sequência de DNA , Sequências de Repetição em Tandem , Sequências Repetidas TerminaisRESUMO
BACKGROUND: The ever-increasing number of sequenced and annotated genomes has made management of their annotations a significant undertaking, especially for large eukaryotic genomes containing many thousands of genes. Typically, changes in gene and transcript numbers are used to summarize changes from release to release, but these measures say nothing about changes to individual annotations, nor do they provide any means to identify annotations in need of manual review. RESULTS: In response, we have developed a suite of quantitative measures to better characterize changes to a genome's annotations between releases, and to prioritize problematic annotations for manual review. We have applied these measures to the annotations of five eukaryotic genomes over multiple releases -- H. sapiens, M. musculus, D. melanogaster, A. gambiae, and C. elegans. CONCLUSION: Our results provide the first detailed, historical overview of how these genomes' annotations have changed over the years, and demonstrate the usefulness of these measures for genome annotation management.
Assuntos
Genoma , Genômica/métodos , Processamento Alternativo , Animais , Bases de Dados GenéticasRESUMO
The domestic rock pigeon (Columba livia) is among the most widely distributed and phenotypically diverse avian species. C. livia is broadly studied in ecology, genetics, physiology, behavior, and evolutionary biology, and has recently emerged as a model for understanding the molecular basis of anatomical diversity, the magnetic sense, and other key aspects of avian biology. Here we report an update to the C. livia genome reference assembly and gene annotation dataset. Greatly increased scaffold lengths in the updated reference assembly, along with an updated annotation set, provide improved tools for evolutionary and functional genetic studies of the pigeon, and for comparative avian genomics in general.