RESUMO
Genomic survey data now permit an unprecedented level of sensitivity in the detection of departures from canonical evolutionary models, including expansions in population size and selective sweeps. Here, we examine the effects of seemingly subtle differences among sampling distributions on goodness of fit analyses of site frequency spectra constructed from single nucleotide polymorphisms. Conditioning on the observation of exactly two alleles in a random sample results in a site frequency spectrum that is independent of the scaled rate of neutral substitution (theta). Other sampling distributions, including conditioning on a single mutational event in the sample genealogy or randomly selecting a single mutation from a genealogy with multiple mutations, have distinct site frequency spectra that show highly significant departures from the predictions of the biallelic model. Some aspects of data filtering may contribute to significant departures of site frequency spectra from expectation, apart from any violation of the standard neutral model.
Assuntos
Genoma , Modelos Teóricos , Polimorfismo de Nucleotídeo Único , AlelosRESUMO
Identifying common patterns among area cladograms that arise in historical biogeography is an important tool for biogeographical inference. We develop the first rigorous formalization of these pattern-identification problems. We develop metrics to compare area cladograms. We define the maximum agreement area cladogram (MAAC) and we develop efficient algorithms for finding the MAAC of two area cladograms, while showing that it is NP-hard to find the MAAC of several binary area cladograms. We also describe a linear-time algorithm to identify if two area cladograms are identical.
Assuntos
Algoritmos , Demografia , Modelos Genéticos , Reconhecimento Automatizado de Padrão/métodos , Filogenia , Dinâmica Populacional , Simulação por ComputadorRESUMO
BACKGROUND: Parrots belong to a group of behaviorally advanced vertebrates and have an advanced ability of vocal learning relative to other vocal-learning birds. They can imitate human speech, synchronize their body movements to a rhythmic beat, and understand complex concepts of referential meaning to sounds. However, little is known about the genetics of these traits. Elucidating the genetic bases would require whole genome sequencing and a robust assembly of a parrot genome. FINDINGS: We present a genomic resource for the budgerigar, an Australian Parakeet (Melopsittacus undulatus) -- the most widely studied parrot species in neuroscience and behavior. We present genomic sequence data that includes over 300× raw read coverage from multiple sequencing technologies and chromosome optical maps from a single male animal. The reads and optical maps were used to create three hybrid assemblies representing some of the largest genomic scaffolds to date for a bird; two of which were annotated based on similarities to reference sets of non-redundant human, zebra finch and chicken proteins, and budgerigar transcriptome sequence assemblies. The sequence reads for this project were in part generated and used for both the Assemblathon 2 competition and the first de novo assembly of a giga-scale vertebrate genome utilizing PacBio single-molecule sequencing. CONCLUSIONS: Across several quality metrics, these budgerigar assemblies are comparable to or better than the chicken and zebra finch genome assemblies built from traditional Sanger sequencing reads, and are sufficient to analyze regions that are difficult to sequence and assemble, including those not yet assembled in prior bird genomes, and promoter regions of genes differentially regulated in vocal learning brain regions. This work provides valuable data and material for genome technology development and for investigating the genomics of complex behavioral traits.
RESUMO
Song-learning birds and humans share independently evolved similarities in brain pathways for vocal learning that are essential for song and speech and are not found in most other species. Comparisons of brain transcriptomes of song-learning birds and humans relative to vocal nonlearners identified convergent gene expression specializations in specific song and speech brain regions of avian vocal learners and humans. The strongest shared profiles relate bird motor and striatal song-learning nuclei, respectively, with human laryngeal motor cortex and parts of the striatum that control speech production and learning. Most of the associated genes function in motor control and brain connectivity. Thus, convergent behavior and neural connectivity for a complex trait are associated with convergent specialized expression of multiple genes.
Assuntos
Encéfalo/fisiologia , Tentilhões/genética , Tentilhões/fisiologia , Regulação da Expressão Gênica , Aprendizagem , Fala , Transcriptoma , Vocalização Animal , Adulto , Animais , Aves/genética , Aves/fisiologia , Encéfalo/anatomia & histologia , Mapeamento Encefálico , Corpo Estriado/anatomia & histologia , Corpo Estriado/fisiologia , Evolução Molecular , Humanos , Masculino , Córtex Motor/anatomia & histologia , Córtex Motor/fisiologia , Vias Neurais , Especificidade da Espécie , Transcrição GênicaRESUMO
Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits.
Assuntos
Evolução Biológica , Aves/genética , Evolução Molecular , Genoma , Adaptação Fisiológica , Animais , Biodiversidade , Aves/classificação , Aves/fisiologia , Sequência Conservada , Dieta , Feminino , Voo Animal , Genes , Variação Genética , Genômica , Masculino , Anotação de Sequência Molecular , Filogenia , Reprodução/genética , Seleção Genética , Análise de Sequência de DNA , Sintenia , Visão Ocular/genética , Vocalização AnimalRESUMO
BACKGROUND: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. RESULTS: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. CONCLUSIONS: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
RESUMO
Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.
Assuntos
Biologia Computacional/métodos , Análise de Sequência de RNA/métodos , Transcriptoma/genética , Algoritmos , Bactérias/genética , Bacteriófagos/genética , RNA/genética , Zea mays/genéticaRESUMO
Synthetic science promises an unparalleled ability to find new meaning in old data, extant results, or previously unconnected methods and concepts, but pursuing synthesis can be a difficult and risky endeavor. Our experience as biologists, informaticians, and educators at the National Evolutionary Synthesis Center has affirmed that synthesis can yield major insights, but also revealed that technological hurdles, prevailing academic culture, and general confusion about the nature of synthesis can hamper its progress. By presenting our view of what synthesis is, why it will continue to drive progress in evolutionary biology, and how to remove barriers to its progress, we provide a map to a future in which all scientists can engage productively in synthetic research.