Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
Cell ; 176(4): 743-756.e17, 2019 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-30735633

RESUMO

Direct comparisons of human and non-human primate brains can reveal molecular pathways underlying remarkable specializations of the human brain. However, chimpanzee tissue is inaccessible during neocortical neurogenesis when differences in brain size first appear. To identify human-specific features of cortical development, we leveraged recent innovations that permit generating pluripotent stem cell-derived cerebral organoids from chimpanzee. Despite metabolic differences, organoid models preserve gene regulatory networks related to primary cell types and developmental processes. We further identified 261 differentially expressed genes in human compared to both chimpanzee organoids and macaque cortex, enriched for recent gene duplications, and including multiple regulators of PI3K-AKT-mTOR signaling. We observed increased activation of this pathway in human radial glia, dependent on two receptors upregulated specifically in human: INSR and ITGB8. Our findings establish a platform for systematic analysis of molecular changes contributing to human brain development and evolution.


Assuntos
Córtex Cerebral/citologia , Organoides/metabolismo , Animais , Evolução Biológica , Encéfalo/citologia , Técnicas de Cultura de Células/métodos , Diferenciação Celular/genética , Córtex Cerebral/metabolismo , Redes Reguladoras de Genes/genética , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Macaca , Neurogênese/genética , Organoides/crescimento & desenvolvimento , Pan troglodytes , Células-Tronco Pluripotentes/citologia , Análise de Célula Única , Especificidade da Espécie , Transcriptoma/genética
2.
Cell ; 171(3): 710-722.e12, 2017 Oct 19.
Artigo em Inglês | MEDLINE | ID: mdl-28965761

RESUMO

To further our understanding of the genetic etiology of autism, we generated and analyzed genome sequence data from 516 idiopathic autism families (2,064 individuals). This resource includes >59 million single-nucleotide variants (SNVs) and 9,212 private copy number variants (CNVs), of which 133,992 and 88 are de novo mutations (DNMs), respectively. We estimate a mutation rate of ∼1.5 × 10-8 SNVs per site per generation with a significantly higher mutation rate in repetitive DNA. Comparing probands and unaffected siblings, we observe several DNM trends. Probands carry more gene-disruptive CNVs and SNVs, resulting in severe missense mutations and mapping to predicted fetal brain promoters and embryonic stem cell enhancers. These differences become more pronounced for autism genes (p = 1.8 × 10-3, OR = 2.2). Patients are more likely to carry multiple coding and noncoding DNMs in different genes, which are enriched for expression in striatal neurons (p = 3 × 10-3), suggesting a path forward for genetically characterizing more complex cases of autism.


Assuntos
Transtorno Autístico/genética , Variações do Número de Cópias de DNA , Polimorfismo de Nucleotídeo Único , Animais , Análise Mutacional de DNA , Feminino , Estudo de Associação Genômica Ampla , Humanos , Mutação INDEL , Masculino , Camundongos
3.
Bioinformatics ; 40(2)2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38269623

RESUMO

MOTIVATION: In diploid organisms, phasing is the problem of assigning the alleles at heterozygous variants to one of two haplotypes. Reads from PacBio HiFi sequencing provide long, accurate observations that can be used as the basis for both calling and phasing variants. HiFi reads also excel at calling larger classes of variation, such as structural or tandem repeat variants. However, current phasing tools typically only phase small variants, leaving larger variants unphased. RESULTS: We developed HiPhase, a tool that jointly phases SNVs, indels, structural, and tandem repeat variants. The main benefits of HiPhase are (i) dual mode allele assignment for detecting large variants, (ii) a novel application of the A*-algorithm to phasing, and (iii) logic allowing phase blocks to span breaks caused by alignment issues around reference gaps and homozygous deletions. In our assessment, HiPhase produced an average phase block NG50 of 480 kb with 929 switchflip errors and fully phased 93.8% of genes, improving over the current state of the art. Additionally, HiPhase jointly phases SNVs, indels, structural, and tandem repeat variants and includes innate multi-threading, statistics gathering, and concurrent phased alignment output generation. AVAILABILITY AND IMPLEMENTATION: HiPhase is available as source code and a pre-compiled Linux binary with a user guide at https://github.com/PacificBiosciences/HiPhase.


Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA , Algoritmos , Haplótipos , Sequências de Repetição em Tandem
4.
Plant Biotechnol J ; 21(6): 1240-1253, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36807472

RESUMO

Rapid adaptation of weeds to herbicide applications in agriculture through resistance development is a widespread phenomenon. In particular, the grass Alopecurus myosuroides is an extremely problematic weed in cereal crops with the potential to manifest resistance in only a few generations. Target-site resistances (TSRs), with their strong phenotypic response, play an important role in this rapid adaptive response. Recently, using PacBio's long-read amplicon sequencing technology in hundreds of individuals, we were able to decipher the genomic context in which TSR mutations occur. However, sequencing individual amplicons are costly and time-consuming, thus impractical to implement for other resistance loci or applications. Alternatively, pool-based approaches overcome these limitations and provide reliable allele frequencies, although at the expense of not preserving haplotype information. In this proof-of-concept study, we sequenced with PacBio High Fidelity (HiFi) reads long-range amplicons (13.2 kb), encompassing the entire ACCase gene in pools of over 100 individuals, and resolved them into haplotypes using the clustering algorithm PacBio amplicon analysis (pbaa), a new application for pools in plants and other organisms. From these amplicon pools, we were able to recover most haplotypes from previously sequenced individuals of the same population. In addition, we analysed new pools from a Germany-wide collection of A. myosuroides populations and found that TSR mutations originating from soft sweeps of independent origin were common. Forward-in-time simulations indicate that TSR haplotypes will persist for decades even at relatively low frequencies and without selection, highlighting the importance of accurate measurement of TSR haplotype prevalence for weed management.


Assuntos
Acetil-CoA Carboxilase , Resistência a Herbicidas , Poaceae , Acetil-CoA Carboxilase/genética , Agricultura , Frequência do Gene/genética , Haplótipos/genética , Resistência a Herbicidas/genética , Herbicidas/farmacologia , Mutação , Poaceae/genética
5.
PLoS Comput Biol ; 18(5): e1009123, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35639788

RESUMO

Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies-as well as in somatic and germline mutation studies. The VCF format can represent single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called and anchored against a reference genome. Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple vcflib, bio-vcf, cyvcf2, hts-nim and slivar projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices. We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation through pangenome graph formats, variation that can not easily be represented by the VCF format.


Assuntos
Ecossistema , Variação Genética , Biologia Computacional , Variação Genética/genética , Nucleotídeos , Software
6.
PLoS Genet ; 16(5): e1008274, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32433666

RESUMO

Rock pigeons (Columba livia) display an extraordinary array of pigment pattern variation. One such pattern, Almond, is characterized by a variegated patchwork of plumage colors that are distributed in an apparently random manner. Almond is a sex-linked, semi-dominant trait controlled by the classical Stipper (St) locus. Heterozygous males (ZStZ+ sex chromosomes) and hemizygous Almond females (ZStW) are favored by breeders for their attractive plumage. In contrast, homozygous Almond males (ZStZSt) develop severe eye defects and often lack plumage pigmentation, suggesting that higher dosage of the mutant allele is deleterious. To determine the molecular basis of Almond, we compared the genomes of Almond pigeons to non-Almond pigeons and identified a candidate St locus on the Z chromosome. We found a copy number variant (CNV) within the differentiated region that captures complete or partial coding sequences of four genes, including the melanosome maturation gene Mlana. We did not find fixed coding changes in genes within the CNV, but all genes are misexpressed in regenerating feather bud collar cells of Almond birds. Notably, six other alleles at the St locus are associated with depigmentation phenotypes, and all exhibit expansion of the same CNV. Structural variation at St is linked to diversity in plumage pigmentation and gene expression, and thus provides a potential mode of rapid phenotypic evolution in pigeons.


Assuntos
Columbidae/genética , Variações do Número de Cópias de DNA/fisiologia , Plumas/metabolismo , Pigmentação/genética , Alelos , Animais , Cor , Columbidae/metabolismo , Feminino , Estudos de Associação Genética/veterinária , Loci Gênicos , Genética Populacional , Heterozigoto , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único
7.
Syst Biol ; 70(5): 908-921, 2021 08 11.
Artigo em Inglês | MEDLINE | ID: mdl-33410870

RESUMO

Evidence from natural systems suggests that hybridization between animal species is more common than traditionally thought, but the overall contribution of introgression to standing genetic variation within species remains unclear for most animal systems. Here, we use targeted exon capture to sequence thousands of nuclear loci and complete mitochondrial genomes from closely related chipmunk species in the Tamias quadrivittatus group that are distributed across the Great Basin and the central and southern Rocky Mountains of North America. This recent radiation includes six overlapping, ecologically distinct species (Tamias canipes, Tamias cinereicollis, Tamias dorsalis, T. quadrivittatus, Tamias rufus, and Tamias umbrinus) that show evidence for widespread introgression across species boundaries. Such evidence has historically been derived from a handful of markers, typically focused on mitochondrial loci, to describe patterns of introgression; consequently, the extent of introgression of nuclear genes is less well characterized. We conducted a series of phylogenomic and species-tree analyses to resolve the phylogeny of six species in this group. In addition, we performed several population-genomic analyses to characterize nuclear genomes and infer coancestry among individuals. Furthermore, we used emerging quartets-based approaches to simultaneously infer the species tree (SVDquartets) and identify introgression (HyDe). We found that, in spite of rampant introgression of mitochondrial genomes between some species pairs (and sometimes involving up to three species), there appears to be little to no evidence for nuclear introgression. These findings mirror other genomic results where complete mitochondrial capture has occurred between chipmunk species in the absence of appreciable nuclear gene flow. The underlying causes of recurrent massive cytonuclear discordance remain unresolved in this group but mitochondrial DNA appears highly misleading of population histories as a whole. Collectively, it appears that chipmunk species boundaries are largely impermeable to nuclear gene flow and that hybridization, while pervasive with respect to mtDNA, has likely played a relatively minor role in the evolutionary history of this group. [Cytonuclear discordance; hyridization; introgression, phylogenomics; SVDquartets; Tamias.].


Assuntos
Genoma Mitocondrial , Sciuridae , Animais , DNA Mitocondrial , Fluxo Gênico , Humanos , Filogenia , Sciuridae/genética
8.
Genome Res ; 28(7): 1029-1038, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29884752

RESUMO

The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultracontiguous genome assemblies. To compare these genomes, we need robust methods for genome annotation. We describe the fully open source Comparative Annotation Toolkit (CAT), which provides a flexible way to simultaneously annotate entire clades and identify orthology relationships. We show that CAT can be used to improve annotations on the rat genome, annotate the great apes, annotate a diverse set of mammals, and annotate personal, diploid human genomes. We demonstrate the resulting discovery of novel genes, isoforms, and structural variants-even in genomes as well studied as rat and the great apes-and how these annotations improve cross-species RNA expression experiments.


Assuntos
Genoma Humano/genética , Algoritmos , Animais , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Anotação de Sequência Molecular/métodos , RNA/genética , Ratos
9.
Ann Hum Genet ; 84(2): 125-140, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31711268

RESUMO

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.


Assuntos
Biomarcadores/análise , Variação Genética , Genoma Humano , Haploidia , Mola Hidatiforme/genética , Análise de Sequência de DNA/métodos , Análise de Célula Única/métodos , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Gravidez
10.
Genome Res ; 27(5): 677-685, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-27895111

RESUMO

In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF > 1%). We estimate that this theoretical human diploid differs by as much as ∼16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery from genotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma Humano , Variação Estrutural do Genoma , Haploidia , Análise de Sequência de DNA/métodos , Mapeamento de Sequências Contíguas/normas , Projeto Genoma Humano , Humanos , Análise de Sequência de DNA/normas
11.
Genome Res ; 26(11): 1453-1467, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27803192

RESUMO

Recurrent rearrangements of Chromosome 8p23.1 are associated with congenital heart defects and developmental delay. The complexity of this region has led to inconsistencies in the current reference assembly, confounding studies of genetic variation. Using comparative sequence-based approaches, we generated a high-quality 6.3-Mbp alternate reference assembly of an inverted Chromosome 8p23.1 haplotype. Comparison with nonhuman primates reveals a 746-kbp duplicative transposition and two separate inversion events that arose in the last million years of human evolution. The breakpoints associated with these rearrangements map to an ape-specific interchromosomal core duplicon that clusters at sites of evolutionary inversion (P = 7.8 × 10-5). Refinement of microdeletion breakpoints identifies a subgroup of patients that map to the same interchromosomal core involved in the evolutionary formation of the duplication blocks. Our results define a higher-order genomic instability element that has shaped the structure of specific chromosomes during primate evolution contributing to rearrangements associated with inversion and disease.


Assuntos
Evolução Molecular , Predisposição Genética para Doença , Instabilidade Genômica , Duplicações Segmentares Genômicas , Animais , Pontos de Quebra do Cromossomo , Deleção Cromossômica , Cromossomos Humanos Par 8/genética , Humanos , Primatas/genética
12.
PLoS Genet ; 12(5): e1006063, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27203426

RESUMO

Lactoferrin is a multifunctional mammalian immunity protein that limits microbial growth through sequestration of nutrient iron. Additionally, lactoferrin possesses cationic protein domains that directly bind and inhibit diverse microbes. The implications for these dual functions on lactoferrin evolution and genetic conflicts with microbes remain unclear. Here we show that lactoferrin has been subject to recurrent episodes of positive selection during primate divergence predominately at antimicrobial peptide surfaces consistent with long-term antagonism by bacteria. An abundant lactoferrin polymorphism in human populations and Neanderthals also exhibits signatures of positive selection across primates, linking ancient host-microbe conflicts to modern human genetic variation. Rapidly evolving sites in lactoferrin further correspond to molecular interfaces with opportunistic bacterial pathogens causing meningitis, pneumonia, and sepsis. Because microbes actively target lactoferrin to acquire iron, we propose that the emergence of antimicrobial activity provided a pivotal mechanism of adaptation sparking evolutionary conflicts via acquisition of new protein functions.


Assuntos
Bactérias/imunologia , Ferro/metabolismo , Lactoferrina/genética , Seleção Genética/genética , Sequência de Aminoácidos , Animais , Peptídeos Catiônicos Antimicrobianos/genética , Peptídeos Catiônicos Antimicrobianos/imunologia , Bactérias/patogenicidade , Genética Populacional , Humanos , Ferro/imunologia , Lactoferrina/imunologia , Homem de Neandertal/genética , Homem de Neandertal/imunologia , Polimorfismo Genético , Primatas/genética , Primatas/imunologia
13.
J Virol ; 91(4)2017 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-27928012

RESUMO

Viruses are under relentless selective pressure from host immune defenses. To study how poxviruses adapt to innate immune detection pathways, we performed serial vaccinia virus infections in primary human cells. Independent courses of experimental evolution with a recombinant strain lacking E3L revealed several high-frequency point mutations in conserved poxvirus genes, suggesting important roles for essential poxvirus proteins in innate immune subversion. Two distinct mutations were identified in the viral RNA polymerase gene A24R, which seem to act through different mechanisms to increase virus replication. Specifically, a Leu18Phe substitution encoded within A24R conferred fitness trade-offs, including increased activation of the antiviral factor protein kinase R (PKR). Intriguingly, this A24R variant underwent a drastic selective sweep during passaging, despite enhanced PKR activity. We showed that the sweep of this variant could be accelerated by the presence of copy number variation (CNV) at the K3L locus, which in multiple copies strongly reduced PKR activation. Therefore, adaptive cases of CNV can facilitate the accumulation of point mutations separate from the expanded locus. This study reveals how rapid bouts of gene copy number amplification during accrual of distant point mutations can potently facilitate poxvirus adaptation to host defenses. IMPORTANCE: Viruses can evolve quickly to defeat host immune functions. For poxviruses, little is known about how multiple adaptive mutations emerge in populations at the same time. In this study, we uncovered a means of vaccinia virus adaptation involving the accumulation of distinct genetic variants within a single population. We identified adaptive point mutations in the viral RNA polymerase gene A24R and, surprisingly, found that one of these mutations activates the nucleic acid sensing factor PKR. We also found that gene copy number variation (CNV) can provide dual benefits to evolving virus populations, including evidence that CNV facilitates the accumulation of a point mutation distant from the expanded locus. Our data suggest that transient CNV can accelerate the fixation of mutations conferring modest benefits, or even fitness trade-offs, and highlight how structural variation might aid poxvirus adaptation through both direct and indirect actions.


Assuntos
Evolução Biológica , Variações do Número de Cópias de DNA , RNA Polimerases Dirigidas por DNA/genética , Amplificação de Genes , Vaccinia virus/fisiologia , Adaptação Biológica , Alelos , RNA Polimerases Dirigidas por DNA/metabolismo , Evolução Molecular , Fibroblastos , Frequência do Gene , Aptidão Genética , Genoma Viral , Humanos , Taxa de Mutação , Fases de Leitura Aberta , Mutação Puntual , RNA Viral , Replicação Viral
15.
PLoS Comput Biol ; 11(12): e1004572, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26625158

RESUMO

Existing methods for identifying structural variants (SVs) from short read datasets are inaccurate. This complicates disease-gene identification and efforts to understand the consequences of genetic variation. In response, we have created Wham (Whole-genome Alignment Metrics) to provide a single, integrated framework for both structural variant calling and association testing, thereby bypassing many of the difficulties that currently frustrate attempts to employ SVs in association testing. Here we describe Wham, benchmark it against three other widely used SV identification tools-Lumpy, Delly and SoftSearch-and demonstrate Wham's ability to identify and associate SVs with phenotypes using data from humans, domestic pigeons, and vaccinia virus. Wham and all associated software are covered under the MIT License and can be freely downloaded from github (https://github.com/zeeev/wham), with documentation on a wiki (http://zeeev.github.io/wham/). For community support please post questions to https://www.biostars.org/.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Estudos de Associação Genética/métodos , Variação Genética/genética , Genoma Humano/genética , Variação Estrutural do Genoma/genética , Sequência de Bases , Humanos , Dados de Sequência Molecular , Software
16.
PLoS Genet ; 9(4): e1003470, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23637635

RESUMO

Advances in vertebrate genomics have uncovered thousands of loci encoding long noncoding RNAs (lncRNAs). While progress has been made in elucidating the regulatory functions of lncRNAs, little is known about their origins and evolution. Here we explore the contribution of transposable elements (TEs) to the makeup and regulation of lncRNAs in human, mouse, and zebrafish. Surprisingly, TEs occur in more than two thirds of mature lncRNA transcripts and account for a substantial portion of total lncRNA sequence (~30% in human), whereas they seldom occur in protein-coding transcripts. While TEs contribute less to lncRNA exons than expected, several TE families are strongly enriched in lncRNAs. There is also substantial interspecific variation in the coverage and types of TEs embedded in lncRNAs, partially reflecting differences in the TE landscapes of the genomes surveyed. In human, TE sequences in lncRNAs evolve under greater evolutionary constraint than their non-TE sequences, than their intronic TEs, or than random DNA. Consistent with functional constraint, we found that TEs contribute signals essential for the biogenesis of many lncRNAs, including ~30,000 unique sites for transcription initiation, splicing, or polyadenylation in human. In addition, we identified ~35,000 TEs marked as open chromatin located within 10 kb upstream of lncRNA genes. The density of these marks in one cell type correlate with elevated expression of the downstream lncRNA in the same cell type, suggesting that these TEs contribute to cis-regulation. These global trends are recapitulated in several lncRNAs with established functions. Finally a subset of TEs embedded in lncRNAs are subject to RNA editing and predicted to form secondary structures likely important for function. In conclusion, TEs are nearly ubiquitous in lncRNAs and have played an important role in the lineage-specific diversification of vertebrate lncRNA repertoires.


Assuntos
Elementos de DNA Transponíveis , RNA Longo não Codificante , Animais , Éxons , Humanos , Íntrons , RNA Longo não Codificante/genética , Vertebrados/genética
17.
G3 (Bethesda) ; 14(2)2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38066578

RESUMO

Pigeons and doves (family Columbidae) are one of the most diverse extant avian lineages, and many species have served as key models for evolutionary genomics, developmental biology, physiology, and behavioral studies. Building genomic resources for columbids is essential to further many of these studies. Here, we present high-quality genome assemblies and annotations for 2 columbid species, Columba livia and Columba guinea. We simultaneously assembled C. livia and C. guinea genomes from long-read sequencing of a single F1 hybrid individual. The new C. livia genome assembly (Cliv_3) shows improved completeness and contiguity relative to Cliv_2.1, with an annotation incorporating long-read IsoSeq data for more accurate gene models. Intensive selective breeding of C. livia has given rise to hundreds of breeds with diverse morphological and behavioral characteristics, and Cliv_3 offers improved tools for mapping the genomic architecture of interesting traits. The C. guinea genome assembly is the first for this species and is a new resource for avian comparative genomics. Together, these assemblies and annotations provide improved resources for functional studies of columbids and avian comparative genomics in general.


Assuntos
Columbidae , Genoma , Animais , Columbidae/genética , Guiné , Evolução Biológica
18.
medRxiv ; 2024 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-38562723

RESUMO

Comprehending the mechanism behind human diseases with an established heritable component represents the forefront of personalized medicine. Nevertheless, numerous medically important genes are inaccurately represented in short-read sequencing data analysis due to their complexity and repetitiveness or the so-called 'dark regions' of the human genome. The advent of PacBio as a long-read platform has provided new insights, yet HiFi whole-genome sequencing (WGS) cost remains frequently prohibitive. We introduce a targeted sequencing and analysis framework, Twist Alliance Dark Genes Panel (TADGP), designed to offer phased variants across 389 medically important yet complex autosomal genes. We highlight TADGP accuracy across eleven control samples and compare it to WGS. This demonstrates that TADGP achieves variant calling accuracy comparable to HiFi-WGS data, but at a fraction of the cost. Thus, enabling scalability and broad applicability for studying rare diseases or complementing previously sequenced samples to gain insights into these complex genes. TADGP revealed several candidate variants across all cases and provided insight into LPA diversity when tested on samples from rare disease and cardiovascular disease cohorts. In both cohorts, we identified novel variants affecting individual disease-associated genes (e.g., IKZF1, KCNE1). Nevertheless, the annotation of the variants across these 389 medically important genes remains challenging due to their underrepresentation in ClinVar and gnomAD. Consequently, we also offer an annotation resource to enhance the evaluation and prioritization of these variants. Overall, we can demonstrate that TADGP offers a cost-efficient and scalable approach to routinely assess the dark regions of the human genome with clinical relevance.

19.
Nat Biotechnol ; 42(10): 1606-1614, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38168995

RESUMO

Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.


Assuntos
Metilação de DNA , Sequências de Repetição em Tandem , Sequências de Repetição em Tandem/genética , Humanos , Metilação de DNA/genética , Genoma Humano/genética , Alelos , Análise de Sequência de DNA/métodos , Software , Bases de Dados Genéticas
20.
bioRxiv ; 2024 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-39149261

RESUMO

Using five complementary short- and long-read sequencing technologies, we phased and assembled >95% of each diploid human genome in a four-generation, 28-member family (CEPH 1463) allowing us to systematically assess de novo mutations (DNMs) and recombination. From this family, we estimate an average of 192 DNMs per generation, including 75.5 de novo single-nucleotide variants (SNVs), 7.4 non-tandem repeat indels, 79.6 de novo indels or structural variants (SVs) originating from tandem repeats, 7.7 centromeric de novo SVs and SNVs, and 12.4 de novo Y chromosome events per generation. STRs and VNTRs are the most mutable with 32 loci exhibiting recurrent mutation through the generations. We accurately assemble 288 centromeres and six Y chromosomes across the generations, documenting de novo SVs, and demonstrate that the DNM rate varies by an order of magnitude depending on repeat content, length, and sequence identity. We show a strong paternal bias (75-81%) for all forms of germline DNM, yet we estimate that 17% of de novo SNVs are postzygotic in origin with no paternal bias. We place all this variation in the context of a high-resolution recombination map (~3.5 kbp breakpoint resolution). We observe a strong maternal recombination bias (1.36 maternal:paternal ratio) with a consistent reduction in the number of crossovers with increasing paternal (r=0.85) and maternal (r=0.65) age. However, we observe no correlation between meiotic crossover locations and de novo SVs, arguing against non-allelic homologous recombination as a predominant mechanism. The use of multiple orthogonal technologies, near-telomere-to-telomere phased genome assemblies, and a multi-generation family to assess transmission has created the most comprehensive, publicly available "truth set" of all classes of genomic variants. The resource can be used to test and benchmark new algorithms and technologies to understand the most fundamental processes underlying human genetic variation.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA