Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 126
Filtrar
1.
Nature ; 629(8010): 136-145, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38570684

RESUMEN

Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size1. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions2,3. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference genome4,5. We find that the two sets of centromeres show at least a 4.1-fold increase in single-nucleotide variation when compared with their unique flanks and vary up to 3-fold in size. Moreover, we find that 45.8% of centromeric sequence cannot be reliably aligned using standard methods owing to the emergence of new α-satellite higher-order repeats (HORs). DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by >500 kb. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan and macaque genomes. Comparative analyses reveal a nearly complete turnover of α-satellite HORs, with characteristic idiosyncratic changes in α-satellite HORs for each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.


Asunto(s)
Centrómero , Evolución Molecular , Variación Genética , Animales , Humanos , Centrómero/genética , Centrómero/metabolismo , Proteína A Centromérica/metabolismo , Metilación de ADN/genética , ADN Satélite/genética , Cinetocoros/metabolismo , Macaca/genética , Pan troglodytes/genética , Polimorfismo de Nucleótido Simple/genética , Pongo/genética , Masculino , Femenino , Estándares de Referencia , Inmunoprecipitación de Cromatina , Haplotipos , Mutación , Amplificación de Genes , Alineación de Secuencia , Cromatina/genética , Cromatina/metabolismo , Especificidad de la Especie
2.
Nature ; 617(7960): 335-343, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-37165241

RESUMEN

The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share large homologous regions, including ribosomal DNA repeats and extended segmental duplications1,2. Although the resolution of these regions in the first complete assembly of a human genome-the Telomere-to-Telomere Consortium's CHM13 assembly (T2T-CHM13)-provided a model of their homology3, it remained unclear whether these patterns were ancestral or maintained by ongoing recombination exchange. Here we show that acrocentric chromosomes contain pseudo-homologous regions (PHRs) indicative of recombination between non-homologous sequences. Utilizing an all-to-all comparison of the human pangenome from the Human Pangenome Reference Consortium4 (HPRC), we find that contigs from all of the SAACs form a community. A variation graph5 constructed from centromere-spanning acrocentric contigs indicates the presence of regions in which most contigs appear nearly identical between heterologous acrocentric chromosomes in T2T-CHM13. Except on chromosome 15, we observe faster decay of linkage disequilibrium in the pseudo-homologous regions than in the corresponding short and long arms, indicating higher rates of recombination6,7. The pseudo-homologous regions include sequences that have previously been shown to lie at the breakpoint of Robertsonian translocations8, and their arrangement is compatible with crossover in inverted duplications on chromosomes 13, 14 and 21. The ubiquity of signals of recombination between heterologous acrocentric chromosomes seen in the HPRC draft pangenome suggests that these shared sequences form the basis for recurrent Robertsonian translocations, providing sequence and population-based confirmation of hypotheses first developed from cytogenetic studies 50 years ago9.


Asunto(s)
Centrómero , Cromosomas Humanos , Recombinación Genética , Humanos , Centrómero/genética , Cromosomas Humanos/genética , ADN Ribosómico/genética , Recombinación Genética/genética , Translocación Genética/genética , Citogenética , Telómero/genética
3.
Nature ; 621(7978): 344-354, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37612512

RESUMEN

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.


Asunto(s)
Cromosomas Humanos Y , Genómica , Análisis de Secuencia de ADN , Humanos , Secuencia de Bases , Cromosomas Humanos Y/genética , ADN Satélite/genética , Variación Genética/genética , Genética de Población , Genómica/métodos , Genómica/normas , Heterocromatina/genética , Familia de Multigenes/genética , Estándares de Referencia , Duplicaciones Segmentarias en el Genoma/genética , Análisis de Secuencia de ADN/normas , Secuencias Repetidas en Tándem/genética , Telómero/genética
4.
Nature ; 611(7936): 519-531, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-36261518

RESUMEN

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.


Asunto(s)
Mapeo Cromosómico , Diploidia , Genoma Humano , Genómica , Humanos , Mapeo Cromosómico/normas , Genoma Humano/genética , Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/normas , Estándares de Referencia , Genómica/métodos , Genómica/normas , Cromosomas Humanos/genética , Variación Genética/genética
5.
Genome Res ; 34(3): 454-468, 2024 04 25.
Artículo en Inglés | MEDLINE | ID: mdl-38627094

RESUMEN

Reference-free genome phasing is vital for understanding allele inheritance and the impact of single-molecule DNA variation on phenotypes. To achieve thorough phasing across homozygous or repetitive regions of the genome, long-read sequencing technologies are often used to perform phased de novo assembly. As a step toward reducing the cost and complexity of this type of analysis, we describe new methods for accurately phasing Oxford Nanopore Technologies (ONT) sequence data with the Shasta genome assembler and a modular tool for extending phasing to the chromosome scale called GFAse. We test using new variants of ONT PromethION sequencing, including those using proximity ligation, and show that newer, higher accuracy ONT reads substantially improve assembly quality.


Asunto(s)
Nanoporos , Humanos , Análisis de Secuencia de ADN/métodos , Secuenciación de Nanoporos/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Genómica/métodos
6.
Genome Res ; 34(3): 498-513, 2024 04 25.
Artículo en Inglés | MEDLINE | ID: mdl-38508693

RESUMEN

Hydractinia is a colonial marine hydroid that shows remarkable biological properties, including the capacity to regenerate its entire body throughout its lifetime, a process made possible by its adult migratory stem cells, known as i-cells. Here, we provide an in-depth characterization of the genomic structure and gene content of two Hydractinia species, Hydractinia symbiolongicarpus and Hydractinia echinata, placing them in a comparative evolutionary framework with other cnidarian genomes. We also generated and annotated a single-cell transcriptomic atlas for adult male H. symbiolongicarpus and identified cell-type markers for all major cell types, including key i-cell markers. Orthology analyses based on the markers revealed that Hydractinia's i-cells are highly enriched in genes that are widely shared amongst animals, a striking finding given that Hydractinia has a higher proportion of phylum-specific genes than any of the other 41 animals in our orthology analysis. These results indicate that Hydractinia's stem cells and early progenitor cells may use a toolkit shared with all animals, making it a promising model organism for future exploration of stem cell biology and regenerative medicine. The genomic and transcriptomic resources for Hydractinia presented here will enable further studies of their regenerative capacity, colonial morphology, and ability to distinguish self from nonself.


Asunto(s)
Genoma , Hidrozoos , Animales , Hidrozoos/genética , Evolución Molecular , Transcriptoma , Células Madre/metabolismo , Masculino , Filogenia , Análisis de la Célula Individual/métodos
7.
Nat Methods ; 21(6): 967-970, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38730258

RESUMEN

Despite advances in long-read sequencing technologies, constructing a near telomere-to-telomere assembly is still computationally demanding. Here we present hifiasm (UL), an efficient de novo assembly algorithm combining multiple sequencing technologies to scale up population-wide near telomere-to-telomere assemblies. Applied to 22 human and two plant genomes, our algorithm produces better diploid assemblies at a cost of an order of magnitude lower than existing methods, and it also works with polyploid genomes.


Asunto(s)
Algoritmos , Diploidia , Poliploidía , Telómero , Humanos , Telómero/genética , Genoma de Planta , Genoma Humano , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
8.
Nat Methods ; 21(1): 41-49, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38036856

RESUMEN

Complete, telomere-to-telomere (T2T) genome assemblies promise improved analyses and the discovery of new variants, but many essential genomic resources remain associated with older reference genomes. Thus, there is a need to translate genomic features and read alignments between references. Here we describe a method called levioSAM2 that performs fast and accurate lift-over between assemblies using a whole-genome map. In addition to enabling the use of several references, we demonstrate that aligning reads to a high-quality reference (for example, T2T-CHM13) and lifting to an older reference (for example, Genome reference Consortium (GRC)h38) improves the accuracy of the resulting variant calls on the old reference. By leveraging the quality improvements of T2T-CHM13, levioSAM2 reduces small and structural variant calling errors compared with GRC-based mapping using real short- and long-read datasets. Performance is especially improved for a set of complex medically relevant genes, where the GRC references are lower quality.


Asunto(s)
Genoma , Genómica , Análisis de Secuencia de ADN/métodos , Genómica/métodos , Mapeo Cromosómico , Secuenciación de Nucleótidos de Alto Rendimiento
9.
Nature ; 593(7857): 101-107, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33828295

RESUMEN

The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the ß-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.


Asunto(s)
Cromosomas Humanos Par 8/química , Cromosomas Humanos Par 8/genética , Evolución Molecular , Animales , Línea Celular , Centrómero/química , Centrómero/genética , Centrómero/metabolismo , Cromosomas Humanos Par 8/fisiología , Metilación de ADN , ADN Satélite/genética , Epigénesis Genética , Femenino , Humanos , Macaca mulatta/genética , Masculino , Repeticiones de Minisatélite/genética , Pan troglodytes/genética , Filogenia , Pongo abelii/genética , Telómero/química , Telómero/genética , Telómero/metabolismo
10.
Nature ; 594(7862): 227-233, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33910227

RESUMEN

The accurate and complete assembly of both haplotype sequences of a diploid organism is essential to understanding the role of variation in genome functions, phenotypes and diseases1. Here, using a trio-binning approach, we present a high-quality, diploid reference genome, with both haplotypes assembled independently at the chromosome level, for the common marmoset (Callithrix jacchus), an primate model system that is widely used in biomedical research2,3. The full spectrum of heterozygosity between the two haplotypes involves 1.36% of the genome-much higher than the 0.13% indicated by the standard estimation based on single-nucleotide heterozygosity alone. The de novo mutation rate is 0.43 × 10-8 per site per generation, and the paternal inherited genome acquired twice as many mutations as the maternal. Our diploid assembly enabled us to discover a recent expansion of the sex-differentiation region and unique evolutionary changes in the marmoset Y chromosome. In addition, we identified many genes with signatures of positive selection that might have contributed to the evolution of Callithrix biological features. Brain-related genes were highly conserved between marmosets and humans, although several genes experienced lineage-specific copy number variations or diversifying selection, with implications for the use of marmosets as a model system.


Asunto(s)
Callithrix/genética , Diploidia , Evolución Molecular , Genoma/genética , Genómica/normas , Animales , Investigación Biomédica , Variaciones en el Número de Copia de ADN , Femenino , Mutación de Línea Germinal/genética , Haplotipos/genética , Heterocigoto , Humanos , Mutación INDEL/genética , Masculino , Estándares de Referencia , Selección Genética , Diferenciación Sexual/genética , Cromosoma Y/genética
11.
Nature ; 592(7856): 737-746, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33911273

RESUMEN

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.


Asunto(s)
Genoma , Genómica/métodos , Vertebrados/genética , Animales , Aves , Biblioteca de Genes , Tamaño del Genoma , Genoma Mitocondrial , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento , Anotación de Secuencia Molecular , Alineación de Secuencia , Análisis de Secuencia de ADN , Cromosomas Sexuales/genética
12.
Nature ; 585(7823): 79-84, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32663838

RESUMEN

After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist1,2. Here we present a human genome assembly that surpasses the continuity of GRCh382, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome3, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.


Asunto(s)
Cromosomas Humanos X/genética , Genoma Humano/genética , Telómero/genética , Centrómero/genética , Islas de CpG/genética , Metilación de ADN , ADN Satélite/genética , Femenino , Humanos , Mola Hidatiforme/genética , Masculino , Embarazo , Reproducibilidad de los Resultados , Testículo/metabolismo
13.
Nat Methods ; 19(6): 705-710, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35365778

RESUMEN

Approximately 5-10% of the human genome remains inaccessible due to the presence of repetitive sequences such as segmental duplications and tandem repeat arrays. We show that existing long-read mappers often yield incorrect alignments and variant calls within long, near-identical repeats, as they remain vulnerable to allelic bias. In the presence of a nonreference allele within a repeat, a read sampled from that region could be mapped to an incorrect repeat copy. To address this limitation, we developed a new long-read mapping method, Winnowmap2, by using minimal confidently alignable substrings. Winnowmap2 computes each read mapping through a collection of confident subalignments. This approach is more tolerant of structural variation and more sensitive to paralog-specific variants within repeats. Our experiments highlight that Winnowmap2 successfully addresses the issue of allelic bias, enabling more accurate downstream variant calls in repetitive sequences.


Asunto(s)
Genoma Humano , Secuencias Repetitivas de Ácidos Nucleicos , Alelos , Humanos , Secuencias Repetitivas de Ácidos Nucleicos/genética , Duplicaciones Segmentarias en el Genoma , Análisis de Secuencia de ADN , Secuencias Repetidas en Tándem
14.
Nat Methods ; 19(6): 696-704, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35361932

RESUMEN

Variant calling has been widely used for genotyping and for improving the consensus accuracy of long-read assemblies. Variant calls are commonly hard-filtered with user-defined cutoffs. However, it is impossible to define a single set of optimal cutoffs, as the calls heavily depend on the quality of the reads, the variant caller of choice and the quality of the unpolished assembly. Here, we introduce Merfin, a k-mer based variant-filtering algorithm for improved accuracy in genotyping and genome assembly polishing. Merfin evaluates each variant based on the expected k-mer multiplicity in the reads, independently of the quality of the read alignment and variant caller's internal score. Merfin increased the precision of genotyped calls in several benchmarks, improved consensus accuracy and reduced frameshift errors when applied to human and nonhuman assemblies built from Pacific Biosciences HiFi and continuous long reads or Oxford Nanopore reads, including the first complete human genome. Moreover, we introduce assembly quality and completeness metrics that account for the expected genomic copy numbers.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Nanoporos , Genoma , Genómica , Humanos , Análisis de Secuencia de ADN
15.
Nat Methods ; 19(6): 687-695, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35361931

RESUMEN

Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k-mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Nanoporos , Femenino , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Embarazo , Análisis de Secuencia de ADN/métodos , Telómero/genética
16.
Proc Natl Acad Sci U S A ; 119(40): e2207374119, 2022 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-36161920

RESUMEN

Most colonial marine invertebrates are capable of allorecognition, the ability to distinguish between themselves and conspecifics. One long-standing question is whether invertebrate allorecognition genes are homologous to vertebrate histocompatibility genes. In the cnidarian Hydractinia symbiolongicarpus, allorecognition is controlled by at least two genes, Allorecognition 1 (Alr1) and Allorecognition 2 (Alr2), which encode highly polymorphic cell-surface proteins that serve as markers of self. Here, we show that Alr1 and Alr2 are part of a family of 41 Alr genes, all of which reside in a single genomic interval called the Allorecognition Complex (ARC). Using sensitive homology searches and highly accurate structural predictions, we demonstrate that the Alr proteins are members of the immunoglobulin superfamily (IgSF) with V-set and I-set Ig domains unlike any previously identified in animals. Specifically, their primary amino acid sequences lack many of the motifs considered diagnostic for V-set and I-set domains, yet they adopt secondary and tertiary structures nearly identical to canonical Ig domains. Thus, the V-set domain, which played a central role in the evolution of vertebrate adaptive immunity, was present in the last common ancestor of cnidarians and bilaterians. Unexpectedly, several Alr proteins also have immunoreceptor tyrosine-based activation motifs and immunoreceptor tyrosine-based inhibitory motifs in their cytoplasmic tails, suggesting they could participate in pathways homologous to those that regulate immunity in humans and flies. This work expands our definition of the IgSF with the addition of a family of unusual members, several of which play a role in invertebrate histocompatibility.


Asunto(s)
Hidrozoos , Inmunoglobulinas , Complejo Mayor de Histocompatibilidad , Animales , Hidrozoos/genética , Hidrozoos/inmunología , Inmunoglobulinas/química , Inmunoglobulinas/genética , Complejo Mayor de Histocompatibilidad/genética , Proteínas de la Membrana/química , Proteínas de la Membrana/genética , Dominios Proteicos , Tirosina/química , Tirosina/genética
17.
Nature ; 563(7732): 501-507, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30429615

RESUMEN

Female Aedes aegypti mosquitoes infect more than 400 million people each year with dangerous viral pathogens including dengue, yellow fever, Zika and chikungunya. Progress in understanding the biology of mosquitoes and developing the tools to fight them has been slowed by the lack of a high-quality genome assembly. Here we combine diverse technologies to produce the markedly improved, fully re-annotated AaegL5 genome assembly, and demonstrate how it accelerates mosquito science. We anchored physical and cytogenetic maps, doubled the number of known chemosensory ionotropic receptors that guide mosquitoes to human hosts and egg-laying sites, provided further insight into the size and composition of the sex-determining M locus, and revealed copy-number variation among glutathione S-transferase genes that are important for insecticide resistance. Using high-resolution quantitative trait locus and population genomic analyses, we mapped new candidates for dengue vector competence and insecticide resistance. AaegL5 will catalyse new biological insights and intervention strategies to fight this deadly disease vector.


Asunto(s)
Aedes/genética , Infecciones por Arbovirus/virología , Arbovirus , Genoma de los Insectos/genética , Genómica/normas , Control de Insectos , Mosquitos Vectores/genética , Mosquitos Vectores/virología , Aedes/virología , Animales , Infecciones por Arbovirus/transmisión , Arbovirus/aislamiento & purificación , Variaciones en el Número de Copia de ADN/genética , Virus del Dengue/aislamiento & purificación , Femenino , Variación Genética/genética , Genética de Población , Glutatión Transferasa/genética , Resistencia a los Insecticidas/efectos de los fármacos , Masculino , Anotación de Secuencia Molecular , Familia de Multigenes/genética , Piretrinas/farmacología , Estándares de Referencia , Procesos de Determinación del Sexo/genética
18.
BMC Biol ; 21(1): 67, 2023 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-37013528

RESUMEN

BACKGROUND: Channel catfish and blue catfish are the most important aquacultured species in the USA. The species do not readily intermate naturally but F1 hybrids can be produced through artificial spawning. F1 hybrids produced by mating channel catfish female with blue catfish male exhibit heterosis and provide an ideal system to study reproductive isolation and hybrid vigor. The purpose of the study was to generate high-quality chromosome level reference genome sequences and to determine their genomic similarities and differences. RESULTS: We present high-quality reference genome sequences for both channel catfish and blue catfish, containing only 67 and 139 total gaps, respectively. We also report three pericentric chromosome inversions between the two genomes, as evidenced by long reads across the inversion junctions from distinct individuals, genetic linkage mapping, and PCR amplicons across the inversion junctions. Recombination rates within the inversional segments, detected as double crossovers, are extremely low among backcross progenies (progenies of channel catfish female × F1 hybrid male), suggesting that the pericentric inversions interrupt postzygotic recombination or survival of recombinants. Identification of channel catfish- and blue catfish-specific genes, along with expansions of immunoglobulin genes and centromeric Xba elements, provides insights into genomic hallmarks of these species. CONCLUSIONS: We generated high-quality reference genome sequences for both blue catfish and channel catfish and identified major chromosomal inversions on chromosomes 6, 11, and 24. These perimetric inversions were validated by additional sequencing analysis, genetic linkage mapping, and PCR analysis across the inversion junctions. The reference genome sequences, as well as the contrasted chromosomal architecture should provide guidance for the interspecific breeding programs.


Asunto(s)
Ictaluridae , Humanos , Animales , Masculino , Femenino , Ictaluridae/genética , Inversión Cromosómica , Ligamiento Genético , Genoma , Mapeo Cromosómico
19.
Genome Res ; 30(9): 1291-1305, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32801147

RESUMEN

Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultralong Oxford Nanopore Technologies (ONT) reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of nine complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance toward the complete assembly of human genomes.


Asunto(s)
Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Alelos , Animales , Línea Celular , Duplicación Cromosómica , ADN de Neoplasias , ADN Satélite , Drosophila/genética , Genoma Humano , Haplotipos , Humanos , Reproducibilidad de los Resultados , Programas Informáticos
20.
Brief Bioinform ; 20(4): 1140-1150, 2019 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-28968737

RESUMEN

Metagenomic samples are snapshots of complex ecosystems at work. They comprise hundreds of known and unknown species, contain multiple strain variants and vary greatly within and across environments. Many microbes found in microbial communities are not easily grown in culture making their DNA sequence our only clue into their evolutionary history and biological function. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. Current methods have made significant strides in reconstructing DNA segments comprising operons, tandem gene arrays and syntenic blocks. Shorter, higher-throughput sequencing technologies have become the de facto standard in the field. Sequencers are now able to generate billions of short reads in only a few days. Multiple metagenomic assembly strategies, pipelines and assemblers have appeared in recent years. Owing to the inherent complexity of metagenome assembly, regardless of the assembly algorithm and sequencing method, metagenome assemblies contain errors. Recent developments in assembly validation tools have played a pivotal role in improving metagenomics assemblers. Here, we survey recent progress in the field of metagenomic assembly, provide an overview of key approaches for genomic and metagenomic assembly validation and demonstrate the insights that can be derived from assemblies through the use of assembly validation strategies. We also discuss the potential for impact of long-read technologies in metagenomics. We conclude with a discussion of future challenges and opportunities in the field of metagenomic assembly and validation.


Asunto(s)
Metagenoma , Metagenómica/métodos , Microbiota/genética , Algoritmos , Biología Computacional , Bases de Datos Genéticas/estadística & datos numéricos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Metagenómica/estadística & datos numéricos , Metagenómica/tendencias , Programas Informáticos
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda