Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Nat Rev Genet ; 21(4): 243-254, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-32034321

RESUMEN

Since the early days of the genome era, the scientific community has relied on a single 'reference' genome for each species, which is used as the basis for a wide range of genetic analyses, including studies of variation within and across species. As sequencing costs have dropped, thousands of new genomes have been sequenced, and scientists have come to realize that a single reference genome is inadequate for many purposes. By sampling a diverse set of individuals, one can begin to assemble a pan-genome: a collection of all the DNA sequences that occur in a species. Here we review efforts to create pan-genomes for a range of species, from bacteria to humans, and we further consider the computational methods that have been proposed in order to capture, interpret and compare pan-genome data. As scientists continue to survey and catalogue the genomic variation across human populations and begin to assemble a human pan-genome, these efforts will increase our power to connect variation to human diversity, disease and beyond.


Asunto(s)
Genoma Humano , Genómica , Genoma Bacteriano , Genoma de Planta , Humanos
2.
RNA ; 28(4): 478-492, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35110373

RESUMEN

Polymorphism drives survival under stress and provides adaptability. Genetic polymorphism of ribosomal RNA (rRNA) genes derives from internal repeat variation of this multicopy gene, and from interindividual variation. A considerable amount of rRNA sequence heterogeneity has been proposed but has been challenging to estimate given the scarcity of accurate reference sequences. We identified four rDNA copies on chromosome 21 (GRCh38) with 99% similarity to recently introduced reference sequence KY962518.1. We customized a GATK bioinformatics pipeline using the four rDNA loci, spanning a total 145 kb, for variant calling and used high-coverage whole-genome sequencing (WGS) data from the 1000 Genomes Project to analyze variants in 2504 individuals from 26 populations. We identified a total of 3791 variant positions. The variants positioned nonrandomly on the rRNA gene. Invariant regions included the promoter, early 5' ETS, most of 18S, 5.8S, ITS1, and large areas of the intragenic spacer. A total of 470 variant positions were observed on 28S rRNA. The majority of the 28S rRNA variants were located on highly flexible human-expanded rRNA helical folds ES7L and ES27L, suggesting that these represent positions of diversity and are potentially under continuous evolution. Several variants were validated based on RNA-seq analyses. Population analyses showed remarkable ancestry-linked genetic variance and the presence of both high penetrance and frequent variants in the 5' ETS, ITS2, and 28S regions segregating according to the continental populations. These findings provide a genetic view of rRNA gene array heterogeneity and raise the need to functionally assess how the 28S rRNA variants affect ribosome functions.


Asunto(s)
Heterogeneidad Genética , Genoma , ADN Ribosómico/genética , Genes de ARNr/genética , Humanos , ARN Ribosómico/genética , ARN Ribosómico 18S , ARN Ribosómico 28S/genética
3.
Genome Res ; 30(9): 1258-1273, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32887686

RESUMEN

Improved identification of structural variants (SVs) in cancer can lead to more targeted and effective treatment options as well as advance our basic understanding of the disease and its progression. We performed whole-genome sequencing of the SKBR3 breast cancer cell line and patient-derived tumor and normal organoids from two breast cancer patients using Illumina/10x Genomics, Pacific Biosciences (PacBio), and Oxford Nanopore Technologies (ONT) sequencing. We then inferred SVs and large-scale allele-specific copy number variants (CNVs) using an ensemble of methods. Our findings show that long-read sequencing allows for substantially more accurate and sensitive SV detection, with between 90% and 95% of variants supported by each long-read technology also supported by the other. We also report high accuracy for long reads even at relatively low coverage (25×-30×). Furthermore, we integrated SV and CNV data into a unifying karyotype-graph structure to present a more accurate representation of the mutated cancer genomes. We find hundreds of variants within known cancer-related genes detectable only through long-read sequencing. These findings highlight the need for long-read sequencing of cancer genomes for the precise analysis of their genetic instability.


Asunto(s)
Neoplasias de la Mama/genética , Variación Estructural del Genoma , Secuenciación Completa del Genoma/métodos , Línea Celular Tumoral , Variaciones en el Número de Copia de ADN , Metilación de ADN , ADN de Neoplasias , Femenino , Humanos , Nanoporos , Organoides , RNA-Seq
4.
Elife ; 102021 09 16.
Artículo en Inglés | MEDLINE | ID: mdl-34528508

RESUMEN

Large genomic insertions and deletions are a potent source of functional variation, but are challenging to resolve with short-read sequencing, limiting knowledge of the role of such structural variants (SVs) in human evolution. Here, we used a graph-based method to genotype long-read-discovered SVs in short-read data from diverse human genomes. We then applied an admixture-aware method to identify 220 SVs exhibiting extreme patterns of frequency differentiation - a signature of local adaptation. The top two variants traced to the immunoglobulin heavy chain locus, tagging a haplotype that swept to near fixation in certain southeast Asian populations, but is rare in other global populations. Further investigation revealed evidence that the haplotype traces to gene flow from Neanderthals, corroborating the role of immune-related genes as prominent targets of adaptive introgression. Our study demonstrates how recent technical advances can help resolve signatures of key evolutionary events that remained obscured within technically challenging regions of the genome.


Asunto(s)
Adaptación Fisiológica/genética , Evolución Molecular , Genoma Humano , Genotipo , Animales , Pueblo Asiatico , Flujo Génico , Genómica , Haplotipos/genética , Humanos , Desequilibrio de Ligamiento , Hombre de Neandertal/genética , Selección Genética
5.
Artículo en Inglés | MEDLINE | ID: mdl-33547152

RESUMEN

OBJECTIVE: To identify the clinical phenotypes and infectious triggers in the 2019 Peruvian Guillain-Barré syndrome (GBS) outbreak. METHODS: We prospectively collected clinical and neurophysiologic data of patients with GBS admitted to a tertiary hospital in Lima, Peru, between May and August 2019. Molecular, immunologic, and microbiological methods were used to identify causative infectious agents. Sera from 41 controls were compared with cases for antibodies to Campylobacter jejuni and gangliosides. Genomic analysis was performed on 4 C jejuni isolates. RESULTS: The 49 included patients had a median age of 44 years (interquartile range [IQR] 30-54 years), and 28 (57%) were male. Thirty-two (65%) had symptoms of a preceding infection: 24 (49%) diarrhea and 13 (27%) upper respiratory tract infection. The median time between infectious to neurologic symptoms was 3 days (IQR 2-9 days). Eighty percent had a pure motor form of GBS, 21 (43%) had the axonal electrophysiologic subtype, and 18% the demyelinating subtype. Evidence of recent C jejuni infection was found in 28/43 (65%). No evidence of recent arbovirus infection was found. Twenty-three cases vs 11 controls (OR 3.3, confidence interval [CI] 95% 1.2-9.2, p < 0.01) had IgM and/or IgA antibodies against C jejuni. Anti-GM1:phosphatidylserine and/or anti-GT1a:GM1 heteromeric complex antibodies were strongly positive in cases (92.9% sensitivity and 68.3% specificity). Genomic analysis showed that the C jejuni strains were closely related and had the Asn51 polymorphism at cstII gene. CONCLUSIONS: Our study indicates that the 2019 Peruvian GBS outbreak was associated with C jejuni infection and that the C jejuni strains linked to GBS circulate widely in different parts of the world.


Asunto(s)
Infecciones por Campylobacter/diagnóstico , Infecciones por Campylobacter/epidemiología , Campylobacter jejuni/aislamiento & purificación , Brotes de Enfermedades , Síndrome de Guillain-Barré/diagnóstico , Síndrome de Guillain-Barré/epidemiología , Adulto , Infecciones por Campylobacter/sangre , Estudios de Casos y Controles , Femenino , Síndrome de Guillain-Barré/sangre , Humanos , Masculino , Persona de Mediana Edad , Perú/epidemiología
6.
Genome Biol ; 21(1): 129, 2020 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-32487205

RESUMEN

BACKGROUND: Thousands of experiments and studies use the human reference genome as a resource each year. This single reference genome, GRCh38, is a mosaic created from a small number of individuals, representing a very small sample of the human population. There is a need for reference genomes from multiple human populations to avoid potential biases. RESULTS: Here, we describe the assembly and annotation of the genome of an Ashkenazi individual and the creation of a new, population-specific human reference genome. This genome is more contiguous and more complete than GRCh38, the latest version of the human reference genome, and is annotated with highly similar gene content. The Ashkenazi reference genome, Ash1, contains 2,973,118,650 nucleotides as compared to 2,937,639,212 in GRCh38. Annotation identified 20,157 protein-coding genes, of which 19,563 are > 99% identical to their counterparts on GRCh38. Most of the remaining genes have small differences. Forty of the protein-coding genes in GRCh38 are missing from Ash1; however, all of these genes are members of multi-gene families for which Ash1 contains other copies. Eleven genes appear on different chromosomes from their homologs in GRCh38. Alignment of DNA sequences from an unrelated Ashkenazi individual to Ash1 identified ~ 1 million fewer homozygous SNPs than alignment of those same sequences to the more-distant GRCh38 genome, illustrating one of the benefits of population-specific reference genomes. CONCLUSIONS: The Ash1 genome is presented as a reference for any genetic studies involving Ashkenazi Jewish individuals.


Asunto(s)
Genoma Humano , Humanos , Anotación de Secuencia Molecular , Valores de Referencia , Translocación Genética
7.
Genome Biol ; 20(1): 291, 2019 12 19.
Artículo en Inglés | MEDLINE | ID: mdl-31856913

RESUMEN

Accurate detection and genotyping of structural variations (SVs) from short-read data is a long-standing area of development in genomics research and clinical sequencing pipelines. We introduce Paragraph, an accurate genotyper that models SVs using sequence graphs and SV annotations. We demonstrate the accuracy of Paragraph on whole-genome sequence data from three samples using long-read SV calls as the truth set, and then apply Paragraph at scale to a cohort of 100 short-read sequenced samples of diverse ancestry. Our analysis shows that Paragraph has better accuracy than other existing genotypers and can be applied to population-scale studies.


Asunto(s)
Variación Estructural del Genoma , Técnicas de Genotipaje , Genoma Humano , Humanos
8.
9.
Nat Genet ; 51(1): 30-35, 2019 01.
Artículo en Inglés | MEDLINE | ID: mdl-30455414

RESUMEN

We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the populations of African descent, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes, and the rest appear to be intergenic.


Asunto(s)
Población Negra/genética , Genoma Humano/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Análisis de Secuencia de ADN/métodos
10.
Sci Rep ; 5: 18054, 2015 Dec 10.
Artículo en Inglés | MEDLINE | ID: mdl-26656258

RESUMEN

The role of amino acid-RNA nucleobase interactions in the evolution of RNA translation and protein-mRNA autoregulation remains an open area of research. We describe the inference of pairwise amino acid-RNA nucleobase interaction preferences using structural data from known RNA-protein complexes. We observed significant matching between an amino acid's nucleobase affinity and corresponding codon content in both the standard genetic code and mitochondrial variants. Furthermore, we showed that knowledge of nucleobase preferences allows statistically significant prediction of protein primary sequence from mRNA using purely physiochemical information. Interestingly, ribosomal primary sequences were more accurately predicted than non-ribosomal sequences, suggesting a potential role for direct amino acid-nucleobase interactions in the genesis of amino acid-based ribosomal components. Finally, we observed matching between amino acid-nucleobase affinities and corresponding mRNA sequences in 35 evolutionarily diverse proteomes. We believe these results have important implications for the study of the evolutionary origins of the genetic code and protein-mRNA cross-regulation.


Asunto(s)
Aminoácidos/metabolismo , Ácidos Nucleicos/metabolismo , Ribosomas/metabolismo , Codón/genética , Codón/metabolismo , Código Genético/genética , Humanos , Proteínas/metabolismo , ARN/genética , ARN/metabolismo , ARN Mensajero/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA