Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 61
Filtrar
1.
GigaByte ; 2024: gigabyte129, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38962390

RESUMEN

Nanopore direct RNA sequencing (DRS) enables measurements of RNA modifications. Modification-free transcripts are a practical and targeted control for DRS, providing a baseline measurement for canonical nucleotides within a matched and biologically-derived sequence context. However, these controls can be challenging to generate and carry nanopore-specific nuances that can impact analyses. We produced DRS datasets using modification-free transcripts from in vitro transcription of cDNA from six immortalized human cell lines. We characterized variation across cell lines and demonstrated how these may be interpreted. These data will serve as a versatile control and resource to the community for RNA modification analyses of human transcripts.

2.
Nat Commun ; 15(1): 5907, 2024 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-39003259

RESUMEN

Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy. However, using local haplotype information creates an overhead as variant calling needs to be performed multiple times which ultimately makes it difficult to extend to new data types and platforms as they get introduced. In this work, we have developed a local haplotype approximate method that enables state-of-the-art variant calling performance with multiple sequencing platforms including PacBio Revio system, ONT R10.4 simplex and duplex data. This addition of local haplotype approximation simplifies long-read variant calling with DeepVariant.


Asunto(s)
Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento , Haplotipos/genética , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Polimorfismo de Nucleótido Simple , Genoma Humano , Algoritmos , Variación Genética , Redes Neurales de la Computación
3.
bioRxiv ; 2024 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-38766185

RESUMEN

Pseudouridine (psi) is one of the most abundant human mRNA modifications generated from the isomerization of uridine via psi synthases, including TRUB1 and PUS7. Nanopore direct RNA sequencing combined with our recent tool, Mod-p ID, enables psi mapping, transcriptome-wide, without chemical derivatization of the input RNA and/or conversion to cDNA. This method is sensitive for detecting changes in positional psi occupancies across cell types, which can inform our understanding of the impact on gene expression. We sequenced, mapped, and compared the positional psi occupancy across six immortalized human cell lines derived from diverse tissue types. We found that lung-derived cells have the highest proportion of psi, while liver-derived cells have the lowest. Further, among a list of highly conserved sites across cell types, most are TRUB1 substrates and fall within the coding sequence. We find that these conserved psi positions correspond to higher levels of protein expression than expected, suggesting translation regulation. Interestingly, we identify cell type-specific sites of psi modification in ubiquitously expressed genes. We validate these sites by ruling out single-nucleotide variants, analyzing current traces, and performing enzymatic knockdowns of psi synthases. Finally, we characterize sites with multiple psi modifications on the same transcript (hypermodification type II) and found that these can be conserved or cell type specific. Among these, we discovered examples of multiple psi modifications within the same k-mer for the first time and analyzed the effect on current distribution. Our data support the hypothesis that motif sequence and the presence of psi synthase are insufficient to drive modifications, that psi modifications contribute to regulating translation and that cell type-specific trans-acting factors play a major role in driving pseudouridylation.

4.
Genome Res ; 34(3): 454-468, 2024 04 25.
Artículo en Inglés | MEDLINE | ID: mdl-38627094

RESUMEN

Reference-free genome phasing is vital for understanding allele inheritance and the impact of single-molecule DNA variation on phenotypes. To achieve thorough phasing across homozygous or repetitive regions of the genome, long-read sequencing technologies are often used to perform phased de novo assembly. As a step toward reducing the cost and complexity of this type of analysis, we describe new methods for accurately phasing Oxford Nanopore Technologies (ONT) sequence data with the Shasta genome assembler and a modular tool for extending phasing to the chromosome scale called GFAse. We test using new variants of ONT PromethION sequencing, including those using proximity ligation, and show that newer, higher accuracy ONT reads substantially improve assembly quality.


Asunto(s)
Nanoporos , Humanos , Análisis de Secuencia de ADN/métodos , Secuenciación de Nanoporos/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Genómica/métodos
5.
bioRxiv ; 2024 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-38585714

RESUMEN

Chemical modifications in mRNAs such as pseudouridine (psi) can regulate gene expression, although our understanding of the functional impact of individual psi modifications, especially in neuronal cells, is limited. We apply nanopore direct RNA sequencing to investigate psi dynamics under cellular perturbations in SH-SY5Y cells. We assign sites to psi synthases using siRNA-based knockdown. A steady-state enzyme-substrate model reveals a strong correlation between psi synthase and mRNA substrate levels and psi modification frequencies. Next, we performed either differentiation or lead-exposure to SH-SY5Y cells and found that, upon lead exposure, not differentiation, the modification frequency is less dependent on enzyme levels suggesting translational control. Finally, we compared the plasticity of psi sites across cellular states and found that plastic sites can be condition-dependent or condition-independent; several of these sites fall within transcripts encoding proteins involved in neuronal processes. Our psi analysis and validation enable investigations into the dynamics and plasticity of RNA modifications.

6.
medRxiv ; 2024 Mar 07.
Artículo en Inglés | MEDLINE | ID: mdl-38496498

RESUMEN

Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.

7.
bioRxiv ; 2024 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-38464144

RESUMEN

DNA methylation most commonly occurs as 5-methylcytosine (5-mC) in the human genome and has been associated with human diseases. Recent developments in single-molecule sequencing technologies (Oxford Nanopore Technologies (ONT) and Pacific Biosciences) have enabled readouts of long, native DNA molecules, including cytosine methylation. ONT recently upgraded their Nanopore sequencing chemistry and kits from R9 to the R10 version, which yielded increased accuracy and sequencing throughput. However the effects on methylation detection have not yet been documented. Here we performed a series of computational analyses to characterize differences in Nanopore-based 5mC detection between the ONT R9 and R10 chemistries. We compared 5mC calls in R9 and R10 for three human genome datasets: a cell line, a frontal cortex brain sample, and a blood sample. We performed an in-depth analysis on CpG islands and homopolymer regions, and documented high concordance for methylation detection among sequencing technologies. The strongest correlation was observed between Nanopore R10 and Illumina bisulfite technologies for cell line-derived datasets. Subtle differences in methylation datasets between technologies can impact analysis tools such as differential methylation calling software. Our findings show that comparisons can be drawn between methylation data from different Nanopore chemistries using guided hypotheses. This work will facilitate comparison among Nanopore data cohorts derived using different chemistries from large scale sequencing efforts, such as the NIH CARD Long Read Initiative.

8.
bioRxiv ; 2023 Sep 12.
Artículo en Inglés | MEDLINE | ID: mdl-37745389

RESUMEN

Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford nanopore technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy. However, using local haplotype information creates an overhead as variant calling needs to be performed multiple times which ultimately makes it difficult to extend to new data types and platforms as they get introduced. In this work, we have developed a local haplotype approximate method that enables state-of-the-art variant calling performance with multiple sequencing platforms including PacBio Revio system, ONT R10.4 simplex and duplex data. This addition of local haplotype approximation makes DeepVariant a universal variant calling solution for long-read sequencing platforms.

9.
Nat Methods ; 20(10): 1483-1492, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37710018

RESUMEN

Long-read sequencing technologies substantially overcome the limitations of short-reads but have not been considered as a feasible replacement for population-scale projects, being a combination of too expensive, not scalable enough or too error-prone. Here we develop an efficient and scalable wet lab and computational protocol, Napu, for Oxford Nanopore Technologies long-read sequencing that seeks to address those limitations. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the National Institutes of Health Center for Alzheimer's and Related Dementias. Using a single PromethION flow cell, we can detect single nucleotide polymorphisms with F1-score comparable to Illumina short-read sequencing. Small indel calling remains difficult within homopolymers and tandem repeats, but achieves good concordance to Illumina indel calls elsewhere. Further, we can discover structural variants with F1-score on par with state-of-the-art de novo assembly methods. Our protocol phases small and structural variants at megabase scales and produces highly accurate, haplotype-specific methylation calls.


Asunto(s)
Genoma Humano , Secuenciación de Nanoporos , Humanos , Análisis de Secuencia de ADN/métodos , Haplotipos , Metilación , Proyectos Piloto , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
10.
Nature ; 621(7978): 344-354, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37612512

RESUMEN

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.


Asunto(s)
Cromosomas Humanos Y , Genómica , Análisis de Secuencia de ADN , Humanos , Secuencia de Bases , Cromosomas Humanos Y/genética , ADN Satélite/genética , Variación Genética/genética , Genética de Población , Genómica/métodos , Genómica/normas , Heterocromatina/genética , Familia de Multigenes/genética , Estándares de Referencia , Duplicaciones Segmentarias en el Genoma/genética , Análisis de Secuencia de ADN/normas , Secuencias Repetidas en Tándem/genética , Telómero/genética
11.
Nat Biomed Eng ; 7(12): 1627-1635, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37652985

RESUMEN

Liquid biopsies provide a means for the profiling of cell-free RNAs secreted by cells throughout the body. Although well-annotated coding and non-coding transcripts in blood are readily detectable and can serve as biomarkers of disease, the overall diagnostic utility of the cell-free transcriptome remains unclear. Here we show that RNAs derived from transposable elements and other repeat elements are enriched in the cell-free transcriptome of patients with cancer, and that they serve as signatures for the accurate classification of the disease. We used repeat-element-aware liquid-biopsy technology and single-molecule nanopore sequencing to profile the cell-free transcriptome in plasma from patients with cancer and to examine millions of genomic features comprising all annotated genes and repeat elements throughout the genome. By aggregating individual repeat elements to the subfamily level, we found that samples with pancreatic cancer are enriched with specific Alu subfamilies, whereas other cancers have their own characteristic cell-free RNA profile. Our findings show that repetitive RNA sequences are abundant in blood and can be used as disease-specific diagnostic biomarkers.


Asunto(s)
Neoplasias , ARN , Humanos , ARN/genética , Secuencia de Bases , Elementos Transponibles de ADN , Plasma , Neoplasias/diagnóstico , Neoplasias/genética , Biomarcadores
12.
medRxiv ; 2023 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-37205357

RESUMEN

GC-rich tandem repeat expansions (TREs) are often associated with DNA methylation, gene silencing and folate-sensitive fragile sites and underlie several congenital and late-onset disorders. Through a combination of DNA methylation profiling and tandem repeat genotyping, we identified 24 methylated TREs and investigated their effects on human traits using PheWAS in 168,641 individuals from the UK Biobank, identifying 156 significant TRE:trait associations involving 17 different TREs. Of these, a GCC expansion in the promoter of AFF3 was linked with a 2.4-fold reduced probability of completing secondary education, an effect size comparable to several recurrent pathogenic microdeletions. In a cohort of 6,371 probands with neurodevelopmental problems of suspected genetic etiology, we observed a significant enrichment of AFF3 expansions compared to controls. With a population prevalence that is at least 5-fold higher than the TRE that causes fragile X syndrome, AFF3 expansions represent a significant cause of neurodevelopmental delay.

13.
Nature ; 617(7960): 312-324, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-37165242

RESUMEN

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.


Asunto(s)
Genoma Humano , Genómica , Humanos , Diploidia , Genoma Humano/genética , Haplotipos/genética , Análisis de Secuencia de ADN , Genómica/normas , Estándares de Referencia , Estudios de Cohortes , Alelos , Variación Genética
14.
bioRxiv ; 2023 Apr 06.
Artículo en Inglés | MEDLINE | ID: mdl-37066160

RESUMEN

Nanopore direct RNA sequencing (DRS) enables measurements of native RNA modifications. Modification-free transcripts are an important control for DRS. Additionally, it is advantageous to have canonical transcripts from multiple cell lines to better account for human transcriptome variation. Here we generated and analyzed Nanopore DRS datasets for five human cell lines using in vitro transcribed (IVT) RNA. We compared performance statistics amongst biological replicates. We also documented nucleotide and ionic current level variation across cell lines. These data will serve as a resource to the community for RNA modification analysis.

15.
bioRxiv ; 2023 Feb 22.
Artículo en Inglés | MEDLINE | ID: mdl-36865218

RESUMEN

As a step towards simplifying and reducing the cost of haplotype resolved de novo assembly, we describe new methods for accurately phasing nanopore data with the Shasta genome assembler and a modular tool for extending phasing to the chromosome scale called GFAse. We test using new variants of Oxford Nanopore Technologies' (ONT) PromethION sequencing, including those using proximity ligation and show that newer, higher accuracy ONT reads substantially improve assembly quality.

16.
Biofilm ; 5: 100108, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-36938359

RESUMEN

Urine, humidity condensate, and other sources of non-potable water are processed onboard the International Space Station (ISS) by the Water Recovery System (WRS) yielding potable water. While some means of microbial control are in place, including a phosphoric acid/hexavalent chromium urine pretreatment solution, many areas within the WRS are not available for routine microbial monitoring. Due to refurbishment needs, two flex lines from the Urine Processor Assembly (UPA) within the WRS were removed and returned to Earth. The water from within these lines, as well as flush water, was microbially evaluated. Culture and culture-independent analysis revealed the presence of Burkholderia, Paraburkholderia, and Leifsonia. Fungal culture also identified Fusarium and Lecythophora. Hybrid de novo genome analysis of the five distinct Burkholderia isolates identified them as B. contaminans, while the two Paraburkholderia isolates were identified as P. fungorum. Chromate-resistance gene clusters were identified through pangenomic analysis that differentiated these genomes from previously studied isolates recovered from the point-of-use potable water dispenser and/or current NCBI references, indicating that unique populations exist within distinct niches in the WRS. Beyond genomic analysis, fixed samples directly from the lines were imaged by environmental scanning electron microscopy, which detailed networks of fungal-bacterial biofilms. This is the first evidence of biofilm formation within flex lines from the UPA onboard the ISS. For all bacteria isolated, biofilm potential was further characterized, with the B. contaminans isolates demonstrating the most considerable biofilm formation. Moreover, the genomes of the B. contaminans revealed secondary metabolite gene clusters associated with quorum sensing, biofilm formation, antifungal compounds, and hemolysins. The potential production of these gene cluster metabolites was phenotypically evaluated through biofilm, bacterial-fungal interaction, and hemolytic assays. Collectively, these data identify the UPA flex lines as a unique ecological niche and novel area of biofilm growth within the WRS. Further investigation of these organisms and their resistance profiles will enable engineering controls directed toward biofilm prevention in future space station water systems.

17.
bioRxiv ; 2023 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-36711673

RESUMEN

Long-read sequencing technologies substantially overcome the limitations of short-reads but to date have not been considered as feasible replacement at scale due to a combination of being too expensive, not scalable enough, or too error-prone. Here, we develop an efficient and scalable wet lab and computational protocol for Oxford Nanopore Technologies (ONT) long-read sequencing that seeks to provide a genuine alternative to short-reads for large-scale genomics projects. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the NIH Center for Alzheimer's and Related Dementias (CARD). Using a single PromethION flow cell, we can detect SNPs with F1-score better than Illumina short-read sequencing. Small indel calling remains to be difficult inside homopolymers and tandem repeats, but is comparable to Illumina calls elsewhere. Further, we can discover structural variants with F1-score comparable to state-of the-art methods involving Pacific Biosciences HiFi sequencing and trio information (but at a lower cost and greater throughput). Using ONT based phasing, we can then combine and phase small and structural variants at megabase scales. Our protocol also produces highly accurate, haplotype-specific methylation calls. Overall, this makes large-scale long-read sequencing projects feasible; the protocol is currently being used to sequence thousands of brain-based genomes as a part of the NIH CARD initiative. We provide the protocol and software as open-source integrated pipelines for generating phased variant calls and assemblies.

18.
Nature ; 611(7936): 519-531, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-36261518

RESUMEN

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.


Asunto(s)
Mapeo Cromosómico , Diploidia , Genoma Humano , Genómica , Humanos , Mapeo Cromosómico/normas , Genoma Humano/genética , Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/normas , Estándares de Referencia , Genómica/métodos , Genómica/normas , Cromosomas Humanos/genética , Variación Genética/genética
20.
Cell Genom ; 2(5)2022 May 11.
Artículo en Inglés | MEDLINE | ID: mdl-35720974

RESUMEN

The precisionFDA Truth Challenge V2 aimed to assess the state of the art of variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 variant call sets for one or more sequencing technologies (Illumina, PacBio HiFi, and Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with updated Genome in a Bottle benchmark sets and genome stratifications. Challenge submissions included numerous innovative methods, with graph-based and machine learning methods scoring best for short-read and long-read datasets, respectively. With machine learning approaches, combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA