Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nat Methods ; 18(11): 1322-1332, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34725481

RESUMO

Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-generation nanopore sequence data have demonstrated a long read length, but current interpretation methods for their novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline, PEPPER-Margin-DeepVariant, that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single-nucleotide-variant identification method at the whole-genome scale and produces high-quality single-nucleotide variants in segmental duplications and low-mappability regions where short-read-based genotyping fails. We show that our pipeline can provide highly contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% and 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance over the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio HiFi-polished).


Assuntos
Genes , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Nanoporos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Software , Genoma Humano , Humanos , Anotação de Sequência Molecular
2.
BMC Bioinformatics ; 24(1): 197, 2023 May 12.
Artigo em Inglês | MEDLINE | ID: mdl-37173615

RESUMO

Large-scale population variant data is often used to filter and aid interpretation of variant calls in a single sample. These approaches do not incorporate population information directly into the process of variant calling, and are often limited to filtering which trades recall for precision. In this study, we develop population-aware DeepVariant models with a new channel encoding allele frequencies from the 1000 Genomes Project. This model reduces variant calling errors, improving both precision and recall in single samples, and reduces rare homozygous and pathogenic clinvar calls cohort-wide. We assess the use of population-specific or diverse reference panels, finding the greatest accuracy with diverse panels, suggesting that large, diverse panels are preferable to individual populations, even when the population matches sample ancestry. Finally, we show that this benefit generalizes to samples with different ancestry from the training data even when the ancestry is also excluded from the reference panel.


Assuntos
Aprendizado Profundo , Humanos , Frequência do Gene , Sequenciamento Completo do Genoma , Estudo de Associação Genômica Ampla , Genoma Humano , Polimorfismo de Nucleotídeo Único , Sequenciamento de Nucleotídeos em Larga Escala
4.
bioRxiv ; 2024 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-39345401

RESUMO

Accurate genome assemblies are essential for biological research, but even the highest quality assemblies retain errors caused by the technologies used to construct them. Base-level errors are typically fixed with an additional polishing step that uses reads aligned to the draft assembly to identify necessary edits. However, current methods struggle to find a balance between over-and under-polishing. Here, we present an encoder-only transformer model for assembly polishing called DeepPolisher, which predicts corrections to the underlying sequence using Pacbio HiFi read alignments to a diploid assembly. Our pipeline introduces a method, PHARAOH (Phasing Reads in Areas Of Homozygosity), which uses ultra-long ONT data to ensure alignments are accurately phased and to correctly introduce heterozygous edits in falsely homozygous regions. We demonstrate that the DeepPolisher pipeline can reduce assembly errors by half, with a greater than 70% reduction in indel errors. We have applied our DeepPolisher-based pipeline to 180 assemblies from the next Human Pangenome Reference Consortium (HPRC) data release, producing an average predicted Quality Value (QV) improvement of 3.4 (54% error reduction) for the majority of the genome.

5.
Nat Commun ; 15(1): 5907, 2024 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-39003259

RESUMO

Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy. However, using local haplotype information creates an overhead as variant calling needs to be performed multiple times which ultimately makes it difficult to extend to new data types and platforms as they get introduced. In this work, we have developed a local haplotype approximate method that enables state-of-the-art variant calling performance with multiple sequencing platforms including PacBio Revio system, ONT R10.4 simplex and duplex data. This addition of local haplotype approximation simplifies long-read variant calling with DeepVariant.


Assuntos
Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Haplótipos/genética , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Polimorfismo de Nucleotídeo Único , Genoma Humano , Algoritmos , Variação Genética , Redes Neurais de Computação
6.
Sci Data ; 11(1): 20, 2024 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-38172163

RESUMO

X-ray coronary angiography is the most common tool for the diagnosis and treatment of coronary artery disease. It involves the injection of contrast agents into coronary vessels using a catheter to highlight the coronary vessel structure. Typically, multiple 2D X-ray projections are recorded from different angles to improve visualization. Recent advances in the development of deep-learning-based tools promise significant improvement in diagnosing and treating coronary artery disease. However, the limited public availability of annotated X-ray coronary angiography image datasets presents a challenge for objective assessment and comparison of existing tools and the development of novel methods. To address this challenge, we introduce a novel ARCADE dataset with 2 objectives: coronary vessel classification and stenosis detection. Each objective contains 1500 expert-labeled X-ray coronary angiography images representing: i) coronary artery segments; and ii) the locations of stenotic plaques. These datasets will serve as a benchmark for developing new methods and assessing existing approaches for the automated diagnosis and risk assessment of coronary artery disease.


Assuntos
Doença da Artéria Coronariana , Humanos , Catéteres , Meios de Contraste , Angiografia Coronária/métodos , Doença da Artéria Coronariana/diagnóstico por imagem , Raios X
7.
bioRxiv ; 2024 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-39229187

RESUMO

Somatic variant detection is an integral part of cancer genomics analysis. While most methods have focused on short-read sequencing, long-read technologies now offer potential advantages in terms of repeat mapping and variant phasing. We present DeepSomatic, a deep learning method for detecting somatic SNVs and insertions and deletions (indels) from both short-read and long-read data, with modes for whole-genome and exome sequencing, and able to run on tumor-normal, tumor-only, and with FFPE-prepared samples. To help address the dearth of publicly available training and benchmarking data for somatic variant detection, we generated and make openly available a dataset of five matched tumor-normal cell line pairs sequenced with Illumina, PacBio HiFi, and Oxford Nanopore Technologies, along with benchmark variant sets. Across samples and technologies (short-read and long-read), DeepSomatic consistently outperforms existing callers, particularly for indels.

8.
medRxiv ; 2024 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-38585974

RESUMO

Most current studies rely on short-read sequencing to detect somatic structural variation (SV) in cancer genomes. Long-read sequencing offers the advantage of better mappability and long-range phasing, which results in substantial improvements in germline SV detection. However, current long-read SV detection methods do not generalize well to the analysis of somatic SVs in tumor genomes with complex rearrangements, heterogeneity, and aneuploidy. Here, we present Severus: a method for the accurate detection of different types of somatic SVs using a phased breakpoint graph approach. To benchmark various short- and long-read SV detection methods, we sequenced five tumor/normal cell line pairs with Illumina, Nanopore, and PacBio sequencing platforms; on this benchmark Severus showed the highest F1 scores (harmonic mean of the precision and recall) as compared to long-read and short-read methods. We then applied Severus to three clinical cases of pediatric cancer, demonstrating concordance with known genetic findings as well as revealing clinically relevant cryptic rearrangements missed by standard genomic panels.

9.
bioRxiv ; 2023 Sep 12.
Artigo em Inglês | MEDLINE | ID: mdl-37745389

RESUMO

Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford nanopore technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy. However, using local haplotype information creates an overhead as variant calling needs to be performed multiple times which ultimately makes it difficult to extend to new data types and platforms as they get introduced. In this work, we have developed a local haplotype approximate method that enables state-of-the-art variant calling performance with multiple sequencing platforms including PacBio Revio system, ONT R10.4 simplex and duplex data. This addition of local haplotype approximation makes DeepVariant a universal variant calling solution for long-read sequencing platforms.

10.
Nat Biotechnol ; 41(2): 232-238, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36050551

RESUMO

Circular consensus sequencing with Pacific Biosciences (PacBio) technology generates long (10-25 kilobases), accurate 'HiFi' reads by combining serial observations of a DNA molecule into a consensus sequence. The standard approach to consensus generation, pbccs, uses a hidden Markov model. We introduce DeepConsensus, which uses an alignment-based loss to train a gap-aware transformer-encoder for sequence correction. Compared to pbccs, DeepConsensus reduces read errors by 42%. This increases the yield of PacBio HiFi reads at Q20 by 9%, at Q30 by 27% and at Q40 by 90%. With two SMRT Cells of HG003, reads from DeepConsensus improve hifiasm assembly contiguity (NG50 4.9 megabases (Mb) to 17.2 Mb), increase gene completeness (94% to 97%), reduce the false gene duplication rate (1.1% to 0.5%), improve assembly base accuracy (Q43 to Q45) and reduce variant-calling errors by 24%. DeepConsensus models could be trained to the general problem of analyzing the alignment of other types of sequences, such as unique molecular identifiers or genome assemblies.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA
11.
Addict Behav ; 135: 107440, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-35973384

RESUMO

BACKGROUND: In 2020, the British Government initiated a review about whether to introduce stricter controls on gambling marketing. We examine: (i) what proportion of regular sports bettors and emergent adult gamblers report that marketing has prompted unplanned spend; and (ii) what factors are associated with reporting that marketing had prompted unplanned spend. METHODS: Data are from two British non-probability online surveys with: (i) emerging adults (16-24 years; n = 3,549; July/August 2019) and (ii) regular sports bettors (18+; n = 3,195; November 2020). Among current gamblers, logistic regressions examined whether reporting that gambling marketing had prompted unplanned spend (vs never) was associated with past-month marketing awareness, past-month receipt of direct marketing (e.g., e-mails), following gambling brands on social media, and problem gambling classification. RESULTS: Almost a third of current gamblers reported that marketing had prompted unplanned gambling spend (sports bettors: 31.2 %; emerging adults: 29.5 %). Escalated severity of problem gambling was associated with reporting that marketing had prompted unplanned spend in both samples, in particular those experiencing gambling problems compared to those experiencing no problems (sports bettors: ORAdj = 17.01, 95 % CI: 10.61-27.27; emerging adults: ORAdj = 11.67, 95 % CI: 6.43-21.12). Receipt of least one form of direct marketing in the past month and following a gambling brand on at least one social media platform was also associated unplanned spend among sports bettors and emerging adults. CONCLUSION: Among emerging adults and regular sports bettors, increased severity of gambling problems, receiving direct marketing, and following gambling brands on social media are associated with reporting that marketing has prompted unplanned spend.


Assuntos
Jogo de Azar , Esportes , Adulto , Estudos Transversais , Jogo de Azar/epidemiologia , Humanos , Marketing , Inquéritos e Questionários
12.
Nat Biotechnol ; 40(7): 1035-1041, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35347328

RESUMO

Whole-genome sequencing (WGS) can identify variants that cause genetic disease, but the time required for sequencing and analysis has been a barrier to its use in acutely ill patients. In the present study, we develop an approach for ultra-rapid nanopore WGS that combines an optimized sample preparation protocol, distributing sequencing over 48 flow cells, near real-time base calling and alignment, accelerated variant calling and fast variant filtration for efficient manual review. Application to two example clinical cases identified a candidate variant in <8 h from sample preparation to variant identification. We show that this framework provides accurate variant calls and efficient prioritization, and accelerates diagnostic clinical genome sequencing twofold compared with previous approaches.


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Mapeamento Cromossômico , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Sequenciamento Completo do Genoma/métodos
13.
Cell Genom ; 2(5)2022 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-35720974

RESUMO

The precisionFDA Truth Challenge V2 aimed to assess the state of the art of variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 variant call sets for one or more sequencing technologies (Illumina, PacBio HiFi, and Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with updated Genome in a Bottle benchmark sets and genome stratifications. Challenge submissions included numerous innovative methods, with graph-based and machine learning methods scoring best for short-read and long-read datasets, respectively. With machine learning approaches, combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.

14.
Commun Biol ; 4(1): 1269, 2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34741098

RESUMO

There is currently a dearth of accessible whole genome sequencing (WGS) data for individuals residing in the Americas with Sub-Saharan African ancestry. We generated whole genome sequencing data at intermediate (15×) coverage for 2,294 individuals with large amounts of Sub-Saharan African ancestry, predominantly Atlantic African admixed with varying amounts of European and American ancestry. We performed extensive comparisons of variant callers, phasing algorithms, and variant filtration on these data to construct a high quality imputation panel containing data from 2,269 unrelated individuals. With the exception of the TOPMed imputation server (which notably cannot be downloaded), our panel substantially outperformed other available panels when imputing African American individuals. The raw sequencing data, variant calls and imputation panel for this cohort are all freely available via dbGaP and should prove an invaluable resource for further study of admixed African genetics.


Assuntos
Genoma Humano , Genótipo , Adulto , Negro ou Afro-Americano , Idoso , Idoso de 80 Anos ou mais , Humanos , Pessoa de Meia-Idade , Estados Unidos , Sequenciamento Completo do Genoma , Adulto Jovem
15.
JACC Case Rep ; 2(2): 312-313, 2020 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34317230

RESUMO

We report the case of a young woman with chest pain and recurrent abortion. The patient was found to have Takayasu arteritis. Drug therapy was started, and emergency bypass surgery was performed. The case showed the possible clinical manifestation of vasculitis as a recurrent abortion, followed by total occlusion of the left main coronary artery. (Level of Difficulty: Intermediate.).

16.
Nat Biotechnol ; 37(10): 1155-1162, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31406327

RESUMO

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.


Assuntos
DNA Circular/genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Variação Genética , Haplótipos , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA