Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 503
Filtrar
1.
BMC Bioinformatics ; 22(1): 20, 2021 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-33413082

RESUMO

BACKGROUND: Next generation sequencing (NGS) technologies have improved the study of hereditary diseases. Since the evaluation of bioinformatics pipelines is not straightforward, NGS demands effective strategies to analyze data that is of paramount relevance for decision making under a clinical scenario. According to the benchmarking framework of the Global Alliance for Genomics and Health (GA4GH), we implemented a new simple and user-friendly set-theory based method to assess variant callers using a gold standard variant set and high confidence regions. As model, we used TruSight Cardio kit sequencing data of the reference genome NA12878. This targeted sequencing kit is used to identify variants in key genes related to Inherited Cardiac Conditions (ICCs), a group of cardiovascular diseases with high rates of morbidity and mortality. RESULTS: We implemented and compared three variant calling pipelines (Isaac, Freebayes, and VarScan). Performance metrics using our set-theory approach showed high-resolution pipelines and revealed: (1) a perfect recall of 1.000 for all three pipelines, (2) very high precision values, i.e. 0.987 for Freebayes, 0.928 for VarScan, and 1.000 for Isaac, when compared with the reference material, and (3) a ROC curve analysis with AUC > 0.94 for all cases. Moreover, significant differences were obtained between the three pipelines. In general, results indicate that the three pipelines were able to recognize the expected variants in the gold standard data set. CONCLUSIONS: Our set-theory approach to calculate metrics was able to identify the expected ICCs related variants by the three selected pipelines, but results were completely dependent on the algorithms. We emphasize the importance to assess pipelines using gold standard materials to achieve the most reliable results for clinical application.


Assuntos
Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala , Benchmarking , Biologia Computacional/métodos , Biologia Computacional/normas , Bases de Dados Genéticas , Predisposição Genética para Doença/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Software
2.
PLoS Genet ; 16(12): e1009242, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33315859

RESUMO

Deletions and duplications in mitochondrial DNA (mtDNA) cause mitochondrial disease and accumulate in conditions such as cancer and age-related disorders, but validated high-throughput methodology that can readily detect and discriminate between these two types of events is lacking. Here we establish a computational method, MitoSAlt, for accurate identification, quantification and visualization of mtDNA deletions and duplications from genomic sequencing data. Our method was tested on simulated sequencing reads and human patient samples with single deletions and duplications to verify its accuracy. Application to mouse models of mtDNA maintenance disease demonstrated the ability to detect deletions and duplications even at low levels of heteroplasmy.


Assuntos
DNA Mitocondrial/genética , Deleção de Genes , Duplicação Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Animais , DNA Mitocondrial/química , Sequenciamento de Nucleotídeos em Larga Escala/normas , Camundongos , Reprodutibilidade dos Testes , Análise de Sequência de DNA/normas
3.
Nat Commun ; 11(1): 5040, 2020 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-33028839

RESUMO

Bringing together cancer genomes from different projects increases power and allows the investigation of pan-cancer, molecular mechanisms. However, working with whole genomes sequenced over several years in different sequencing centres requires a framework to compare the quality of these sequences. We used the Pan-Cancer Analysis of Whole Genomes cohort as a test case to construct such a framework. This cohort contains whole cancer genomes of 2832 donors from 18 sequencing centres. We developed a non-redundant set of five quality control (QC) measurements to establish a star rating system. These QC measures reflect known differences in sequencing protocol and provide a guide to downstream analyses and allow for exclusion of samples of poor quality. We have found that this is an effective framework of quality measures. The implementation of the framework is available at: https://dockstore.org/containers/quay.io/jwerner_dkfz/pancanqc:1.2.2 .


Assuntos
Genoma Humano/genética , Genômica/normas , Neoplasias/genética , Controle de Qualidade , Mapeamento Cromossômico/normas , Cromossomos Humanos/genética , Análise Mutacional de DNA/normas , Feminino , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Masculino , Mutação , Software , Sequenciamento Completo do Genoma/normas
4.
BMC Bioinformatics ; 21(1): 429, 2020 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-33004007

RESUMO

BACKGROUND: PacBio sequencing is an incredibly valuable third-generation DNA sequencing method due to very long read lengths, ability to detect methylated bases, and its real-time sequencing methodology. Yet, hitherto no tool was available for analyzing the quality of, subsampling, and filtering PacBio data. RESULTS: Here we present SequelTools, a command-line program containing three tools: Quality Control, Read Subsampling, and Read Filtering. The Quality Control tool quickly processes PacBio Sequel raw sequence data from multiple SMRTcells producing multiple statistics and publication-quality plots describing the quality of the data including N50, read length and count statistics, PSR, and ZOR. The Read Subsampling tool allows the user to subsample reads by one or more of the following criteria: longest subreads per CLR or random CLR selection. The Read Filtering tool provides options for normalizing data by filtering out certain low-quality scraps reads and/or by minimum CLR length. SequelTools is implemented in bash, R, and Python using only standard libraries and packages and is platform independent. CONCLUSIONS: SequelTools is a program that provides the only free, fast, and easy-to-use quality control tool, and the only program providing this kind of read subsampling and read filtering for PacBio Sequel raw sequence data, and is available at https://github.com/ISUgenomics/SequelTools .


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Arabidopsis/genética , Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala/normas , Controle de Qualidade
5.
Genes (Basel) ; 11(8)2020 08 17.
Artigo em Inglês | MEDLINE | ID: mdl-32824573

RESUMO

The COVID-19 pandemic has spread very fast around the world. A few days after the first detected case in South Africa, an infection started in a large hospital outbreak in Durban, KwaZulu-Natal (KZN). Phylogenetic analysis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes can be used to trace the path of transmission within a hospital. It can also identify the source of the outbreak and provide lessons to improve infection prevention and control strategies. This manuscript outlines the obstacles encountered in order to genotype SARS-CoV-2 in near-real time during an urgent outbreak investigation. This included problems with the length of the original genotyping protocol, unavailability of reagents, and sample degradation and storage. Despite this, three different library preparation methods for Illumina sequencing were set up, and the hands-on library preparation time was decreased from twelve to three hours, which enabled the outbreak investigation to be completed in just a few weeks. Furthermore, the new protocols increased the success rate of sequencing whole viral genomes. A simple bioinformatics workflow for the assembly of high-quality genomes in near-real time was also fine-tuned. In order to allow other laboratories to learn from our experience, all of the library preparation and bioinformatics protocols are publicly available at protocols.io and distributed to other laboratories of the Network for Genomics Surveillance in South Africa (NGS-SA) consortium.


Assuntos
Betacoronavirus/genética , Infecções por Coronavirus/diagnóstico , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Técnicas de Diagnóstico Molecular/métodos , Pneumonia Viral/diagnóstico , Sequenciamento Completo do Genoma/métodos , Betacoronavirus/patogenicidade , Infecções por Coronavirus/virologia , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Técnicas de Diagnóstico Molecular/normas , Pandemias , Pneumonia Viral/virologia , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Sequenciamento Completo do Genoma/normas
6.
Nat Biotechnol ; 38(9): 1044-1053, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32686750

RESUMO

De novo assembly of a human genome using nanopore long-read sequences has been reported, but it used more than 150,000 CPU hours and weeks of wall-clock time. To enable rapid human genome assembly, we present Shasta, a de novo long-read assembler, and polishing algorithms named MarginPolish and HELEN. Using a single PromethION nanopore sequencer and our toolkit, we assembled 11 highly contiguous human genomes de novo in 9 d. We achieved roughly 63× coverage, 42-kb read N50 values and 6.5× coverage in reads >100 kb using three flow cells per sample. Shasta produced a complete haploid human genome assembly in under 6 h on a single commercial compute node. MarginPolish and HELEN polished haploid assemblies to more than 99.9% identity (Phred quality score QV = 30) with nanopore reads alone. Addition of proximity-ligation sequencing enabled near chromosome-level scaffolds for all 11 genomes. We compare our assembly performance to existing methods for diploid, haploid and trio-binned human samples and report superior accuracy and speed.


Assuntos
Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento por Nanoporos , Análise de Sequência de DNA/métodos , Algoritmos , Benchmarking , Cromossomos Humanos/genética , Aprendizado Profundo , Genômica , Antígenos HLA/genética , Haploidia , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Análise de Sequência de DNA/normas
7.
Virulence ; 11(1): 964-967, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32726172

RESUMO

Currently, testing for coronavirus is performed with time and personnel consuming PCR assays. The aim of this study was to evaluate the sensitivity, specificity and capacity of a fully automated, random access high-throughput real-time PCR-based diagnostic platform for the detection of SARS-CoV-2. The NeuMoDx N96 system displayed an equal or better detection rate for SARS-CoV-2 compared with the LightCycler 480II system and showed a specificity of 100%. The median PCR run time for all 28 PCR runs was 91 (IQR 84-97) minutes. The capacity of the NeuMoDx N96 could easily surpass the capacity of most currently used molecular test systems and significantly reduce the turn-around time.


Assuntos
Betacoronavirus/isolamento & purificação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNA Viral/isolamento & purificação , Reação em Cadeia da Polimerase em Tempo Real/métodos , Betacoronavirus/genética , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , RNA Viral/genética , Reação em Cadeia da Polimerase em Tempo Real/instrumentação , Reação em Cadeia da Polimerase em Tempo Real/normas , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Fatores de Tempo
8.
PLoS One ; 15(7): e0235861, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32706774

RESUMO

BACKGROUND: To support the rising need for testing and to standardize tumor DNA sequencing practices within the U.S. Department of Veterans Affairs (VA)'s Veterans Health Administration (VHA), the National Precision Oncology Program (NPOP) was launched in 2016. We sought to assess oncologists' practices, concerns, and perceptions regarding Next-Generation Sequencing (NGS) and the NPOP. MATERIALS AND METHODS: Using a purposive total sampling approach, oncologists who had previously ordered NGS for at least one tumor sample through the NPOP were invited to participate in semi-structured interviews. Questions assessed the following: expectations for the NPOP, procedural requirements, applicability of testing results, and the summative utility of the NPOP. Interviews were assessed using an open coding approach. Thematic analysis was conducted to evaluate the completed codebook. Themes were defined deductively by reviewing the direct responses to interview questions as well as inductively by identifying emerging patterns of data. RESULTS: Of the 105 medical oncologists who were invited to participate, 20 (19%) were interviewed from 19 different VA medical centers in 14 states. Five recurrent themes were observed: (1) Educational Efforts Regarding Tumor DNA Sequencing Should be Undertaken, (2) Pathology Departments Share a Critical Role in Facilitating Test Completion, (3) Tumor DNA Sequencing via NGS Serves as the Most Comprehensive Testing Modality within Precision Oncology, (4) The Availability of the NPOP Has Expanded Options for Select Patients, and (5) The Completion of Tumor DNA Sequencing through the NPOP Could Help Improve Research Efforts within VHA Oncology Practices. CONCLUSION: Medical oncologists believe that the availability of tumor DNA sequencing through the NPOP could potentially lead to an improvement in outcomes for veterans with metastatic solid tumors. Efforts should be directed toward improving oncologists' understanding of sequencing, strengthening collaborative relationships between oncologists and pathologists, and assessing the role of comprehensive NGS panels within the battery of precision tests.


Assuntos
Conhecimentos, Atitudes e Prática em Saúde , Sequenciamento de Nucleotídeos em Larga Escala/normas , Neoplasias/genética , Oncologistas/psicologia , Análise de Sequência de DNA/normas , United States Department of Veterans Affairs , Adulto , Detecção Precoce de Câncer/normas , Feminino , Testes Genéticos/normas , Humanos , Masculino , Pessoa de Meia-Idade , Neoplasias/diagnóstico , Medicina de Precisão/normas , Planos Governamentais de Saúde , Inquéritos e Questionários , Estados Unidos
9.
Nat Commun ; 11(1): 2704, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32483174

RESUMO

Index hopping is the main cause of incorrect sample assignment of sequencing reads in multiplexed pooled libraries. We introduce a statistical model for estimating the sample index-hopping rate in multiplexed droplet-based single-cell RNA-seq data and for probabilistic inference of the true sample of origin of hopped reads. We analyze several datasets and estimate the sample index hopping probability to range between 0.003-0.009, a small number that counter-intuitively gives rise to a large fraction of phantom molecules - the fraction of phantom molecules exceeds 8% in more than 25% of samples and reaches as high as 85% in low-complexity samples. Phantom molecules lead to widespread complications in downstream analyses, including transcriptome mixing across cells, emergence of phantom copies of cells from other samples, and misclassification of empty droplets as cells. We demonstrate that our approach can correct for these artifacts by accurately purging the majority of phantom molecules from the data.


Assuntos
Algoritmos , Artefatos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Modelos Estatísticos , RNA/análise , Análise de Célula Única/métodos , Simulação por Computador , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , RNA/genética , Reprodutibilidade dos Testes , Análise de Célula Única/normas
10.
J Neuropathol Exp Neurol ; 79(7): 763-766, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32483596

RESUMO

The power and widespread use of next-generation sequencing (NGS) in surgical neuropathology has raised questions as to whether NGS might someday fully supplant histologic-based examination. We therefore sought to determine the feasibility of relying on NGS alone for diagnosing infiltrating gliomas. A total of 171 brain lesions in adults, all of which had been analyzed by GlioSeq NGS, comprised the study cohort. Each case was separately diagnosed by 6 reviewers, based solely on age, sex, tumor location, and NGS results. Results were compared with the final integrated diagnoses and scored on the following scale: 0 = either wrong tumor type or correct tumor type but off by 2+ grades; 1 = off by 1 grade; 2 = exactly correct. Histology alone was treated as a seventh reviewer. Overall reviewer accuracy ranged from 81.6% to 94.2%, while histology alone scored 87.1%. For glioblastomas, NGS was more accurate than histology alone (93.8%-97.9% vs 87.5%). The NGS accuracy for grade II and III astrocytoma and oligodendroglioma was only 54.3%-84.8% and 34.4%-87.5%, respectively. Most uncommon gliomas, including BRAF-driven tumors, could not be accurately classified just by NGS. These data indicate that, even in this era of advanced molecular diagnostics, histologic evaluation is still an essential part of optimal patient care.


Assuntos
Neoplasias Encefálicas/diagnóstico , Neoplasias Encefálicas/genética , Glioma/diagnóstico , Glioma/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Adolescente , Adulto , Estudos de Coortes , Feminino , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Masculino , Adulto Jovem
11.
Leukemia ; 34(11): 2934-2950, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32404973

RESUMO

Drug combinations that target critical pathways are a mainstay of cancer care. To improve current approaches to combination treatment of chronic lymphocytic leukemia (CLL) and gain insights into the underlying biology, we studied the effect of 352 drug combination pairs in multiple concentrations by analysing ex vivo drug response of 52 primary CLL samples, which were characterized by "omics" profiling. Known synergistic interactions were confirmed for B-cell receptor (BCR) inhibitors with Bcl-2 inhibitors and with chemotherapeutic drugs, suggesting that this approach can identify clinically useful combinations. Moreover, we uncovered synergistic interactions between BCR inhibitors and afatinib, which we attribute to BCR activation by afatinib through BLK upstream of BTK and PI3K. Combinations of multiple inhibitors of BCR components (e.g., BTK, PI3K, SYK) had effects similar to the single agents. While PI3K and BTK inhibitors produced overall similar effects in combinations with other drugs, we uncovered a larger response heterogeneity of combinations including PI3K inhibitors, predominantly in CLL with mutated IGHV, which we attribute to the target's position within the BCR-signaling pathway. Taken together, our study shows that drug combination effects can be effectively queried in primary cancer cells, which could aid discovery, triage and clinical development of drug combinations.


Assuntos
Antineoplásicos/farmacologia , Avaliação Pré-Clínica de Medicamentos , Resistencia a Medicamentos Antineoplásicos/genética , Leucemia Linfocítica Crônica de Células B/genética , Antineoplásicos/administração & dosagem , Antineoplásicos/efeitos adversos , Protocolos de Quimioterapia Combinada Antineoplásica/efeitos adversos , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Biomarcadores Tumorais , Linhagem Celular Tumoral , Relação Dose-Resposta a Droga , Avaliação Pré-Clínica de Medicamentos/métodos , Avaliação Pré-Clínica de Medicamentos/normas , Sinergismo Farmacológico , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Leucemia Linfocítica Crônica de Células B/diagnóstico , Leucemia Linfocítica Crônica de Células B/tratamento farmacológico , Cultura Primária de Células , Inibidores de Proteínas Quinases/farmacologia , Inibidores de Proteínas Quinases/uso terapêutico , Proteínas Proto-Oncogênicas c-bcl-2/antagonistas & inibidores , Proteínas Proto-Oncogênicas c-bcl-2/genética , Proteínas Proto-Oncogênicas c-bcl-2/metabolismo , Receptores de Antígenos de Linfócitos B/antagonistas & inibidores , Receptores de Antígenos de Linfócitos B/genética , Receptores de Antígenos de Linfócitos B/metabolismo , Reprodutibilidade dos Testes
12.
Anesth Analg ; 131(2): 450-463, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32371742

RESUMO

Perioperative medicine is changing from a "protocol-based" approach to a progressively personalized care model. New molecular techniques and comprehensive perioperative medical records allow for detection of patient-specific phenotypes that may better explain, or even predict, a patient's response to perioperative stress and anesthetic care. Basic science technology has significantly evolved in recent years with the advent of powerful approaches that have translational relevance. It is incumbent on us as a primarily clinical specialty to have an in-depth understanding of rapidly evolving underlying basic science techniques to incorporate such approaches into our own research, critically interpret the literature, and improve future anesthesia patient care. This review focuses on 3 important and most likely practice-changing basic science techniques: next-generation sequencing (NGS), clustered regularly interspaced short palindromic repeat (CRISPR) modulations, and inducible pluripotent stem cells (iPSCs). Each technique will be described, potential advantages and limitations discussed, open questions and challenges addressed, and future developments outlined. We hope to provide insight for practicing physicians when confronted with basic science articles and encourage investigators to apply "state-of-the-art" technology to their future experiments.


Assuntos
Anestesiologia/tendências , Pesquisa Biomédica/tendências , Guias de Prática Clínica como Assunto , Projetos de Pesquisa/tendências , Anestesiologia/normas , Pesquisa Biomédica/normas , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Sequenciamento de Nucleotídeos em Larga Escala/normas , Sequenciamento de Nucleotídeos em Larga Escala/tendências , Humanos , Células-Tronco Pluripotentes Induzidas/transplante , Guias de Prática Clínica como Assunto/normas
13.
PLoS One ; 15(4): e0230594, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32271772

RESUMO

Normalization is crucial in RNA-seq data analyses. Due to the existence of excessive zeros and a large number of small measures, it is challenging to find reliable linear rescaling normalization parameters. We propose a Zipf plot based normalization method (ZN) assuming that all gene profiles have similar upper tail behaviors in their expression distributions. The new normalization method uses global information of all genes in the same profile without gene-level expression alteration. It doesn't require the majority of genes to be not differentially expressed (DE), and can be applied to data where the majority of genes are weakly or not expressed. Two normalization schemes are implemented with ZN: a linear rescaling scheme and a non-linear transformation scheme. The linear rescaling scheme can be applied alone or together with the non-linear normalization scheme. The performance of ZN is benchmarked against five popular linear normalization methods for RNA-seq data. Results show that the linear rescaling normalization scheme by itself works well and is robust. The non-linear normalization scheme can further improve the normalization outcomes and is optional if the Zipf plots show parallel patterns.


Assuntos
Biologia Computacional/normas , Perfilação da Expressão Gênica/normas , RNA-Seq/normas , Análise de Sequência de RNA/normas , Design de Software , Linhagem Celular , Biologia Computacional/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , RNA-Seq/métodos , Padrões de Referência , Valores de Referência , Análise de Sequência de RNA/métodos , Software , Transcriptoma , Sequenciamento Completo do Exoma/métodos , Sequenciamento Completo do Exoma/normas
14.
Virus Genes ; 56(3): 288-297, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32193781

RESUMO

The capability of high-throughput sequencing (HTS) for detection of known and unknown viruses timely makes it a powerful tool for public health emergency response. Third-generation sequencing (TGS) offers advantages in speed and length of detection over second-generation sequencing (SGS). Here, we presented the end-to-end workflows for both Oxford Nanopore MinION and Pacbio Sequel on a viral disease emergency event, along with Ion Torrent PGM as a reference. A specific pipeline for comparative analysis on viral genomes recovered by each platform was assembled, given the high errors of base-calling for TGS platforms. All the three platforms successfully identified and recovered at least 85% Norovirus GII genomes. Oxford Nanopore MinION spent the least sample-to-answer turnaround time with relatively low but enough accuracy for taxonomy classification. Pacbio Sequel recovered the most accurate viral genome, while spending the longest time. Overall, Nanopore metagenomics can rapidly characterize viruses, and Pacbio Sequel can accurately recover viruses. This study provides a framework for designing the appropriate experiments that are likely to lead to accurate and rapid virus emergency response.


Assuntos
Emergências , Sequenciamento de Nucleotídeos em Larga Escala , Saúde Pública , Viroses/epidemiologia , Viroses/virologia , Vírus/classificação , Vírus/genética , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Filogenia , Vigilância em Saúde Pública
15.
BMC Biol ; 18(1): 24, 2020 03 02.
Artigo em Inglês | MEDLINE | ID: mdl-32122347

RESUMO

BACKGROUND: Contaminant DNA is a well-known confounding factor in molecular biology and in genomic repositories. Strikingly, analysis workflows for whole-genome sequencing (WGS) data commonly do not account for errors potentially introduced by contamination, which could lead to the wrong assessment of allele frequency both in basic and clinical research. RESULTS: We used a taxonomic filter to remove contaminant reads from more than 4000 bacterial samples from 20 different studies and performed a comprehensive evaluation of the extent and impact of contaminant DNA in WGS. We found that contamination is pervasive and can introduce large biases in variant analysis. We showed that these biases can result in hundreds of false positive and negative SNPs, even for samples with slight contamination. Studies investigating complex biological traits from sequencing data can be completely biased if contamination is neglected during the bioinformatic analysis, and we demonstrate that removing contaminant reads with a taxonomic classifier permits more accurate variant calling. We used both real and simulated data to evaluate and implement reliable, contamination-aware analysis pipelines. CONCLUSION: As sequencing technologies consolidate as precision tools that are increasingly adopted in the research and clinical context, our results urge for the implementation of contamination-aware analysis pipelines. Taxonomic classifiers are a powerful tool to implement such pipelines.


Assuntos
Bactérias/genética , Contaminação por DNA , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/normas , Mycobacterium tuberculosis/genética , Sequenciamento Completo do Genoma/normas , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Polimorfismo de Nucleotídeo Único
16.
PLoS One ; 15(3): e0229763, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32155174

RESUMO

INTRODUCTION: Meta-analysis is a powerful means for leveraging the hundreds of experiments being run worldwide into more statistically powerful analyses. This is also true for the analysis of omic data, including genome-wide DNA methylation. In particular, thousands of DNA methylation profiles generated using the Illumina 450k are stored in the publicly accessible Gene Expression Omnibus (GEO) repository. Often, however, the intensity values produced by the BeadChip (raw data) are not deposited, therefore only pre-processed values -obtained after computational manipulation- are available. Pre-processing is possibly different among studies and may then affect meta-analysis by introducing non-biological sources of variability. MATERIAL AND METHODS: To systematically investigate the effect of pre-processing on meta-analysis, we analysed four different collections of DNA methylation samples (datasets), each composed of two subsets, for which raw data from controls (i.e. healthy subjects) and cases (i.e. patients) are available. We pre-processed the data from each dataset with nine among the most common pipelines found in literature. Moreover, we evaluated the performance of regRCPqn, a modification of the RCP algorithm that aims to improve data consistency. For each combination of pre-processing (9 × 9), we first evaluated the between-sample variability among control subjects and, then, we identified genomic positions that are differentially methylated between cases and controls (differential analysis). RESULTS AND CONCLUSION: The pre-processing of DNA methylation data affects both the between-sample variability and the loci identified as differentially methylated, and the effects of pre-processing are strongly dataset-dependent. By contrast, application of our renormalization algorithm regRCPqn: (i) reduces variability and (ii) increases agreement between meta-analysed datasets, both critical components of data harmonization.


Assuntos
Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala/normas , Metanálise como Assunto , Análise de Sequência de DNA/normas , Animais , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de DNA/métodos , Software/normas
17.
PLoS Comput Biol ; 16(3): e1007531, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32214318

RESUMO

Life scientists are increasingly turning to high-throughput sequencing technologies in their research programs, owing to the enormous potential of these methods. In a parallel manner, the number of core facilities that provide bioinformatics support are also increasing. Notably, the generation of complex large datasets has necessitated the development of bioinformatics support core facilities that aid laboratory scientists with cost-effective and efficient data management, analysis, and interpretation. In this article, we address the challenges-related to communication, good laboratory practice, and data handling-that may be encountered in core support facilities when providing bioinformatics support, drawing on our own experiences working as support bioinformaticians on multidisciplinary research projects. Most importantly, the article proposes a list of guidelines that outline how these challenges can be preemptively avoided and effectively managed to increase the value of outputs to the end user, covering the entire research project lifecycle, including experimental design, data analysis, and management (i.e., sharing and storage). In addition, we highlight the importance of clear and transparent communication, comprehensive preparation, appropriate handling of samples and data using monitoring systems, and the employment of appropriate tools and standard operating procedures to provide effective bioinformatics support.


Assuntos
Biologia Computacional/economia , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Pesquisa Biomédica/economia , Pesquisa Biomédica/métodos , Comunicação , Biologia Computacional/normas , Sequenciamento de Nucleotídeos em Larga Escala/economia , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Projetos de Pesquisa/normas
18.
PLoS One ; 15(3): e0230301, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32176719

RESUMO

The MGISEQ-2000 developed by MGI Tech Co. Ltd. (a subsidiary of the BGI Group) is a new competitor of such next-generation sequencing platforms as NovaSeq and HiSeq (Illumina). Its sequencing principle is based on the DNB and the cPAS technologies, which were also used in the previous version of the BGISEQ-500 device. However, the reagents for MGISEQ-2000 have been refined and the platform utilizes updated software. The cPAS method is an advanced technology based on the cPAL previously created by Complete Genomics. In this paper, the authors compare the results of the whole-genome sequencing of a DNA sample from a Russian female donor performed on MGISEQ-2000 and Illumina HiSeq 2500 (both PE150). Two platforms were compared in terms of sequencing quality, number of errors and performance. Additionally, we performed variant calling using four different software packages: Samtools mpileaup, Strelka2, Sentieon, and GATK. The accuracy of SNP detection was similar in the data generated by MGISEQ-2000 and HiSeq 2500, which was used as a reference. At the same time, a separate indel analysis of the overall error rate revealed similar FPR values and lower sensitivity. It may be concluded with confidence that the data generated by the analyzed sequencing systems is characterized by comparable magnitudes of error and that MGISEQ-2000 and HiSeq 2500 can be used interchangeably for similar tasks like whole genome sequencing.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Sequenciamento Completo do Genoma , Bases de Dados Genéticas , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Polimorfismo de Nucleotídeo Único/genética , Controle de Qualidade , Sequenciamento Completo do Genoma/normas
19.
Zhonghua Yi Xue Yi Chuan Xue Za Zhi ; 37(3): 334-338, 2020 Mar 10.
Artigo em Chinês | MEDLINE | ID: mdl-32128754

RESUMO

Pre-testing preparation is the basis and starting point of genetic testing. The process includes collection of clinical information, formulation of testing scheme, genetic counseling before testing, and completion of informed consent and testing authorization. To effectively identify genetic diseases in clinics can greatly improve the diagnostic rate of next generation sequencing (NGS), thereby reducing medical cost and improving clinical efficacy. The analysis of NGS results relies, to a large extent, on the understanding of genotype-phenotype correlations, therefore it is particularly important to collect and evaluate clinical phenotypes and describe them in uniform standard terms. Different types of genetic diseases or mutations may require specific testing techniques, which can yield twice the result with half the effort. Pre-testing genetic counseling can help patients and their families to understand the significance of relevant genetic testing, formulate individualized testing strategies, and lay a foundation for follow-up.


Assuntos
Doenças Genéticas Inatas/diagnóstico , Testes Genéticos/normas , Sequenciamento de Nucleotídeos em Larga Escala/normas , Consenso , Estudos de Associação Genética , Aconselhamento Genético , Humanos , Mutação
20.
Zhonghua Yi Xue Yi Chuan Xue Za Zhi ; 37(3): 339-344, 2020 Mar 10.
Artigo em Chinês | MEDLINE | ID: mdl-32128755

RESUMO

With high accuracy and precision, next generation sequencing (NGS) has provided a powerful tool for clinical testing of genetic diseases. To follow a standardized experimental procedure is the prerequisite to obtain stable, reliable, and effective NGS data for the assistance of diagnosis and/or screening of genetic diseases. At a conference of genetic testing industry held in Shanghai, May 2019, physicians engaged in the diagnosis and treatment of genetic diseases, experts engaged in clinical laboratory testing of genetic diseases and experts from third-party genetic testing companies have fully discussed the standardization of NGS procedures for the testing of genetic diseases. Experts from different backgrounds have provided opinions for the operation and implementation of NGS testing procedures including sample collection, reception, preservation, library construction, sequencing and data quality control. Based on the discussion, a consensus on the standardization of the testing procedures in NGS laboratories is developed with the aim to standardize NGS testing and accelerate implementation of NGS in clinical settings across China.


Assuntos
Doenças Genéticas Inatas/diagnóstico , Testes Genéticos/normas , Sequenciamento de Nucleotídeos em Larga Escala/normas , China , Consenso , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA