Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 637
Filtrar
1.
PLoS One ; 15(7): e0235861, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32706774

RESUMO

BACKGROUND: To support the rising need for testing and to standardize tumor DNA sequencing practices within the U.S. Department of Veterans Affairs (VA)'s Veterans Health Administration (VHA), the National Precision Oncology Program (NPOP) was launched in 2016. We sought to assess oncologists' practices, concerns, and perceptions regarding Next-Generation Sequencing (NGS) and the NPOP. MATERIALS AND METHODS: Using a purposive total sampling approach, oncologists who had previously ordered NGS for at least one tumor sample through the NPOP were invited to participate in semi-structured interviews. Questions assessed the following: expectations for the NPOP, procedural requirements, applicability of testing results, and the summative utility of the NPOP. Interviews were assessed using an open coding approach. Thematic analysis was conducted to evaluate the completed codebook. Themes were defined deductively by reviewing the direct responses to interview questions as well as inductively by identifying emerging patterns of data. RESULTS: Of the 105 medical oncologists who were invited to participate, 20 (19%) were interviewed from 19 different VA medical centers in 14 states. Five recurrent themes were observed: (1) Educational Efforts Regarding Tumor DNA Sequencing Should be Undertaken, (2) Pathology Departments Share a Critical Role in Facilitating Test Completion, (3) Tumor DNA Sequencing via NGS Serves as the Most Comprehensive Testing Modality within Precision Oncology, (4) The Availability of the NPOP Has Expanded Options for Select Patients, and (5) The Completion of Tumor DNA Sequencing through the NPOP Could Help Improve Research Efforts within VHA Oncology Practices. CONCLUSION: Medical oncologists believe that the availability of tumor DNA sequencing through the NPOP could potentially lead to an improvement in outcomes for veterans with metastatic solid tumors. Efforts should be directed toward improving oncologists' understanding of sequencing, strengthening collaborative relationships between oncologists and pathologists, and assessing the role of comprehensive NGS panels within the battery of precision tests.


Assuntos
Conhecimentos, Atitudes e Prática em Saúde , Sequenciamento de Nucleotídeos em Larga Escala/normas , Neoplasias/genética , Oncologistas/psicologia , Análise de Sequência de DNA/normas , United States Department of Veterans Affairs , Adulto , Detecção Precoce de Câncer/normas , Feminino , Testes Genéticos/normas , Humanos , Masculino , Pessoa de Meia-Idade , Neoplasias/diagnóstico , Medicina de Precisão/normas , Planos Governamentais de Saúde , Inquéritos e Questionários , Estados Unidos
2.
Proc Natl Acad Sci U S A ; 117(29): 16961-16968, 2020 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-32641514

RESUMO

Alignment-free classification tools have enabled high-throughput processing of sequencing data in many bioinformatics analysis pipelines primarily due to their computational efficiency. Originally k-mer based, such tools often lack sensitivity when faced with sequencing errors and polymorphisms. In response, some tools have been augmented with spaced seeds, which are capable of tolerating mismatches. However, spaced seeds have seen little practical use in classification because they bring increased computational and memory costs compared to methods that use k-mers. These limitations have also caused the design and length of practical spaced seeds to be constrained, since storing spaced seeds can be costly. To address these challenges, we have designed a probabilistic data structure called a multiindex Bloom Filter (miBF), which can store multiple spaced seed sequences with a low memory cost that remains static regardless of seed length or seed design. We formalize how to minimize the false-positive rate of miBFs when classifying sequences from multiple targets or references. Available within BioBloom Tools, we illustrate the utility of miBF in two use cases: read-binning for targeted assembly, and taxonomic read assignment. In our benchmarks, an analysis pipeline based on miBF shows higher sensitivity and specificity for read-binning than sequence alignment-based methods, also executing in less time. Similarly, for taxonomic classification, miBF enables higher sensitivity than a conventional spaced seed-based approach, while using half the memory and an order of magnitude less computational time.


Assuntos
Análise de Sequência de DNA/métodos , Software , Animais , Pareamento Incorreto de Bases , Humanos , Filogenia , Alinhamento de Sequência , Análise de Sequência de DNA/normas
3.
Cell Syst ; 11(2): 131-144.e6, 2020 08 26.
Artigo em Inglês | MEDLINE | ID: mdl-32721383

RESUMO

We present a combinatorial machine learning method to evaluate and optimize peptide vaccine formulations for SARS-CoV-2. Our approach optimizes the presentation likelihood of a diverse set of vaccine peptides conditioned on a target human-population HLA haplotype distribution and expected epitope drift. Our proposed SARS-CoV-2 MHC class I vaccine formulations provide 93.21% predicted population coverage with at least five vaccine peptide-HLA average hits per person (≥ 1 peptide: 99.91%) with all vaccine peptides perfectly conserved across 4,690 geographically sampled SARS-CoV-2 genomes. Our proposed MHC class II vaccine formulations provide 97.21% predicted coverage with at least five vaccine peptide-HLA average hits per person with all peptides having an observed mutation probability of ≤ 0.001. We provide an open-source implementation of our design methods (OptiVax), vaccine evaluation tool (EvalVax), as well as the data used in our design efforts here: https://github.com/gifford-lab/optivax.


Assuntos
Betacoronavirus/imunologia , Haplótipos , Antígenos de Histocompatibilidade Classe II/genética , Antígenos de Histocompatibilidade Classe I/genética , Análise de Sequência de DNA/métodos , Vacinas de Subunidades/imunologia , Vacinas Virais/imunologia , Betacoronavirus/genética , Infecções por Coronavirus/genética , Infecções por Coronavirus/imunologia , Infecções por Coronavirus/prevenção & controle , Epitopos/química , Epitopos/genética , Epitopos/imunologia , Antígenos de Histocompatibilidade Classe I/química , Antígenos de Histocompatibilidade Classe I/imunologia , Antígenos de Histocompatibilidade Classe II/química , Antígenos de Histocompatibilidade Classe II/imunologia , Humanos , Aprendizado de Máquina , Análise de Sequência de DNA/normas , Vacinas de Subunidades/química , Vacinas de Subunidades/genética , Vacinas Virais/química , Vacinas Virais/genética
4.
J Med Microbiol ; 69(5): 712-720, 2020 May.
Artigo em Inglês | MEDLINE | ID: mdl-32368996

RESUMO

Introduction. Given the limited number of candidaemia studies in Iran, the profile of yeast species causing bloodstream infections (BSIs), especially in adults, remains limited. Although biochemical assays are widely used in developing countries, they produce erroneous results, especially for rare yeast species.Aim. We aimed to assess the profile of yeast species causing BSIs and to compare the accuracy of the Vitek 2 system and 21-plex PCR.Methodology. Yeast blood isolates were retrospectively collected from patients recruited from two tertiary care training hospitals in Tehran from 2015 to 2017. Relevant clinical data were mined. Identification was performed by automated Vitek 2, 21-plex PCR and sequencing of the internal transcribed spacer region (ITS1-5.8S-ITS2).Results. In total, 137 yeast isolates were recovered from 107 patients. The overall all-cause 30-day mortality rate was 47.7 %. Fluconazole was the most widely used systemic antifungal. Candida albicans (58/137, 42.3 %), Candida glabrata (30/137, 21.9 %), Candida parapsilosis sensu stricto (23/137, 16.8 %), Candida tropicalis (10/137, 7.3 %) and Pichia kudriavzevii (Candida krusei) (4/137, 2.9 %) constituted almost 90 % of the isolates and 10 % of the species detected were rare yeast species (12/137; 8.7 %). The 21-plex PCR method correctly identified 97.1 % of the isolates, a higher percentage than the Vitek 2 showed (87.6 %).Conclusion. C. albicans was the main cause of yeast-derived fungaemia in this study. Future prospective studies are warranted to closely monitor the epidemiological landscape of yeast species causing BSIs in Iran. The superiority of 21-plex PCR over automated Vitek 2 indicates its potential clinical utility as an alternative identification tool use in developing countries.


Assuntos
Fungemia/diagnóstico , Fungemia/epidemiologia , Fungemia/microbiologia , Reação em Cadeia da Polimerase Multiplex , Análise de Sequência de DNA , Leveduras/classificação , Leveduras/genética , Idoso , Idoso de 80 Anos ou mais , DNA Intergênico , Feminino , Fungemia/história , História do Século XXI , Humanos , Irã (Geográfico)/epidemiologia , Masculino , Pessoa de Meia-Idade , Reação em Cadeia da Polimerase Multiplex/métodos , Reação em Cadeia da Polimerase Multiplex/normas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas
5.
PLoS One ; 15(3): e0229763, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32155174

RESUMO

INTRODUCTION: Meta-analysis is a powerful means for leveraging the hundreds of experiments being run worldwide into more statistically powerful analyses. This is also true for the analysis of omic data, including genome-wide DNA methylation. In particular, thousands of DNA methylation profiles generated using the Illumina 450k are stored in the publicly accessible Gene Expression Omnibus (GEO) repository. Often, however, the intensity values produced by the BeadChip (raw data) are not deposited, therefore only pre-processed values -obtained after computational manipulation- are available. Pre-processing is possibly different among studies and may then affect meta-analysis by introducing non-biological sources of variability. MATERIAL AND METHODS: To systematically investigate the effect of pre-processing on meta-analysis, we analysed four different collections of DNA methylation samples (datasets), each composed of two subsets, for which raw data from controls (i.e. healthy subjects) and cases (i.e. patients) are available. We pre-processed the data from each dataset with nine among the most common pipelines found in literature. Moreover, we evaluated the performance of regRCPqn, a modification of the RCP algorithm that aims to improve data consistency. For each combination of pre-processing (9 × 9), we first evaluated the between-sample variability among control subjects and, then, we identified genomic positions that are differentially methylated between cases and controls (differential analysis). RESULTS AND CONCLUSION: The pre-processing of DNA methylation data affects both the between-sample variability and the loci identified as differentially methylated, and the effects of pre-processing are strongly dataset-dependent. By contrast, application of our renormalization algorithm regRCPqn: (i) reduces variability and (ii) increases agreement between meta-analysed datasets, both critical components of data harmonization.


Assuntos
Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala/normas , Metanálise como Assunto , Análise de Sequência de DNA/normas , Animais , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de DNA/métodos , Software/normas
6.
PLoS One ; 15(2): e0228899, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32053657

RESUMO

Microorganisms are ubiquitous in the biosphere, playing a crucial role in both biogeochemistry of the planet and human health. However, identifying these microorganisms and defining their function are challenging. Widely used approaches in comparative metagenomics, 16S amplicon sequencing and whole genome shotgun sequencing (WGS), have provided access to DNA sequencing analysis to identify microorganisms and evaluate diversity and abundance in various environments. However, advances in parallel high-throughput DNA sequencing in the past decade have introduced major hurdles, namely standardization of methods, data storage, reproducible interoperability of results, and data sharing. The National Ecological Observatory Network (NEON), established by the National Science Foundation, enables all researchers to address queries on a regional to continental scale around a variety of environmental challenges and provide high-quality, integrated, and standardized data from field sites across the U.S. As the amount of metagenomic data continues to grow, standardized procedures that allow results across projects to be assessed and compared is becoming increasingly important in the field of metagenomics. We demonstrate the feasibility of using publicly available NEON soil metagenomic sequencing datasets in combination with open access Metagenomics Rapid Annotation using the Subsystem Technology (MG-RAST) server to illustrate advantages of WGS compared to 16S amplicon sequencing. Four WGS and four 16S amplicon sequence datasets, from surface soil samples prepared by NEON investigators, were selected for comparison, using standardized protocols collected at the same locations in Colorado between April-July 2014. The dominant bacterial phyla detected across samples agreed between sequencing methodologies. However, WGS yielded greater microbial resolution, increased accuracy, and allowed identification of more genera of bacteria, archaea, viruses, and eukaryota, and putative functional genes that would have gone undetected using 16S amplicon sequencing. NEON open data will be useful for future studies characterizing and quantifying complex ecological processes associated with changing aquatic and terrestrial ecosystems.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/normas , Metagenômica/métodos , Análise de Sequência de DNA/métodos , Archaea/genética , Bactérias/genética , DNA Bacteriano/genética , Bases de Dados Genéticas/normas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenoma , RNA Ribossômico 16S/genética , Análise de Sequência de DNA/normas , Análise de Sequência de RNA/métodos , Análise de Sequência de RNA/normas , Solo , Sequenciamento Completo do Genoma/métodos , Sequenciamento Completo do Genoma/normas
7.
PLoS One ; 15(1): e0227275, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31923209

RESUMO

The aim of this work was to determine current cagA gene EPIYA motifs present in Colombian Helicobacter pylori isolates using a fast and reliable molecular test. DNA from eighty-five Helicobacter pylori-cagA positive strains were analyzed. Strains were obtained from patients diagnosed with functional dyspepsia at Clínica Fundadores in Bogotá. The 3' region of the cagA gene was amplified through conventional Polymerase Chain Reaction (PCR). Obtained amplicons were sequenced using the Sanger method and analyzed with bioinformatics tools. Additionally, a significant Spearman correlation coefficient was determined between the patients' age and the number of EPIYA-C repeats; with p values < 0.05 considered significant. Estimates were obtained using a 95% CI. The 3´ variable region of the cagA gene was amplified and PCR products of the following sizes corresponded to the following EPIYA motifs: 400 bp: EPIYA AB, 500 bp: EPIYA ABC, 600 bp: EPIYA ABCC and 700 bp: ABCCC. A single PCR band was observed for 58 out of 85 Helicobacter pylori isolates, with an EPIYA distribution motif as follows: 7/85 AB (8.2%), 34/85 ABC (40%), 26/85 ABCC (30.6%) and 18/85 ABCCC (21.2%). However, in 27 out of 85 Helicobacter pylori isolates, two or more bands were observed, where the most predominant cagA genotype were ABC-ABCC (26%, 7/27) and ABCC-ABCCC (22.2%, 6/27). A direct proportionality between the number of EPIYA-C repeats and an increase in the patients' age was observed, finding a greater number of EPIYA ABCC and ABCCC repeats in the population over 50 years old. All isolates were of the Western cagA type and 51.8% of them were found to have multiple EPIYA-C repeats. These standardized molecular test allowed to identify the number of EPIYA C motifs based on band size.


Assuntos
Motivos de Aminoácidos/genética , Antígenos de Bactérias/genética , Proteínas de Bactérias/genética , Testes Diagnósticos de Rotina/normas , Genes Bacterianos/genética , Infecções por Helicobacter/diagnóstico , Infecções por Helicobacter/epidemiologia , Helicobacter pylori/genética , Adulto , Idoso , Colômbia/epidemiologia , DNA Bacteriano/genética , Dispepsia/microbiologia , Feminino , Genótipo , Infecções por Helicobacter/microbiologia , Humanos , Masculino , Pessoa de Meia-Idade , Reação em Cadeia da Polimerase/normas , Sequências Repetitivas de Ácido Nucleico/genética , Análise de Sequência de DNA/normas
8.
Immunogenetics ; 71(10): 647-663, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31761978

RESUMO

The classical class I and class II molecules of the major histocompatibility complex (MHC) play crucial roles in immune responses to infectious pathogens and vaccines as well as being important for autoimmunity, allergy, cancer and reproduction. These classical MHC genes are the most polymorphic known, with roughly 10,000 alleles in humans. In chickens, the MHC (also known as the BF-BL region) determines decisive resistance and susceptibility to infectious pathogens, but relatively few MHC alleles and haplotypes have been described in any detail. We describe a typing protocol for classical chicken class I (BF) and class II B (BLB) genes based on a hybridization method called reference strand-mediated conformational analysis (RSCA). We optimize the various steps, validate the analysis using well-characterized chicken MHC haplotypes, apply the system to type some experimental lines and discover a new chicken class I allele. This work establishes a basis for typing the MHC genes of chickens worldwide and provides an opportunity to correlate with microsatellite and with single nucleotide polymorphism (SNP) typing for approaches involving imputation.


Assuntos
Genes MHC da Classe II/genética , Genes MHC Classe I/genética , Hibridização de Ácido Nucleico/métodos , Polimorfismo Genético , Análise de Sequência de DNA/normas , Animais , Galinhas , Polimorfismo Conformacional de Fita Simples , Padrões de Referência , Análise de Sequência de DNA/métodos
9.
Virol J ; 16(1): 140, 2019 11 21.
Artigo em Inglês | MEDLINE | ID: mdl-31752912

RESUMO

BACKGROUND: Next generation sequencing (NGS) is becoming widely used among diagnostics and research laboratories, and nowadays it is applied to a variety of disciplines, including veterinary virology. The NGS workflow comprises several steps, namely sample processing, library preparation, sequencing and primary/secondary/tertiary bioinformatics (BI) analyses. The latter is constituted by a complex process extremely difficult to standardize, due to the variety of tools and metrics available. Thus, it is of the utmost importance to assess the comparability of results obtained through different methods and in different laboratories. To achieve this goal, we have organized a proficiency test focused on the bioinformatics components for the generation of complete genome sequences of salmonid rhabdoviruses. METHODS: Three partners, that performed virus sequencing using different commercial library preparation kits and NGS platforms, gathered together and shared with each other 75 raw datasets which were analyzed separately by the participants to produce a consensus sequence according to their own bioinformatics pipeline. Results were then compared to highlight discrepancies, and a subset of inconsistencies were investigated more in detail. RESULTS: In total, we observed 526 discrepancies, of which 39.5% were located at genome termini, 14.1% at intergenic regions and 46.4% at coding regions. Among these, 10 SNPs and 99 indels caused changes in the protein products. Overall reproducibility was 99.94%. Based on the analysis of a subset of inconsistencies investigated more in-depth, manual curation appeared the most critical step affecting sequence comparability, suggesting that the harmonization of this phase is crucial to obtain comparable results. The analysis of a calibrator sample allowed assessing BI accuracy, being 99.983%. CONCLUSIONS: We demonstrated the applicability and the usefulness of BI proficiency testing to assure the quality of NGS data, and recommend a wider implementation of such exercises to guarantee sequence data uniformity among different virology laboratories.


Assuntos
Biologia Computacional/métodos , Biologia Computacional/normas , Sequenciamento de Nucleotídeos em Larga Escala/normas , Vírus da Necrose Hematopoética Infecciosa/genética , Novirhabdovirus/genética , Análise de Sequência de DNA/normas , Animais , Peixes , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Controle de Qualidade , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodos
10.
Adv Exp Med Biol ; 1168: 103-115, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31713167

RESUMO

The past two decades have seen unprecedented advances in the field of oncogenomics. The ongoing characterization of neoplastic tissues through genomic techniques has transformed many aspects of cancer research, diagnosis, and treatment. However, identifying sequence variants with biological and clinical significance is a challenging endeavor. In order to accomplish this task, variants must be annotated and interpreted using various online resources. Data on protein structure, functional prediction, variant frequency in relevant populations, and multiple other factors have been compiled in useful databases for this purpose. Thus, understanding the available online resources for the annotation and interpretation of sequence variants is critical to aid molecular pathologists and researchers working in this space.


Assuntos
Bases de Dados Genéticas , Privacidade Genética , Neoplasias , Farmacogenética , Privacidade Genética/tendências , Variação Genética , Recursos em Saúde , Humanos , Internet , Neoplasias/fisiopatologia , Neoplasias/terapia , Análise de Sequência de DNA/normas , Análise de Sequência de DNA/tendências
11.
Pediatrics ; 144(6)2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31719124

RESUMO

The BabySeq Project is a study funded by the National Institutes of Health and aimed at exploring the medical, behavioral, and economic impacts of integrating genomic sequencing into the care of both healthy newborns and newborns who are sick. Infants were randomly assigned to receive standard of care or standard of care plus sequencing. The protocol and consent specified that only childhood-onset conditions would be returned. When 1 child was found to carry a BRCA2 mutation despite a negative family history, the research team experienced moral distress about nondisclosure and sought institutional review board permission to disclose. The protocol was then modified to require participants to agree to receive results for adult-onset-only conditions as a precondition to study enrollment. The BabySeq team asserted that their new protocol was in the child's best interest because having one's parents alive and well provides both an individual child benefit and a "family benefit." We begin with a short description of BabySeq and the controversy regarding predictive genetic testing of children for adult-onset conditions. We then examine the ethical problems with (1) the revised BabySeq protocol and (2) the concept of family benefit as a justification for the return of adult-onset-only conditions. We reject family benefit as a moral reason to expand genomic sequencing of children beyond conditions that present in childhood. We also argue that researchers should design their pediatric studies to avoid, when possible, identifying adult-onset-only genetic variants and that parents should not be offered the return of this information if discovered unless relevant for the child's current or imminent health.


Assuntos
Testes Genéticos/ética , Triagem Neonatal/ética , Triagem Neonatal/psicologia , Pais/psicologia , Sequenciamento Completo do Exoma/ética , Testes Genéticos/normas , Humanos , Recém-Nascido , Triagem Neonatal/normas , Análise de Sequência de DNA/ética , Análise de Sequência de DNA/normas , Sequenciamento Completo do Exoma/normas
12.
BMC Bioinformatics ; 20(1): 474, 2019 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-31521109

RESUMO

BACKGROUND: In most mammals, a vast array of genes coding for chemosensory receptors mediates olfaction. Odorant receptor (OR) genes generally constitute the largest multifamily (> 1100 intact members in the mouse). From the whole pool, each olfactory neuron expresses a single OR allele following poorly characterized mechanisms termed OR gene choice. OR genes are found in genomic aggregations known as clusters. Nearby enhancers, named elements, are crucial regulators of OR gene choice. Despite their importance, searching for new elements is burdensome. Other chemosensory receptor genes responsible for smell adhere to expression modalities resembling OR gene choice, and are arranged in genomic clusters - often with chromosomal linkage to OR genes. Still, no elements are known for them. RESULTS: Here we present an inexpensive framework aimed at predicting elements. We redefine cluster identity by focusing on multiple receptor gene families at once, and exemplify thirty - not necessarily OR-exclusive - novel candidate enhancers. CONCLUSIONS: The pipeline we introduce could guide future in vivo work aimed at discovering/validating new elements. In addition, our study provides an updated and comprehensive classification of all genomic loci responsible for the transduction of olfactory signals in mammals.


Assuntos
Algoritmos , Elementos Facilitadores Genéticos , Genômica/métodos , Receptores Odorantes/genética , Análise de Sequência de DNA/normas , Animais , Humanos , Camundongos , Ratos
13.
DNA Res ; 26(5): 391-398, 2019 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-31364694

RESUMO

In bacterial genome and metagenome sequencing, Illumina sequencers are most frequently used due to their high throughput capacity, and multiple library preparation kits have been developed for Illumina platforms. Here, we systematically analysed and compared the sequencing bias generated by currently available library preparation kits for Illumina sequencing. Our analyses revealed that a strong sequencing bias is introduced in low-GC regions by the Nextera XT kit. The level of bias introduced is dependent on the level of GC content; stronger bias is generated as the GC content decreases. Other analysed kits did not introduce this strong sequencing bias. The GC content-associated sequencing bias introduced by Nextera XT was more remarkable in metagenome sequencing of a mock bacterial community and seriously affected estimation of the relative abundance of low-GC species. The results of our analyses highlight the importance of selecting proper library preparation kits according to the purposes and targets of sequencing, particularly in metagenome sequencing, where a wide range of microbial species with various degrees of GC content is present. Our data also indicate that special attention should be paid to which library preparation kit was used when analysing and interpreting publicly available metagenomic data.


Assuntos
Bactérias/genética , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala/normas , Metagenoma , Análise de Sequência de DNA/normas , Composição de Bases , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
14.
Genes (Basel) ; 10(9)2019 08 28.
Artigo em Inglês | MEDLINE | ID: mdl-31466333

RESUMO

Genotype imputation, where missing genotypes can be computationally imputed, is an essential tool in genomic analysis ranging from genome wide associations to phenotype prediction. Traditional genotype imputation methods are typically based on haplotype-clustering algorithms, hidden Markov models (HMMs), and statistical inference. Deep learning-based methods have been recently reported to suitably address the missing data problems in various fields. To explore the performance of deep learning for genotype imputation, in this study, we propose a deep model called a sparse convolutional denoising autoencoder (SCDA) to impute missing genotypes. We constructed the SCDA model using a convolutional layer that can extract various correlation or linkage patterns in the genotype data and applying a sparse weight matrix resulted from the L1 regularization to handle high dimensional data. We comprehensively evaluated the performance of the SCDA model in different scenarios for genotype imputation on the yeast and human genotype data, respectively. Our results showed that SCDA has strong robustness and significantly outperforms popular reference-free imputation methods. This study thus points to another novel application of deep learning models for missing data imputation in genomic studies.


Assuntos
Técnicas de Genotipagem/métodos , Análise de Sequência de DNA/métodos , Software , Genoma Fúngico , Genoma Humano , Genótipo , Técnicas de Genotipagem/normas , Humanos , Saccharomyces cerevisiae , Análise de Sequência de DNA/normas , Razão Sinal-Ruído
15.
Genes (Basel) ; 10(8)2019 07 25.
Artigo em Inglês | MEDLINE | ID: mdl-31349684

RESUMO

Current high-throughput sequencing technologies can generate sequence data and provide information on the genetic composition of samples at very high coverage. Deep sequencing approaches enable the detection of rare variants in heterogeneous samples, such as viral quasi-species, but also have the undesired effect of amplifying sequencing errors and artefacts. Distinguishing real variants from such noise is not straightforward. Variant callers that can handle pooled samples can be in trouble at extremely high read depths, while at lower depths sensitivity is often sacrificed to specificity. In this paper, we propose SiNPle (Simplified Inference of Novel Polymorphisms from Large coveragE), a fast and effective software for variant calling. SiNPle is based on a simplified Bayesian approach to compute the posterior probability that a variant is not generated by sequencing errors or PCR artefacts. The Bayesian model takes into consideration individual base qualities as well as their distribution, the baseline error rates during both the sequencing and the PCR stage, the prior distribution of variant frequencies and their strandedness. Our approach leads to an approximate but extremely fast computation of posterior probabilities even for very high coverage data, since the expression for the posterior distribution is a simple analytical formula in terms of summary statistics for the variants appearing at each site in the genome. These statistics can be used to filter out putative SNPs and indels according to the required level of sensitivity. We tested SiNPle on several simulated and real-life viral datasets to show that it is faster and more sensitive than existing methods. The source code for SiNPle is freely available to download and compile, or as a Conda/Bioconda package.


Assuntos
Técnicas de Genotipagem/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Software , DNA Viral/genética , Técnicas de Genotipagem/normas , Sequenciamento de Nucleotídeos em Larga Escala/normas , Sensibilidade e Especificidade , Análise de Sequência de DNA/normas
16.
Gene Ther ; 26(7-8): 338-346, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31296934

RESUMO

Gene doping confers health risks for athletes and is a threat to fair competition in sports. Therefore the anti-doping community has given attention on its detection. Previously published polymerase chain reaction-based methodologies for gene doping detection are targeting exon-exon junctions in the intron-less transgene. However, because these junctions are known, it would be relatively easy to evade detection by tampering with the copyDNA sequences. We have developed a targeted next-generation sequencing based assay for the detection of all exon-exon junctions of the potential doping genes, EPO, IGF1, IGF2, GH1, and GH2, which is resistant to tampering. Using this assay, all exon-exon junctions of copyDNA of doping genes could be detected with a sensitivity of 1296 copyDNA copies in 1000 ng of genomic DNA. In addition, promotor regions and plasmid-derived sequences are readily detectable in our sequence data. While we show the reliability of our method for a selection of genes, expanding the panel to detect other genes would be straightforward. As we were able to detect plasmid-derived sequences, we expect that genes with manipulated junctions, promotor regions, and plasmid or virus-derived sequences will also be readily detected.


Assuntos
Doping nos Esportes/métodos , Testes Genéticos/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Plasmídeos/genética , Análise de Sequência de DNA/métodos , Transgenes , Eritropoetina/genética , Eritropoetina/metabolismo , Éxons , Testes Genéticos/normas , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Peptídeos e Proteínas de Sinalização Intercelular/genética , Peptídeos e Proteínas de Sinalização Intercelular/metabolismo , Plasmídeos/metabolismo , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise de Sequência de DNA/normas
17.
Gigascience ; 8(7)2019 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-31289836

RESUMO

BACKGROUND: Mammalian X and Y chromosomes share a common evolutionary origin and retain regions of high sequence similarity. Similar sequence content can confound the mapping of short next-generation sequencing reads to a reference genome. It is therefore possible that the presence of both sex chromosomes in a reference genome can cause technical artifacts in genomic data and affect downstream analyses and applications. Understanding this problem is critical for medical genomics and population genomic inference. RESULTS: Here, we characterize how sequence homology can affect analyses on the sex chromosomes and present XYalign, a new tool that (1) facilitates the inference of sex chromosome complement from next-generation sequencing data; (2) corrects erroneous read mapping on the sex chromosomes; and (3) tabulates and visualizes important metrics for quality control such as mapping quality, sequencing depth, and allele balance. We find that sequence homology affects read mapping on the sex chromosomes and this has downstream effects on variant calling. However, we show that XYalign can correct mismapping, resulting in more accurate variant calling. We also show how metrics output by XYalign can be used to identify XX and XY individuals across diverse sequencing experiments, including low- and high-coverage whole-genome sequencing, and exome sequencing. Finally, we discuss how the flexibility of the XYalign framework can be leveraged for other uses including the identification of aneuploidy on the autosomes. XYalign is available open source under the GNU General Public License (version 3). CONCLUSIONS: Sex chromsome sequence homology causes the mismapping of short reads, which in turn affects downstream analyses. XYalign provides a reproducible framework to correct mismapping and improve variant calling on the sex chromsomes.


Assuntos
Cromossomos Humanos X/genética , Cromossomos Humanos Y/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Homologia de Sequência do Ácido Nucleico , Artefatos , Mapeamento de Sequências Contíguas/métodos , Mapeamento de Sequências Contíguas/normas , Feminino , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Masculino , Alinhamento de Sequência/métodos , Alinhamento de Sequência/normas , Análise de Sequência de DNA/normas
18.
Gigascience ; 8(7)2019 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-31251324

RESUMO

Biclustering is a technique of discovering local similarities within data. For many years the complexity of the methods and parallelization issues limited its application to big data problems. With the development of novel scalable methods, biclustering has finally started to close this gap. In this paper we discuss the caveats of biclustering and present its current challenges and guidelines for practitioners. We also try to explain why biclustering may soon become one of the standards for big data analytics.


Assuntos
Big Data , Genômica/métodos , Análise de Sequência de DNA/métodos , Análise por Conglomerados , Mineração de Dados/métodos , Genoma Humano , Genômica/normas , Humanos , Alinhamento de Sequência/métodos , Alinhamento de Sequência/normas , Análise de Sequência de DNA/normas , Software
19.
Hum Genet ; 138(7): 757-769, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31168775

RESUMO

An ethnicity is characterized by genomic fragments, single nucleotide polymorphisms (SNPs), and structural variations specific to it. However, the widely used 'standard human reference genome' GRCh37/38 is based on Caucasians. Therefore, de novo-assembled reference genomes for specific ethnicities would have advantages for genetics and precision medicine applications, especially with the long-read sequencing techniques that facilitate genome assembly. In this study, we assessed the de novo-assembled Chinese Han reference genome HX1 vis-à-vis the standard GRCh38 for improving the quality of assembly and for ethnicity-specific applications. Surprisingly, all genomic sequencing datasets mapped better to GRCh38 than to HX1, even for the datasets of the Chinese Han population. This gap was mainly due to the massive structural misassembly of the HX1 reference genome rather than the SNPs between the ethnicities, and this misassembly could not be corrected by short-read whole-genome sequencing (WGS). For example, HX1 and the other de novo-assembled personal genomes failed to assemble the mitochondrial genome as a contig. We mapped 97.1% of dbSNP, 98.8% of ClinVar, and 97.2% of COSMIC variants to HX1. HX1-absent, non-synonymous ClinVar SNPs were involved in 140 genes and many important functions in various diseases, most of which were due to the assembly failure of essential exons. In contrast, the HX1-specific regions were scantly expressible, as shown in the cell lines and clinical samples of Chinese patients. Our results demonstrated that the de novo-assembled individual genome such as HX1 did not have advantages against the standard GRCh38 genome due to insufficient assembly quality, and that it is, therefore, not recommended for common use.


Assuntos
Grupo com Ancestrais do Continente Asiático/genética , Grupos Étnicos/genética , Genoma Humano , Genômica/normas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Padrões de Referência , Análise de Sequência de DNA/normas , Algoritmos , Mapeamento de Sequências Contíguas , Genética Populacional , Humanos , Polimorfismo de Nucleotídeo Único , Transcriptoma
20.
Western Pac Surveill Response J ; 10(1): 32-38, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31110840

RESUMO

Introduction: There are two methods of reverse transcription polymerase chain reaction (RT-PCR) that have been the common methods to detect influenza infections: conventional and real-time RT-PCR. From December 2017 to March 2018, several missed diagnoses of influenza A(H1)pdm09 using real-time RT-PCR were reported in northern Viet Nam. This study investigated how these missed detections occurred to determine their effect on the surveillance of influenza. Methods: The haemagglutinin (HA) segments of A(H1N1)pdm09 from both real-time RT-PCR positive and negative samples were isolated and sequenced. The primer and probe sets in the HA gene were checked for mismatches, and phylogenetic analyses were performed to determine the molecular epidemiology of these viruses. Results: There were 86 positive influenza A samples; 32 were A(H1)pdm09 positive by conventional RT-PCR but were negative by real-time RT-PCR. Sequencing was conducted on 23 influenza (H1N1)pdm09 isolates that were recovered from positive samples. Eight of these were negative for A(H1)pdm09 by real-time RT-PCR. There were two different mismatches in the probe target sites of the HA gene sequences of all isolates (n = 23) with additional mismatches only at position 7 (template binding site) identified for all eight negative real-time RT-PCR isolates. The prime target sites had no mismatches. Phylogenetic analysis of the HA gene showed that both the positive and negative real-time RT-PCR isolates were grouped in clade 6B.1; however, the real-time RT-PCR negative viruses were located in a subgroup that referred to substitution I295V. Conclusion: Constant monitoring of genetic changes in the circulating influenza A(H1N1)pdm09 viruses is important for maintaining the sensitivity of molecular detection assays.


Assuntos
Diagnóstico Tardio/tendências , Influenza Humana/diagnóstico , Análise de Sequência de DNA/normas , Testes de Hemaglutinação/métodos , Hemaglutininas/análise , Hemaglutininas/genética , Humanos , Vírus da Influenza A Subtipo H1N1/genética , Vírus da Influenza A Subtipo H1N1/patogenicidade , Influenza Humana/epidemiologia , Influenza Humana/mortalidade , Mutação/genética , Filogenia , Reação em Cadeia da Polimerase Via Transcriptase Reversa/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/tendências , Vietnã
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA