Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
J Biol Chem ; 293(30): 11687-11708, 2018 07 27.
Artículo en Inglés | MEDLINE | ID: mdl-29773649

RESUMEN

HIV-1 subtype C (HIV-1C) may duplicate longer amino acid stretches in the p6 Gag protein, leading to the creation of an additional Pro-Thr/Ser-Ala-Pro (PTAP) motif necessary for viral packaging. However, the biological significance of a duplication of the PTAP motif for HIV-1 replication and pathogenesis has not been experimentally validated. In a longitudinal study of two different clinical cohorts of select HIV-1 seropositive, drug-naive individuals from India, we found that 8 of 50 of these individuals harbored a mixed infection of viral strains discordant for the PTAP duplication. Conventional and next-generation sequencing of six primary viral quasispecies at multiple time points disclosed that in a mixed infection, the viral strains containing the PTAP duplication dominated the infection. The dominance of the double-PTAP viral strains over a genetically similar single-PTAP viral clone was confirmed in viral proliferation and pairwise competition assays. Of note, in the proximity ligation assay, double-PTAP Gag proteins exhibited a significantly enhanced interaction with the host protein tumor susceptibility gene 101 (Tsg101). Moreover, Tsg101 overexpression resulted in a biphasic effect on HIV-1C proliferation, an enhanced effect at low concentration and an inhibitory effect only at higher concentrations, unlike a uniformly inhibitory effect on subtype B strains. In summary, our results indicate that the duplication of the PTAP motif in the p6 Gag protein enhances the replication fitness of HIV-1C by engaging the Tsg101 host protein with a higher affinity. Our results have implications for HIV-1 pathogenesis, especially of HIV-1C.


Asunto(s)
Proteínas de Unión al ADN/metabolismo , Complejos de Clasificación Endosomal Requeridos para el Transporte/metabolismo , Infecciones por VIH/metabolismo , Infecciones por VIH/virología , VIH-1/fisiología , Factores de Transcripción/metabolismo , Replicación Viral , Productos del Gen gag del Virus de la Inmunodeficiencia Humana/metabolismo , Adulto , Secuencias de Aminoácidos , Células Cultivadas , Proteínas de Unión al ADN/genética , Complejos de Clasificación Endosomal Requeridos para el Transporte/genética , Femenino , Infecciones por VIH/genética , VIH-1/química , VIH-1/genética , Interacciones Huésped-Patógeno , Humanos , Estudios Longitudinales , Masculino , Persona de Mediana Edad , Mapas de Interacción de Proteínas , Factores de Transcripción/genética , Productos del Gen gag del Virus de la Inmunodeficiencia Humana/química , Productos del Gen gag del Virus de la Inmunodeficiencia Humana/genética
2.
BMC Bioinformatics ; 16: 17, 2015 Jan 28.
Artículo en Inglés | MEDLINE | ID: mdl-25626454

RESUMEN

BACKGROUND: Next-generation sequencing (NGS) is rapidly becoming common practice in clinical diagnostics and cancer research. In addition to the detection of single nucleotide variants (SNVs), information on copy number variants (CNVs) is of great interest. Several algorithms exist to detect CNVs by analyzing whole genome sequencing data or data from samples enriched by hybridization-capture. PCR-enriched amplicon-sequencing data have special characteristics that have been taken into account by only one publicly available algorithm so far. RESULTS: We describe a new algorithm named quandico to detect copy number differences based on NGS data generated following PCR-enrichment. A weighted t-test statistic was applied to calculate probabilities (p-values) of copy number changes. We assessed the performance of the method using sequencing reads generated from reference DNA with known CNVs, and we were able to detect these variants with 98.6% sensitivity and 98.5% specificity which is significantly better than another recently described method for amplicon sequencing. The source code (R-package) of quandico is licensed under the GPLv3 and it is available at https://github.com/reineckef/quandico . CONCLUSION: We demonstrated that our new algorithm is suitable to call copy number changes using data from PCR-enriched samples with high sensitivity and specificity even for single copy differences.


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Reacción en Cadena de la Polimerasa/métodos , Análisis de Secuencia de ADN/métodos , Estudios de Casos y Controles , Variaciones en el Número de Copia de ADN , Humanos , Sensibilidad y Especificidad
3.
BMC Genomics ; 15: 1073, 2014 Dec 05.
Artículo en Inglés | MEDLINE | ID: mdl-25480444

RESUMEN

BACKGROUND: Analysis of targeted amplicon sequencing data presents some unique challenges in comparison to the analysis of random fragment sequencing data. Whereas reads from randomly fragmented DNA have arbitrary start positions, the reads from amplicon sequencing have fixed start positions that coincide with the amplicon boundaries. As a result, any variants near the amplicon boundaries can cause misalignments of multiple reads that can ultimately lead to false-positive or false-negative variant calls. RESULTS: We show that amplicon boundaries are variant calling blind spots where the variant calls are highly inaccurate. We propose that an effective strategy to avoid these blind spots is to incorporate the primer bases in obtaining read alignments and post-processing of the alignments, thereby effectively moving these blind spots into the primer binding regions (which are not used for variant calling). Targeted sequencing data analysis pipelines can provide better variant calling accuracy when primer bases are retained and sequenced. CONCLUSIONS: Read bases beyond the variant site are necessary for analysis of amplicon sequencing data. Enzymatic primer digestion, if used in the target enrichment process, should leave at least a few primer bases to ensure that these bases are available during data analysis. The primer bases should only be removed immediately before the variant calling step to ensure that the variants can be called irrespective of where they occur within the amplicon insert region.


Asunto(s)
Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN/métodos , Simulación por Computador , Cartilla de ADN , Reacción en Cadena de la Polimerasa/métodos , Reproducibilidad de los Resultados
4.
BMC Genomics ; 15: 244, 2014 Mar 28.
Artículo en Inglés | MEDLINE | ID: mdl-24678773

RESUMEN

BACKGROUND: High-throughput sequencing is rapidly becoming common practice in clinical diagnosis and cancer research. Many algorithms have been developed for somatic single nucleotide variant (SNV) detection in matched tumor-normal DNA sequencing. Although numerous studies have compared the performance of various algorithms on exome data, there has not yet been a systematic evaluation using PCR-enriched amplicon data with a range of variant allele fractions. The recently developed gold standard variant set for the reference individual NA12878 by the NIST-led "Genome in a Bottle" Consortium (NIST-GIAB) provides a good resource to evaluate admixtures with various SNV fractions. RESULTS: Using the NIST-GIAB gold standard, we compared the performance of five popular somatic SNV calling algorithms (GATK UnifiedGenotyper followed by simple subtraction, MuTect, Strelka, SomaticSniper and VarScan2) for matched tumor-normal amplicon and exome sequencing data. CONCLUSIONS: We demonstrated that the five commonly used somatic SNV calling methods are applicable to both targeted amplicon and exome sequencing data. However, the sensitivities of these methods vary based on the allelic fraction of the mutation in the tumor sample. Our analysis can assist researchers in choosing a somatic SNV calling method suitable for their specific needs.


Asunto(s)
Biología Computacional/métodos , Exoma , Secuenciación de Nucleótidos de Alto Rendimiento , Mutación , Programas Informáticos , Algoritmos , Bases de Datos de Ácidos Nucleicos , Genómica/métodos , Humanos , Mutación Puntual , Curva ROC , Sensibilidad y Especificidad
5.
Nucleic Acids Res ; 40(16): e127, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22584625

RESUMEN

Accurate estimation of expression levels from RNA-Seq data entails precise mapping of the sequence reads to a reference genome. Because the standard reference genome contains only one allele at any given locus, reads overlapping polymorphic loci that carry a non-reference allele are at least one mismatch away from the reference and, hence, are less likely to be mapped. This bias in read mapping leads to inaccurate estimates of allele-specific expression (ASE). To address this read-mapping bias, we propose the construction of an enhanced reference genome that includes the alternative alleles at known polymorphic loci. We show that mapping to this enhanced reference reduced the read-mapping biases, leading to more reliable estimates of ASE. Experiments on simulated data show that the proposed strategy reduced the number of loci with mapping bias by ≥ 63% when compared with a previous approach that relies on masking the polymorphic loci and by ≥ 18% when compared with the standard approach that uses an unaltered reference. When we applied our strategy to actual RNA-Seq data, we found that it mapped up to 15% more reads than the previous approaches and identified many seemingly incorrect inferences made by them.


Asunto(s)
Alelos , Mapeo Cromosómico/métodos , Perfilación de la Expresión Génica , Análisis de Secuencia de ARN/métodos , Mapeo Cromosómico/normas , Sitios Genéticos , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Polimorfismo de Nucleótido Simple , Estándares de Referencia
6.
Bioinformatics ; 22(14): e514-22, 2006 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-16873515

RESUMEN

MOTIVATION: We explore the problem of constructing near-perfect phylogenies on bi-allelic haplotypes, where the deviation from perfect phylogeny is entirely due to homoplasy events. We present polynomial-time algorithms for restricted versions of the problem. We show that these algorithms can be extended to genotype data, in which case the problem is called the near-perfect phylogeny haplotyping (NPPH) problem. We present a near-optimal algorithm for the H1-NPPH problem, which is to determine if a given set of genotypes admit a phylogeny with a single homoplasy event. The time-complexity of our algorithm for the H1-NPPH problem is O(m2(n + m)), where n is the number of genotypes and m is the number of SNP sites. This is a significant improvement over the earlier O(n4) algorithm. We also introduce generalized versions of the problem. The H(1, q)-NPPH problem is to determine if a given set of genotypes admit a phylogeny with q homoplasy events, so that all the homoplasy events occur in a single site. We present an O(m(q+1)(n + m)) algorithm for the H(1,q)-NPPH problem. RESULTS: We present results on simulated data, which demonstrate that the accuracy of our algorithm for the H1-NPPH problem is comparable to that of the existing methods, while being orders of magnitude faster. AVAILABILITY: The implementation of our algorithm for the H1-NPPH problem is available upon request.


Asunto(s)
Evolución Biológica , Mapeo Cromosómico/métodos , Análisis Mutacional de ADN/métodos , Desequilibrio de Ligamiento/genética , Polimorfismo de Nucleótido Simple/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Secuencia de Bases , Genoma Humano/genética , Haplotipos/genética , Humanos , Datos de Secuencia Molecular , Filogenia
7.
Clin Cancer Res ; 23(18): 5648-5656, 2017 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-28536309

RESUMEN

Purpose: Tumor-derived cell-free DNA (cfDNA) in plasma can be used for molecular testing and provide an attractive alternative to tumor tissue. Commonly used PCR-based technologies can test for limited number of alterations at the time. Therefore, novel ultrasensitive technologies capable of testing for a broad spectrum of molecular alterations are needed to further personalized cancer therapy.Experimental Design: We developed a highly sensitive ultradeep next-generation sequencing (NGS) assay using reagents from TruSeqNano library preparation and NexteraRapid Capture target enrichment kits to generate plasma cfDNA sequencing libraries for mutational analysis in 61 cancer-related genes using common bioinformatics tools. The results were retrospectively compared with molecular testing of archival primary or metastatic tumor tissue obtained at different points of clinical care.Results: In a study of 55 patients with advanced cancer, the ultradeep NGS assay detected 82% (complete detection) to 87% (complete and partial detection) of the aberrations identified in discordantly collected corresponding archival tumor tissue. Patients with a low variant allele frequency (VAF) of mutant cfDNA survived longer than those with a high VAF did (P = 0.018). In patients undergoing systemic therapy, radiological response was positively associated with changes in cfDNA VAF (P = 0.02), and compared with unchanged/increased mutant cfDNA VAF, decreased cfDNA VAF was associated with longer time to treatment failure (TTF; P = 0.03).Conclusions: Ultradeep NGS assay has good sensitivity compared with conventional clinical mutation testing of archival specimens. A high VAF in mutant cfDNA corresponded with shorter survival. Changes in VAF of mutated cfDNA were associated with TTF. Clin Cancer Res; 23(18); 5648-56. ©2017 AACR.


Asunto(s)
Biomarcadores de Tumor , ADN Tumoral Circulante , Secuenciación de Nucleótidos de Alto Rendimiento , Neoplasias/diagnóstico , Neoplasias/genética , Adulto , Anciano , Anciano de 80 o más Años , Femenino , Pruebas Genéticas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Humanos , Masculino , Persona de Mediana Edad , Mutación , Neoplasias/mortalidad , Pronóstico , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
8.
Microbiome ; 2: 31, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25228989

RESUMEN

BACKGROUND: Sample storage conditions, extraction methods, PCR primers, and parameters are major factors that affect metagenomics analysis based on microbial 16S rRNA gene sequencing. Most published studies were limited to the comparison of only one or two types of these factors. Systematic multi-factor explorations are needed to evaluate the conditions that may impact validity of a microbiome analysis. This study was aimed to improve methodological options to facilitate the best technical approaches in the design of a microbiome study. Three readily available mock bacterial community materials and two commercial extraction techniques, Qiagen DNeasy and MO BIO PowerSoil DNA purification methods, were used to assess procedures for 16S ribosomal DNA amplification and pyrosequencing-based analysis. Primers were chosen for 16S rDNA quantitative PCR and amplification of region V3 to V1. Swabs spiked with mock bacterial community cells and clinical oropharyngeal swabs were incubated at respective temperatures of -80°C, -20°C, 4°C, and 37°C for 4 weeks, then extracted with the two methods, and subjected to pyrosequencing and taxonomic and statistical analyses to investigate microbiome profile stability. RESULTS: The bacterial compositions for the mock community DNA samples determined in this study were consistent with the projected levels and agreed with the literature. The quantitation accuracy of abundances for several genera was improved with changes made to the standard Human Microbiome Project (HMP) procedure. The data for the samples purified with DNeasy and PowerSoil methods were statistically distinct; however, both results were reproducible and in good agreement with each other. The temperature effect on storage stability was investigated by using mock community cells and showed that the microbial community profiles were altered with the increase in incubation temperature. However, this phenomenon was not detected when clinical oropharyngeal swabs were used in the experiment. CONCLUSIONS: Mock community materials originated from the HMP study are valuable controls in developing 16S metagenomics analysis procedures. Long-term exposure to a high temperature may introduce variation into analysis for oropharyngeal swabs, suggestive of storage at 4°C or lower. The observed variations due to sample storage temperature are in a similar range as the intrapersonal variability among different clinical oropharyngeal swab samples.

9.
PLoS One ; 6(3): e17469, 2011 Mar 07.
Artículo en Inglés | MEDLINE | ID: mdl-21408217

RESUMEN

BACKGROUND: The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS) system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. METHODOLOGY: The AGeS system supports three main capabilities. The first is the storage of input contig sequences and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through the Do-It-Yourself Annotation (DIYA) framework. The identified protein-coding regions are then functionally annotated using the in-house-developed Pipeline for Protein Annotation (PIPA). The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. AGeS was evaluated by comparing its genome annotations with those provided by three other methods. Our results indicate that the software tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions.


Asunto(s)
Genoma Bacteriano/genética , Anotación de Secuencia Molecular/métodos , Programas Informáticos , Secuencia de Bases , Genes Bacterianos/genética , Reproducibilidad de los Resultados
10.
Artículo en Inglés | MEDLINE | ID: mdl-18989047

RESUMEN

The incomplete perfect phylogeny (IPP) problem and the incomplete perfect phylogeny haplotyping (IPPH) problem deal with constructing a phylogeny for a given set of haplotypes or genotypes with missing entries. The earlier approaches for both of these problems dealt with restricted versions of the problems, where the root is either available or can be trivially re-constructed from the data, or certain assumptions were made about the data. In this paper, we deal with the unrestricted versions of the problems, where the root of the phylogeny is neither available nor trivially recoverable from the data. Both IPP and IPPH problems have previously been proven to be NP-complete. Here, we present efficient enumerative algorithms that can handle practical instances of the problem. Empirical analysis on simulated data shows that the algorithms perform very well both in terms of speed and in terms accuracy of the recovered data.


Asunto(s)
Algoritmos , Evolución Biológica , Mapeo Cromosómico/métodos , Evolución Molecular , Haplotipos/genética , Filogenia , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADN/métodos
11.
Artículo en Inglés | MEDLINE | ID: mdl-16452805

RESUMEN

Codon optimization enhances the efficiency of DNA expression vectors used in DNA vaccination and gene therapy by increasing protein expression. Additionally, certain nucleotide motifs have experimentally been shown to be immuno-stimulatory while certain others immuno-suppressive. In this paper, we present algorithms to locate a given set of immuno-modulatory motifs in the DNA expression vectors corresponding to a given amino acid sequence and maximize or minimize the number and the context of the immuno-modulatory motifs in the DNA expression vectors. The main contribution is to use multiple pattern matching algorithms to synthesize a DNA sequence for a given amino acid sequence and a graph theoretic approach for finding the longest weighted path in a directed graph that will maximize or minimize certain motifs. This is achieved using O(n(2)) time, where n is the length of the amino acid sequence. Based on this, we develop a software tool.


Asunto(s)
Algoritmos , Codón/genética , Islas de CpG/genética , Ingeniería Genética/métodos , Vectores Genéticos/genética , Reconocimiento de Normas Patrones Automatizadas/métodos , Análisis de Secuencia de ADN/métodos , Secuencias de Aminoácidos , Inteligencia Artificial , Expresión Génica/genética , Programas Informáticos , Vacunas de ADN/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA