Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Breast Cancer Res ; 21(1): 87, 2019 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-31383035

RESUMO

BACKGROUND: Approximately two thirds of patients with localized triple-negative breast cancer (TNBC) harbor residual disease (RD) after neoadjuvant chemotherapy (NAC) and have a high risk-of-recurrence. Targeted therapeutic development for TNBC is of primary significance as no targeted therapies are clinically indicated for this aggressive subset. In view of this, we conducted a comprehensive molecular analysis and correlated molecular features of chemorefractory RD tumors with recurrence for the purpose of guiding downstream therapeutic development. METHODS: We assembled DNA and RNA sequencing data from RD tumors as well as pre-operative biopsies, lymphocytic infiltrate, and survival data as part of a molecular correlative to a phase II post-neoadjuvant clinical trial. Matched somatic mutation, gene expression, and lymphocytic infiltrate were assessed before and after chemotherapy to understand how tumors evolve during chemotherapy. Kaplan-Meier survival analyses were conducted categorizing cancers with TP53 mutations by the degree of loss as well as by the copy number of a locus of 18q corresponding to the SMAD2, SMAD4, and SMAD7 genes. RESULTS: Analysis of matched somatic genomes pre-/post-NAC revealed chaotic acquisition of copy gains and losses including amplification of prominent oncogenes. In contrast, significant gains in deleterious point mutations and insertion/deletions were not observed. No trends between clonal evolution and recurrence were identified. Gene expression data from paired biopsies revealed enrichment of actionable regulators of stem cell-like behavior and depletion of immune signaling, which was corroborated by total lymphocytic infiltrate, but was not associated with recurrence. Novel characterization of TP53 mutation revealed prognostically relevant subgroups, which were linked to MYC-driven transcriptional amplification. Finally, somatic gains in 18q were associated with poor prognosis, likely driven by putative upregulation of TGFß signaling through the signal transducer SMAD2. CONCLUSIONS: We conclude TNBCs are dynamic during chemotherapy, demonstrating complex plasticity in subclonal diversity, stem-like qualities, and immune depletion, but somatic alterations of TP53/MYC and TGFß signaling in RD samples are prominent drivers of recurrence, representing high-yield targets for additional interrogation.


Assuntos
Biomarcadores Tumorais , Resistencia a Medicamentos Antineoplásicos/genética , Regulação Neoplásica da Expressão Gênica , Neoplasias de Mama Triplo Negativas/genética , Neoplasias de Mama Triplo Negativas/patologia , Protocolos de Quimioterapia Combinada Antineoplásica/efeitos adversos , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Variações do Número de Cópias de DNA , Feminino , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Estimativa de Kaplan-Meier , Mutação , Terapia Neoadjuvante , Recidiva Local de Neoplasia , Neoplasia Residual , Células-Tronco Neoplásicas/metabolismo , Prognóstico , Transdução de Sinais , Resultado do Tratamento , Neoplasias de Mama Triplo Negativas/tratamento farmacológico , Neoplasias de Mama Triplo Negativas/mortalidade , Proteína Supressora de Tumor p53/genética
2.
NPJ Breast Cancer ; 3: 24, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28685160

RESUMO

Next-generation sequencing to detect circulating tumor DNA is a minimally invasive method for tumor genotyping and monitoring therapeutic response. The majority of studies have focused on detecting circulating tumor DNA from patients with metastatic disease. Herein, we tested whether circulating tumor DNA could be used as a biomarker to predict relapse in triple-negative breast cancer patients with residual disease after neoadjuvant chemotherapy. In this study, we analyzed samples from 38 early-stage triple-negative breast cancer patients with matched tumor, blood, and plasma. Extracted DNA underwent library preparation and amplification using the Oncomine Research Panel consisting of 134 cancer genes, followed by high-coverage sequencing and bioinformatics. We detected high-quality somatic mutations from primary tumors in 33 of 38 patients. TP53 mutations were the most prevalent (82%) followed by PIK3CA (16%). Of the 33 patients who had a mutation identified in their primary tumor, we were able to detect circulating tumor DNA mutations in the plasma of four patients (three TP53 mutations, one AKT1 mutation, one CDKN2A mutation). All four patients had recurrence of their disease (100% specificity), but sensitivity was limited to detecting only 4 of 13 patients who clinically relapsed (31% sensitivity). Notably, all four patients had a rapid recurrence (0.3, 4.0, 5.3, and 8.9 months). Patients with detectable circulating tumor DNA had an inferior disease free survival (p < 0.0001; median disease-free survival: 4.6 mos. vs. not reached; hazard ratio = 12.6, 95% confidence interval: 3.06-52.2). Our study shows that next-generation circulating tumor DNA sequencing of triple-negative breast cancer patients with residual disease after neoadjuvant chemotherapy can predict recurrence with high specificity, but moderate sensitivity. For those patients where circulating tumor DNA is detected, recurrence is rapid.

3.
BMC Genomics ; 15 Suppl 5: S7, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25082147

RESUMO

BACKGROUND: High throughput RNA sequencing (RNA-Seq) can generate whole transcriptome information at the single transcript level providing a powerful tool with multiple interrelated applications including transcriptome reconstruction and quantification. The sequences of novel transcripts can be reconstructed from deep RNA-Seq data, but this is computationally challenging due to sequencing errors, uneven coverage of expressed transcripts, and the need to distinguish between highly similar transcripts produced by alternative splicing. Another challenge in transcriptomic analysis comes from the ambiguities in mapping reads to transcripts. RESULTS: We present MaLTA, a method for simultaneous transcriptome assembly and quantification from Ion Torrent RNA-Seq data. Our approach explores transcriptome structure and incorporates a maximum likelihood model into the assembly and quantification procedure. A new version of the IsoEM algorithm suitable for Ion Torrent RNA-Seq reads is used to accurately estimate transcript expression levels. The MaLTA-IsoEM tool is publicly available at: http://alan.cs.gsu.edu/NGS/?q=malta CONCLUSIONS: Experimental results on both synthetic and real datasets show that Ion Torrent RNA-Seq data can be successfully used for transcriptome analyses. Experimental results suggest increased transcriptome assembly and quantification accuracy of MaLTA-IsoEM solution compared to existing state-of-the-art approaches.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Transcriptoma , Algoritmos , Processamento Alternativo , Humanos , Funções Verossimilhança , Alinhamento de Sequência , Software
4.
PLoS One ; 6(4): e18565, 2011 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-21533272

RESUMO

Filamentous marine cyanobacteria are extraordinarily rich sources of structurally novel, biomedically relevant natural products. To understand their biosynthetic origins as well as produce increased supplies and analog molecules, access to the clustered biosynthetic genes that encode for the assembly enzymes is necessary. Complicating these efforts is the universal presence of heterotrophic bacteria in the cell wall and sheath material of cyanobacteria obtained from the environment and those grown in uni-cyanobacterial culture. Moreover, the high similarity in genetic elements across disparate secondary metabolite biosynthetic pathways renders imprecise current gene cluster targeting strategies and contributes sequence complexity resulting in partial genome coverage. Thus, it was necessary to use a dual-method approach of single-cell genomic sequencing based on multiple displacement amplification (MDA) and metagenomic library screening. Here, we report the identification of the putative apratoxin. A biosynthetic gene cluster, a potent cancer cell cytotoxin with promise for medicinal applications. The roughly 58 kb biosynthetic gene cluster is composed of 12 open reading frames and has a type I modular mixed polyketide synthase/nonribosomal peptide synthetase (PKS/NRPS) organization and features loading and off-loading domain architecture never previously described. Moreover, this work represents the first successful isolation of a complete biosynthetic gene cluster from Lyngbya bouillonii, a tropical marine cyanobacterium renowned for its production of diverse bioactive secondary metabolites.


Assuntos
Toxinas Bacterianas/biossíntese , Cianobactérias/metabolismo , Análise de Célula Única , Cianobactérias/genética , Genoma Bacteriano , Família Multigênica , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz
5.
Bioinformatics ; 26(22): 2856-62, 2010 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-20871107

RESUMO

MOTIVATION: In complex disorders, independently evolving locus pairs might interact to confer disease susceptibility, with only a modest effect at each locus. With genome-wide association studies on large cohorts, testing all pairs for interaction confers a heavy computational burden, and a loss of power due to large Bonferroni-like corrections. Correspondingly, limiting the tests to pairs that show marginal effect at either locus, also has reduced power. Here, we describe an algorithm that discovers interacting locus pairs without explicitly testing all pairs, or requiring a marginal effect at each locus. The central idea is a mathematical transformation that maps 'statistical correlation between locus pairs' to 'distance between two points in a Euclidean space'. This enables the use of geometric properties to identify proximal points (correlated locus pairs), without testing each pair explicitly. For large datasets (∼ 10(6) SNPs), this reduces the number of tests from 10(12) to 10(6), significantly reducing the computational burden, without loss of power. The speed of the test allows for correction using permutation-based tests. The algorithm is encoded in a tool called RAPID (RApid Pair IDentification) for identifying paired interactions in case-control GWAS. RESULTS: We validated RAPID with extensive tests on simulated and real datasets. On simulated models of interaction, RAPID easily identified pairs with small marginal effects. On the benchmark disease, datasets from The Wellcome Trust Case Control Consortium, RAPID ran in about 1 CPU-hour per dataset, and identified many significant interactions. In many cases, the interacting loci were known to be important for the disease, but were not individually associated in the genome-wide scan. AVAILABILITY: http://bix.ucsd.edu/projects/rapid.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Software , Bases de Dados Genéticas , Expressão Gênica , Polimorfismo de Nucleotídeo Único
6.
Genome Res ; 19(2): 336-46, 2009 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-19056694

RESUMO

Increasing read length is currently viewed as the crucial condition for fragment assembly with next-generation sequencing technologies. However, introducing mate-paired reads (separated by a gap of length, GapLength) opens a possibility to transform short mate-pairs into long mate-reads of length approximately GapLength, and thus raises the question as to whether the read length (as opposed to GapLength) even matters. We describe a new tool, EULER-USR, for assembling mate-paired short reads and use it to analyze the question of whether the read length matters. We further complement the ongoing experimental efforts to maximize read length by a new computational approach for increasing the effective read length. While the common practice is to trim the error-prone tails of the reads, we present an approach that substitutes trimming with error correction using repeat graphs. An important and counterintuitive implication of this result is that one may extend sequencing reactions that degrade with length "past their prime" to where the error rate grows above what is normally acceptable for fragment assembly.


Assuntos
Sequência de Bases/fisiologia , Mapeamento Cromossômico/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Pareamento de Bases/fisiologia , Viés , Biologia Computacional/métodos , Escherichia coli/genética , Marcadores Genéticos , Genoma Bacteriano , Modelos Biológicos
7.
Artigo em Inglês | MEDLINE | ID: mdl-18451440

RESUMO

Emerging microarray technologies allow affordable typing of very long genome sequences. A key challenge in analyzing of such huge amount of data is scalable and accurate computational inferring of haplotypes (i.e., splitting of each genotype into a pair of corresponding haplotypes). In this paper, we first phase genotypes consisting only of two SNPs using genotypes frequencies adjusted to the random mating model and then extend phasing of two-SNP genotypes to phasing of complete genotypes using maximum spanning trees. Runtime of the proposed 2SNP algorithm is O(nm (n + log m), where n and m are the numbers of genotypes and SNPs, respectively, and it can handle genotypes spanning entire chromosomes in a matter of hours. On datasets across 23 chromosomal regions from HapMap[11], 2SNP is several orders of magnitude faster than GERBIL and PHASE while matching them in quality measured by the number of correctly phased genotypes, single-site and switching errors. For example the 2SNP software phases entire chromosome (10(5) SNPs from HapMap) for 30 individuals in 2 hours with average switching error 7.7%. We have also enhanced 2SNP algorithm to phase family trio data and compared it with four other well-known phasing methods on simulated data from [15]. 2SNP is much faster than all of them while loosing in quality only to PHASE. 2SNP software is publicly available at http://alla.cs.gsu.edu/~software/2SNP.


Assuntos
Algoritmos , Polimorfismo de Nucleotídeo Único , Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Feminino , Genótipo , Haplótipos , Humanos , Masculino , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Software
8.
J Comput Biol ; 15(1): 81-90, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18199025

RESUMO

Accessibility of high-throughput genotyping technology allows genome-wide association studies for common complex diseases. This paper addresses two challenges commonly facing such studies: (i) searching an enormous amount of possible gene interactions and (ii) finding reproducible associations. These challenges have been traditionally addressed in statistics while here we apply computational approaches--optimization and cross-validation. A complex risk factor is modeled as a subset of single nucleotide polymorphisms (SNPs) with specified alleles and the optimization formulation asks for the one with the maximum odds ratio. To measure and compare ability of search methods to find reproducible risk factors, we propose to apply a cross-validation scheme usually used for prediction validation. We have applied and cross-validated known search methods with proposed enhancements on real case-control studies for several diseases (Crohn's disease, autoimmune disorder, tick-borne encephalitis, lung cancer, and rheumatoid arthritis). Proposed methods are compared favorably to the exhaustive search: they are faster, find more frequently statistically significant risk factors, and have significantly higher leave-half-out cross-validation rate.


Assuntos
Estudos de Casos e Controles , Biologia Computacional/métodos , Predisposição Genética para Doença , Intervalos de Confiança , Bases de Dados Genéticas , Humanos , Mutação/genética , Razão de Chances , Polimorfismo de Nucleotídeo Único/genética , Reprodutibilidade dos Testes , Software
9.
Genet Epidemiol ; 31 Suppl 1: S51-60, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-18046765

RESUMO

Genome-wide association studies using thousands to hundreds of thousands of single nucleotide polymorphism (SNP) markers and region-wide association studies using a dense panel of SNPs are already in use to identify disease susceptibility genes and to predict disease risk in individuals. Because these tasks become increasingly important, three different data sets were provided for the Genetic Analysis Workshop 15, thus allowing examination of various novel and existing data mining methods for both classification and identification of disease susceptibility genes, gene by gene or gene by environment interaction. The approach most often applied in this presentation group was random forests because of its simplicity, elegance, and robustness. It was used for prediction and for screening for interesting SNPs in a first step. The logistic tree with unbiased selection approach appeared to be an interesting alternative to efficiently select interesting SNPs. Machine learning, specifically ensemble methods, might be useful as pre-screening tools for large-scale association studies because they can be less prone to overfitting, can be less computer processor time intensive, can easily include pair-wise and higher-order interactions compared with standard statistical approaches and can also have a high capability for classification. However, improved implementations that are able to deal with hundreds of thousands of SNPs at a time are required.


Assuntos
Redes Neurais de Computação , Polimorfismo de Nucleotídeo Único , Predisposição Genética para Doença , Genoma Humano , Humanos , Análise de Regressão
10.
BMC Proc ; 1 Suppl 1: S129, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-18466471

RESUMO

We propose two new haplotype-sharing methods for identifying disease loci: the haplotype sharing statistic (HSS), which compares length of shared haplotypes between cases and controls, and the CROSS test, which tests whether a case and a control haplotype show less sharing than two random haplotypes. The significance of the HSS is determined using a variance estimate from the theory of U-statistics, whereas the significance of the CROSS test is estimated from a sequential randomization procedure. Both methods are fast and hence practical, even for whole-genome screens with high marker densities. We analyzed data sets of Problems 2 and 3 of Genetic Analysis Workshop 15 and compared HSS and CROSS to conventional association methods. Problem 2 provided a data set of 2300 single-nucleotide polymorphisms (SNPs) in a 10-Mb region of chromosome 18q, which had shown linkage evidence for rheumatoid arthritis. The CROSS test detected a significant association at approximately position 4407 kb. This was supported by single-marker association and HSS. The CROSS test outperformed them both with respect to significance level and signal-to-noise ratio. A 20-kb candidate region could be identified. Problem 3 provided a simulated 10 k SNP data set covering the whole genome. Three known candidate regions for rheumatoid arthritis were detected. Again, the CROSS test gave the most significant results. Furthermore, both the HSS and the CROSS showed better fine-mapping accuracy than straightforward haplotype association. In conclusion, haplotype sharing methods, particularly the CROSS test, show great promise for identifying disease gene loci.

11.
Bioinformatics ; 22(3): 371-3, 2006 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-16287933

RESUMO

2SNP software package implements a new very fast scalable algorithm for haplotype inference based on genotype statistics collected only for pairs of SNPs. This software can be used for comparatively accurate phasing of large number of long genome sequences, e.g. obtained from DNA arrays. As an input 2SNP takes genotype matrix and outputs the corresponding haplotype matrix. On datasets across 79 regions from HapMap 2SNP is several orders of magnitude faster than GERBIL and PHASE while matching them in quality measured by the number of correctly phased genotypes, single-site and switching errors. For example, 2SNP requires 41 s on Pentium 4 2 Ghz processor to phase 30 genotypes with 1381 SNPs (ENm010.7p15:2 data from HapMap) versus GERBIL and PHASE requiring more than a week and admitting no less errors than 2SNP.


Assuntos
Mapeamento Cromossômico/métodos , Análise Mutacional de DNA/métodos , Haplótipos/genética , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos
12.
Conf Proc IEEE Eng Med Biol Soc ; 2006: 5802-5, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17946721

RESUMO

Recent improvements in the accessibility of high-throughput genotyping have brought a deal of attention to genome-wide association studies for common complex diseases. Although, such diseases can be caused by multi-loci interactions, locus-by-locus studies are prevailing. Recently, two-loci analysis has been shown promising (Marchini et al, 2005), and multi-loci analysis is expected to find even deeper disease-associated interactions. Unfortunately, an exhaustive search among all possible corresponding multi-markers can be unfeasible even for small number of SNPs let alone the complete genome. In this paper we first propose to extract informative (indexing) SNPs that can be used for reconstructing of all SNPs almost without loss (He and Zelikovsky, 2006). In the reduced set of SNPs, we then propose to apply a novel combinatorial method for finding disease-associated multi-SNP combinations (MSCs). Our experimental study shows that the proposed methods are able to find MSCs whose disease association is statistically significant even after multiple testing adjustment. For (Daly et al, 2001) data we found a few unphased MSCs associated with Crohn's disease with multiple testing adjusted p-value below 0.05 while no single SNP or pair of SNPs show any significant association. For (Ueda et al, 2003) data we found a few new unphased and phased MSCs associated with autoimmune disorder.


Assuntos
Biologia Computacional/métodos , Doenças Genéticas Inatas/genética , Polimorfismo de Nucleotídeo Único , Alelos , Cromossomos , Genoma Humano , Genótipo , Haplótipos , Humanos , Modelos Estatísticos , Software
13.
Artigo em Inglês | MEDLINE | ID: mdl-17282153

RESUMO

Recent improvements in the accessibility of high-throughput genotyping have brought a great deal of attention to disease association and susceptibility studies. This paper explores possibility of applying combinatorial methods to disease susceptibility prediction. The proposed combinatorial methods as well as standard statistical methods are applied to publicly available genotype data on Crohn's disease and autoimmune disorders for predicting susceptibility to these diseases. The quality of susceptibility prediction algorithm is assessed using leave-one-out and leave-many-out tests - the disease status of one or several individuals is predicted and compared to the their actual disease status which is initially made unknown to the algorithm. The best prediction rate achieved by the proposed algorithms is 77.78% for Crohn's disease and 64.99% for autoimmune disorders, respectively.

14.
Int J Bioinform Res Appl ; 1(2): 221-9, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-18048132

RESUMO

Although there exist many phasing methods for unrelated adults or pedigrees, phasing and missing data recovery for data representing family trios is lagging behind. This paper is an attempt to fill this gap by considering the following problem. Given a set of genotypes partitioned into family trios, find for each trio a quartet of parent/offspring haplotypes explaining each trio without recombinations and recovering the SNP values missed in given genotype data. Our contributions include: formulating the pure-parsimony trio phasing without recombinations and the trio missing data recovery problems; proposing new greedy and integer linear programming based solution methods; extensive experimental validation of proposed methods showing advantage over the previously known methods.


Assuntos
Genótipo , Haplótipos , Humanos , Modelos Genéticos , Linhagem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA