Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Hum Genomics ; 17(1): 64, 2023 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-37454130

RESUMO

BACKGROUND: Female breast cancer remains the second leading cause of cancer-related death in the USA. The heterogeneity in the tumor morphology across the cohort and within patients can lead to unpredictable therapy resistance, metastasis, and clinical outcome. Hence, supplementing classic pathological markers with intrinsic tumor molecular markers can help identify novel molecular subtypes and the discovery of actionable biomarkers. METHODS: We conducted a large multi-institutional genomic analysis of paired normal and tumor samples from breast cancer patients to profile the complex genomic architecture of breast tumors. Long-term patient follow-up, therapeutic regimens, and treatment response for this cohort are documented using the Breast Cancer Collaborative Registry. The majority of the patients in this study were at tumor stage 1 (51.4%) and stage 2 (36.3%) at the time of diagnosis. Whole-exome sequencing data from 554 patients were used for mutational profiling and identifying cancer drivers. RESULTS: We identified 54 tumors having at least 1000 mutations and 185 tumors with less than 100 mutations. Tumor mutational burden varied across the classified subtypes, and the top ten mutated genes include MUC4, MUC16, PIK3CA, TTN, TP53, NBPF10, NBPF1, CDC27, AHNAK2, and MUC2. Patients were classified based on seven biological and tumor-specific parameters, including grade, stage, hormone receptor status, histological subtype, Ki67 expression, lymph node status, race, and mutational profiles compared across different subtypes. Mutual exclusion of mutations in PIK3CA and TP53 was pronounced across different tumor grades. Cancer drivers specific to each subtype include TP53, PIK3CA, CDC27, CDH1, STK39, CBFB, MAP3K1, and GATA3, and mutations associated with patient survival were identified in our cohort. CONCLUSIONS: This extensive study has revealed tumor burden, driver genes, co-occurrence, mutual exclusivity, and survival effects of mutations on a US Midwestern breast cancer cohort, paving the way for developing personalized therapeutic strategies.


Assuntos
Neoplasias da Mama , Feminino , Humanos , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Prognóstico , Mutação , Biomarcadores Tumorais/genética , Classe I de Fosfatidilinositol 3-Quinases/genética
2.
Genomics ; 112(6): 3943-3950, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32621856

RESUMO

Following Hardy-Weinberg disequilibrium (HWD) occurring at a single locus and linkage disequilibrium (LD) between two loci in generations, we here proposed the third genetic disequilibrium in a population: recombination disequilibrium (RD). RD is a measurement of crossover interference among multiple loci in a random mating population. In natural populations besides recombination interference, RD may also be due to selection, mutation, gene conversion, drift and/or migration. Therefore, similarly to LD, RD will also reflect the history of natural selection and mutation. In breeding populations, RD purely results from recombination interference and hence can be used to build or evaluate and correct a linkage map. Practical examples from F2, testcross and human populations indeed demonstrate that RD is useful for measuring recombination interference between two short intervals and evaluating linkage maps. As with LD, RD will be important for studying genetic mapping, association of haplotypes with disease, plant breading and population history.


Assuntos
Recombinação Genética , Genoma Humano , Humanos , Desequilíbrio de Ligação , Seleção Genética
3.
BMC Bioinformatics ; 18(Suppl 11): 404, 2017 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-28984187

RESUMO

BACKGROUND: Dominant markers in an F2 population or a hybrid population have much less linkage information in repulsion phase than in coupling phase. Linkage analysis produces two separate complementary marker linkage maps that have little use in disease association analysis and breeding. There is a need to develop efficient statistical methods and computational algorithms to construct or merge a complete linkage dominant marker maps. The key for doing so is to efficiently estimate recombination fractions between dominant markers in repulsion phases. RESULT: We proposed an expectation least square (ELS) algorithm and binomial analysis of three-point gametes (BAT) for estimating gamete frequencies from F2 dominant and codominant marker data, respectively. The results obtained from simulated and real genotype datasets showed that the ELS algorithm was able to accurately estimate frequencies of gametes and outperformed the EM algorithm in estimating recombination fractions between dominant loci and recovering true linkage maps of 6 dominant loci in coupling and unknown linkage phases. Our BAT method also had smaller variances in estimation of two-point recombination fractions than the EM algorithm. CONCLUSION: ELS is a powerful method for accurate estimation of gamete frequencies in dominant three-locus system in an F2 population and BAT is a computationally efficient and fast method for estimating frequencies of three-point codominant gametes.


Assuntos
Cruzamentos Genéticos , Recombinação Genética , Estatística como Assunto/métodos , Algoritmos , Animais , Simulação por Computador , Feminino , Genes Dominantes , Ligação Genética , Loci Gênicos , Marcadores Genéticos , Análise dos Mínimos Quadrados , Masculino , Camundongos , Modelos Genéticos
4.
Nucleic Acids Res ; 43(15): e96, 2015 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-25953852

RESUMO

Most mammalian genes have mRNA variants due to alternative promoter usage, alternative splicing, and alternative cleavage and polyadenylation. Expression of alternative RNA isoforms has been found to be associated with tumorigenesis, proliferation and differentiation. Detection of condition-associated transcription variation requires association methods. Traditional association methods such as Pearson chi-square test and Fisher Exact test are single test methods and do not work on count data with replicates. Although the Cochran Mantel Haenszel (CMH) approach can handle replicated count data, our simulations showed that multiple CMH tests still had very low power. To identify condition-associated variation of transcription, we here proposed a ranking analysis of chi-squares (RAX2) for large-scale association analysis. RAX2 is a nonparametric method and has accurate and conservative estimation of FDR profile. Simulations demonstrated that RAX2 performs well in finding condition-associated transcription variants. We applied RAX2 to primary T-cell transcriptomic data and identified 1610 (16.3%) tags associated in transcription with immune stimulation at FDR < 0.05. Most of these tags also had differential expression. Analysis of two and three tags within genes revealed that under immune stimulation short RNA isoforms were preferably used.


Assuntos
Processamento Alternativo , Perfilação da Expressão Gênica/métodos , Poliadenilação , Linfócitos T CD4-Positivos/metabolismo , Linhagem Celular , Distribuição de Qui-Quadrado , Variação Genética , Genômica/métodos , Humanos , Isoformas de RNA/química , Isoformas de RNA/metabolismo , Estatísticas não Paramétricas , Transcrição Gênica
5.
Bioinformatics ; 30(14): 2018-25, 2014 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-24632499

RESUMO

UNLABELLED: The 'omic' data such as genomic data, transcriptomic data, proteomic data and single nucleotide polymorphism data have been rapidly growing. The omic data are large-scale and high-throughput data. Such data challenge traditional statistical methodologies and require multiple tests. Several multiple-testing procedures such as Bonferroni procedure, Benjamini-Hochberg (BH) procedure and Westfall-Young procedure have been developed, among which some control family-wise error rate and the others control false discovery rate (FDR). These procedures are valid in some cases and cannot be applied to all types of large-scale data. To address this statistically challenging problem in the analysis of the omic data, we propose a general method for generating a set of multiple-testing procedures. This method is based on the BH theorems. By choosing a C-value, one can realize a specific multiple-testing procedure. For example, by setting C = 1.22, our method produces the BH procedure. With C < 1.22, our method generates procedures of weakly controlling FDR, and with C > 1.22, the procedures strongly control FDR. Those with C = G (number of genes or tests) and C = 0 are, respectively, the Bonferroni procedure and the single-testing procedure. These are the two extreme procedures in this family. To let one choose an appropriate multiple-testing procedure in practice, we develop an algorithm by which FDR can be correctly and reliably estimated. Simulated results show that our method works well for an accurate estimation of FDR in various scenarios, and we illustrate the applications of our method with three real datasets. AVAILABILITY AND IMPLEMENTATION: Our program is implemented in Matlab and is available upon request.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Arabidopsis/genética , Interpretação Estatística de Dados , Humanos , Leucemia/genética , Análise de Sequência com Séries de Oligonucleotídeos
6.
Int J Biostat ; 19(1): 1-19, 2023 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-35749155

RESUMO

It has been reported that about half of biological discoveries are irreproducible. These irreproducible discoveries were partially attributed to poor statistical power. The poor powers are majorly owned to small sample sizes. However, in molecular biology and medicine, due to the limit of biological resources and budget, most molecular biological experiments have been conducted with small samples. Two-sample t-test controls bias by using a degree of freedom. However, this also implicates that t-test has low power in small samples. A discovery found with low statistical power suggests that it has a poor reproducibility. So, promotion of statistical power is not a feasible way to enhance reproducibility in small-sample experiments. An alternative way is to reduce type I error rate. For doing so, a so-called t α -test was developed. Both theoretical analysis and simulation study demonstrate that t α -test much outperforms t-test. However, t α -test is reduced to t-test when sample sizes are over 15. Large-scale simulation studies and real experiment data show that t α -test significantly reduced type I error rate compared to t-test and Wilcoxon test in small-sample experiments. t α -test had almost the same empirical power with t-test. Null p-value density distribution explains why t α -test had so lower type I error rate than t-test. One real experimental dataset provides a typical example to show that t α -test outperforms t-test and a microarray dataset showed that t α -test had the best performance among five statistical methods. In addition, the density distribution and probability cumulative function of t α -statistic were given in mathematics and the theoretical and observed distributions are well matched.


Assuntos
Modelos Estatísticos , Reprodutibilidade dos Testes , Simulação por Computador , Funções Verossimilhança , Tamanho da Amostra
7.
Front Genet ; 14: 1295327, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38292437

RESUMO

Haplotype-based association analysis has several advantages over single-SNP association analysis. However, to date all haplotype-disease associations have not excluded recombination interference among multiple loci and hence some results might be confounded by recombination interference. Association of sister haplotypes with a complex disease, based on recombination disequilibrium (RD) was presented. Sister haplotypes can be determined by translating notation of DNA base haplotypes to notation of genetic genotypes. Sister haplotypes provide haplotype pairs available for haplotype-disease association analysis. After performing RD tests in control and case cohorts, a two-by-two contingency table can be constructed using sister haplotype pair and case-control pair. With this standard two-by-two table, one can perform classical Chi-square test to find statistical haplotype-disease association. Applying this method to a haplotype dataset of Alzheimer disease (AD), association of sister haplotypes containing ApoE3/4 with risk for AD was identified under no RD. Haplotypes within gene IL-13 were not associated with risk for breast cancer in the case of no RD and no association of haplotypes in gene IL-17A with risk for coronary artery disease were detected without RD. The previously reported associations of haplotypes within these genes with risk for these diseases might be due to strong RD and/or inappropriate haplotype pairs.

8.
Genomics ; 97(1): 58-68, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-20888900

RESUMO

Development of statistical methods has become very necessary for large-scale correlation analysis in the current "omic" data. We propose ranking analysis of correlation coefficients (RAC) based on transforming correlation matrix into correlation vector and conducting a "locally ranking" strategy that significantly reduces computational complexity and load. RAC gives estimation of null correlation distribution and an estimator of false discovery rate (FDR) for finding gene pairs of being correlated in expressions obtained by comparison between the ranked observed correlation coefficients and the ranked estimated ones at a given threshold level. The simulated and real data show that the estimated null correlation distribution is exactly the same with the true one and the FDR estimator works well in various scenarios. By applying our RAC, in the null dataset, no gene pairs were found but, in the human cancer dataset, 837 gene pairs were found to have positively correlated expression variations at FDR≤5%. RAC performs well in multiple conditions (classes), each with 3 or more replicate observations.


Assuntos
Biologia Computacional/métodos , Genes Neoplásicos , Genômica/métodos , Expressão Gênica , Genoma Humano , Humanos , Neoplasias
9.
Genomics ; 98(5): 390-9, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21741470

RESUMO

Receiver operating characteristic (ROC) has been widely used to evaluate statistical methods, but a fatal problem is that ROC cannot evaluate estimation of the false discovery rate (FDR) of a statistical method and hence the area under of curve as a criterion cannot tell us if a statistical method is conservative. To address this issue, we propose an alternative criterion, work efficiency. Work efficiency is defined as the product of the power and degree of conservativeness of a statistical method. We conducted large-scale simulation comparisons among the optimizing discovery procedure (ODP), the Bonferroni (B-) procedure, Local FDR (Localfdr), ranking analysis of the F-statistics (RAF), the Benjamini-Hochberg (BH-) procedure, and significance analysis of microarray data (SAM). The results show that ODP, SAM, and the B-procedure perform with low efficiencies while the BH-procedure, RAF, and Localfdr work with higher efficiency. ODP and SAM have the same ROC curves but their efficiencies are significantly different.


Assuntos
Interpretação Estatística de Dados , Perfilação da Expressão Gênica/métodos , Modelos Estatísticos , Algoritmos , Área Sob a Curva , Simulação por Computador , Regulação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Curva ROC
10.
Sci Rep ; 12(1): 12833, 2022 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-35896555

RESUMO

Rapid development of transcriptome sequencing technologies has resulted in a data revolution and emergence of new approaches to study transcriptomic regulation such as alternative splicing, alternative polyadenylation, CRISPR knockout screening in addition to the regular gene expression. A full characterization of the transcriptional landscape of different groups of cells or tissues holds enormous potential for both basic science as well as clinical applications. Although many methods have been developed in the realm of differential gene expression analysis, they all geared towards a particular type of sequencing data and failed to perform well when applied in different types of transcriptomic data. To fill this gap, we offer a negative beta binomial t-test (NBBt-test). NBBt-test provides multiple functions to perform differential analyses of alternative splicing, polyadenylation, CRISPR knockout screening, and gene expression datasets. Both real and large-scale simulation data show superior performance of NBBt-test with higher efficiency, and lower type I error rate and FDR to identify differential isoforms and differentially expressed genes and differential CRISPR knockout screening genes with different sample sizes when compared against the current very popular statistical methods. An R-package implementing NBBt-test is available for downloading from CRAN ( https://CRAN.R-project.org/package=NBBttest ).


Assuntos
Processamento Alternativo , Transcriptoma , Perfilação da Expressão Gênica/métodos , RNA-Seq , Análise de Sequência de RNA/métodos , Software , Transcriptoma/genética , Sequenciamento do Exoma
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa