Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
J Exp Clin Cancer Res ; 42(1): 231, 2023 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-37670323

RESUMEN

BACKGROUND: Acute lymphoblastic leukemia (ALL) is the most common pediatric hematological malignancy, with ETV6::RUNX1 being the most prevalent translocation whose exact pathogenesis remains unclear. IGF2BP1 (Insulin-like Growth Factor 2 Binding Protein 1) is an oncofetal RNA binding protein seen to be specifically overexpressed in ETV6::RUNX1 positive B-ALL. In this study, we have studied the mechanistic role of IGF2BP1 in leukemogenesis and its synergism with the ETV6::RUNX1 fusion protein. METHODS: Gene expression was analyzed from patient bone marrow RNA using Real Time RT-qPCR. Knockout cell lines were created using CRISPR-Cas9 based lentiviral vectors. RNA-Seq and RNA Immunoprecipitation sequencing (RIP-Seq) after IGF2BP1 pulldown were performed using the Illumina platform. Mouse experiments were done by retroviral overexpression of donor HSCs followed by lethal irradiation of recipients using a bone marrow transplant model. RESULTS: We observed specific overexpression of IGF2BP1 in ETV6::RUNX1 positive patients in an Indian cohort of pediatric ALL (n=167) with a positive correlation with prednisolone resistance. IGF2BP1 expression was essential for tumor cell survival in multiple ETV6::RUNX1 positive B-ALL cell lines. Integrated analysis of transcriptome sequencing after IGF2BP1 knockout and RIP-Seq after IGF2BP1 pulldown in Reh cell line revealed that IGF2BP1 targets encompass multiple pro-oncogenic signalling pathways including TNFα/NFκB and PI3K-Akt pathways. These pathways were also dysregulated in primary ETV6::RUNX1 positive B-ALL patient samples from our center as well as in public B-ALL patient datasets. IGF2BP1 showed binding and stabilization of the ETV6::RUNX1 fusion transcript itself. This positive feedback loop led to constitutive dysregulation of several oncogenic pathways. Enforced co-expression of ETV6::RUNX1 and IGF2BP1 in mouse bone marrow resulted in marrow hypercellularity which was characterized by multi-lineage progenitor expansion and strong Ki67 positivity. This pre-leukemic phenotype confirmed their synergism in-vivo. Clonal expansion of cells overexpressing both ETV6::RUNX1 and IGF2BP1 was clearly observed. These mice also developed splenomegaly indicating extramedullary hematopoiesis. CONCLUSION: Our data suggest a combined impact of the ETV6::RUNX1 fusion protein and RNA binding protein, IGF2BP1 in activating multiple oncogenic pathways in B-ALL which makes IGF2BP1 and these pathways as attractive therapeutic targets and biomarkers.


Asunto(s)
Leucemia-Linfoma Linfoblástico de Células Precursoras B , Leucemia-Linfoma Linfoblástico de Células Precursoras , Animales , Ratones , Subunidad alfa 2 del Factor de Unión al Sitio Principal , Ratones Noqueados , Fosfatidilinositol 3-Quinasas , Proteína ETS de Variante de Translocación 6
2.
Hum Genomics ; 17(1): 64, 2023 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-37454130

RESUMEN

BACKGROUND: Female breast cancer remains the second leading cause of cancer-related death in the USA. The heterogeneity in the tumor morphology across the cohort and within patients can lead to unpredictable therapy resistance, metastasis, and clinical outcome. Hence, supplementing classic pathological markers with intrinsic tumor molecular markers can help identify novel molecular subtypes and the discovery of actionable biomarkers. METHODS: We conducted a large multi-institutional genomic analysis of paired normal and tumor samples from breast cancer patients to profile the complex genomic architecture of breast tumors. Long-term patient follow-up, therapeutic regimens, and treatment response for this cohort are documented using the Breast Cancer Collaborative Registry. The majority of the patients in this study were at tumor stage 1 (51.4%) and stage 2 (36.3%) at the time of diagnosis. Whole-exome sequencing data from 554 patients were used for mutational profiling and identifying cancer drivers. RESULTS: We identified 54 tumors having at least 1000 mutations and 185 tumors with less than 100 mutations. Tumor mutational burden varied across the classified subtypes, and the top ten mutated genes include MUC4, MUC16, PIK3CA, TTN, TP53, NBPF10, NBPF1, CDC27, AHNAK2, and MUC2. Patients were classified based on seven biological and tumor-specific parameters, including grade, stage, hormone receptor status, histological subtype, Ki67 expression, lymph node status, race, and mutational profiles compared across different subtypes. Mutual exclusion of mutations in PIK3CA and TP53 was pronounced across different tumor grades. Cancer drivers specific to each subtype include TP53, PIK3CA, CDC27, CDH1, STK39, CBFB, MAP3K1, and GATA3, and mutations associated with patient survival were identified in our cohort. CONCLUSIONS: This extensive study has revealed tumor burden, driver genes, co-occurrence, mutual exclusivity, and survival effects of mutations on a US Midwestern breast cancer cohort, paving the way for developing personalized therapeutic strategies.


Asunto(s)
Neoplasias de la Mama , Femenino , Humanos , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Pronóstico , Mutación , Biomarcadores de Tumor/genética , Fosfatidilinositol 3-Quinasa Clase I/genética
3.
Int J Biostat ; 19(1): 1-19, 2023 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-35749155

RESUMEN

It has been reported that about half of biological discoveries are irreproducible. These irreproducible discoveries were partially attributed to poor statistical power. The poor powers are majorly owned to small sample sizes. However, in molecular biology and medicine, due to the limit of biological resources and budget, most molecular biological experiments have been conducted with small samples. Two-sample t-test controls bias by using a degree of freedom. However, this also implicates that t-test has low power in small samples. A discovery found with low statistical power suggests that it has a poor reproducibility. So, promotion of statistical power is not a feasible way to enhance reproducibility in small-sample experiments. An alternative way is to reduce type I error rate. For doing so, a so-called t α -test was developed. Both theoretical analysis and simulation study demonstrate that t α -test much outperforms t-test. However, t α -test is reduced to t-test when sample sizes are over 15. Large-scale simulation studies and real experiment data show that t α -test significantly reduced type I error rate compared to t-test and Wilcoxon test in small-sample experiments. t α -test had almost the same empirical power with t-test. Null p-value density distribution explains why t α -test had so lower type I error rate than t-test. One real experimental dataset provides a typical example to show that t α -test outperforms t-test and a microarray dataset showed that t α -test had the best performance among five statistical methods. In addition, the density distribution and probability cumulative function of t α -statistic were given in mathematics and the theoretical and observed distributions are well matched.


Asunto(s)
Modelos Estadísticos , Reproducibilidad de los Resultados , Simulación por Computador , Funciones de Verosimilitud , Tamaño de la Muestra
4.
Front Genet ; 14: 1295327, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38292437

RESUMEN

Haplotype-based association analysis has several advantages over single-SNP association analysis. However, to date all haplotype-disease associations have not excluded recombination interference among multiple loci and hence some results might be confounded by recombination interference. Association of sister haplotypes with a complex disease, based on recombination disequilibrium (RD) was presented. Sister haplotypes can be determined by translating notation of DNA base haplotypes to notation of genetic genotypes. Sister haplotypes provide haplotype pairs available for haplotype-disease association analysis. After performing RD tests in control and case cohorts, a two-by-two contingency table can be constructed using sister haplotype pair and case-control pair. With this standard two-by-two table, one can perform classical Chi-square test to find statistical haplotype-disease association. Applying this method to a haplotype dataset of Alzheimer disease (AD), association of sister haplotypes containing ApoE3/4 with risk for AD was identified under no RD. Haplotypes within gene IL-13 were not associated with risk for breast cancer in the case of no RD and no association of haplotypes in gene IL-17A with risk for coronary artery disease were detected without RD. The previously reported associations of haplotypes within these genes with risk for these diseases might be due to strong RD and/or inappropriate haplotype pairs.

5.
Sci Rep ; 12(1): 12833, 2022 07 27.
Artículo en Inglés | MEDLINE | ID: mdl-35896555

RESUMEN

Rapid development of transcriptome sequencing technologies has resulted in a data revolution and emergence of new approaches to study transcriptomic regulation such as alternative splicing, alternative polyadenylation, CRISPR knockout screening in addition to the regular gene expression. A full characterization of the transcriptional landscape of different groups of cells or tissues holds enormous potential for both basic science as well as clinical applications. Although many methods have been developed in the realm of differential gene expression analysis, they all geared towards a particular type of sequencing data and failed to perform well when applied in different types of transcriptomic data. To fill this gap, we offer a negative beta binomial t-test (NBBt-test). NBBt-test provides multiple functions to perform differential analyses of alternative splicing, polyadenylation, CRISPR knockout screening, and gene expression datasets. Both real and large-scale simulation data show superior performance of NBBt-test with higher efficiency, and lower type I error rate and FDR to identify differential isoforms and differentially expressed genes and differential CRISPR knockout screening genes with different sample sizes when compared against the current very popular statistical methods. An R-package implementing NBBt-test is available for downloading from CRAN ( https://CRAN.R-project.org/package=NBBttest ).


Asunto(s)
Empalme Alternativo , Transcriptoma , Perfilación de la Expresión Génica/métodos , RNA-Seq , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Transcriptoma/genética , Secuenciación del Exoma
6.
AIMS Microbiol ; 7(2): 216-237, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34250376

RESUMEN

Gastrointestinal microflora is a key component in the maintenance of health and longevity across many species. In humans and mice, nonpathogenic viruses present in the gastrointestinal tract enhance the effects of the native bacterial microbiota. However, it is unclear whether nonpathogenic gastrointestinal viruses, such as Nora virus that infects Drosophila melanogaster, lead to similar observations. Longevity analysis of Nora virus infected (NV+) and uninfected (NV-) D. melanogaster in relationship to presence (B+) or absence (B-) of the native gut bacteria using four different treatment groups, NV+/B+, NV+/B-, NV-/B+, and NV-/B-, was conducted. Data from the longevity results were tested via Kaplan-Meier analysis and demonstrated that Nora virus can be detrimental to the longevity of the organism, whereas bacterial presence is beneficial. These data led to the hypothesis that gastrointestinal bacterial composition varies from NV+ to NV- flies. To test this, NV+ and NV- virgin female flies were collected and aged for 4 days. Surface sterilization followed by dissections of the fat body and the gastrointestinal tract, divided into crop (foregut), midgut, and hindgut, were performed. Ribosomal 16S DNA samples were sequenced to determine the bacterial communities that comprise the microflora in the gastrointestinal tract of NV+ and NV- D. melanogaster. When analyzing operational taxonomic units (OTUs), the data demonstrate that the NV+ samples consist of more OTUs than NV- samples. The NV+ samples were both more rich and diverse in OTUs compared to NV-. When comparing whole body samples to specific organs and organ sections, the whole fly was more diverse in OTUs, whereas the crop was the most rich. These novel data are pertinent in describing where Nora virus infection may be occurring within the gastrointestinal tract, as well as continuing discussion between the relationship of persistent viral and bacterial interaction.

7.
Sci Rep ; 11(1): 3596, 2021 02 12.
Artículo en Inglés | MEDLINE | ID: mdl-33580150

RESUMEN

Lung cancer is the leading cause of death worldwide. Especially, non-small cell lung cancer (NSCLC) has higher mortality rate than the other cancers. The high mortality rate is partially due to lack of efficient biomarkers for detection, diagnosis and prognosis. To find high efficient biomarkers for clinical diagnosis of NSCLC patients, we used gene differential expression and gene ontology (GO) to define a set of 26 tumor suppressor (TS) genes. The 26 TS genes were down-expressed in tumor samples in cohorts GSE18842, GSE40419, and GSE21933 and at stages 2 and 3 in GSE19804, and 15 TS genes were significantly down-expressed in tumor samples of stage 1. We used S-scores and N-scores defined in correlation networks to evaluate positive and negative influences of these 26 TS genes on expression of other functional genes in the four independent cohorts and found that SASH1, STARD13, CBFA2T3 and RECK were strong TS genes that have strong accordant/discordant effects and network effects globally impacting the other genes in expression and hence can be used as specific biomarkers for diagnosis of NSCLC cancer. Weak TS genes EXT1, PTCH1, KLK10 and APC that are associated with a few genes in function or work in a special pathway were not detected to be differentially expressed and had very small S-scores and N-scores in all collected datasets and can be used as sensitive biomarkers for diagnosis of early cancer. Our findings are well consistent with functions of these TS genes. GSEA analysis found that these 26 TS genes as a gene set had high enrichment scores at stages 1, 2, 3 and all stages.


Asunto(s)
Biomarcadores de Tumor/genética , Carcinoma de Pulmón de Células no Pequeñas/diagnóstico , Carcinoma de Pulmón de Células no Pequeñas/genética , Genes Supresores de Tumor , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Proteína de la Poliposis Adenomatosa del Colon/genética , Carcinoma de Pulmón de Células no Pequeñas/patología , Estudios de Cohortes , Regulación hacia Abajo/genética , Detección Precoz del Cáncer , Expresión Génica/genética , Humanos , Calicreínas/genética , Neoplasias Pulmonares/patología , N-Acetilglucosaminiltransferasas/genética , Estadificación de Neoplasias , Receptor Patched-1/genética
8.
Genomics ; 112(6): 3943-3950, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32621856

RESUMEN

Following Hardy-Weinberg disequilibrium (HWD) occurring at a single locus and linkage disequilibrium (LD) between two loci in generations, we here proposed the third genetic disequilibrium in a population: recombination disequilibrium (RD). RD is a measurement of crossover interference among multiple loci in a random mating population. In natural populations besides recombination interference, RD may also be due to selection, mutation, gene conversion, drift and/or migration. Therefore, similarly to LD, RD will also reflect the history of natural selection and mutation. In breeding populations, RD purely results from recombination interference and hence can be used to build or evaluate and correct a linkage map. Practical examples from F2, testcross and human populations indeed demonstrate that RD is useful for measuring recombination interference between two short intervals and evaluating linkage maps. As with LD, RD will be important for studying genetic mapping, association of haplotypes with disease, plant breading and population history.


Asunto(s)
Recombinación Genética , Genoma Humano , Humanos , Desequilibrio de Ligamiento , Selección Genética
9.
Sci Rep ; 10(1): 9208, 2020 06 08.
Artículo en Inglés | MEDLINE | ID: mdl-32514076

RESUMEN

Selecting a set of valid genetic variants is critical for Mendelian randomization (MR) to correctly infer risk factors causing a disease. We here developed a method for selecting genetic variants as valid instrumental variables for inferring risk factors causing coronary artery disease (CAD). Using this method, we selected two sets of single-nucleotide-polymorphism (SNP) genetic variants (SNP338 and SNP363) associated with each of the three potential risk factors for CAD including low density lipoprotein cholesterol (LDL-c), high density lipoprotein cholesterol (HDL-c) and triglycerides (TG) from two independent GWAS datasets. We performed in-depth multivariate MR (MVMR) analyses and the results from both datasets consistently showed that LDL-c was strongly associated with increased risk for CAD (ß = 0.396,OR = 1.486 per 1 SD (equivalent to 38 mg/dL), 95CI = (1.38, 1.59) in SNP338; and ß = 0.424, OR = 1.528 per 1 SD, 95%CI = (1.42, 1.65) in SNP363); HDL-c was strongly associated with reduced risk for CAD (ß = -0.315, OR = 0.729 per 1 SD (equivalent to 16 mg/dL), 95CI = (0.68, 0.78) in SNP338; and ß = -0.319, OR = 0.726 per 1 SD, 95%CI = (0.66, 0.80), in SNP363). In case of TG, when using the full datasets, an increased risk for CAD (ß = 0.184, OR = 1.2 per 1 SD (equivalent to 89 mg/dL), 95%CI = (1.12, 1.28) in SNPP338; and ß = 0.207, OR = 1.222 per 1 SD, 95%CI = (1.10, 1.36) in SNP363) was observed, while using partial datasets that contain shared and unique SNPs showed that TG is not a risk factor for CAD. From these results, it can be inferred that TG itself is not a causal risk factor for CAD, but it's shown as a risk factor due to pleiotropic effects associated with LDL-c and HDL-c SNPs. Large-scale simulation experiments without pleiotropic effects also corroborated these results.


Asunto(s)
Enfermedad de la Arteria Coronaria/etiología , Enfermedad de la Arteria Coronaria/genética , Polimorfismo de Nucleótido Simple/genética , HDL-Colesterol/genética , LDL-Colesterol/genética , Femenino , Humanos , Masculino , Análisis de la Aleatorización Mendeliana , Factores de Riesgo , Triglicéridos/genética
10.
BMC Bioinformatics ; 18(Suppl 11): 404, 2017 Oct 03.
Artículo en Inglés | MEDLINE | ID: mdl-28984187

RESUMEN

BACKGROUND: Dominant markers in an F2 population or a hybrid population have much less linkage information in repulsion phase than in coupling phase. Linkage analysis produces two separate complementary marker linkage maps that have little use in disease association analysis and breeding. There is a need to develop efficient statistical methods and computational algorithms to construct or merge a complete linkage dominant marker maps. The key for doing so is to efficiently estimate recombination fractions between dominant markers in repulsion phases. RESULT: We proposed an expectation least square (ELS) algorithm and binomial analysis of three-point gametes (BAT) for estimating gamete frequencies from F2 dominant and codominant marker data, respectively. The results obtained from simulated and real genotype datasets showed that the ELS algorithm was able to accurately estimate frequencies of gametes and outperformed the EM algorithm in estimating recombination fractions between dominant loci and recovering true linkage maps of 6 dominant loci in coupling and unknown linkage phases. Our BAT method also had smaller variances in estimation of two-point recombination fractions than the EM algorithm. CONCLUSION: ELS is a powerful method for accurate estimation of gamete frequencies in dominant three-locus system in an F2 population and BAT is a computationally efficient and fast method for estimating frequencies of three-point codominant gametes.


Asunto(s)
Cruzamientos Genéticos , Recombinación Genética , Estadística como Asunto/métodos , Algoritmos , Animales , Simulación por Computador , Femenino , Genes Dominantes , Ligamiento Genético , Sitios Genéticos , Marcadores Genéticos , Análisis de los Mínimos Cuadrados , Masculino , Ratones , Modelos Genéticos
11.
Nucleic Acids Res ; 43(15): e96, 2015 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-25953852

RESUMEN

Most mammalian genes have mRNA variants due to alternative promoter usage, alternative splicing, and alternative cleavage and polyadenylation. Expression of alternative RNA isoforms has been found to be associated with tumorigenesis, proliferation and differentiation. Detection of condition-associated transcription variation requires association methods. Traditional association methods such as Pearson chi-square test and Fisher Exact test are single test methods and do not work on count data with replicates. Although the Cochran Mantel Haenszel (CMH) approach can handle replicated count data, our simulations showed that multiple CMH tests still had very low power. To identify condition-associated variation of transcription, we here proposed a ranking analysis of chi-squares (RAX2) for large-scale association analysis. RAX2 is a nonparametric method and has accurate and conservative estimation of FDR profile. Simulations demonstrated that RAX2 performs well in finding condition-associated transcription variants. We applied RAX2 to primary T-cell transcriptomic data and identified 1610 (16.3%) tags associated in transcription with immune stimulation at FDR < 0.05. Most of these tags also had differential expression. Analysis of two and three tags within genes revealed that under immune stimulation short RNA isoforms were preferably used.


Asunto(s)
Empalme Alternativo , Perfilación de la Expresión Génica/métodos , Poliadenilación , Linfocitos T CD4-Positivos/metabolismo , Línea Celular , Distribución de Chi-Cuadrado , Variación Genética , Genómica/métodos , Humanos , Isoformas de ARN/química , Isoformas de ARN/metabolismo , Estadísticas no Paramétricas , Transcripción Genética
12.
J Diabetes Investig ; 6(3): 295-301, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25969714

RESUMEN

AIMS/INTRODUCTION: Variants in cell cycle regulation genes, CDKAL1 and CDKN2A/2B, have been suggested to be associated with type 2 diabetes, and also play a role in insulin procession in non-diabetic European individuals. Rs7754580 in CDKAL1 and rs7020996 in CDKN2A/2B were found to be associated with gestational diabetes in Chinese individuals. In order to understand the metabolism mechanism of greatly upregulated maternal insulin signaling during pregnancy and the pathogenesis of gestational diabetes, we investigated the impact of rs7754580 and rs7020996 on gestational insulin regulation and procession. MATERIALS AND METHODS: We recruited 1,146 unrelated, non-diabetic, pregnant Han Chinese women (age 28.5 ± 4.1 years, body mass index 21.4 ± 2.6 kg/m(2)), and gave them oral glucose tolerance tests. The indices of insulin sensitivity, insulin disposition, insulin release and proinsulin to insulin conversion were calculated. Rs7754580 in the CDKAL1 gene and rs7020996 in the CDKN2A/2B gene were genotyped. Under an additive model, we analyzed the associations between the variants and gestational insulin indices using logistic regression. RESULTS: By adjusting for maternal age, body mass index and the related interactions, CDKAL1 rs7754580 risk allele C was detected to be associated with increased insulin sensitivity (P = 0.011), decreased insulin disposition (P = 0.0002) and 2-h proinsulin conversion (P = 0.017). CDKN2A/2B rs7020996 risk allele T was found to be related to decreased insulin sensitivity (P = 0.002) and increased insulin disposition (P = 0.0001). CONCLUSIONS: The study showed that cell cycle regulating genes might have a distinctive effect on gestational insulin sensitivity, ß-cell function and proinsulin conversion in pregnant Han Chinese women.

13.
Reprod Sci ; 22(11): 1421-8, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-25878199

RESUMEN

We investigate the impact of genetic variants on transiently upregulated gestational insulin signaling. We recruited 1152 unrelated nondiabetic pregnant Han Chinese women (age 28.5 ± 4.1 years; body mass index [BMI] 21.4 ± 2.6 kg/m(2)) and gave them oral glucose tolerance tests. Matsuda index of insulin sensitivity, homeostatic model assessment of insulin resistance, indices of insulin disposition, early-phase insulin release, fasting state, and 0 to 120 minute's proinsulin to insulin conversion were used to dissect insulin physiological characterization. Several variants related to ß-cell function were genotyped. The genetic impacts were analyzed using logistic regression under an additive model. By adjusting for maternal age, BMI, and the related interactions, the genetic variants in ABCC8, CDKAL1, CDKN2A, HNF1B, KCNJ11, and MTNR1B were detected to impact gestational insulin signaling through heterogeneous mechanisms; however, compared with that in nonpregnant metabolism, the genetic effects seem to be eminently and heavily influenced by maternal age and BMI, indicating possible particular mechanisms underlying gestational metabolism and diabetic pathogenesis.


Asunto(s)
Diabetes Mellitus Tipo 2/genética , Diabetes Gestacional/genética , Interacción Gen-Ambiente , Sitios Genéticos , Variación Genética , Células Secretoras de Insulina/metabolismo , Insulina/metabolismo , Transducción de Señal , Adulto , Pueblo Asiatico/genética , Glucemia/genética , Glucemia/metabolismo , Índice de Masa Corporal , China , Diabetes Mellitus Tipo 2/etnología , Diabetes Gestacional/etnología , Femenino , Marcadores Genéticos , Predisposición Genética a la Enfermedad , Prueba de Tolerancia a la Glucosa , Humanos , Insulina/sangre , Resistencia a la Insulina/etnología , Resistencia a la Insulina/genética , Modelos Logísticos , Edad Materna , Fenotipo , Embarazo , Adulto Joven
14.
PLoS One ; 10(4): e0123658, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25894390

RESUMEN

Next generation sequencing (NGS) is increasingly being used for transcriptome-wide analysis of differential gene expression. The NGS data are multidimensional count data. Therefore, most of the statistical methods developed well for microarray data analysis are not applicable to transcriptomic data. For this reason, a variety of new statistical methods based on count data of transcript reads have been correspondingly proposed. But due to high cost and limitation of biological resources, current NGS data are still generated from a few replicate libraries. Some of these existing methods do not always have desirable performances on count data. We here developed a very powerful and robust statistical method based on beta and binomial distributions. Our method (mBeta t-test) is specifically applicable to sequence count data from small samples. Both simulated and real transcriptomic data showed mBeta t-test significantly outperformed the existing top statistical methods chosen in all 12 given scenarios and performed with high efficiency and high stability. The differentially expressed genes found by our method from real transcriptomic data were validated by qPCR experiments. Our method shows high power in finding truly differential expression, conservatively estimating FDR and high stability in RNA sequence count data derived from small samples. Our method can also be extended to genome-wide detection of differential splicing events.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Estadística como Asunto , Animales , Simulación por Computador , Bases de Datos Genéticas , Humanos , Células Jurkat , Ratones , ARN Mensajero/genética , ARN Mensajero/metabolismo , Curva ROC , Reacción en Cadena en Tiempo Real de la Polimerasa , Reproducibilidad de los Resultados
15.
Bioinformatics ; 30(14): 2018-25, 2014 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-24632499

RESUMEN

UNLABELLED: The 'omic' data such as genomic data, transcriptomic data, proteomic data and single nucleotide polymorphism data have been rapidly growing. The omic data are large-scale and high-throughput data. Such data challenge traditional statistical methodologies and require multiple tests. Several multiple-testing procedures such as Bonferroni procedure, Benjamini-Hochberg (BH) procedure and Westfall-Young procedure have been developed, among which some control family-wise error rate and the others control false discovery rate (FDR). These procedures are valid in some cases and cannot be applied to all types of large-scale data. To address this statistically challenging problem in the analysis of the omic data, we propose a general method for generating a set of multiple-testing procedures. This method is based on the BH theorems. By choosing a C-value, one can realize a specific multiple-testing procedure. For example, by setting C = 1.22, our method produces the BH procedure. With C < 1.22, our method generates procedures of weakly controlling FDR, and with C > 1.22, the procedures strongly control FDR. Those with C = G (number of genes or tests) and C = 0 are, respectively, the Bonferroni procedure and the single-testing procedure. These are the two extreme procedures in this family. To let one choose an appropriate multiple-testing procedure in practice, we develop an algorithm by which FDR can be correctly and reliably estimated. Simulated results show that our method works well for an accurate estimation of FDR in various scenarios, and we illustrate the applications of our method with three real datasets. AVAILABILITY AND IMPLEMENTATION: Our program is implemented in Matlab and is available upon request.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Arabidopsis/genética , Interpretación Estadística de Datos , Humanos , Leucemia/genética , Análisis de Secuencia por Matrices de Oligonucleótidos
16.
PLoS One ; 7(11): e48619, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23144912

RESUMEN

DNA damage and repair are hallmarks of cellular responses to ionizing radiation. We hypothesized that monitoring the expression of DNA repair-associated genes would enhance the detection of individuals exposed to radiation versus other forms of physiological stress. We employed the human blood ex vivo radiation model to investigate the expression responses of DNA repair genes in repeated blood samples from healthy, non-smoking men and women exposed to 2 Gy of X-rays in the context of inflammation stress mimicked by the bacterial endotoxin lipopolysaccharide (LPS). Radiation exposure significantly modulated the transcript expression of 12 genes of 40 tested (2.2E-06

Asunto(s)
Biomarcadores/sangre , Ciclo Celular/efectos de la radiación , Reparación del ADN/efectos de la radiación , Inflamación/sangre , Estrés Fisiológico/efectos de la radiación , Adulto , Ciclo Celular/efectos de los fármacos , Ciclo Celular/genética , Reparación del ADN/efectos de los fármacos , Reparación del ADN/genética , Relación Dosis-Respuesta en la Radiación , Femenino , Regulación de la Expresión Génica/efectos de los fármacos , Regulación de la Expresión Génica/efectos de la radiación , Humanos , Inflamación/genética , Lipopolisacáridos/farmacología , Masculino , Persona de Mediana Edad , Fosforilación/efectos de los fármacos , Fosforilación/efectos de la radiación , Valor Predictivo de las Pruebas , ARN Mensajero/genética , ARN Mensajero/metabolismo , Reproducibilidad de los Resultados , Estrés Fisiológico/efectos de los fármacos , Estrés Fisiológico/genética , Factores de Tiempo , Transcripción Genética/efectos de los fármacos , Transcripción Genética/efectos de la radiación , Rayos X , Adulto Joven
17.
PLoS One ; 7(7): e40113, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22768333

RESUMEN

BACKGROUND: This study aimed to explore the association of MTNR1B genetic variants with gestational plasma glucose homeostasis in pregnant Chinese women. METHODS: A total of 1,985 pregnant Han Chinese women were recruited and evaluated for gestational glucose tolerance status with a two-step approach. The four MTNR1B variants rs10830963, rs1387153, rs1447352, and rs2166706 which had been reported to associate with glucose levels in general non-pregnant populations, were genotyped in these women. Using an additive model adjusted for age and body mass index (BMI), association of these variants with gestational fasting and postprandial plasma glucose (FPG and PPG) levels were analyzed by multiple linear regression; relative risk of developing gestational glucose intolerance was calculated by logistic regression. Hardy-Weinberg Equilibrium was tested by Chi-square and linkage disequilibrium (LD) between these variants was estimated by measures of D' and r(2). RESULTS: In the pregnant Chinese women, the MTNR1B variant rs10830963, rs1387153, rs2166706 and rs1447352 were shown to be associated with the increased 1 hour PPG level (p=8.04 × 10(-10), 5.49 × 10(-6), 1.89 × 10(-5) and 0.02, respectively). The alleles were also shown to be associated with gestational glucose intolerance with odds ratios (OR) of 1.64 (p=8.03 × 10(-11)), 1.43 (p=1.94 × 10(-6)), 1.38 (p=1.63 × 10(-5)) and 1.24 (p=0.007), respectively. MTNR1B rs1387153, rs2166706 were shown to be associated with gestational FPG levels (p=0.04). Our data also suggested that, the LD pattern of these variants in the studied women conformed to that in the general populations: rs1387153 and rs2166706 were in high LD, they linked moderately with rs10830963, but might not linked with rs1447352;rs10830963 might not link with rs1447352, either. In addition, the MTNR1B variants were not found to be associated with any other traits tested. CONCLUSIONS: The MTNR1B is likely to be involved in the regulation of glucose homeostasis during pregnancy.


Asunto(s)
Glucemia/genética , Intolerancia a la Glucosa/genética , Polimorfismo Genético , Complicaciones del Embarazo/genética , Segundo Trimestre del Embarazo/genética , Receptor de Melatonina MT1/genética , Adulto , Pueblo Asiatico , Glucemia/metabolismo , China , Femenino , Intolerancia a la Glucosa/metabolismo , Humanos , Embarazo , Complicaciones del Embarazo/metabolismo , Segundo Trimestre del Embarazo/sangre , Receptor de Melatonina MT1/metabolismo , Receptor de Melatonina MT2
18.
Genomics ; 98(5): 390-9, 2011 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-21741470

RESUMEN

Receiver operating characteristic (ROC) has been widely used to evaluate statistical methods, but a fatal problem is that ROC cannot evaluate estimation of the false discovery rate (FDR) of a statistical method and hence the area under of curve as a criterion cannot tell us if a statistical method is conservative. To address this issue, we propose an alternative criterion, work efficiency. Work efficiency is defined as the product of the power and degree of conservativeness of a statistical method. We conducted large-scale simulation comparisons among the optimizing discovery procedure (ODP), the Bonferroni (B-) procedure, Local FDR (Localfdr), ranking analysis of the F-statistics (RAF), the Benjamini-Hochberg (BH-) procedure, and significance analysis of microarray data (SAM). The results show that ODP, SAM, and the B-procedure perform with low efficiencies while the BH-procedure, RAF, and Localfdr work with higher efficiency. ODP and SAM have the same ROC curves but their efficiencies are significantly different.


Asunto(s)
Interpretación Estadística de Datos , Perfilación de la Expresión Génica/métodos , Modelos Estadísticos , Algoritmos , Área Bajo la Curva , Simulación por Computador , Regulación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos , Curva ROC
19.
Genomics ; 97(1): 58-68, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-20888900

RESUMEN

Development of statistical methods has become very necessary for large-scale correlation analysis in the current "omic" data. We propose ranking analysis of correlation coefficients (RAC) based on transforming correlation matrix into correlation vector and conducting a "locally ranking" strategy that significantly reduces computational complexity and load. RAC gives estimation of null correlation distribution and an estimator of false discovery rate (FDR) for finding gene pairs of being correlated in expressions obtained by comparison between the ranked observed correlation coefficients and the ranked estimated ones at a given threshold level. The simulated and real data show that the estimated null correlation distribution is exactly the same with the true one and the FDR estimator works well in various scenarios. By applying our RAC, in the null dataset, no gene pairs were found but, in the human cancer dataset, 837 gene pairs were found to have positively correlated expression variations at FDR≤5%. RAC performs well in multiple conditions (classes), each with 3 or more replicate observations.


Asunto(s)
Biología Computacional/métodos , Genes Relacionados con las Neoplasias , Genómica/métodos , Expresión Génica , Genoma Humano , Humanos , Neoplasias
20.
Bioinformation ; 7(8): 400-4, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-22347782

RESUMEN

Identification of genes differentially expressed across multiple conditions has become an important statistical problem in analyzing large-scale microarray data. Many statistical methods have been developed to address the challenging problem. Therefore, an extensive comparison among these statistical methods is extremely important for experimental scientists to choose a valid method for their data analysis. In this study, we conducted simulation studies to compare six statistical methods: the Bonferroni (B-) procedure, the Benjamini and Hochberg (BH-) procedure, the Local false discovery rate (Localfdr) method, the Optimal Discovery Procedure (ODP), the Ranking Analysis of F-statistics (RAF), and the Significant Analysis of Microarray data (SAM) in identifying differentially expressed genes. We demonstrated that the strength of treatment effect, the sample size, proportion of differentially expressed genes and variance of gene expression will significantly affect the performance of different methods. The simulated results show that ODP exhibits an extremely high power in indentifying differentially expressed genes, but significantly underestimates the False Discovery Rate (FDR) in all different data scenarios. The SAM has poor performance when the sample size is small, but is among the best-performing methods when the sample size is large. The B-procedure is stringent and thus has a low power in all data scenarios. Localfdr and RAF show comparable statistical behaviors with the BH-procedure with favorable power and conservativeness of FDR estimation. RAF performs the best when proportion of differentially expressed genes is small and treatment effect is weak, but Localfdr is better than RAF when proportion of differentially expressed genes is large.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...