Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Hum Genomics ; 17(1): 64, 2023 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-37454130

RESUMO

BACKGROUND: Female breast cancer remains the second leading cause of cancer-related death in the USA. The heterogeneity in the tumor morphology across the cohort and within patients can lead to unpredictable therapy resistance, metastasis, and clinical outcome. Hence, supplementing classic pathological markers with intrinsic tumor molecular markers can help identify novel molecular subtypes and the discovery of actionable biomarkers. METHODS: We conducted a large multi-institutional genomic analysis of paired normal and tumor samples from breast cancer patients to profile the complex genomic architecture of breast tumors. Long-term patient follow-up, therapeutic regimens, and treatment response for this cohort are documented using the Breast Cancer Collaborative Registry. The majority of the patients in this study were at tumor stage 1 (51.4%) and stage 2 (36.3%) at the time of diagnosis. Whole-exome sequencing data from 554 patients were used for mutational profiling and identifying cancer drivers. RESULTS: We identified 54 tumors having at least 1000 mutations and 185 tumors with less than 100 mutations. Tumor mutational burden varied across the classified subtypes, and the top ten mutated genes include MUC4, MUC16, PIK3CA, TTN, TP53, NBPF10, NBPF1, CDC27, AHNAK2, and MUC2. Patients were classified based on seven biological and tumor-specific parameters, including grade, stage, hormone receptor status, histological subtype, Ki67 expression, lymph node status, race, and mutational profiles compared across different subtypes. Mutual exclusion of mutations in PIK3CA and TP53 was pronounced across different tumor grades. Cancer drivers specific to each subtype include TP53, PIK3CA, CDC27, CDH1, STK39, CBFB, MAP3K1, and GATA3, and mutations associated with patient survival were identified in our cohort. CONCLUSIONS: This extensive study has revealed tumor burden, driver genes, co-occurrence, mutual exclusivity, and survival effects of mutations on a US Midwestern breast cancer cohort, paving the way for developing personalized therapeutic strategies.


Assuntos
Neoplasias da Mama , Feminino , Humanos , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Prognóstico , Mutação , Biomarcadores Tumorais/genética , Classe I de Fosfatidilinositol 3-Quinases/genética
2.
Genomics ; 112(6): 3943-3950, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32621856

RESUMO

Following Hardy-Weinberg disequilibrium (HWD) occurring at a single locus and linkage disequilibrium (LD) between two loci in generations, we here proposed the third genetic disequilibrium in a population: recombination disequilibrium (RD). RD is a measurement of crossover interference among multiple loci in a random mating population. In natural populations besides recombination interference, RD may also be due to selection, mutation, gene conversion, drift and/or migration. Therefore, similarly to LD, RD will also reflect the history of natural selection and mutation. In breeding populations, RD purely results from recombination interference and hence can be used to build or evaluate and correct a linkage map. Practical examples from F2, testcross and human populations indeed demonstrate that RD is useful for measuring recombination interference between two short intervals and evaluating linkage maps. As with LD, RD will be important for studying genetic mapping, association of haplotypes with disease, plant breading and population history.


Assuntos
Recombinação Genética , Genoma Humano , Humanos , Desequilíbrio de Ligação , Seleção Genética
3.
BMC Bioinformatics ; 18(Suppl 11): 404, 2017 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-28984187

RESUMO

BACKGROUND: Dominant markers in an F2 population or a hybrid population have much less linkage information in repulsion phase than in coupling phase. Linkage analysis produces two separate complementary marker linkage maps that have little use in disease association analysis and breeding. There is a need to develop efficient statistical methods and computational algorithms to construct or merge a complete linkage dominant marker maps. The key for doing so is to efficiently estimate recombination fractions between dominant markers in repulsion phases. RESULT: We proposed an expectation least square (ELS) algorithm and binomial analysis of three-point gametes (BAT) for estimating gamete frequencies from F2 dominant and codominant marker data, respectively. The results obtained from simulated and real genotype datasets showed that the ELS algorithm was able to accurately estimate frequencies of gametes and outperformed the EM algorithm in estimating recombination fractions between dominant loci and recovering true linkage maps of 6 dominant loci in coupling and unknown linkage phases. Our BAT method also had smaller variances in estimation of two-point recombination fractions than the EM algorithm. CONCLUSION: ELS is a powerful method for accurate estimation of gamete frequencies in dominant three-locus system in an F2 population and BAT is a computationally efficient and fast method for estimating frequencies of three-point codominant gametes.


Assuntos
Cruzamentos Genéticos , Recombinação Genética , Estatística como Assunto/métodos , Algoritmos , Animais , Simulação por Computador , Feminino , Genes Dominantes , Ligação Genética , Loci Gênicos , Marcadores Genéticos , Análise dos Mínimos Quadrados , Masculino , Camundongos , Modelos Genéticos
4.
Nucleic Acids Res ; 43(15): e96, 2015 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-25953852

RESUMO

Most mammalian genes have mRNA variants due to alternative promoter usage, alternative splicing, and alternative cleavage and polyadenylation. Expression of alternative RNA isoforms has been found to be associated with tumorigenesis, proliferation and differentiation. Detection of condition-associated transcription variation requires association methods. Traditional association methods such as Pearson chi-square test and Fisher Exact test are single test methods and do not work on count data with replicates. Although the Cochran Mantel Haenszel (CMH) approach can handle replicated count data, our simulations showed that multiple CMH tests still had very low power. To identify condition-associated variation of transcription, we here proposed a ranking analysis of chi-squares (RAX2) for large-scale association analysis. RAX2 is a nonparametric method and has accurate and conservative estimation of FDR profile. Simulations demonstrated that RAX2 performs well in finding condition-associated transcription variants. We applied RAX2 to primary T-cell transcriptomic data and identified 1610 (16.3%) tags associated in transcription with immune stimulation at FDR < 0.05. Most of these tags also had differential expression. Analysis of two and three tags within genes revealed that under immune stimulation short RNA isoforms were preferably used.


Assuntos
Processamento Alternativo , Perfilação da Expressão Gênica/métodos , Poliadenilação , Linfócitos T CD4-Positivos/metabolismo , Linhagem Celular , Distribuição de Qui-Quadrado , Variação Genética , Genômica/métodos , Humanos , Isoformas de RNA/química , Isoformas de RNA/metabolismo , Estatísticas não Paramétricas , Transcrição Gênica
5.
Bioinformatics ; 30(14): 2018-25, 2014 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-24632499

RESUMO

UNLABELLED: The 'omic' data such as genomic data, transcriptomic data, proteomic data and single nucleotide polymorphism data have been rapidly growing. The omic data are large-scale and high-throughput data. Such data challenge traditional statistical methodologies and require multiple tests. Several multiple-testing procedures such as Bonferroni procedure, Benjamini-Hochberg (BH) procedure and Westfall-Young procedure have been developed, among which some control family-wise error rate and the others control false discovery rate (FDR). These procedures are valid in some cases and cannot be applied to all types of large-scale data. To address this statistically challenging problem in the analysis of the omic data, we propose a general method for generating a set of multiple-testing procedures. This method is based on the BH theorems. By choosing a C-value, one can realize a specific multiple-testing procedure. For example, by setting C = 1.22, our method produces the BH procedure. With C < 1.22, our method generates procedures of weakly controlling FDR, and with C > 1.22, the procedures strongly control FDR. Those with C = G (number of genes or tests) and C = 0 are, respectively, the Bonferroni procedure and the single-testing procedure. These are the two extreme procedures in this family. To let one choose an appropriate multiple-testing procedure in practice, we develop an algorithm by which FDR can be correctly and reliably estimated. Simulated results show that our method works well for an accurate estimation of FDR in various scenarios, and we illustrate the applications of our method with three real datasets. AVAILABILITY AND IMPLEMENTATION: Our program is implemented in Matlab and is available upon request.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Arabidopsis/genética , Interpretação Estatística de Dados , Humanos , Leucemia/genética , Análise de Sequência com Séries de Oligonucleotídeos
6.
Int J Biostat ; 19(1): 1-19, 2023 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-35749155

RESUMO

It has been reported that about half of biological discoveries are irreproducible. These irreproducible discoveries were partially attributed to poor statistical power. The poor powers are majorly owned to small sample sizes. However, in molecular biology and medicine, due to the limit of biological resources and budget, most molecular biological experiments have been conducted with small samples. Two-sample t-test controls bias by using a degree of freedom. However, this also implicates that t-test has low power in small samples. A discovery found with low statistical power suggests that it has a poor reproducibility. So, promotion of statistical power is not a feasible way to enhance reproducibility in small-sample experiments. An alternative way is to reduce type I error rate. For doing so, a so-called t α -test was developed. Both theoretical analysis and simulation study demonstrate that t α -test much outperforms t-test. However, t α -test is reduced to t-test when sample sizes are over 15. Large-scale simulation studies and real experiment data show that t α -test significantly reduced type I error rate compared to t-test and Wilcoxon test in small-sample experiments. t α -test had almost the same empirical power with t-test. Null p-value density distribution explains why t α -test had so lower type I error rate than t-test. One real experimental dataset provides a typical example to show that t α -test outperforms t-test and a microarray dataset showed that t α -test had the best performance among five statistical methods. In addition, the density distribution and probability cumulative function of t α -statistic were given in mathematics and the theoretical and observed distributions are well matched.


Assuntos
Modelos Estatísticos , Reprodutibilidade dos Testes , Simulação por Computador , Funções Verossimilhança , Tamanho da Amostra
7.
Front Genet ; 14: 1295327, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38292437

RESUMO

Haplotype-based association analysis has several advantages over single-SNP association analysis. However, to date all haplotype-disease associations have not excluded recombination interference among multiple loci and hence some results might be confounded by recombination interference. Association of sister haplotypes with a complex disease, based on recombination disequilibrium (RD) was presented. Sister haplotypes can be determined by translating notation of DNA base haplotypes to notation of genetic genotypes. Sister haplotypes provide haplotype pairs available for haplotype-disease association analysis. After performing RD tests in control and case cohorts, a two-by-two contingency table can be constructed using sister haplotype pair and case-control pair. With this standard two-by-two table, one can perform classical Chi-square test to find statistical haplotype-disease association. Applying this method to a haplotype dataset of Alzheimer disease (AD), association of sister haplotypes containing ApoE3/4 with risk for AD was identified under no RD. Haplotypes within gene IL-13 were not associated with risk for breast cancer in the case of no RD and no association of haplotypes in gene IL-17A with risk for coronary artery disease were detected without RD. The previously reported associations of haplotypes within these genes with risk for these diseases might be due to strong RD and/or inappropriate haplotype pairs.

8.
J Exp Clin Cancer Res ; 42(1): 231, 2023 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-37670323

RESUMO

BACKGROUND: Acute lymphoblastic leukemia (ALL) is the most common pediatric hematological malignancy, with ETV6::RUNX1 being the most prevalent translocation whose exact pathogenesis remains unclear. IGF2BP1 (Insulin-like Growth Factor 2 Binding Protein 1) is an oncofetal RNA binding protein seen to be specifically overexpressed in ETV6::RUNX1 positive B-ALL. In this study, we have studied the mechanistic role of IGF2BP1 in leukemogenesis and its synergism with the ETV6::RUNX1 fusion protein. METHODS: Gene expression was analyzed from patient bone marrow RNA using Real Time RT-qPCR. Knockout cell lines were created using CRISPR-Cas9 based lentiviral vectors. RNA-Seq and RNA Immunoprecipitation sequencing (RIP-Seq) after IGF2BP1 pulldown were performed using the Illumina platform. Mouse experiments were done by retroviral overexpression of donor HSCs followed by lethal irradiation of recipients using a bone marrow transplant model. RESULTS: We observed specific overexpression of IGF2BP1 in ETV6::RUNX1 positive patients in an Indian cohort of pediatric ALL (n=167) with a positive correlation with prednisolone resistance. IGF2BP1 expression was essential for tumor cell survival in multiple ETV6::RUNX1 positive B-ALL cell lines. Integrated analysis of transcriptome sequencing after IGF2BP1 knockout and RIP-Seq after IGF2BP1 pulldown in Reh cell line revealed that IGF2BP1 targets encompass multiple pro-oncogenic signalling pathways including TNFα/NFκB and PI3K-Akt pathways. These pathways were also dysregulated in primary ETV6::RUNX1 positive B-ALL patient samples from our center as well as in public B-ALL patient datasets. IGF2BP1 showed binding and stabilization of the ETV6::RUNX1 fusion transcript itself. This positive feedback loop led to constitutive dysregulation of several oncogenic pathways. Enforced co-expression of ETV6::RUNX1 and IGF2BP1 in mouse bone marrow resulted in marrow hypercellularity which was characterized by multi-lineage progenitor expansion and strong Ki67 positivity. This pre-leukemic phenotype confirmed their synergism in-vivo. Clonal expansion of cells overexpressing both ETV6::RUNX1 and IGF2BP1 was clearly observed. These mice also developed splenomegaly indicating extramedullary hematopoiesis. CONCLUSION: Our data suggest a combined impact of the ETV6::RUNX1 fusion protein and RNA binding protein, IGF2BP1 in activating multiple oncogenic pathways in B-ALL which makes IGF2BP1 and these pathways as attractive therapeutic targets and biomarkers.


Assuntos
Leucemia-Linfoma Linfoblástico de Células Precursoras B , Leucemia-Linfoma Linfoblástico de Células Precursoras , Animais , Camundongos , Subunidade alfa 2 de Fator de Ligação ao Core , Camundongos Knockout , Fosfatidilinositol 3-Quinases , Variante 6 da Proteína do Fator de Translocação ETS
9.
Genomics ; 97(1): 58-68, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-20888900

RESUMO

Development of statistical methods has become very necessary for large-scale correlation analysis in the current "omic" data. We propose ranking analysis of correlation coefficients (RAC) based on transforming correlation matrix into correlation vector and conducting a "locally ranking" strategy that significantly reduces computational complexity and load. RAC gives estimation of null correlation distribution and an estimator of false discovery rate (FDR) for finding gene pairs of being correlated in expressions obtained by comparison between the ranked observed correlation coefficients and the ranked estimated ones at a given threshold level. The simulated and real data show that the estimated null correlation distribution is exactly the same with the true one and the FDR estimator works well in various scenarios. By applying our RAC, in the null dataset, no gene pairs were found but, in the human cancer dataset, 837 gene pairs were found to have positively correlated expression variations at FDR≤5%. RAC performs well in multiple conditions (classes), each with 3 or more replicate observations.


Assuntos
Biologia Computacional/métodos , Genes Neoplásicos , Genômica/métodos , Expressão Gênica , Genoma Humano , Humanos , Neoplasias
10.
Genomics ; 98(5): 390-9, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21741470

RESUMO

Receiver operating characteristic (ROC) has been widely used to evaluate statistical methods, but a fatal problem is that ROC cannot evaluate estimation of the false discovery rate (FDR) of a statistical method and hence the area under of curve as a criterion cannot tell us if a statistical method is conservative. To address this issue, we propose an alternative criterion, work efficiency. Work efficiency is defined as the product of the power and degree of conservativeness of a statistical method. We conducted large-scale simulation comparisons among the optimizing discovery procedure (ODP), the Bonferroni (B-) procedure, Local FDR (Localfdr), ranking analysis of the F-statistics (RAF), the Benjamini-Hochberg (BH-) procedure, and significance analysis of microarray data (SAM). The results show that ODP, SAM, and the B-procedure perform with low efficiencies while the BH-procedure, RAF, and Localfdr work with higher efficiency. ODP and SAM have the same ROC curves but their efficiencies are significantly different.


Assuntos
Interpretação Estatística de Dados , Perfilação da Expressão Gênica/métodos , Modelos Estatísticos , Algoritmos , Área Sob a Curva , Simulação por Computador , Regulação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Curva ROC
11.
Sci Rep ; 12(1): 12833, 2022 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-35896555

RESUMO

Rapid development of transcriptome sequencing technologies has resulted in a data revolution and emergence of new approaches to study transcriptomic regulation such as alternative splicing, alternative polyadenylation, CRISPR knockout screening in addition to the regular gene expression. A full characterization of the transcriptional landscape of different groups of cells or tissues holds enormous potential for both basic science as well as clinical applications. Although many methods have been developed in the realm of differential gene expression analysis, they all geared towards a particular type of sequencing data and failed to perform well when applied in different types of transcriptomic data. To fill this gap, we offer a negative beta binomial t-test (NBBt-test). NBBt-test provides multiple functions to perform differential analyses of alternative splicing, polyadenylation, CRISPR knockout screening, and gene expression datasets. Both real and large-scale simulation data show superior performance of NBBt-test with higher efficiency, and lower type I error rate and FDR to identify differential isoforms and differentially expressed genes and differential CRISPR knockout screening genes with different sample sizes when compared against the current very popular statistical methods. An R-package implementing NBBt-test is available for downloading from CRAN ( https://CRAN.R-project.org/package=NBBttest ).


Assuntos
Processamento Alternativo , Transcriptoma , Perfilação da Expressão Gênica/métodos , RNA-Seq , Análise de Sequência de RNA/métodos , Software , Transcriptoma/genética , Sequenciamento do Exoma
12.
AIMS Microbiol ; 7(2): 216-237, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34250376

RESUMO

Gastrointestinal microflora is a key component in the maintenance of health and longevity across many species. In humans and mice, nonpathogenic viruses present in the gastrointestinal tract enhance the effects of the native bacterial microbiota. However, it is unclear whether nonpathogenic gastrointestinal viruses, such as Nora virus that infects Drosophila melanogaster, lead to similar observations. Longevity analysis of Nora virus infected (NV+) and uninfected (NV-) D. melanogaster in relationship to presence (B+) or absence (B-) of the native gut bacteria using four different treatment groups, NV+/B+, NV+/B-, NV-/B+, and NV-/B-, was conducted. Data from the longevity results were tested via Kaplan-Meier analysis and demonstrated that Nora virus can be detrimental to the longevity of the organism, whereas bacterial presence is beneficial. These data led to the hypothesis that gastrointestinal bacterial composition varies from NV+ to NV- flies. To test this, NV+ and NV- virgin female flies were collected and aged for 4 days. Surface sterilization followed by dissections of the fat body and the gastrointestinal tract, divided into crop (foregut), midgut, and hindgut, were performed. Ribosomal 16S DNA samples were sequenced to determine the bacterial communities that comprise the microflora in the gastrointestinal tract of NV+ and NV- D. melanogaster. When analyzing operational taxonomic units (OTUs), the data demonstrate that the NV+ samples consist of more OTUs than NV- samples. The NV+ samples were both more rich and diverse in OTUs compared to NV-. When comparing whole body samples to specific organs and organ sections, the whole fly was more diverse in OTUs, whereas the crop was the most rich. These novel data are pertinent in describing where Nora virus infection may be occurring within the gastrointestinal tract, as well as continuing discussion between the relationship of persistent viral and bacterial interaction.

13.
Sci Rep ; 11(1): 3596, 2021 02 12.
Artigo em Inglês | MEDLINE | ID: mdl-33580150

RESUMO

Lung cancer is the leading cause of death worldwide. Especially, non-small cell lung cancer (NSCLC) has higher mortality rate than the other cancers. The high mortality rate is partially due to lack of efficient biomarkers for detection, diagnosis and prognosis. To find high efficient biomarkers for clinical diagnosis of NSCLC patients, we used gene differential expression and gene ontology (GO) to define a set of 26 tumor suppressor (TS) genes. The 26 TS genes were down-expressed in tumor samples in cohorts GSE18842, GSE40419, and GSE21933 and at stages 2 and 3 in GSE19804, and 15 TS genes were significantly down-expressed in tumor samples of stage 1. We used S-scores and N-scores defined in correlation networks to evaluate positive and negative influences of these 26 TS genes on expression of other functional genes in the four independent cohorts and found that SASH1, STARD13, CBFA2T3 and RECK were strong TS genes that have strong accordant/discordant effects and network effects globally impacting the other genes in expression and hence can be used as specific biomarkers for diagnosis of NSCLC cancer. Weak TS genes EXT1, PTCH1, KLK10 and APC that are associated with a few genes in function or work in a special pathway were not detected to be differentially expressed and had very small S-scores and N-scores in all collected datasets and can be used as sensitive biomarkers for diagnosis of early cancer. Our findings are well consistent with functions of these TS genes. GSEA analysis found that these 26 TS genes as a gene set had high enrichment scores at stages 1, 2, 3 and all stages.


Assuntos
Biomarcadores Tumorais/genética , Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Carcinoma Pulmonar de Células não Pequenas/genética , Genes Supressores de Tumor , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Proteína da Polipose Adenomatosa do Colo/genética , Carcinoma Pulmonar de Células não Pequenas/patologia , Estudos de Coortes , Regulação para Baixo/genética , Detecção Precoce de Câncer , Expressão Gênica/genética , Humanos , Calicreínas/genética , Neoplasias Pulmonares/patologia , N-Acetilglucosaminiltransferases/genética , Estadiamento de Neoplasias , Receptor Patched-1/genética
14.
Sci Rep ; 10(1): 9208, 2020 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-32514076

RESUMO

Selecting a set of valid genetic variants is critical for Mendelian randomization (MR) to correctly infer risk factors causing a disease. We here developed a method for selecting genetic variants as valid instrumental variables for inferring risk factors causing coronary artery disease (CAD). Using this method, we selected two sets of single-nucleotide-polymorphism (SNP) genetic variants (SNP338 and SNP363) associated with each of the three potential risk factors for CAD including low density lipoprotein cholesterol (LDL-c), high density lipoprotein cholesterol (HDL-c) and triglycerides (TG) from two independent GWAS datasets. We performed in-depth multivariate MR (MVMR) analyses and the results from both datasets consistently showed that LDL-c was strongly associated with increased risk for CAD (ß = 0.396,OR = 1.486 per 1 SD (equivalent to 38 mg/dL), 95CI = (1.38, 1.59) in SNP338; and ß = 0.424, OR = 1.528 per 1 SD, 95%CI = (1.42, 1.65) in SNP363); HDL-c was strongly associated with reduced risk for CAD (ß = -0.315, OR = 0.729 per 1 SD (equivalent to 16 mg/dL), 95CI = (0.68, 0.78) in SNP338; and ß = -0.319, OR = 0.726 per 1 SD, 95%CI = (0.66, 0.80), in SNP363). In case of TG, when using the full datasets, an increased risk for CAD (ß = 0.184, OR = 1.2 per 1 SD (equivalent to 89 mg/dL), 95%CI = (1.12, 1.28) in SNPP338; and ß = 0.207, OR = 1.222 per 1 SD, 95%CI = (1.10, 1.36) in SNP363) was observed, while using partial datasets that contain shared and unique SNPs showed that TG is not a risk factor for CAD. From these results, it can be inferred that TG itself is not a causal risk factor for CAD, but it's shown as a risk factor due to pleiotropic effects associated with LDL-c and HDL-c SNPs. Large-scale simulation experiments without pleiotropic effects also corroborated these results.


Assuntos
Doença da Artéria Coronariana/etiologia , Doença da Artéria Coronariana/genética , Polimorfismo de Nucleotídeo Único/genética , HDL-Colesterol/genética , LDL-Colesterol/genética , Feminino , Humanos , Masculino , Análise da Randomização Mendeliana , Fatores de Risco , Triglicerídeos/genética
15.
BMC Bioinformatics ; 9: 142, 2008 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-18325100

RESUMO

BACKGROUND: Microarray technology provides an efficient means for globally exploring physiological processes governed by the coordinated expression of multiple genes. However, identification of genes differentially expressed in microarray experiments is challenging because of their potentially high type I error rate. Methods for large-scale statistical analyses have been developed but most of them are applicable to two-sample or two-condition data. RESULTS: We developed a large-scale multiple-group F-test based method, named ranking analysis of F-statistics (RAF), which is an extension of ranking analysis of microarray data (RAM) for two-sample t-test. In this method, we proposed a novel random splitting approach to generate the null distribution instead of using permutation, which may not be appropriate for microarray data. We also implemented a two-simulation strategy to estimate the false discovery rate. Simulation results suggested that it has higher efficiency in finding differentially expressed genes among multiple classes at a lower false discovery rate than some commonly used methods. By applying our method to the experimental data, we found 107 genes having significantly differential expressions among 4 treatments at <0.7% FDR, of which 31 belong to the expressed sequence tags (ESTs), 76 are unique genes who have known functions in the brain or central nervous system and belong to six major functional groups. CONCLUSION: Our method is suitable to identify differentially expressed genes among multiple groups, in particular, when sample size is small.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Modelos Genéticos , Família Multigênica/fisiologia , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteoma/metabolismo , Simulação por Computador , Interpretação Estatística de Dados , Modelos Estatísticos
16.
Genetics ; 175(2): 923-31, 2007 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-17057236

RESUMO

Although most high-density linkage maps have been constructed from codominant markers such as single-nucleotide polymorphisms (SNPs) and microsatellites due to their high linkage information, dominant markers can be expected to be even more significant as proteomic technique becomes widely applicable to generate protein polymorphism data from large samples. However, for dominant markers, two possible linkage phases between a pair of markers complicate the estimation of recombination fractions between markers and consequently the construction of linkage maps. The low linkage information of the repulsion phase and high linkage information of coupling phase have led geneticists to construct two separate but related linkage maps. To circumvent this problem, we proposed a new method for estimating the recombination fraction between markers, which greatly improves the accuracy of estimation through distinction between the coupling phase and the repulsion phase of the linked loci. The results obtained from both real and simulated F2 dominant marker data indicate that the recombination fractions estimated by the new method contain a large amount of linkage information for constructing a complete linkage map. In addition, the new method is also applicable to data with mixed types of markers (dominant and codominant) with unknown linkage phase.


Assuntos
Mapeamento Cromossômico/métodos , Genes Dominantes , Marcadores Genéticos , Recombinação Genética/genética , Algoritmos , Animais , Simulação por Computador , Células Germinativas , Camundongos , Modelos Genéticos
17.
Am J Hypertens ; 21(9): 1028-33, 2008 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-18583984

RESUMO

BACKGROUND: Prostasin, a serine protease, is suggested to be a novel mechanism regulating the epithelial sodium channel (ENaC) expressed in the distal nephron. This study aimed to evaluate whether the human prostasin gene is a novel candidate gene underlying blood pressure (BP) elevation. METHODS: In a sample of healthy African-American (AA) and European-American (EA) twin subjects aged 17.6 +/- 3.3 years (n = 920, 45% AAs), race-specific tagging single-nucleotide polymorphisms (tSNPs) were identified to tag all the available SNPs +/- 2 kb up- and downstream of the prostasin gene from HapMap at r2 of 0.8-1.0. Selection yielded four tSNPs in AAs and one in EAs, with one tSNP (rs12597511: C to T) present in both AAs and EAs. RESULTS: For rs12597511, CT and TT genotypes exhibited higher systolic BP (SBP) than CC genotype (115.9 +/- 1.1 mm Hg vs. 113.7 +/- 0.6 mm Hg, P = 0.025 (AAs); and 110.7 +/- 0.5 mm Hg vs. 109.6 +/- 0.6 mm Hg, P = 0.115 (EAs)). CT and TT genotypes compared with CC genotype showed a significant increase in diastolic BP (DBP) in both racial groups (62.5 +/- 0.7 mm Hg vs. 60.4 +/- 0.4 mm Hg, P = 0.003 (AAs); and 58.2 +/- 0.3 mm Hg vs. 56.7 +/- 0.4 mm Hg, P = 0.007 (EAs)). Furthermore, there was an increase in radial pulse wave velocity (PWV) in subjects with CT and TT genotype as compared with those with CC genotype (6.5 +/- 0.1 vs. 6.1 +/- 0.1 m/s, P < 0.0001) (EAs); and 6.7 +/- 0.1 vs. 6.6 +/- 0.1 m/s, P = 0.354 (AAs)). Analyses combining AAs and EAs consistently demonstrated a statistical significance of rs12597511 on all the phenotypes including SBP/DBP and PWV. CONCLUSION: Genetic variation of the prostasin gene may be implicated in the development of hypertension in youths..


Assuntos
Hipertensão/genética , Serina Endopeptidases/genética , Adolescente , Alelos , Feminino , Variação Genética , Genótipo , Humanos , Masculino , Grupos Raciais
18.
Twin Res Hum Genet ; 11(5): 517-23, 2008 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-18828734

RESUMO

BACKGROUND: Our research group recently reported that aorto-radial (radial) and aorto-dorsalis-pedis (foot) pulse wave velocity (PWV) as proxies of arterial stiffness are substantially heritable in healthy youth. This article aimed at uncovering the genetic contributions of adhesion molecules, key members in the inflammatory process, to PWV in these young individuals. METHODS: Radial and foot PWV were noninvasively measured with applanation tonometry in 702 black and white subjects (42% blacks, mean age 17.7 +/- 3.3 years) from the Georgia Cardio vascular Twin Study. Eight functional polymorphisms from genes for E-selectin (SELE), P-selectin (SELP), intercellular adhesion molecules-1 (ICAM1), and vascular cell adhesion molecules-1 (VCAM1) were genotyped. RESULTS: Youth with Ser290Asn or Asn290Asn genotype (SELP) compared to those with Ser290Ser had an increase in both radial and foot PWV (6.61 +/- 0.07 vs. 6.41 +/- 0.05 m/s, p = .026; 7.22 +/- 0.05 vs. 7.04 +/- 0.04 m/s, p = .007). TT homozygotes of rs2244529 (SELP) had higher foot PWV (7.28 +/- 0.07 vs. 7.06 +/- 0.03 m/s, p = .002) than CT heterozygotes and CC homozygotes. There appeared to be a decrease in foot PWV in youth with the 241Arg allele (ICAM1) as compared to those without (6.96 +/- 0.08 vs. 7.14 +/- 0.03 m/s, p = .005). For the Asp693Asp (C to T) polymorphism (VCAM1), CC genotype had higher foot PWV than CT and TT genotypes (7.18 +/- 0.04 vs. 6.95 +/- 0.06 m/s, p < .0001). There was an epistatic interaction between Ser290Asn, Gly241Arg, and Asp693Asp on foot PWV (p = .017), explaining 3.6% variance of the foot PWV. CONCLUSION: Genetic variation of adhesion molecules may be implicated in the development of arterial stiffness. Screening for adhesion molecule polymorphisms may help identify high-risk youth.


Assuntos
Velocidade do Fluxo Sanguíneo/genética , Moléculas de Adesão Celular/genética , Gêmeos Dizigóticos/genética , Gêmeos Monozigóticos/genética , Adolescente , Alelos , Feminino , Frequência do Gene , Predisposição Genética para Doença , Genótipo , Humanos , Masculino , Polimorfismo Genético , Gêmeos Dizigóticos/etnologia , Gêmeos Monozigóticos/etnologia , Adulto Jovem
19.
Genetics ; 173(4): 2383-90, 2006 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-16783016

RESUMO

The goal of linkage mapping is to find the true order of loci from a chromosome. Since the number of possible orders is large even for a modest number of loci, the problem of finding the optimal solution is known as a NP-hard problem or traveling salesman problem (TSP). Although a number of algorithms are available, many either are low in the accuracy of recovering the true order of loci or require tremendous amounts of computational resources, thus making them difficult to use for reconstructing a large-scale map. We developed in this article a novel method called unidirectional growth (UG) to help solve this problem. The UG algorithm sequentially constructs the linkage map on the basis of novel results about additive distance. It not only is fast but also has a very high accuracy in recovering the true order of loci according to our simulation studies. Since the UG method requires n-1 cycles to estimate the ordering of n loci, it is particularly useful for estimating linkage maps consisting of hundreds or even thousands of linked codominant loci on a chromosome.


Assuntos
Algoritmos , Mapeamento Cromossômico , Cromossomos/genética , Ligação Genética , Modelos Genéticos , Locos de Características Quantitativas/genética , Mapeamento Cromossômico/métodos , Simulação por Computador , Genes Dominantes/genética
20.
Yi Chuan Xue Bao ; 33(12): 1132-40, 2006 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-17185174

RESUMO

Because of the high operation costs involved in microarray experiments, the determination of the number of replicates required to detect a gene significantly differentially expressed in a given multiple-testing procedure is of considerable significance. Calculation of power/replicate numbers required in multiple-testing procedures provides design guidance for microarray experiments. Based on this model and by choice of a multiple-testing procedure, expression noises based on permutation resampling can be considerably minimized. The method for mixture distribution model is suitable to various microarray data types obtained from single noise sources, or from multiple noise sources. By using the biological replicate number required in microarray experiments for a given power or by determining the power required to detect a gene significantly differentially expressed, given the sample size, or the best multiple-testing method can be chosen. As an example, a single-distribution model of t-statistic was fitted to an observed microarray dataset of 3 000 genes responsive to stroke in rat, and then used to calculate powers of four popular multiple-testing procedures to detect a gene of an expression change D. The results show that the B-procedure had the lowest power to detect a gene of small change among the multiple-testing procedures, whereas the BH-procedure had the highest power. However, all multiple-testing procedures had the same power to identify a gene having the largest change. Similar to a single test, the power of the BH-procedure to detect a small change does not vary as the number of genes increases, but powers of the other three multiple-testing procedures decline as the number of genes increases.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Modelos Genéticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA