Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Pharm Stat ; 17(2): 105-116, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29297979

RESUMO

For survival endpoints in subgroup selection, a score conversion model is often used to convert the set of biomarkers for each patient into a univariate score and using the median of the univariate scores to divide the patients into biomarker-positive and biomarker-negative subgroups. However, this may lead to bias in patient subgroup identification regarding the 2 issues: (1) treatment is equally effective for all patients and/or there is no subgroup difference; (2) the median value of the univariate scores as a cutoff may be inappropriate if the sizes of the 2 subgroups are differ substantially. We utilize a univariate composite score method to convert the set of patient's candidate biomarkers to a univariate response score. We propose applying the likelihood ratio test (LRT) to assess homogeneity of the sampled patients to address the first issue. In the context of identification of the subgroup of responders in adaptive design to demonstrate improvement of treatment efficacy (adaptive power), we suggest that subgroup selection is carried out if the LRT is significant. For the second issue, we utilize a likelihood-based change-point algorithm to find an optimal cutoff. Our simulation study shows that type I error generally is controlled, while the overall adaptive power to detect treatment effects sacrifices approximately 4.5% for the simulation designs considered by performing the LRT; furthermore, the change-point algorithm outperforms the median cutoff considerably when the subgroup sizes differ substantially.


Assuntos
Seleção de Pacientes , Medicina de Precisão/mortalidade , Medicina de Precisão/métodos , Bases de Dados Factuais/tendências , Humanos , Funções Verossimilhança , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/mortalidade , Neoplasias Pulmonares/terapia , Medicina de Precisão/tendências , Taxa de Sobrevida/tendências , Resultado do Tratamento
2.
BMC Bioinformatics ; 17: 74, 2016 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-26852017

RESUMO

BACKGROUND: Gene set analysis (GSA) aims to evaluate the association between the expression of biological pathways, or a priori defined gene sets, and a particular phenotype. Numerous GSA methods have been proposed to assess the enrichment of sets of genes. However, most methods are developed with respect to a specific alternative scenario, such as a differential mean pattern or a differential coexpression. Moreover, a very limited number of methods can handle either binary, categorical, or continuous phenotypes. In this paper, we develop two novel GSA tests, called SDRs, based on the sufficient dimension reduction technique, which aims to capture sufficient information about the relationship between genes and the phenotype. The advantages of our proposed methods are that they allow for categorical and continuous phenotypes, and they are also able to identify a variety of enriched gene sets. RESULTS: Through simulation studies, we compared the type I error and power of SDRs with existing GSA methods for binary, triple, and continuous phenotypes. We found that SDR methods adequately control the type I error rate at the pre-specified nominal level, and they have a satisfactory power to detect gene sets with differential coexpression and to test non-linear associations between gene sets and a continuous phenotype. In addition, the SDR methods were compared with seven widely-used GSA methods using two real microarray datasets for illustration. CONCLUSIONS: We concluded that the SDR methods outperform the others because of their flexibility with regard to handling different kinds of phenotypes and their power to detect a wide range of alternative scenarios. Our real data analysis highlights the differences between GSA methods for detecting enriched gene sets.


Assuntos
Biologia Computacional/métodos , Simulação por Computador , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Neoplasias da Próstata/genética , Proteína Supressora de Tumor p53/genética , Negro ou Afro-Americano/genética , Genótipo , Humanos , Masculino , Fenótipo , Neoplasias da Próstata/etnologia
3.
Stat Med ; 33(19): 3300-17, 2014 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-24771655

RESUMO

The approval of generic drugs requires the evidence of average bioequivalence (ABE) on both the area under the concentration-time curve and the peak concentration Cmax . The bioequivalence (BE) hypothesis can be decomposed into the non-inferiority (NI) and non-superiority (NS) hypothesis. Most of regulatory agencies employ the two one-sided tests (TOST) procedure to test ABE between two formulations. As it is based on the intersection-union principle, the TOST procedure is conservative in terms of the type I error rate. However, the type II error rate is the sum of the type II error rates with respect to each null hypothesis of NI and NS hypotheses. When the difference in population means between two treatments is not 0, no close-form solution for the sample size for the BE hypothesis is available. Current methods provide the sample sizes with either insufficient power or unnecessarily excessive power. We suggest an approximate method for sample size determination, which can also provide the type II rate for each of NI and NS hypotheses. In addition, the proposed method is flexible to allow extension from one pharmacokinetic (PK) response to determination of the sample size required for multiple PK responses. We report the results of a numerical study. An R code is provided to calculate the sample size for BE testing based on the proposed methods.


Assuntos
Equivalência Terapêutica , Bioestatística , Química Farmacêutica , Estudos Cross-Over , Preparações de Ação Retardada , Medicamentos Genéricos/farmacocinética , Humanos , Modelos Estatísticos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Tamanho da Amostra , Teofilina/administração & dosagem , Teofilina/farmacocinética
4.
BMC Infect Dis ; 14: 80, 2014 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-24520993

RESUMO

BACKGROUND: Studies indicate that asymptomatic infections do indeed occur frequently for both seasonal and pandemic influenza, accounting for about one-third of influenza infections. Studies carried out during the 2009 pH1N1 pandemic have found significant antibody response against seasonal H1N1 and H3N2 vaccine strains in schoolchildren receiving only pandemic H1N1 monovalent vaccine, yet reported either no symptoms or only mild symptoms. METHODS: Serum samples of 255 schoolchildren, who had not received vaccination and had pre-season HI Ab serotiters <40, were collected from urban, rural areas and an isolated island in Taiwan during the 2005-2006 influenza season. Their hemagglutination inhibition antibody (HI Ab) serotiters against the 2005 A/New Caledonia/20/99 (H1N1) vaccine strain at pre-season and post-season were measured to determine the symptoms with the highest correlation with infection, as defined by 4-fold rise in HI titer. We estimate the asymptomatic ratio, or the proportion of asymptomatic infections, for schoolchildren during the 2005-6 influenza season when this vaccine strain was found to be antigenically related to the circulating H1N1 strain. RESULTS: Fever has the highest correlation with the 2005-06 seasonal influenza A(H1N1) infection, followed by headache, cough, vomiting, and sore throat. Asymptomatic ratio for the schoolchildren is found to range between 55.6% (95% CI: 44.7-66.4)-77.9% (68.8-87.0) using different sets of predictive symptoms. Moreover, the asymptomatic ratio was 66.9% (56.6-77.2) when using US-CDC criterion of fever + (cough/sore throat), and 73.0 (63.3-82.8) when under Taiwan CDC definition of Fever + (cough or sore throat or nose) + ( headache or pain or fatigue). CONCLUSIONS: Asymptomatic ratio for children is found to be substantially higher than that of the general population in literature. In providing reasonable quantification of the asymptomatic infected children spreading pathogens to others in a seasonal epidemic or a pandemic, our estimates of symptomatic ratio of infected children has important clinical and public health implications.


Assuntos
Vírus da Influenza A Subtipo H1N1 , Influenza Humana/epidemiologia , Anticorpos Antivirais/sangue , Criança , Controle de Doenças Transmissíveis , Tosse/epidemiologia , Epidemias , Feminino , Febre/epidemiologia , Testes de Inibição da Hemaglutinação , Humanos , Programas de Imunização , Vacinas contra Influenza/uso terapêutico , Modelos Logísticos , Masculino , Análise Multivariada , Curva ROC , População Rural , Estações do Ano , Taiwan , População Urbana
5.
Genes (Basel) ; 14(3)2023 02 25.
Artigo em Inglês | MEDLINE | ID: mdl-36980852

RESUMO

For medical data mining, the development of a class prediction model has been widely used to deal with various kinds of data classification problems. Classification models especially for high-dimensional gene expression datasets have attracted many researchers in order to identify marker genes for distinguishing any type of cancer cells from their corresponding normal cells. However, skewed class distributions often occur in the medical datasets in which at least one of the classes has a relatively small number of observations. A classifier induced by such an imbalanced dataset typically has a high accuracy for the majority class and poor prediction for the minority class. In this study, we focus on an SVM classifier with a Gaussian radial basis kernel for a binary classification problem. In order to take advantage of an SVM and to achieve the best generalization ability for improving the classification performance, we will address two important problems: the class imbalance and parameter selection during SVM parameter optimization. First of all, we proposed a novel adjustment method called b-SVM, for adjusting the cutoff threshold of the SVM. Second, we proposed a fast and simple approach, called the Min-max gamma selection, to optimize the model parameters of SVMs without carrying out an extensive k-fold cross validation. An extensive comparison with a standard SVM and well-known existing methods are carried out to evaluate the performance of our proposed algorithms using simulated and real datasets. The experimental results show that our proposed algorithms outperform the over-sampling techniques and existing SVM-based solutions. This study also shows that the proposed Min-max gamma selection is at least 10 times faster than the cross-validation selection based on the average running time on six real datasets.


Assuntos
Algoritmos , Máquina de Vetores de Suporte , Mineração de Dados , Projetos de Pesquisa
6.
BMC Nephrol ; 13: 97, 2012 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-22935561

RESUMO

BACKGROUND: The status of immunocompromised patients is well recognized in end stage renal disease (ESRD). As described recently, this acquired immune dysfunction in the uremic milieu may be one of the main pathogenic factors for mortality in ESRD. The aim of this study was to determine the relationship between the immune response following a hepatitis B vaccination (HBV vaccination) and the survival of maintenance dialysis patients. METHODS: A total of 156 patients (103 on hemodialysis and 53 on continuous ambulatory peritoneal dialysis) were recruited. After receiving a full dose of the HBV vaccination, all patients were followed up for to 5 years to evaluate the association of patient survival, cause of mortality, and immune response. RESULTS: The response rate to the hepatitis B vaccination was 70.5%. There was no significant association between the immune response and the 5-year survival rate (p =0.600) or between the post-vaccination anti-HBs titers and the 5-year survival rate (p = 0.201). The logistic prediction model with the coefficient as non-response following HBV vaccination, diabetes mellitus, old age, and low albumin level could significantly predict infection-cause mortality (sensitivity = 0.842, specificity = 0.937). CONCLUSION: There was no significant association between the immune response to HBV vaccination and the 5-year survival rate. However, non-response following HBV vaccination might be associated with infection-cause mortality in dialysis patients.


Assuntos
Vacinas contra Hepatite B/imunologia , Vacinas contra Hepatite B/uso terapêutico , Doenças do Sistema Imunitário/mortalidade , Diálise Renal/mortalidade , Insuficiência Renal Crônica/mortalidade , Insuficiência Renal Crônica/reabilitação , Vacinação/mortalidade , Causalidade , Comorbidade , Feminino , Humanos , Incidência , Masculino , Pessoa de Meia-Idade , Fatores de Risco , Análise de Sobrevida , Taxa de Sobrevida , Taiwan/epidemiologia , Resultado do Tratamento
7.
Brief Bioinform ; 10(5): 537-46, 2009 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-19346320

RESUMO

Recent development of high-throughput technology has accelerated interest in the development of molecular biomarker classifiers for safety assessment, disease diagnostics and prognostics, and prediction of response for patient assignment. This article reviews and evaluates some important aspects and key issues in the development of biomarker classifiers. Development of a biomarker classifier for high-throughput data involves two components: (i) model building and (ii) performance assessment. This article focuses on feature selection in model building and cross validation for performance assessment. A 'frequency' approach to feature selection is presented and compared to the 'conventional' approach in terms of the predictive accuracy and stability of the selected feature set. The two approaches are compared based on four biomarker classifiers, each with a different feature selection method and well-known classification algorithm. In each of the four classifiers the feature predictor set selected by the frequency approach is more stable than the feature set selected by the conventional approach.


Assuntos
Algoritmos , Biomarcadores , Biologia Computacional , Modelos Biológicos , Biologia Computacional/classificação , Biologia Computacional/métodos , Bases de Dados Genéticas , Matemática , Reprodutibilidade dos Testes
8.
J Clin Psychopharmacol ; 31(3): 369-74, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21508860

RESUMO

BACKGROUND: Several lines of evidence implicate glutamatergic neurotransmission in the pathophysiology of obsessive compulsive disorder (OCD). Sarcosine is an endogenous antagonist of glycine transporter-1. By blocking glycine uptake, sarcosine may increase the availability of synaptic glycine and enhance N-methyl-d-aspartate (NMDA) subtype glutamatergic neurotransmission. In this 10-week open-label trial, we examined the potential benefit of sarcosine treatment in OCD patients. METHOD: Twenty-six outpatients with OCD and baseline Yale-Brown Obsessive Compulsive Scale (YBOCS) scores higher than 16 were enrolled. Drug-naive subjects (group 1, n = 8) and those who had discontinued serotonin reuptake inhibitors for at least 8 weeks at study entry (group 2, n = 6) received sarcosine monotherapy. The other subjects (group 3, n = 12) received sarcosine as adjunctive treatment. A flexible dosage schedule of sarcosine 500 to 2000 mg/d was applied. The primary outcome measures were Y-BOCS and Hamilton Anxiety Inventory, rated at weeks 0, 2, 4, 6, 8, and 10. Results were analyzed by repeated-measures analysis of variance. RESULTS: Data of 25 subjects were eligible for analysis. The mean ± SD Y-BOCS scores decreased from 27.6 ± 5.8 to 22.7 ± 8.7, indicating a mean decrease of 19.8% ± 21.7% (P = 0.0035). Eight (32%) subjects were regarded as responders with greater than 35% reduction of Y-BOCS scores. Five of the responders achieved the good response early by week 4. Although not statistically significant, drug-naive (group 1) subjects had more profound and sustained improvement and more responders than the subjects who had received treatment before (groups 2 and 3). Sarcosine was tolerated well; only one subject withdrew owing to transient headache. CONCLUSION: Sarcosine treatment can achieve a fast therapeutic effect in some OCD patients, particularly those who are treatment naive. The study supports the glycine transporter-1 as a novel target for developing new OCD treatment. Large-series placebo-controlled, double-blind studies are recommended.


Assuntos
Proteínas da Membrana Plasmática de Transporte de Glicina/antagonistas & inibidores , Transtorno Obsessivo-Compulsivo/tratamento farmacológico , Psicotrópicos/uso terapêutico , Sarcosina/uso terapêutico , Adulto , Quimioterapia Combinada/estatística & dados numéricos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Escalas de Graduação Psiquiátrica , Psicotrópicos/administração & dosagem , Sarcosina/administração & dosagem
9.
Bioinformatics ; 25(7): 897-903, 2009 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-19254923

RESUMO

MOTIVATION: Gene class testing (GCT) or gene set analysis (GSA) is a statistical approach to determine whether some functionally predefined sets of genes express differently under different experimental conditions. Shortcomings of the Fisher's exact test for the overrepresentation analysis are illustrated by an example. Most alternative GSA methods are developed for data collected from two experimental conditions, and most is based on a univariate gene-by-gene test statistic or assume independence among genes in the gene set. A multivariate analysis of variance (MANOVA) approach is proposed for studies with two or more experimental conditions. RESULTS: When the number of genes in the gene set is greater than the number of samples, the sample covariance matrix is singular and ill-condition. The use of standard multivariate methods can result in biases in the analysis. The proposed MANOVA test uses a shrinkage covariance matrix estimator for the sample covariance matrix. The MANOVA test and six other GSA published methods, principal component analysis, SAM-GS, analysis of covariance, Global, GSEA and MaxMean, are evaluated using simulation. The MANOVA test appears to perform the best in terms of control of type I error and power under the models considered in the simulation. Several publicly available microarray datasets under two and three experimental conditions are analyzed for illustrations of GSA. Most methods, except for GSEA and MaxMean, generally are comparable in terms of power of identification of significant gene sets. AVAILABILITY: A free R-code to perform MANOVA test is available at http://mail.cmu.edu.tw/~catsai/research.htm. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Variância , Simulação por Computador , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos
10.
Bioinformatics ; 23(16): 2104-12, 2007 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-17553853

RESUMO

MOTIVATION: Gene class testing (GCT) is a statistical approach to determine whether some functionally predefined classes of genes express differently under two experimental conditions. GCT computes the P-value of each gene class based on the null distribution and the gene classes are ranked for importance in accordance with their P-values. Currently, two null hypotheses have been considered: the Q1 hypothesis tests the relative strength of association with the phenotypes among the gene classes, and the Q2 hypothesis assesses the statistical significance. These two hypotheses are related but not equivalent. METHOD: We investigate three one-sided and two two-sided test statistics under Q1 and Q2. The null distributions of gene classes under Q1 are generated by permuting gene labels and the null distributions under Q2 are generated by permuting samples. RESULTS: We applied the five statistics to a diabetes dataset with 143 gene classes and to a breast cancer dataset with 508 GO (Gene Ontology) terms. In each statistic, the null distributions of the gene classes under Q1 are different from those under Q2 in both datasets, and their rankings can be different too. We clarify the one-sided and two-sided hypotheses, and discuss some issues regarding the Q1 and Q2 hypotheses for gene class ranking in the GCT. Because Q1 does not deal with correlations among genes, we prefer test based on Q2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Modelos Biológicos , Família Multigênica/fisiologia , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Diferenciação Celular , Simulação por Computador , Interpretação Estatística de Dados
11.
Mutat Res ; 640(1-2): 54-73, 2008 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-18206960

RESUMO

The tumor suppressor protein p53 is a key regulatory element in the cell and is regarded as the "guardian of the genome". Much of the present knowledge of p53 function has come from studies of transgenic mice in which the p53 gene has undergone a targeted deletion. In order to provide additional insight into the impact on the cellular regulatory networks associated with the loss of this gene, microarray technology was utilized to assess gene expression in tissues from both the p53(-/-) and p53(+/-) mice. Six male mice from each genotype (p53(+/+), p53(+/-), and p53(-/-)) were humanely killed and the tissues processed for microarray analysis. The initial studies have been performed in the liver for which the Dunnett test revealed 1406 genes to be differentially expressed between p53(+/+) and p53(+/-) or between p53(+/+) and p53(-/-) at the level of p < or = 0.05. Both genes with increased expression and decreased expression were identified in p53(+/-) and in p53(-/-) mice. Most notable in the gene list derived from the p53(+/-) mice was the significant reduction in p53 mRNA. In the p53(-/-) mice, not only was there reduced expression of the p53 genes on the array, but genes associated with DNA repair, apoptosis, and cell proliferation were differentially expressed, as expected. However, altered expression was noted for many genes in the Cdc42-GTPase pathways that influence cell proliferation. This may indicate that alternate pathways are brought into play in the unperturbed liver when loss or reduction in p53 levels occurs.


Assuntos
Perfilação da Expressão Gênica , Genes p53 , Fígado , Animais , Genótipo , Heterozigoto , Masculino , Camundongos , Camundongos Knockout , Família Multigênica , Análise de Sequência com Séries de Oligonucleotídeos , Reação em Cadeia da Polimerase
12.
J Biopharm Stat ; 18(5): 869-82, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18781522

RESUMO

An important objective in mass spectrometry (MS) is to identify a set of biomarkers that can be used to potentially distinguish patients between distinct treatments (or conditions) from tens or hundreds of spectra. A common two-step approach involving peak extraction and quantification is employed to identify the features of scientific interest. The selected features are then used for further investigation to understand underlying biological mechanism of individual protein or for development of genomic biomarkers to early diagnosis. However, the use of inadequate or ineffective peak detection and peak alignment algorithms in peak extraction step may lead to a high rate of false positives. Also, it is crucial to reduce the false positive rate in detecting biomarkers from ten or hundreds of spectra. Here a new procedure is introduced for feature extraction in mass spectrometry data that extends the continuous wavelet transform-based (CWT-based) algorithm to multiple spectra. The proposed multispectra CWT-based algorithm (MCWT) not only can perform peak detection for multiple spectra but also carry out peak alignment at the same time. The author' MCWT algorithm constructs a reference, which integrates information of multiple raw spectra, for feature extraction. The algorithm is applied to a SELDI-TOF mass spectra data set provided by CAMDA 2006 with known polypeptide m/z positions. This new approach is easy to implement and it outperforms the existing peak extraction method from the Bioconductor PROcess package.


Assuntos
Reconhecimento Automatizado de Padrão/métodos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Algoritmos , Humanos , Mapeamento de Peptídeos , Sensibilidade e Especificidade , Processamento de Sinais Assistido por Computador
13.
BMC Bioinformatics ; 8: 74, 2007 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-17338815

RESUMO

BACKGROUND: A microarray study may select different differentially expressed gene sets because of different selection criteria. For example, the fold-change and p-value are two commonly known criteria to select differentially expressed genes under two experimental conditions. These two selection criteria often result in incompatible selected gene sets. Also, in a two-factor, say, treatment by time experiment, the investigator may be interested in one gene list that responds to both treatment and time effects. RESULTS: We propose three layer ranking algorithms, point-admissible, line-admissible (convex), and Pareto, to provide a preference gene list from multiple gene lists generated by different ranking criteria. Using the public colon data as an example, the layer ranking algorithms are applied to the three univariate ranking criteria, fold-change, p-value, and frequency of selections by the SVM-RFE classifier. A simulation experiment shows that for experiments with small or moderate sample sizes (less than 20 per group) and detecting a 4-fold change or less, the two-dimensional (p-value and fold-change) convex layer ranking selects differentially expressed genes with generally lower FDR and higher power than the standard p-value ranking. Three applications are presented. The first application illustrates a use of the layer rankings to potentially improve predictive accuracy. The second application illustrates an application to a two-factor experiment involving two dose levels and two time points. The layer rankings are applied to selecting differentially expressed genes relating to the dose and time effects. In the third application, the layer rankings are applied to a benchmark data set consisting of three dilution concentrations to provide a ranking system from a long list of differentially expressed genes generated from the three dilution concentrations. CONCLUSION: The layer ranking algorithms are useful to help investigators in selecting the most promising genes from multiple gene lists generated by different filter, normalization, or analysis methods for various objectives.


Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Modelos Genéticos , Algoritmos , Neoplasias do Colo/genética , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos
14.
BMC Bioinformatics ; 8: 412, 2007 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-17961233

RESUMO

BACKGROUND: Many researchers are concerned with the comparability and reliability of microarray gene expression data. Recent completion of the MicroArray Quality Control (MAQC) project provides a unique opportunity to assess reproducibility across multiple sites and the comparability across multiple platforms. The MAQC analysis presented for the conclusion of inter- and intra-platform comparability/reproducibility of microarray gene expression measurements is inadequate. We evaluate the reproducibility/comparability of the MAQC data for 12901 common genes in four titration samples generated from five high-density one-color microarray platforms and the TaqMan technology. We discuss some of the problems with the use of correlation coefficient as metric to evaluate the inter- and intra-platform reproducibility and the percent of overlapping genes (POG) as a measure for evaluation of a gene selection procedure by MAQC. RESULTS: A total of 293 arrays were used in the intra- and inter-platform analysis. A hierarchical cluster analysis shows distinct differences in the measured intensities among the five platforms. A number of genes show a small fold-change in one platform and a large fold-change in another platform, even though the correlations between platforms are high. An analysis of variance shows thirty percent of gene expressions of the samples show inconsistent patterns across the five platforms. We illustrated that POG does not reflect the accuracy of a selected gene list. A non-overlapping gene can be truly differentially expressed with a stringent cut, and an overlapping gene can be non-differentially expressed with non-stringent cutoff. In addition, POG is an unusable selection criterion. POG can increase or decrease irregularly as cutoff changes; there is no criterion to determine a cutoff so that POG is optimized. CONCLUSION: Using various statistical methods we demonstrate that there are differences in the intensities measured by different platforms and different sites within platform. Within each platform, the patterns of expression are generally consistent, but there is site-by-site variability. Evaluation of data analysis methods for use in regulatory decision should take no treatment effect into consideration, when there is no treatment effect, "a fold-change cutoff with a non-stringent p-value cutoff" could result in 100% false positive error selection.


Assuntos
Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise Serial de Proteínas/métodos , Reprodutibilidade dos Testes , Análise por Conglomerados , Sondas de DNA/análise , Bases de Dados Genéticas , Análise de Falha de Equipamento , Reações Falso-Positivas , Perfilação da Expressão Gênica/instrumentação , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/normas , Análise de Sequência com Séries de Oligonucleotídeos/instrumentação , Análise de Sequência com Séries de Oligonucleotídeos/normas , Análise Serial de Proteínas/instrumentação , Análise Serial de Proteínas/normas , Controle de Qualidade , Padrões de Referência , Análise de Regressão , Sensibilidade e Especificidade
15.
J Comput Biol ; 24(12): 1254-1264, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29099245

RESUMO

Genome-wide association studies (GWAS) have been a powerful tool for exploring potential relationships between single-nucleotide polymorphisms (SNPs) and biological traits. For screening out important genetic variants, it is desired to perform an exhaustive scan over a whole genome. However, this is usually a challenging and daunting task in computation, due mainly to the large number of SNPs in GWAS. In this article, we propose a computationally effective algorithm for highly homozygous genomes. Pseudo standard error (PSE) is known to be a highly efficient and robust estimator for the standard deviation of a quantitative trait. We thus develop a statistical testing procedure for determining significant SNP main effects and SNP × SNP interactions associated with a quantitative trait based on PSE. A simulation study is first conducted to evaluate its empirical size and power. It is shown that the proposed PSE-based method can generally maintain the empirical size sufficiently close to the nominal significance level. However, the power investigation indicates that the PSE-based method might lack power in identifying significant effects for low-frequency variants if their true effect sizes are not large enough. A software is provided for implementing the proposed algorithm and its computational efficiency is evaluated through another simulation study. An exhaustive scan is usually done within a very reasonable runtime and a rice genome data set is analyzed by the software.


Assuntos
Estudo de Associação Genômica Ampla , Genoma , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Software , Algoritmos , Epistasia Genética , Variação Genética , Humanos , Fenótipo
16.
J Toxicol Environ Health A ; 69(16): 1527-40, 2006 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-16854783

RESUMO

The percent active (A) and inactive (I) chemicals in a database can directly affect the sensitivity (% active chemicals predicted correctly) and specificity (% inactive chemicals predicted correctly) of structure-activity relationship (SAR) analyses. Subdividing the National Center for Toxicological Research (NCTR) liver cancer database (NCTRlcdb) into various A/I ratios, which varied from 0.2 to 5.5, resulted in sensitivity/specificity ratios that varied from 0.1 to 6.5. As percent active chemicals increased (increasing A/I ratio), the sensitivity rose, the specificity decreased, and the concordance (% total chemicals predicted correctly) remained fairly constant. The numbers of chemicals in the various data sets ranged from 187 to 999 and appeared to have no affect on any of the 3 predictors of sensitivity, specificity, or concordance.


Assuntos
Carcinógenos/toxicidade , Bases de Dados Factuais , Neoplasias Hepáticas/induzido quimicamente , Xenobióticos/toxicidade , Animais , Previsões , Humanos , Sensibilidade e Especificidade , Relação Estrutura-Atividade
17.
Nucleic Acids Res ; 31(9): e52, 2003 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-12711697

RESUMO

This paper compares the type I error and power of the one- and two-sample t-tests, and the one- and two-sample permutation tests for detecting differences in gene expression between two microarray samples with replicates using Monte Carlo simulations. When data are generated from a normal distribution, type I errors and powers of the one-sample parametric t-test and one-sample permutation test are very close, as are the two-sample t-test and two-sample permutation test, provided that the number of replicates is adequate. When data are generated from a t-distribution, the permutation tests outperform the corresponding parametric tests if the number of replicates is at least five. For data from a two-color dye swap experiment, the one-sample test appears to perform better than the two-sample test since expression measurements for control and treatment samples from the same spot are correlated. For data from independent samples, such as the one-channel array or two-channel array experiment using reference design, the two-sample t-tests appear more powerful than the one-sample t-tests.


Assuntos
Perfilação da Expressão Gênica/estatística & dados numéricos , Método de Monte Carlo , Análise de Sequência com Séries de Oligonucleotídeos/normas , DNA Complementar/genética , DNA Complementar/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos/métodos
18.
PLoS One ; 11(4): e0153525, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27120450

RESUMO

A key feature of precision medicine is that it takes individual variability at the genetic or molecular level into account in determining the best treatment for patients diagnosed with diseases detected by recently developed novel biotechnologies. The enrichment design is an efficient design that enrolls only the patients testing positive for specific molecular targets and randomly assigns them for the targeted treatment or the concurrent control. However there is no diagnostic device with perfect accuracy and precision for detecting molecular targets. In particular, the positive predictive value (PPV) can be quite low for rare diseases with low prevalence. Under the enrichment design, some patients testing positive for specific molecular targets may not have the molecular targets. The efficacy of the targeted therapy may be underestimated in the patients that actually do have the molecular targets. To address the loss of efficiency due to misclassification error, we apply the discrete mixture modeling for time-to-event data proposed by Eng and Hanlon [8] to develop an inferential procedure, based on the Cox proportional hazard model, for treatment effects of the targeted treatment effect for the true-positive patients with the molecular targets. Our proposed procedure incorporates both inaccuracy of diagnostic devices and uncertainty of estimated accuracy measures. We employed the expectation-maximization algorithm in conjunction with the bootstrap technique for estimation of the hazard ratio and its estimated variance. We report the results of simulation studies which empirically investigated the performance of the proposed method. Our proposed method is illustrated by a numerical example.


Assuntos
Terapia de Alvo Molecular/métodos , Medicina de Precisão/métodos , Modelos de Riscos Proporcionais , Algoritmos , Ensaios Clínicos como Assunto , Simulação por Computador , Humanos , Modelos Estatísticos
19.
Environ Mol Mutagen ; 45(2-3): 188-205, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-15657912

RESUMO

Identifying genes that are differentially expressed in response to DNA damage may help elucidate markers for genetic damage and provide insight into the cellular responses to specific genotoxic agents. We utilized cDNA microarrays to develop gene expression profiles for ionizing radiation-exposed human lymphoblastoid TK6 cells. In order to relate changes in the expression profiles to biological responses, the effects of ionizing radiation on cell viability, cloning efficiency, and micronucleus formation were measured. TK6 cells were exposed to 0.5, 1, 5, 10, and 20 Gy ionizing radiation and cultured for 4 or 24 hr. A significant (P < 0.0001) decrease in cloning efficiency was observed at all doses at 4 and 24 hr after exposure. Flow cytometry revealed significant decreases in cell viability at 24 hr in cells exposed to 5 (P < 0.001), 10 (P < 0.0001), and 20 Gy (P < 0.0001). An increase in micronucleus frequency occurred at both 4 and 24 hr at 0.5 and 1 Gy; however, insufficient binucleated cells were present for analysis at the higher doses. Gene expression profiles were developed from mRNA isolated from cells exposed to 5, 10, and 20 Gy using a 350 gene human cDNA array platform. Overall, more genes were differentially expressed at 24-hr than at the 4-hr time point. The genes upregulated (> 1.5-fold) or downregulated (< 0.67-fold) at 4 hr were those primarily involved in the cessation of the cell cycle, cellular detoxification pathways, DNA repair, and apoptosis. At 24 hr, glutathione-associated genes were induced in addition to genes involved in apoptosis. Genes involved in cell cycle progression and mitosis were downregulated at 24 hr. Real-time quantitative PCR was used to confirm the microarray results and to evaluate expression levels of selected genes at the low doses (0.5 and 1.0 Gy). The expression profiles reflect the cellular and molecular responses to ionizing radiation related to the recognition of DNA damage, a halt in progression through the cell cycle, activation of DNA-repair pathways, and the promotion of apoptosis.


Assuntos
Dano ao DNA , Perfilação da Expressão Gênica , Regulação da Expressão Gênica/efeitos da radiação , Timidina Quinase/genética , Análise de Variância , Ciclo Celular/efeitos da radiação , Sobrevivência Celular/efeitos da radiação , Primers do DNA , Relação Dose-Resposta à Radiação , Citometria de Fluxo , Humanos , Testes para Micronúcleos , Análise de Sequência com Séries de Oligonucleotídeos , Radiação Ionizante , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Fatores de Tempo , Células Tumorais Cultivadas
20.
Math Biosci ; 193(1): 79-100, 2005 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-15681277

RESUMO

DNA microarray technology provides tools for studying the expression profiles of a large number of distinct genes simultaneously. This technology has been applied to sample clustering and sample prediction. Because of a large number of genes measured, many of the genes in the original data set are irrelevant to the analysis. Selection of discriminatory genes is critical to the accuracy of clustering and prediction. This paper considers statistical significance testing approach to selecting discriminatory gene sets for multi-class clustering and prediction of experimental samples. A toxicogenomic data set with nine treatments (a control and eight metals, As, Cd, Ni, Cr, Sb, Pb, Cu, and AsV with a total of 55 samples) is used to illustrate a general framework of the approach. Among four selected gene sets, a gene set omega(I) formed by the intersection of the F-test and the set of the union of one-versus-all t-tests performs the best in terms of clustering as well as prediction. Hierarchical and two modified partition (k-means) methods all show that the set omega(I) is able to group the 55 samples into seven clusters reasonably well, in which the As and AsV samples are considered as one cluster (the same group) as are the Cd and Cu samples. With respect to prediction, the overall accuracy for the gene set omega(I) using the nearest neighbors algorithm to predict 55 samples into one of the nine treatments is 85%.


Assuntos
Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/estatística & dados numéricos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Análise Discriminante , Fibroblastos/efeitos dos fármacos , Fibroblastos/metabolismo , Expressão Gênica/efeitos dos fármacos , Humanos , Metais/farmacologia , Toxicogenética/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA