Pesquisa | Biblioteca Virtual em Saúde

Comparison of RNA-seq and microarray-based models for clinical endpoint prediction.

Zhang, Wenqian; Yu, Ying; Hertwig, Falk; Thierry-Mieg, Jean; Zhang, Wenwei; Thierry-Mieg, Danielle; Wang, Jian; Furlanello, Cesare; Devanarayan, Viswanath; Cheng, Jie; Deng, Youping; Hero, Barbara; Hong, Huixiao; Jia, Meiwen; Li, Li; Lin, Simon M; Nikolsky, Yuri; Oberthuer, André; Qing, Tao; Su, Zhenqiang; Volland, Ruth; Wang, Charles; Wang, May D; Ai, Junmei; Albanese, Davide; Asgharzadeh, Shahab; Avigad, Smadar; Bao, Wenjun; Bessarabova, Marina; Brilliant, Murray H; Brors, Benedikt; Chierici, Marco; Chu, Tzu-Ming; Zhang, Jibin; Grundy, Richard G; He, Min Max; Hebbring, Scott; Kaufman, Howard L; Lababidi, Samir; Lancashire, Lee J; Li, Yan; Lu, Xin X; Luo, Heng; Ma, Xiwen; Ning, Baitang; Noguera, Rosa; Peifer, Martin; Phan, John H; Roels, Frederik; Rosswog, Carolina.

Genome Biol ; 16: 133, 2015 Jun 25.

Artigo em Inglês | MEDLINE | ID: mdl-26109056

RESUMO

BACKGROUND: Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. RESULTS: We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. CONCLUSIONS: We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.

Assuntos

Perfilação da Expressão Gênica , Neuroblastoma/genética , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Sequência de RNA , Adolescente , Adulto , Criança , Pré-Escolar , Determinação de Ponto Final , Feminino , Humanos , Lactente , Recém-Nascido , Masculino , Modelos Genéticos , Neuroblastoma/classificação , Neuroblastoma/diagnóstico , Células Tumorais Cultivadas , Adulto Jovem

The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance.

Wang, Charles; Gong, Binsheng; Bushel, Pierre R; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Xu, Joshua; Fang, Hong; Hong, Huixiao; Shen, Jie; Su, Zhenqiang; Meehan, Joe; Li, Xiaojin; Yang, Lu; Li, Haiqing; Labaj, Pawel P; Kreil, David P; Megherbi, Dalila; Gaj, Stan; Caiment, Florian; van Delft, Joost; Kleinjans, Jos; Scherer, Andreas; Devanarayan, Viswanath; Wang, Jian; Yang, Yong; Qian, Hui-Rong; Lancashire, Lee J; Bessarabova, Marina; Nikolsky, Yuri; Furlanello, Cesare; Chierici, Marco; Albanese, Davide; Jurman, Giuseppe; Riccadonna, Samantha; Filosi, Michele; Visintainer, Roberto; Zhang, Ke K; Li, Jianying; Hsieh, Jui-Hua; Svoboda, Daniel L; Fuscoe, James C; Deng, Youping; Shi, Leming; Paules, Richard S; Auerbach, Scott S; Tong, Weida.

Nat Biotechnol ; 32(9): 926-32, 2014 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-25150839

RESUMO

The concordance of RNA-sequencing (RNA-seq) with microarrays for genome-wide analysis of differential gene expression has not been rigorously assessed using a range of chemical treatment conditions. Here we use a comprehensive study design to generate Illumina RNA-seq and Affymetrix microarray data from the same liver samples of rats exposed in triplicate to varying degrees of perturbation by 27 chemicals representing multiple modes of action (MOAs). The cross-platform concordance in terms of differentially expressed genes (DEGs) or enriched pathways is linearly correlated with treatment effect size (R(2)î0.8). Furthermore, the concordance is also affected by transcript abundance and biological complexity of the MOA. RNA-seq outperforms microarray (93% versus 75%) in DEG verification as assessed by quantitative PCR, with the gain mainly due to its improved accuracy for low-abundance transcripts. Nonetheless, classifiers to predict MOAs perform similarly when developed using data from either platform. Therefore, the endpoint studied and its biological complexity, transcript abundance and the genomic application are important factors in transcriptomic research and for clinical and regulatory decision making.

Assuntos

Análise de Sequência com Séries de Oligonucleotídeos , RNA Mensageiro/genética , Análise de Sequência de RNA , Animais , Ratos

An investigation of biomarkers derived from legacy microarray data for their utility in the RNA-seq era.

Su, Zhenqiang; Fang, Hong; Hong, Huixiao; Shi, Leming; Zhang, Wenqian; Zhang, Wenwei; Zhang, Yanyan; Dong, Zirui; Lancashire, Lee J; Bessarabova, Marina; Yang, Xi; Ning, Baitang; Gong, Binsheng; Meehan, Joe; Xu, Joshua; Ge, Weigong; Perkins, Roger; Fischer, Matthias; Tong, Weida.

Genome Biol ; 15(12): 523, 2014 Dec 03.

Artigo em Inglês | MEDLINE | ID: mdl-25633159

RESUMO

BACKGROUND: Gene expression microarray has been the primary biomarker platform ubiquitously applied in biomedical research, resulting in enormous data, predictive models, and biomarkers accrued. Recently, RNA-seq has looked likely to replace microarrays, but there will be a period where both technologies co-exist. This raises two important questions: Can microarray-based models and biomarkers be directly applied to RNA-seq data? Can future RNA-seq-based predictive models and biomarkers be applied to microarray data to leverage past investment? RESULTS: We systematically evaluated the transferability of predictive models and signature genes between microarray and RNA-seq using two large clinical data sets. The complexity of cross-platform sequence correspondence was considered in the analysis and examined using three human and two rat data sets, and three levels of mapping complexity were revealed. Three algorithms representing different modeling complexity were applied to the three levels of mappings for each of the eight binary endpoints and Cox regression was used to model survival times with expression data. In total, 240,096 predictive models were examined. CONCLUSIONS: Signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development, and microarray-based models can accurately predict RNA-seq-profiled samples; while RNA-seq-based models are less accurate in predicting microarray-profiled samples and are affected both by the choice of modeling algorithm and the gene mapping complexity. The results suggest continued usefulness of legacy microarray data and established microarray biomarkers and predictive models in the forthcoming RNA-seq era.

Assuntos

Perfilação da Expressão Gênica/métodos , Marcadores Genéticos , RNA/análise , Análise de Sequência de RNA , Algoritmos , Animais , Biologia Computacional/métodos , Humanos , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos , Ratos

Statistical considerations of optimal study design for human plasma proteomics and biomarker discovery.

Zhou, Cong; Simpson, Kathryn L; Lancashire, Lee J; Walker, Michael J; Dawson, Martin J; Unwin, Richard D; Rembielak, Agata; Price, Patricia; West, Catharine; Dive, Caroline; Whetton, Anthony D.

J Proteome Res ; 11(4): 2103-13, 2012 Apr 06.

Artigo em Inglês | MEDLINE | ID: mdl-22338609

RESUMO

A mass spectrometry-based plasma biomarker discovery workflow was developed to facilitate biomarker discovery. Plasma from either healthy volunteers or patients with pancreatic cancer was 8-plex iTRAQ labeled, fractionated by 2-dimensional reversed phase chromatography and subjected to MALDI ToF/ToF mass spectrometry. Data were processed using a q-value based statistical approach to maximize protein quantification and identification. Technical (between duplicate samples) and biological variance (between and within individuals) were calculated and power analysis was thereby enabled. An a priori power analysis was carried out using samples from healthy volunteers to define sample sizes required for robust biomarker identification. The result was subsequently validated with a post hoc power analysis using a real clinical setting involving pancreatic cancer patients. This demonstrated that six samples per group (e.g., pre- vs post-treatment) may provide sufficient statistical power for most proteins with changes>2 fold. A reference standard allowed direct comparison of protein expression changes between multiple experiments. Analysis of patient plasma prior to treatment identified 29 proteins with significant changes within individual patient. Changes in Peroxiredoxin II levels were confirmed by Western blot. This q-value based statistical approach in combination with reference standard samples can be applied with confidence in the design and execution of clinical studies for predictive, prognostic, and/or pharmacodynamic biomarker discovery. The power analysis provides information required prior to study initiation.

Assuntos

Biomarcadores Tumorais/sangue , Proteínas Sanguíneas/análise , Proteínas de Neoplasias/sangue , Proteômica/métodos , Proteínas Sanguíneas/química , Estudos de Casos e Controles , Fator XIII , Humanos , Proteínas de Neoplasias/química , Neoplasias Pancreáticas/sangue , Peroxirredoxinas , Proteoma/análise , Proteoma/química , Reprodutibilidade dos Testes , Estatística como Assunto

The development of composite circulating biomarker models for use in anticancer drug clinical development.

Lancashire, Lee J; Roberts, Darren L; Dive, Caroline; Renehan, Andrew G.

Int J Cancer ; 128(8): 1843-51, 2011 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-20549702

RESUMO

The development of informative composite circulating biomarkers predicting cancer presence or therapy response is clinically attractive but optimal approaches to modeling are as yet unclear. This study investigated multidimensional relationships within an example panel of serum insulin-like growth factor (IGF) peptides using logistic regression (LR), fractional polynomial (FP), regression, artificial neural networks (ANNs) and support vector machines (SVMs) to derive predictive models for colorectal cancer (CRC). Two phase 2 biomarker validation analyses were performed: controls were ambulant adults (n = 722); cases were: (i) CRC patients (n = 100) and (ii) patients with acromegaly (n = 52), the latter as "positive" discriminators. Serum IGF-I, IGF-II, IGF binding protein (IGFBP)-2 and -3 were measured. Discriminatory characteristics were compared within and between models. For the LR, FP and ANN models, and to a lesser extent SVMs, the addition of covariates at several steps improved discrimination characteristics. The optimum biomarker combination discriminating CRC vs. controls was achieved using ANN models [sensitivity, 94%; specificity, 90%; accuracy, 0.975 (95% CIs: 0.948 1.000)]. ANN modeling significantly outperformed LR, FP and SVM in terms of discrimination (p < 0.0001) and calibration. The acromegaly analysis demonstrated expected high performance characteristics in the ANN model [accuracy, 0.993 (95% CIs: 0.977, 1.000)]. Curved decision surfaces generated from the ANNs revealed the potential clinical utility. This example demonstrated improved discriminatory characteristics within the composite biomarker ANN model and a final model that outperformed the three other models. This modeling approach forms the basis to evaluate composite biomarkers as pharmacological and predictive biomarkers in future clinical trials.

Assuntos

Biomarcadores Tumorais/sangue , Neoplasias Colorretais/diagnóstico , Proteína 2 de Ligação a Fator de Crescimento Semelhante à Insulina/sangue , Proteína 3 de Ligação a Fator de Crescimento Semelhante à Insulina/sangue , Fator de Crescimento Insulin-Like II/metabolismo , Fator de Crescimento Insulin-Like I/metabolismo , Modelos Estatísticos , Adulto , Idoso , Estudos de Casos e Controles , Neoplasias Colorretais/sangue , Feminino , Humanos , Masculino , Radioimunoensaio , Estudos Retrospectivos

An introduction to artificial neural networks in bioinformatics--application to complex microarray and mass spectrometry datasets in cancer studies.

Lancashire, Lee J; Lemetre, Christophe; Ball, Graham R.

Brief Bioinform ; 10(3): 315-29, 2009 May.

Artigo em Inglês | MEDLINE | ID: mdl-19307287

RESUMO

Applications of genomic and proteomic technologies have seen a major increase, resulting in an explosion in the amount of highly dimensional and complex data being generated. Subsequently this has increased the effort by the bioinformatics community to develop novel computational approaches that allow for meaningful information to be extracted. This information must be of biological relevance and thus correlate to disease phenotypes of interest. Artificial neural networks are a form of machine learning from the field of artificial intelligence with proven pattern recognition capabilities and have been utilized in many areas of bioinformatics. This is due to their ability to cope with highly dimensional complex datasets such as those developed by protein mass spectrometry and DNA microarray experiments. As such, neural networks have been applied to problems such as disease classification and identification of biomarkers. This review introduces and describes the concepts related to neural networks, the advantages and caveats to their use, examples of their applications in mass spectrometry and microarray research (with a particular focus on cancer studies), and illustrations from recent literature showing where neural networks have performed well in comparison to other machine learning methods. This should form the necessary background knowledge and information enabling researchers with an interest in these methodologies, but not necessarily from a machine learning background, to apply the concepts to their own datasets, thus maximizing the information gain from these complex biological systems.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Proteínas , Espectrometria de Massas , Análise em Microsséries , Neoplasias , Redes Neurais de Computação , Inteligência Artificial , Teorema de Bayes , Genômica , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Proteômica , Reprodutibilidade dos Testes

Clinical evaluation of M30 and M65 ELISA cell death assays as circulating biomarkers in a drug-sensitive tumor, testicular cancer.

de Haas, Esther C; di Pietro, Alessandra; Simpson, Kathryn L; Meijer, Coby; Suurmeijer, Albert J H; Lancashire, Lee J; Cummings, J; de Jong, Steven; de Vries, Elisabeth G E; Dive, Caroline; Gietema, Jourik A.

Neoplasia ; 10(10): 1041-8, 2008 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-18813353

RESUMO

Circulating full-length and caspase-cleaved cytokeratin 18 (CK18) are considered biomarkers of chemotherapy-induced cell death measured using a combination of the M30 and M65 ELISAs. M30 measures caspase-cleaved CK18 produced during apoptosis and M65 measures the levels of both caspase-cleaved and intact CK18, the latter of which is released from cells undergoing necrosis. Previous studies have highlighted their potential as prognostic, predictive, and pharmacological tools in the treatment of cancer. Disseminated testicular germ cell cancer (TC) is a paradigm for a chemosensitive solid malignancy of epithelial origin and has a cure rate of 80% to 90%. We conducted M30/M65 analyses on 34 patients with TC before and during treatment with bleomycin, etoposide, and cisplatin and showed that prechemotherapy serum levels of M65 and M30 antigens are correlated with established TC tumor markers lactate dehydrogenase, alpha-fetoprotein, and beta-human chorionic gonadotropin, probably reflecting tumor load. Cumulative percentage change of M65 and M30 from baseline to end of study was highest in poor prognosis patients (P < .05). Moreover, area under the curve profiles of M65 and M30 during chemotherapy mirrored dynamic profiles for lactate dehydrogenase, alpha-fetoprotein, and beta-human chorionic gonadotropin. Consequently, M65 and M30 levels appear to reflect chemotherapy-induced changes that correlate with changes in markers routinely used in the clinic for management of patients with TC. This is the first clinical study where M65 and M30 antigen levels correlate with established prognostic markers and provides impetus for their exploration in other epithelial cancers where there is a pressing need for informative circulating biomarkers.

Assuntos

Biomarcadores Tumorais/análise , Queratina-18/análise , Neoplasias Embrionárias de Células Germinativas/sangue , Neoplasias Embrionárias de Células Germinativas/diagnóstico , Neoplasias Testiculares/sangue , Neoplasias Testiculares/diagnóstico , Adulto , Antígenos de Neoplasias/análise , Antígenos de Neoplasias/sangue , Antígenos de Neoplasias/metabolismo , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Apoptose/efeitos dos fármacos , Biomarcadores Tumorais/sangue , Caspases/metabolismo , Morte Celular/efeitos dos fármacos , Ensaio de Imunoadsorção Enzimática/métodos , Humanos , Queratina-18/sangue , Queratina-18/imunologia , Queratina-18/metabolismo , Masculino , Pessoa de Meia-Idade , Neoplasias Embrionárias de Células Germinativas/tratamento farmacológico , Prognóstico , Neoplasias Testiculares/tratamento farmacológico , Resultado do Tratamento

Identification of gene transcript signatures predictive for estrogen receptor and lymph node status using a stepwise forward selection artificial neural network modelling approach.

Lancashire, Lee J; Rees, Robert C; Ball, Graham R.

Artif Intell Med ; 43(2): 99-111, 2008 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-18420392

RESUMO

OBJECTIVE: The advent of microarrays has attracted considerable interest from biologists due to the potential for high throughput analysis of hundreds of thousands of gene transcripts. Subsequent analysis of the data may identify specific features which correspond to characteristics of interest within the population, for example, analysis of gene expression profiles in cancer patients to identify molecular signatures corresponding with prognostic outcome. These high throughput technologies have resulted in an unprecedented rate of data generation, often of high complexity, highlighting the need for novel data analysis methodologies that will cope with data of this nature. METHODS: Stepwise methods using artificial neural networks (ANNs) have been developed to identify an optimal subset of predictive gene transcripts from highly dimensional microarray data. Here these methods have been applied to a gene microarray dataset to identify and validate gene signatures corresponding with estrogen receptor and lymph node status in breast cancer. RESULTS: Many gene transcripts were identified whose expression could differentiate patients to very high accuracies based upon firstly whether they were positive or negative for estrogen receptor, and secondly whether metastasis to the axillary lymph node had occurred. A number of these genes had been previously reported to have a role in cancer. Significantly fewer genes were used compared to other previous studies. The models using the optimal gene subsets were internally validated using an extensive random sample cross-validation procedure and externally validated using a follow up dataset from a different cohort of patients on a newer array chip containing the same and additional probe sets. Here, the models retained high accuracies, emphasising the potential power of this approach in analysing complex systems. These findings show how the proposed method allows for the rapid analysis and subsequent detailed interrogation of gene expression signatures to provide a further understanding of the underlying molecular mechanisms that could be important in determining novel prognostic markers associated with cancer.

Assuntos

Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Redes Neurais de Computação , Receptores de Estrogênio/fisiologia , Transcrição Gênica/fisiologia , Feminino , Perfilação da Expressão Gênica , Humanos , Metástase Linfática , Análise de Sequência com Séries de Oligonucleotídeos , Valor Preditivo dos Testes , Reprodutibilidade dos Testes

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA