Pesquisa | Secretaria de Estado da Saúde

1.

exRNA Atlas Analysis Reveals Distinct Extracellular RNA Cargo Types and Their Carriers Present across Human Biofluids.

Murillo, Oscar D; Thistlethwaite, William; Rozowsky, Joel; Subramanian, Sai Lakshmi; Lucero, Rocco; Shah, Neethu; Jackson, Andrew R; Srinivasan, Srimeenakshi; Chung, Allen; Laurent, Clara D; Kitchen, Robert R; Galeev, Timur; Warrell, Jonathan; Diao, James A; Welsh, Joshua A; Hanspers, Kristina; Riutta, Anders; Burgstaller-Muehlbacher, Sebastian; Shah, Ravi V; Yeri, Ashish; Jenkins, Lisa M; Ahsen, Mehmet E; Cordon-Cardo, Carlos; Dogra, Navneet; Gifford, Stacey M; Smith, Joshua T; Stolovitzky, Gustavo; Tewari, Ashutosh K; Wunsch, Benjamin H; Yadav, Kamlesh K; Danielson, Kirsty M; Filant, Justyna; Moeller, Courtney; Nejad, Parham; Paul, Anu; Simonson, Bridget; Wong, David K; Zhang, Xuan; Balaj, Leonora; Gandhi, Roopali; Sood, Anil K; Alexander, Roger P; Wang, Liang; Wu, Chunlei; Wong, David T W; Galas, David J; Van Keuren-Jensen, Kendall; Patel, Tushar; Jones, Jennifer C; Das, Saumya.

Cell ; 177(2): 463-477.e15, 2019 04 04.

Artigo em Inglês | MEDLINE | ID: mdl-30951672

RESUMO

To develop a map of cell-cell communication mediated by extracellular RNA (exRNA), the NIH Extracellular RNA Communication Consortium created the exRNA Atlas resource (https://exrna-atlas.org). The Atlas version 4P1 hosts 5,309 exRNA-seq and exRNA qPCR profiles from 19 studies and a suite of analysis and visualization tools. To analyze variation between profiles, we apply computational deconvolution. The analysis leads to a model with six exRNA cargo types (CT1, CT2, CT3A, CT3B, CT3C, CT4), each detectable in multiple biofluids (serum, plasma, CSF, saliva, urine). Five of the cargo types associate with known vesicular and non-vesicular (lipoprotein and ribonucleoprotein) exRNA carriers. To validate utility of this model, we re-analyze an exercise response study by deconvolution to identify physiologically relevant response pathways that were not detected previously. To enable wide application of this model, as part of the exRNA Atlas resource, we provide tools for deconvolution and analysis of user-provided case-control studies.

Assuntos

Comunicação Celular/fisiologia , RNA/metabolismo , Adulto , Líquidos Corporais/química , Ácidos Nucleicos Livres/metabolismo , MicroRNA Circulante/metabolismo , Vesículas Extracelulares/metabolismo , Feminino , Humanos , Masculino , Reprodutibilidade dos Testes , Análise de Sequência de RNA/métodos , Software

2.

The Fermi-Dirac distribution provides a calibrated probabilistic output for binary classifiers.

Kim, Sung-Cheol; Arun, Adith S; Ahsen, Mehmet Eren; Vogel, Robert; Stolovitzky, Gustavo.

Proc Natl Acad Sci U S A ; 118(34)2021 08 24.

Artigo em Inglês | MEDLINE | ID: mdl-34413191

RESUMO

Binary classification is one of the central problems in machine-learning research and, as such, investigations of its general statistical properties are of interest. We studied the ranking statistics of items in binary classification problems and observed that there is a formal and surprising relationship between the probability of a sample belonging to one of the two classes and the Fermi-Dirac distribution determining the probability that a fermion occupies a given single-particle quantum state in a physical system of noninteracting fermions. Using this equivalence, it is possible to compute a calibrated probabilistic output for binary classifiers. We show that the area under the receiver operating characteristics curve (AUC) in a classification problem is related to the temperature of an equivalent physical system. In a similar manner, the optimal decision threshold between the two classes is associated with the chemical potential of an equivalent physical system. Using our framework, we also derive a closed-form expression to calculate the variance for the AUC of a classifier. Finally, we introduce FiDEL (Fermi-Dirac-based ensemble learning), an ensemble learning algorithm that uses the calibrated nature of the classifier's output probability to combine possibly very different classifiers.

3.

Assessment of network module identification across complex diseases.

Choobdar, Sarvenaz; Ahsen, Mehmet E; Crawford, Jake; Tomasoni, Mattia; Fang, Tao; Lamparter, David; Lin, Junyuan; Hescott, Benjamin; Hu, Xiaozhe; Mercer, Johnathan; Natoli, Ted; Narayan, Rajiv; Subramanian, Aravind; Zhang, Jitao D; Stolovitzky, Gustavo; Kutalik, Zoltán; Lage, Kasper; Slonim, Donna K; Saez-Rodriguez, Julio; Cowen, Lenore J; Bergmann, Sven; Marbach, Daniel.

Nat Methods ; 16(9): 843-852, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31471613

RESUMO

Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the 'Disease Module Identification DREAM Challenge', an open competition to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology.

Assuntos

Biologia Computacional/métodos , Doença/genética , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Modelos Biológicos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Algoritmos , Perfilação da Expressão Gênica , Humanos , Fenótipo , Mapas de Interação de Proteínas

4.

COSIFER: a Python package for the consensus inference of molecular interaction networks.

Manica, Matteo; Bunne, Charlotte; Mathis, Roland; Cadow, Joris; Ahsen, Mehmet Eren; Stolovitzky, Gustavo A; Martínez, María Rodríguez.

Bioinformatics ; 37(14): 2070-2072, 2021 08 04.

Artigo em Inglês | MEDLINE | ID: mdl-33241320

RESUMO

SUMMARY: The advent of high-throughput technologies has provided researchers with measurements of thousands of molecular entities and enable the investigation of the internal regulatory apparatus of the cell. However, network inference from high-throughput data is far from being a solved problem. While a plethora of different inference methods have been proposed, they often lead to non-overlapping predictions, and many of them lack user-friendly implementations to enable their broad utilization. Here, we present Consensus Interaction Network Inference Service (COSIFER), a package and a companion web-based platform to infer molecular networks from expression data using state-of-the-art consensus approaches. COSIFER includes a selection of state-of-the-art methodologies for network inference and different consensus strategies to integrate the predictions of individual methods and generate robust networks. AVAILABILITY AND IMPLEMENTATION: COSIFER Python source code is available at https://github.com/PhosphorylatedRabbits/cosifer. The web service is accessible at https://ibm.biz/cosifer-aas. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Software , Consenso

5.

Crowdsourcing biomedical research: leveraging communities as innovation engines.

Saez-Rodriguez, Julio; Costello, James C; Friend, Stephen H; Kellen, Michael R; Mangravite, Lara; Meyer, Pablo; Norman, Thea; Stolovitzky, Gustavo.

Nat Rev Genet ; 17(8): 470-86, 2016 07 15.

Artigo em Inglês | MEDLINE | ID: mdl-27418159

RESUMO

The generation of large-scale biomedical data is creating unprecedented opportunities for basic and translational science. Typically, the data producers perform initial analyses, but it is very likely that the most informative methods may reside with other groups. Crowdsourcing the analysis of complex and massive data has emerged as a framework to find robust methodologies. When the crowdsourcing is done in the form of collaborative scientific competitions, known as Challenges, the validation of the methods is inherently addressed. Challenges also encourage open innovation, create collaborative communities to solve diverse and important biomedical problems, and foster the creation and dissemination of well-curated data repositories.

Assuntos

Pesquisa Biomédica/organização & administração , Crowdsourcing , Pesquisa Translacional Biomédica/organização & administração , Animais , Comportamento Cooperativo , Humanos , Comunicação Interdisciplinar , Inovação Organizacional

6.

Unannotated small RNA clusters associated with circulating extracellular vesicles detect early stage liver cancer.

von Felden, Johann; Garcia-Lezana, Teresa; Dogra, Navneet; Gonzalez-Kozlova, Edgar; Ahsen, Mehmet Eren; Craig, Amanda; Gifford, Stacey; Wunsch, Benjamin; Smith, Joshua T; Kim, Sungcheol; Diaz, Jennifer E L; Chen, Xintong; Labgaa, Ismail; Haber, Philipp; Olsen, Reena; Han, Dan; Restrepo, Paula; D'Avola, Delia; Hernandez-Meza, Gabriela; Allette, Kimaada; Sebra, Robert; Saberi, Behnam; Tabrizian, Parissa; Asgharpour, Amon; Dieterich, Douglas; Llovet, Josep M; Cordon-Cardo, Carlos; Tewari, Ash; Schwartz, Myron; Stolovitzky, Gustavo; Losic, Bojan; Villanueva, Augusto.

Gut ; 2021 Jul 28.

Artigo em Inglês | MEDLINE | ID: mdl-34321221

RESUMO

OBJECTIVE: Surveillance tools for early cancer detection are suboptimal, including hepatocellular carcinoma (HCC), and biomarkers are urgently needed. Extracellular vesicles (EVs) have gained increasing scientific interest due to their involvement in tumour initiation and metastasis; however, most extracellular RNA (exRNA) blood-based biomarker studies are limited to annotated genomic regions. DESIGN: EVs were isolated with differential ultracentrifugation and integrated nanoscale deterministic lateral displacement arrays (nanoDLD) and quality assessed by electron microscopy, immunoblotting, nanoparticle tracking and deconvolution analysis. Genome-wide sequencing of the largely unexplored small exRNA landscape, including unannotated transcripts, identified and reproducibly quantified small RNA clusters (smRCs). Their key genomic features were delineated across biospecimens and EV isolation techniques in prostate cancer and HCC. Three independent exRNA cancer datasets with a total of 479 samples from 375 patients, including longitudinal samples, were used for this study. RESULTS: ExRNA smRCs were dominated by uncharacterised, unannotated small RNA with a consensus sequence of 20 nt. An unannotated 3-smRC signature was significantly overexpressed in plasma exRNA of patients with HCC (p<0.01, n=157). An independent validation in a phase 2 biomarker case-control study revealed 86% sensitivity and 91% specificity for the detection of early HCC from controls at risk (n=209) (area under the receiver operating curve (AUC): 0.87). The 3-smRC signature was independent of alpha-fetoprotein (p<0.0001) and a composite model yielded an increased AUC of 0.93. CONCLUSION: These findings directly lead to the prospect of a minimally invasive, blood-only, operator-independent clinical tool for HCC surveillance, thus highlighting the potential of unannotated smRCs for biomarker research in cancer.

7.

Broken flow symmetry explains the dynamics of small particles in deterministic lateral displacement arrays.

Kim, Sung-Cheol; Wunsch, Benjamin H; Hu, Huan; Smith, Joshua T; Austin, Robert H; Stolovitzky, Gustavo.

Proc Natl Acad Sci U S A ; 114(26): E5034-E5041, 2017 06 27.

Artigo em Inglês | MEDLINE | ID: mdl-28607075

RESUMO

Deterministic lateral displacement (DLD) is a technique for size fractionation of particles in continuous flow that has shown great potential for biological applications. Several theoretical models have been proposed, but experimental evidence has demonstrated that a rich class of intermediate migration behavior exists, which is not predicted. We present a unified theoretical framework to infer the path of particles in the whole array on the basis of trajectories in a unit cell. This framework explains many of the unexpected particle trajectories reported and can be used to design arrays for even nanoscale particle fractionation. We performed experiments that verify these predictions and used our model to develop a condenser array that achieves full particle separation with a single fluidic input.

8.

Inferring causal molecular networks: empirical assessment through a community-based effort.

Hill, Steven M; Heiser, Laura M; Cokelaer, Thomas; Unger, Michael; Nesser, Nicole K; Carlin, Daniel E; Zhang, Yang; Sokolov, Artem; Paull, Evan O; Wong, Chris K; Graim, Kiley; Bivol, Adrian; Wang, Haizhou; Zhu, Fan; Afsari, Bahman; Danilova, Ludmila V; Favorov, Alexander V; Lee, Wai Shing; Taylor, Dane; Hu, Chenyue W; Long, Byron L; Noren, David P; Bisberg, Alexander J; Mills, Gordon B; Gray, Joe W; Kellen, Michael; Norman, Thea; Friend, Stephen; Qutub, Amina A; Fertig, Elana J; Guan, Yuanfang; Song, Mingzhou; Stuart, Joshua M; Spellman, Paul T; Koeppl, Heinz; Stolovitzky, Gustavo; Saez-Rodriguez, Julio; Mukherjee, Sach.

Nat Methods ; 13(4): 310-8, 2016 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-26901648

RESUMO

It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.

Assuntos

Causalidade , Redes Reguladoras de Genes , Neoplasias/genética , Mapeamento de Interação de Proteínas/métodos , Software , Biologia de Sistemas , Algoritmos , Biologia Computacional , Simulação por Computador , Perfilação da Expressão Gênica , Humanos , Modelos Biológicos , Transdução de Sinais , Células Tumorais Cultivadas

9.

Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection.

Ewing, Adam D; Houlahan, Kathleen E; Hu, Yin; Ellrott, Kyle; Caloian, Cristian; Yamaguchi, Takafumi N; Bare, J Christopher; P'ng, Christine; Waggott, Daryl; Sabelnykova, Veronica Y; Kellen, Michael R; Norman, Thea C; Haussler, David; Friend, Stephen H; Stolovitzky, Gustavo; Margolin, Adam A; Stuart, Joshua M; Boutros, Paul C.

Nat Methods ; 12(7): 623-30, 2015 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-25984700

RESUMO

The detection of somatic mutations from cancer genome sequences is key to understanding the genetic basis of disease progression, patient survival and response to therapy. Benchmarking is needed for tool assessment and improvement but is complicated by a lack of gold standards, by extensive resource requirements and by difficulties in sharing personal genomic information. To resolve these issues, we launched the ICGC-TCGA DREAM Somatic Mutation Calling Challenge, a crowdsourced benchmark of somatic mutation detection algorithms. Here we report the BAMSurgeon tool for simulating cancer genomes and the results of 248 analyses of three in silico tumors created with it. Different algorithms exhibit characteristic error profiles, and, intriguingly, false positives show a trinucleotide profile very similar to one found in human tumors. Although the three simulated tumors differ in sequence contamination (deviation from normal cell sequence) and in subclonality, an ensemble of pipelines outperforms the best individual pipeline in all cases. BAMSurgeon is available at https://github.com/adamewing/bamsurgeon/.

Assuntos

Benchmarking , Crowdsourcing , Genoma , Neoplasias/genética , Polimorfismo de Nucleotídeo Único , Algoritmos , Humanos

10.

Prediction of overall survival for patients with metastatic castration-resistant prostate cancer: development of a prognostic model through a crowdsourced challenge with open clinical trial data.

Guinney, Justin; Wang, Tao; Laajala, Teemu D; Winner, Kimberly Kanigel; Bare, J Christopher; Neto, Elias Chaibub; Khan, Suleiman A; Peddinti, Gopal; Airola, Antti; Pahikkala, Tapio; Mirtti, Tuomas; Yu, Thomas; Bot, Brian M; Shen, Liji; Abdallah, Kald; Norman, Thea; Friend, Stephen; Stolovitzky, Gustavo; Soule, Howard; Sweeney, Christopher J; Ryan, Charles J; Scher, Howard I; Sartor, Oliver; Xie, Yang; Aittokallio, Tero; Zhou, Fang Liz; Costello, James C.

Lancet Oncol ; 18(1): 132-142, 2017 01.

Artigo em Inglês | MEDLINE | ID: mdl-27864015

RESUMO

BACKGROUND: Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an open-data, crowdsourced, DREAM (Dialogue for Reverse Engineering Assessments and Methods) challenge to not only identify a better prognostic model for prediction of survival in patients with metastatic castration-resistant prostate cancer but also engage a community of international data scientists to study this disease. METHODS: Data from the comparator arms of four phase 3 clinical trials in first-line metastatic castration-resistant prostate cancer were obtained from Project Data Sphere, comprising 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, 598 patients treated with docetaxel, prednisone or prednisolone, and placebo in the VENICE trial, and 470 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Datasets consisting of more than 150 clinical variables were curated centrally, including demographics, laboratory values, medical history, lesion sites, and previous treatments. Data from ASCENT2, MAINSAIL, and VENICE were released publicly to be used as training data to predict the outcome of interest-namely, overall survival. Clinical data were also released for ENTHUSE 33, but data for outcome variables (overall survival and event status) were hidden from the challenge participants so that ENTHUSE 33 could be used for independent validation. Methods were evaluated using the integrated time-dependent area under the curve (iAUC). The reference model, based on eight clinical variables and a penalised Cox proportional-hazards model, was used to compare method performance. Further validation was done using data from a fifth trial-ENTHUSE M1-in which 266 patients with metastatic castration-resistant prostate cancer were treated with placebo alone. FINDINGS: 50 independent methods were developed to predict overall survival and were evaluated through the DREAM challenge. The top performer was based on an ensemble of penalised Cox regression models (ePCR), which uniquely identified predictive interaction effects with immune biomarkers and markers of hepatic and renal function. Overall, ePCR outperformed all other methods (iAUC 0·791; Bayes factor >5) and surpassed the reference model (iAUC 0·743; Bayes factor >20). Both the ePCR model and reference models stratified patients in the ENTHUSE 33 trial into high-risk and low-risk groups with significantly different overall survival (ePCR: hazard ratio 3·32, 95% CI 2·39-4·62, p<0·0001; reference model: 2·56, 1·85-3·53, p<0·0001). The new model was validated further on the ENTHUSE M1 cohort with similarly high performance (iAUC 0·768). Meta-analysis across all methods confirmed previously identified predictive clinical variables and revealed aspartate aminotransferase as an important, albeit previously under-reported, prognostic biomarker. INTERPRETATION: Novel prognostic factors were delineated, and the assessment of 50 methods developed by independent international teams establishes a benchmark for development of methods in the future. The results of this effort show that data-sharing, when combined with a crowdsourced challenge, is a robust and powerful framework to develop new prognostic models in advanced prostate cancer. FUNDING: Sanofi US Services, Project Data Sphere.

Assuntos

Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Modelos Estatísticos , Nomogramas , Neoplasias de Próstata Resistentes à Castração/mortalidade , Adolescente , Adulto , Idoso , Teorema de Bayes , Crowdsourcing , Docetaxel , Seguimentos , Humanos , Masculino , Pessoa de Meia-Idade , Estadiamento de Neoplasias , Prednisona/administração & dosagem , Prognóstico , Neoplasias de Próstata Resistentes à Castração/tratamento farmacológico , Neoplasias de Próstata Resistentes à Castração/secundário , Taxa de Sobrevida , Taxoides/administração & dosagem , Adulto Jovem

11.

A Crowdsourcing Approach to Developing and Assessing Prediction Algorithms for AML Prognosis.

Noren, David P; Long, Byron L; Norel, Raquel; Rrhissorrakrai, Kahn; Hess, Kenneth; Hu, Chenyue Wendy; Bisberg, Alex J; Schultz, Andre; Engquist, Erik; Liu, Li; Lin, Xihui; Chen, Gregory M; Xie, Honglei; Hunter, Geoffrey A M; Boutros, Paul C; Stepanov, Oleg; Norman, Thea; Friend, Stephen H; Stolovitzky, Gustavo; Kornblau, Steven; Qutub, Amina A.

PLoS Comput Biol ; 12(6): e1004890, 2016 06.

Artigo em Inglês | MEDLINE | ID: mdl-27351836

RESUMO

Acute Myeloid Leukemia (AML) is a fatal hematological cancer. The genetic abnormalities underlying AML are extremely heterogeneous among patients, making prognosis and treatment selection very difficult. While clinical proteomics data has the potential to improve prognosis accuracy, thus far, the quantitative means to do so have yet to be developed. Here we report the results and insights gained from the DREAM 9 Acute Myeloid Prediction Outcome Prediction Challenge (AML-OPC), a crowdsourcing effort designed to promote the development of quantitative methods for AML prognosis prediction. We identify the most accurate and robust models in predicting patient response to therapy, remission duration, and overall survival. We further investigate patient response to therapy, a clinically actionable prediction, and find that patients that are classified as resistant to therapy are harder to predict than responsive patients across the 31 models submitted to the challenge. The top two performing models, which held a high sensitivity to these patients, substantially utilized the proteomics data to make predictions. Using these models, we also identify which signaling proteins were useful in predicting patient therapeutic response.

Assuntos

Algoritmos , Esclerose Lateral Amiotrófica/diagnóstico , Esclerose Lateral Amiotrófica/terapia , Crowdsourcing/métodos , Avaliação de Processos e Resultados em Cuidados de Saúde/métodos , Proteoma/metabolismo , Esclerose Lateral Amiotrófica/metabolismo , Biomarcadores/metabolismo , Humanos , Reprodutibilidade dos Testes , Medição de Risco , Sensibilidade e Especificidade , Resultado do Tratamento

12.

A computational method for designing diverse linear epitopes including citrullinated peptides with desired binding affinities to intravenous immunoglobulin.

Patro, Rob; Norel, Raquel; Prill, Robert J; Saez-Rodriguez, Julio; Lorenz, Peter; Steinbeck, Felix; Ziems, Bjoern; Lustrek, Mitja; Barbarini, Nicola; Tiengo, Alessandra; Bellazzi, Riccardo; Thiesen, Hans-Jürgen; Stolovitzky, Gustavo; Kingsford, Carl.

BMC Bioinformatics ; 17: 155, 2016 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-27059896

RESUMO

BACKGROUND: Understanding the interactions between antibodies and the linear epitopes that they recognize is an important task in the study of immunological diseases. We present a novel computational method for the design of linear epitopes of specified binding affinity to Intravenous Immunoglobulin (IVIg). RESULTS: We show that the method, called Pythia-design can accurately design peptides with both high-binding affinity and low binding affinity to IVIg. To show this, we experimentally constructed and tested the computationally constructed designs. We further show experimentally that these designed peptides are more accurate that those produced by a recent method for the same task. Pythia-design is based on combining random walks with an ensemble of probabilistic support vector machines (SVM) classifiers, and we show that it produces a diverse set of designed peptides, an important property to develop robust sets of candidates for construction. We show that by combining Pythia-design and the method of (PloS ONE 6(8):23616, 2011), we are able to produce an even more accurate collection of designed peptides. Analysis of the experimental validation of Pythia-design peptides indicates that binding of IVIg is favored by epitopes that contain trypthophan and cysteine. CONCLUSIONS: Our method, Pythia-design, is able to generate a diverse set of binding and non-binding peptides, and its designs have been experimentally shown to be accurate.

Assuntos

Biologia Computacional/métodos , Epitopos/química , Imunoglobulinas Intravenosas/química , Peptídeos Cíclicos/química , Citrulina/química , Cisteína/química , Humanos , Modelos Moleculares , Reprodutibilidade dos Testes , Máquina de Vetores de Suporte , Triptofano/química

13.

Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach.

Meyer, Pablo; Siwo, Geoffrey; Zeevi, Danny; Sharon, Eilon; Norel, Raquel; Segal, Eran; Stolovitzky, Gustavo.

Genome Res ; 23(11): 1928-37, 2013 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-23950146

RESUMO

The Gene Promoter Expression Prediction challenge consisted of predicting gene expression from promoter sequences in a previously unknown experimentally generated data set. The challenge was presented to the community in the framework of the sixth Dialogue for Reverse Engineering Assessments and Methods (DREAM6), a community effort to evaluate the status of systems biology modeling methodologies. Nucleotide-specific promoter activity was obtained by measuring fluorescence from promoter sequences fused upstream of a gene for yellow fluorescence protein and inserted in the same genomic site of yeast Saccharomyces cerevisiae. Twenty-one teams submitted results predicting the expression levels of 53 different promoters from yeast ribosomal protein genes. Analysis of participant predictions shows that accurate values for low-expressed and mutated promoters were difficult to obtain, although in the latter case, only when the mutation induced a large change in promoter activity compared to the wild-type sequence. As in previous DREAM challenges, we found that aggregation of participant predictions provided robust results, but did not fare better than the three best algorithms. Finally, this study not only provides a benchmark for the assessment of methods predicting activity of a specific set of promoters from their sequence, but it also shows that the top performing algorithm, which used machine-learning approaches, can be improved by the addition of biological features such as transcription factor binding sites.

Assuntos

Crowdsourcing , Expressão Gênica , Regiões Promotoras Genéticas , Proteínas Ribossômicas/genética , Ribossomos/genética , Saccharomyces cerevisiae/genética , Algoritmos , Sítios de Ligação/genética , Perfilação da Expressão Gênica , Regulação Fúngica da Expressão Gênica , Redes Reguladoras de Genes , Genes Fúngicos , Modelos Genéticos , Mutação , Elementos Reguladores de Transcrição , Ribossomos/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Biologia de Sistemas

14.

Understanding the limits of animal models as predictors of human biology: lessons learned from the sbv IMPROVER Species Translation Challenge.

Rhrissorrakrai, Kahn; Belcastro, Vincenzo; Bilal, Erhan; Norel, Raquel; Poussin, Carine; Mathis, Carole; Dulize, Rémi H J; Ivanov, Nikolai V; Alexopoulos, Leonidas; Rice, J Jeremy; Peitsch, Manuel C; Stolovitzky, Gustavo; Meyer, Pablo; Hoeng, Julia.

Bioinformatics ; 31(4): 471-83, 2015 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-25236459

RESUMO

MOTIVATION: Inferring how humans respond to external cues such as drugs, chemicals, viruses or hormones is an essential question in biomedicine. Very often, however, this question cannot be addressed because it is not possible to perform experiments in humans. A reasonable alternative consists of generating responses in animal models and 'translating' those results to humans. The limitations of such translation, however, are far from clear, and systematic assessments of its actual potential are urgently needed. sbv IMPROVER (systems biology verification for Industrial Methodology for PROcess VErification in Research) was designed as a series of challenges to address translatability between humans and rodents. This collaborative crowd-sourcing initiative invited scientists from around the world to apply their own computational methodologies on a multilayer systems biology dataset composed of phosphoproteomics, transcriptomics and cytokine data derived from normal human and rat bronchial epithelial cells exposed in parallel to 52 different stimuli under identical conditions. Our aim was to understand the limits of species-to-species translatability at different levels of biological organization: signaling, transcriptional and release of secreted factors (such as cytokines). Participating teams submitted 49 different solutions across the sub-challenges, two-thirds of which were statistically significantly better than random. Additionally, similar computational methods were found to range widely in their performance within the same challenge, and no single method emerged as a clear winner across all sub-challenges. Finally, computational methods were able to effectively translate some specific stimuli and biological processes in the lung epithelial system, such as DNA synthesis, cytoskeleton and extracellular matrix, translation, immune/inflammation and growth factor/proliferation pathways, better than the expected response similarity between species. CONTACT: pmeyerr@us.ibm.com or Julia.Hoeng@pmi.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Citocinas/metabolismo , Perfilação da Expressão Gênica , Modelos Animais , Fosfoproteínas/metabolismo , Software , Biologia de Sistemas/métodos , Animais , Brônquios/citologia , Brônquios/metabolismo , Células Cultivadas , Bases de Dados Factuais , Células Epiteliais/citologia , Células Epiteliais/metabolismo , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Fosforilação , Ratos , Especificidade da Espécie , Pesquisa Translacional Biomédica

15.

A crowd-sourcing approach for the construction of species-specific cell signaling networks.

Bilal, Erhan; Sakellaropoulos, Theodore; Melas, Ioannis N; Messinis, Dimitris E; Belcastro, Vincenzo; Rhrissorrakrai, Kahn; Meyer, Pablo; Norel, Raquel; Iskandar, Anita; Blaese, Elise; Rice, John J; Peitsch, Manuel C; Hoeng, Julia; Stolovitzky, Gustavo; Alexopoulos, Leonidas G; Poussin, Carine.

Bioinformatics ; 31(4): 484-91, 2015 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-25294919

RESUMO

MOTIVATION: Animal models are important tools in drug discovery and for understanding human biology in general. However, many drugs that initially show promising results in rodents fail in later stages of clinical trials. Understanding the commonalities and differences between human and rat cell signaling networks can lead to better experimental designs, improved allocation of resources and ultimately better drugs. RESULTS: The sbv IMPROVER Species-Specific Network Inference challenge was designed to use the power of the crowds to build two species-specific cell signaling networks given phosphoproteomics, transcriptomics and cytokine data generated from NHBE and NRBE cells exposed to various stimuli. A common literature-inspired reference network with 220 nodes and 501 edges was also provided as prior knowledge from which challenge participants could add or remove edges but not nodes. Such a large network inference challenge not based on synthetic simulations but on real data presented unique difficulties in scoring and interpreting the results. Because any prior knowledge about the networks was already provided to the participants for reference, novel ways for scoring and aggregating the results were developed. Two human and rat consensus networks were obtained by combining all the inferred networks. Further analysis showed that major signaling pathways were conserved between the two species with only isolated components diverging, as in the case of ribosomal S6 kinase RPS6KA1. Overall, the consensus between inferred edges was relatively high with the exception of the downstream targets of transcription factors, which seemed more difficult to predict. CONTACT: ebilal@us.ibm.com or gustavo@us.ibm.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Crowdsourcing , Citocinas/metabolismo , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Fosfoproteínas/metabolismo , Software , Biologia de Sistemas/métodos , Animais , Brônquios/citologia , Brônquios/metabolismo , Comunicação Celular , Células Cultivadas , Bases de Dados Factuais , Células Epiteliais/citologia , Células Epiteliais/metabolismo , Regulação da Expressão Gênica , Humanos , Modelos Animais , Análise de Sequência com Séries de Oligonucleotídeos , Fosforilação , Ratos , Transdução de Sinais , Especificidade da Espécie

16.

Summary of the DREAM8 Parameter Estimation Challenge: Toward Parameter Identification for Whole-Cell Models.

Karr, Jonathan R; Williams, Alex H; Zucker, Jeremy D; Raue, Andreas; Steiert, Bernhard; Timmer, Jens; Kreutz, Clemens; Wilkinson, Simon; Allgood, Brandon A; Bot, Brian M; Hoff, Bruce R; Kellen, Michael R; Covert, Markus W; Stolovitzky, Gustavo A; Meyer, Pablo.

PLoS Comput Biol ; 11(5): e1004096, 2015 May.

Artigo em Inglês | MEDLINE | ID: mdl-26020786

RESUMO

Whole-cell models that explicitly represent all cellular components at the molecular level have the potential to predict phenotype from genotype. However, even for simple bacteria, whole-cell models will contain thousands of parameters, many of which are poorly characterized or unknown. New algorithms are needed to estimate these parameters and enable researchers to build increasingly comprehensive models. We organized the Dialogue for Reverse Engineering Assessments and Methods (DREAM) 8 Whole-Cell Parameter Estimation Challenge to develop new parameter estimation algorithms for whole-cell models. We asked participants to identify a subset of parameters of a whole-cell model given the model's structure and in silico "experimental" data. Here we describe the challenge, the best performing methods, and new insights into the identifiability of whole-cell models. We also describe several valuable lessons we learned toward improving future challenges. Going forward, we believe that collaborative efforts supported by inexpensive cloud computing have the potential to solve whole-cell model parameter estimation.

Assuntos

Células/metabolismo , Modelos Biológicos , Algoritmos , Bactérias/genética , Bactérias/metabolismo , Bioengenharia , Computação em Nuvem , Biologia Computacional , Simulação por Computador , Estudos de Associação Genética/estatística & dados numéricos , Mutação , Mycoplasma genitalium/genética , Mycoplasma genitalium/metabolismo

17.

The inconvenience of data of convenience: computational research beyond post-mortem analyses.

Azencott, Chloé-Agathe; Aittokallio, Tero; Roy, Sushmita; Norman, Thea; Friend, Stephen; Stolovitzky, Gustavo; Goldenberg, Anna.

Nat Methods ; 14(10): 937-938, 2017 09 29.

Artigo em Inglês | MEDLINE | ID: mdl-28960198

Assuntos

Biologia Computacional/métodos , Biologia Computacional/normas , Análise de Dados , Projetos de Pesquisa/normas , Antirreumáticos/uso terapêutico , Artrite Reumatoide/tratamento farmacológico , Artrite Reumatoide/genética , Biologia Computacional/estatística & dados numéricos , Estudo de Associação Genômica Ampla , Humanos , Polimorfismo de Nucleotídeo Único , Valor Preditivo dos Testes , Projetos de Pesquisa/estatística & dados numéricos

18.

Wisdom of crowds for robust gene network inference.

Marbach, Daniel; Costello, James C; Küffner, Robert; Vega, Nicole M; Prill, Robert J; Camacho, Diogo M; Allison, Kyle R; Kellis, Manolis; Collins, James J; Stolovitzky, Gustavo.

Nat Methods ; 9(8): 796-804, 2012 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-22796662

RESUMO

Reconstructing gene regulatory networks from high-throughput data is a long-standing challenge. Through the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we performed a comprehensive blind assessment of over 30 network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae and in silico microarray data. We characterize the performance, data requirements and inherent biases of different inference approaches, and we provide guidelines for algorithm application and development. We observed that no single inference method performs optimally across all data sets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse data sets. We thereby constructed high-confidence networks for E. coli and S. aureus, each comprising ~1,700 transcriptional interactions at a precision of ~50%. We experimentally tested 53 previously unobserved regulatory interactions in E. coli, of which 23 (43%) were supported. Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks.

Assuntos

Biologia Computacional , Regulação Bacteriana da Expressão Gênica/genética , Redes Reguladoras de Genes , Análise de Sequência com Séries de Oligonucleotídeos , Algoritmos , Escherichia coli/genética , Saccharomyces cerevisiae/genética , Software , Staphylococcus aureus/genética , Transcrição Gênica/genética

19.

Quantitative modeling of the terminal differentiation of B cells and mechanisms of lymphomagenesis.

Martínez, María Rodríguez; Corradin, Alberto; Klein, Ulf; Álvarez, Mariano Javier; Toffolo, Gianna M; di Camillo, Barbara; Califano, Andrea; Stolovitzky, Gustavo A.

Proc Natl Acad Sci U S A ; 109(7): 2672-7, 2012 Feb 14.

Artigo em Inglês | MEDLINE | ID: mdl-22308355

RESUMO

Mature B-cell exit from germinal centers is controlled by a transcriptional regulatory module that integrates antigen and T-cell signals and, ultimately, leads to terminal differentiation into memory B cells or plasma cells. Despite a compact structure, the module dynamics are highly complex because of the presence of several feedback loops and self-regulatory interactions, and understanding its dysregulation, frequently associated with lymphomagenesis, requires robust dynamical modeling techniques. We present a quantitative kinetic model of three key gene regulators, BCL6, IRF4, and BLIMP, and use gene expression profile data from mature human B cells to determine appropriate model parameters. The model predicts the existence of two different hysteresis cycles that direct B cells through an irreversible transition toward a differentiated cellular state. By synthetically perturbing the interactions in this network, we can elucidate known mechanisms of lymphomagenesis and suggest candidate tumorigenic alterations, indicating that the model is a valuable quantitative tool to simulate B-cell exit from the germinal center under a variety of physiological and pathological conditions.

Assuntos

Linfócitos B/citologia , Diferenciação Celular , Linfoma/patologia , Linfócitos B/imunologia , Perfilação da Expressão Gênica , Humanos , Memória Imunológica , Linfoma/genética

20.

Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge.

Tarca, Adi L; Lauria, Mario; Unger, Michael; Bilal, Erhan; Boue, Stephanie; Kumar Dey, Kushal; Hoeng, Julia; Koeppl, Heinz; Martin, Florian; Meyer, Pablo; Nandy, Preetam; Norel, Raquel; Peitsch, Manuel; Rice, Jeremy J; Romero, Roberto; Stolovitzky, Gustavo; Talikka, Marja; Xiang, Yang; Zechner, Christoph.

Bioinformatics ; 29(22): 2892-9, 2013 Nov 15.

Artigo em Inglês | MEDLINE | ID: mdl-23966112

RESUMO

MOTIVATION: After more than a decade since microarrays were used to predict phenotype of biological samples, real-life applications for disease screening and identification of patients who would best benefit from treatment are still emerging. The interest of the scientific community in identifying best approaches to develop such prediction models was reaffirmed in a competition style international collaboration called IMPROVER Diagnostic Signature Challenge whose results we describe herein. RESULTS: Fifty-four teams used public data to develop prediction models in four disease areas including multiple sclerosis, lung cancer, psoriasis and chronic obstructive pulmonary disease, and made predictions on blinded new data that we generated. Teams were scored using three metrics that captured various aspects of the quality of predictions, and best performers were awarded. This article presents the challenge results and introduces to the community the approaches of the best overall three performers, as well as an R package that implements the approach of the best overall team. The analyses of model performance data submitted in the challenge as well as additional simulations that we have performed revealed that (i) the quality of predictions depends more on the disease endpoint than on the particular approaches used in the challenge; (ii) the most important modeling factor (e.g. data preprocessing, feature selection and classifier type) is problem dependent; and (iii) for optimal results datasets and methods have to be carefully matched. Biomedical factors such as the disease severity and confidence in diagnostic were found to be associated with the misclassification rates across the different teams. AVAILABILITY: The lung cancer dataset is available from Gene Expression Omnibus (accession, GSE43580). The maPredictDSC R package implementing the approach of the best overall team is available at www.bioconductor.org or http://bioinformaticsprb.med.wayne.edu/.

Assuntos

Perfilação da Expressão Gênica/métodos , Técnicas de Diagnóstico Molecular , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Fenótipo , Doença/genética , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Esclerose Múltipla/diagnóstico , Esclerose Múltipla/genética , Psoríase/diagnóstico , Psoríase/genética , Doença Pulmonar Obstrutiva Crônica/diagnóstico , Doença Pulmonar Obstrutiva Crônica/genética

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa