Pesquisa | Biblioteca Virtual em Saúde

1.

A novel estimator for the two-way partial AUC.

Chaibub Neto, Elias; Yadav, Vijay; Sieberts, Solveig K; Omberg, Larsson.

BMC Med Inform Decis Mak ; 24(1): 57, 2024 Feb 20.

Artigo em Inglês | MEDLINE | ID: mdl-38378636

RESUMO

BACKGROUND: The two-way partial AUC has been recently proposed as a way to directly quantify partial area under the ROC curve with simultaneous restrictions on the sensitivity and specificity ranges of diagnostic tests or classifiers. The metric, as originally implemented in the tpAUC R package, is estimated using a nonparametric estimator based on a trimmed Mann-Whitney U-statistic, which becomes computationally expensive in large sample sizes. (Its computational complexity is of order [Formula: see text], where [Formula: see text] and [Formula: see text] represent the number of positive and negative cases, respectively). This is problematic since the statistical methodology for comparing estimates generated from alternative diagnostic tests/classifiers relies on bootstrapping resampling and requires repeated computations of the estimator on a large number of bootstrap samples. METHODS: By leveraging the graphical and probabilistic representations of the AUC, partial AUCs, and two-way partial AUC, we derive a novel estimator for the two-way partial AUC, which can be directly computed from the output of any software able to compute AUC and partial AUCs. We implemented our estimator using the computationally efficient pROC R package, which leverages a nonparametric approach using the trapezoidal rule for the computation of AUC and partial AUC scores. (Its computational complexity is of order [Formula: see text], where [Formula: see text].). We compare the empirical bias and computation time of the proposed estimator against the original estimator provided in the tpAUC package in a series of simulation studies and on two real datasets. RESULTS: Our estimator tended to be less biased than the original estimator based on the trimmed Mann-Whitney U-statistic across all experiments (and showed considerably less bias in the experiments based on small sample sizes). But, most importantly, because the computational complexity of the proposed estimator is of order [Formula: see text], rather than [Formula: see text], it is much faster to compute when sample sizes are large. CONCLUSIONS: The proposed estimator provides an improvement for the computation of two-way partial AUC, and allows the comparison of diagnostic tests/machine learning classifiers in large datasets where repeated computations of the original estimator on bootstrap samples become too expensive to compute.

Assuntos

Área Sob a Curva , Humanos , Simulação por Computador

2.

Disentangling personalized treatment effects from "time-of-the-day" confounding in mobile health studies.

Chaibub Neto, Elias; Perumal, Thanneer M; Pratap, Abhishek; Tediarjo, Aryton; Bot, Brian M; Mangravite, Lara; Omberg, Larsson.

PLoS One ; 17(8): e0271766, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35925980

RESUMO

Ideally, a patient's response to medication can be monitored by measuring changes in performance of some activity. In observational studies, however, any detected association between treatment ("on-medication" vs "off-medication") and the outcome (performance in the activity) might be due to confounders. In particular, causal inferences at the personalized level are especially vulnerable to confounding effects that arise in a cyclic fashion. For quick acting medications, effects can be confounded by circadian rhythms and daily routines. Using the time-of-the-day as a surrogate for these confounders and the performance measurements as captured on a smartphone, we propose a personalized statistical approach to disentangle putative treatment and "time-of-the-day" effects, that leverages conditional independence relations spanned by causal graphical models involving the treatment, time-of-the-day, and outcome variables. Our approach is based on conditional independence tests implemented via standard and temporal linear regression models. Using synthetic data, we investigate when and how residual autocorrelation can affect the standard tests, and how time series modeling (namely, ARIMA and robust regression via HAC covariance matrix estimators) can remedy these issues. In particular, our simulations illustrate that when patients perform their activities in a paired fashion, positive autocorrelation can lead to conservative results for the standard regression approach (i.e., lead to deflated true positive detection), whereas negative autocorrelation can lead to anticonservative behavior (i.e., lead to inflated false positive detection). The adoption of time series methods, on the other hand, leads to well controlled type I error rates. We illustrate the application of our methodology with data from a Parkinson's disease mobile health study.

Assuntos

Medicina de Precisão , Telemedicina , Causalidade , Humanos , Modelos Lineares , Smartphone

3.

Remote smartphone monitoring of Parkinson's disease and individual response to therapy.

Omberg, Larsson; Chaibub Neto, Elias; Perumal, Thanneer M; Pratap, Abhishek; Tediarjo, Aryton; Adams, Jamie; Bloem, Bastiaan R; Bot, Brian M; Elson, Molly; Goldman, Samuel M; Kellen, Michael R; Kieburtz, Karl; Klein, Arno; Little, Max A; Schneider, Ruth; Suver, Christine; Tarolli, Christopher; Tanner, Caroline M; Trister, Andrew D; Wilbanks, John; Dorsey, E Ray; Mangravite, Lara M.

Nat Biotechnol ; 40(4): 480-487, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-34373643

RESUMO

Remote health assessments that gather real-world data (RWD) outside clinic settings require a clear understanding of appropriate methods for data collection, quality assessment, analysis and interpretation. Here we examine the performance and limitations of smartphones in collecting RWD in the remote mPower observational study of Parkinson's disease (PD). Within the first 6 months of study commencement, 960 participants had enrolled and performed at least five self-administered active PD symptom assessments (speeded tapping, gait/balance, phonation or memory). Task performance, especially speeded tapping, was predictive of self-reported PD status (area under the receiver operating characteristic curve (AUC) = 0.8) and correlated with in-clinic evaluation of disease severity (r = 0.71; P < 1.8 × 10-6) when compared with motor Movement Disorder Society-Unified Parkinson's Disease Rating Scale (MDS-UPDRS). Although remote assessment requires careful consideration for accurate interpretation of RWD, our results support the use of smartphones and wearables in objective and personalized disease assessments.

Assuntos

Doença de Parkinson , Smartphone , Marcha , Humanos , Movimento , Doença de Parkinson/diagnóstico , Índice de Gravidade de Doença

4.

Crowdsourcing digital health measures to predict Parkinson's disease severity: the Parkinson's Disease Digital Biomarker DREAM Challenge.

Sieberts, Solveig K; Schaff, Jennifer; Duda, Marlena; Pataki, Bálint Ármin; Sun, Ming; Snyder, Phil; Daneault, Jean-Francois; Parisi, Federico; Costante, Gianluca; Rubin, Udi; Banda, Peter; Chae, Yooree; Chaibub Neto, Elias; Dorsey, E Ray; Aydin, Zafer; Chen, Aipeng; Elo, Laura L; Espino, Carlos; Glaab, Enrico; Goan, Ethan; Golabchi, Fatemeh Noushin; Görmez, Yasin; Jaakkola, Maria K; Jonnagaddala, Jitendra; Klén, Riku; Li, Dongmei; McDaniel, Christian; Perrin, Dimitri; Perumal, Thanneer M; Rad, Nastaran Mohammadian; Rainaldi, Erin; Sapienza, Stefano; Schwab, Patrick; Shokhirev, Nikolai; Venäläinen, Mikko S; Vergara-Diaz, Gloria; Zhang, Yuqian; Wang, Yuanjia; Guan, Yuanfang; Brunner, Daniela; Bonato, Paolo; Mangravite, Lara M; Omberg, Larsson.

NPJ Digit Med ; 4(1): 53, 2021 Mar 19.

Artigo em Inglês | MEDLINE | ID: mdl-33742069

RESUMO

Consumer wearables and sensors are a rich source of data about patients' daily disease and symptom burden, particularly in the case of movement disorders like Parkinson's disease (PD). However, interpreting these complex data into so-called digital biomarkers requires complicated analytical approaches, and validating these biomarkers requires sufficient data and unbiased evaluation methods. Here we describe the use of crowdsourcing to specifically evaluate and benchmark features derived from accelerometer and gyroscope data in two different datasets to predict the presence of PD and severity of three PD symptoms: tremor, dyskinesia, and bradykinesia. Forty teams from around the world submitted features, and achieved drastically improved predictive performance for PD status (best AUROC = 0.87), as well as tremor- (best AUPR = 0.75), dyskinesia- (best AUPR = 0.48) and bradykinesia-severity (best AUPR = 0.95).

5.

Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms.

Schaffter, Thomas; Buist, Diana S M; Lee, Christoph I; Nikulin, Yaroslav; Ribli, Dezso; Guan, Yuanfang; Lotter, William; Jie, Zequn; Du, Hao; Wang, Sijia; Feng, Jiashi; Feng, Mengling; Kim, Hyo-Eun; Albiol, Francisco; Albiol, Alberto; Morrell, Stephen; Wojna, Zbigniew; Ahsen, Mehmet Eren; Asif, Umar; Jimeno Yepes, Antonio; Yohanandan, Shivanthan; Rabinovici-Cohen, Simona; Yi, Darvin; Hoff, Bruce; Yu, Thomas; Chaibub Neto, Elias; Rubin, Daniel L; Lindholm, Peter; Margolies, Laurie R; McBride, Russell Bailey; Rothstein, Joseph H; Sieh, Weiva; Ben-Ari, Rami; Harrer, Stefan; Trister, Andrew; Friend, Stephen; Norman, Thea; Sahiner, Berkman; Strand, Fredrik; Guinney, Justin; Stolovitzky, Gustavo; Mackey, Lester; Cahoon, Joyce; Shen, Li; Sohn, Jae Ho; Trivedi, Hari; Shen, Yiqiu; Buturovic, Ljubomir; Pereira, Jose Costa; Cardoso, Jaime S.

JAMA Netw Open ; 3(3): e200265, 2020 03 02.

Artigo em Inglês | MEDLINE | ID: mdl-32119094

RESUMO

Importance: Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives. Objective: To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms. Design, Setting, and Participants: In this diagnostic accuracy study conducted between September 2016 and November 2017, an international, crowdsourced challenge was hosted to foster AI algorithm development focused on interpreting screening mammography. More than 1100 participants comprising 126 teams from 44 countries participated. Analysis began November 18, 2016. Main Outcomes and Measurements: Algorithms used images alone (challenge 1) or combined images, previous examinations (if available), and clinical and demographic risk factor data (challenge 2) and output a score that translated to cancer yes/no within 12 months. Algorithm accuracy for breast cancer detection was evaluated using area under the curve and algorithm specificity compared with radiologists' specificity with radiologists' sensitivity set at 85.9% (United States) and 83.9% (Sweden). An ensemble method aggregating top-performing AI algorithms and radiologists' recall assessment was developed and evaluated. Results: Overall, 144â¯231 screening mammograms from 85â¯580 US women (952 cancer positive ≤12 months from screening) were used for algorithm training and validation. A second independent validation cohort included 166â¯578 examinations from 68â¯008 Swedish women (780 cancer positive). The top-performing algorithm achieved an area under the curve of 0.858 (United States) and 0.903 (Sweden) and 66.2% (United States) and 81.2% (Sweden) specificity at the radiologists' sensitivity, lower than community-practice radiologists' specificity of 90.5% (United States) and 98.5% (Sweden). Combining top-performing algorithms and US radiologist assessments resulted in a higher area under the curve of 0.942 and achieved a significantly improved specificity (92.0%) at the same sensitivity. Conclusions and Relevance: While no single AI algorithm outperformed radiologists, an ensemble of AI algorithms combined with radiologist assessment in a single-reader screening environment improved overall accuracy. This study underscores the potential of using machine learning methods for enhancing mammography screening interpretation.

Assuntos

Neoplasias da Mama/diagnóstico por imagem , Aprendizado Profundo , Interpretação de Imagem Assistida por Computador/métodos , Mamografia/métodos , Radiologistas , Adulto , Idoso , Algoritmos , Inteligência Artificial , Detecção Precoce de Câncer , Feminino , Humanos , Pessoa de Meia-Idade , Radiologia , Sensibilidade e Especificidade , Suécia , Estados Unidos

6.

Multiple Myeloma DREAM Challenge reveals epigenetic regulator PHF19 as marker of aggressive disease.

Mason, Mike J; Schinke, Carolina; Eng, Christine L P; Towfic, Fadi; Gruber, Fred; Dervan, Andrew; White, Brian S; Pratapa, Aditya; Guan, Yuanfang; Chen, Hongjie; Cui, Yi; Li, Bailiang; Yu, Thomas; Chaibub Neto, Elias; Mavrommatis, Konstantinos; Ortiz, Maria; Lyzogubov, Valeriy; Bisht, Kamlesh; Dai, Hongyue Y; Schmitz, Frank; Flynt, Erin; Danziger, Samuel A; Ratushny, Alexander; Dalton, William S; Goldschmidt, Hartmut; Avet-Loiseau, Herve; Samur, Mehmet; Hayete, Boris; Sonneveld, Pieter; Shain, Kenneth H; Munshi, Nikhil; Auclair, Daniel; Hose, Dirk; Morgan, Gareth; Trotter, Matthew; Bassett, Douglas; Goke, Jonathan; Walker, Brian A; Thakurta, Anjan; Guinney, Justin.

Leukemia ; 34(7): 1866-1874, 2020 07.

Artigo em Inglês | MEDLINE | ID: mdl-32060406

RESUMO

While the past decade has seen meaningful improvements in clinical outcomes for multiple myeloma patients, a subset of patients does not benefit from current therapeutics for unclear reasons. Many gene expression-based models of risk have been developed, but each model uses a different combination of genes and often involves assaying many genes making them difficult to implement. We organized the Multiple Myeloma DREAM Challenge, a crowdsourced effort to develop models of rapid progression in newly diagnosed myeloma patients and to benchmark these against previously published models. This effort lead to more robust predictors and found that incorporating specific demographic and clinical features improved gene expression-based models of high risk. Furthermore, post-challenge analysis identified a novel expression-based risk marker, PHF19, which has recently been found to have an important biological role in multiple myeloma. Lastly, we show that a simple four feature predictor composed of age, ISS, and expression of PHF19 and MMSET performs similarly to more complex models with many more gene expression features included.

Assuntos

Biomarcadores Tumorais/metabolismo , Ensaios Clínicos como Assunto/estatística & dados numéricos , Proteínas de Ligação a DNA/metabolismo , Epigênese Genética , Regulação Neoplásica da Expressão Gênica , Modelos Estatísticos , Mieloma Múltiplo/patologia , Fatores de Transcrição/metabolismo , Biomarcadores Tumorais/genética , Ciclo Celular , Proliferação de Células , Proteínas de Ligação a DNA/genética , Bases de Dados Factuais , Conjuntos de Dados como Assunto , Humanos , Mieloma Múltiplo/genética , Mieloma Múltiplo/metabolismo , Fatores de Transcrição/genética , Células Tumorais Cultivadas

7.

Data Science Approaches for Effective Use of Mobile Device-Based Collection of Real-World Data.

Omberg, Larsson; Chaibub Neto, Elias; Mangravite, Lara M.

Clin Pharmacol Ther ; 107(4): 719-721, 2020 04.

Artigo em Inglês | MEDLINE | ID: mdl-32036612

Assuntos

Telefone Celular , Coleta de Dados/métodos , Ciência de Dados/métodos , Dispositivos Eletrônicos Vestíveis , Humanos , Aprendizado de Máquina , Vigilância de Produtos Comercializados/métodos

8.

Detecting the impact of subject characteristics on machine learning-based diagnostic applications.

Chaibub Neto, Elias; Pratap, Abhishek; Perumal, Thanneer M; Tummalacherla, Meghasyam; Snyder, Phil; Bot, Brian M; Trister, Andrew D; Friend, Stephen H; Mangravite, Lara; Omberg, Larsson.

NPJ Digit Med ; 2: 99, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31633058

RESUMO

Collection of high-dimensional, longitudinal digital health data has the potential to support a wide-variety of research and clinical applications including diagnostics and longitudinal health tracking. Algorithms that process these data and inform digital diagnostics are typically developed using training and test sets generated from multiple repeated measures collected across a set of individuals. However, the inclusion of repeated measurements is not always appropriately taken into account in the analytical evaluations of predictive performance. The assignment of repeated measurements from each individual to both the training and the test sets ("record-wise" data split) is a common practice and can lead to massive underestimation of the prediction error due to the presence of "identity confounding." In essence, these models learn to identify subjects, in addition to diagnostic signal. Here, we present a method that can be used to effectively calculate the amount of identity confounding learned by classifiers developed using a record-wise data split. By applying this method to several real datasets, we demonstrate that identity confounding is a serious issue in digital health studies and that record-wise data splits for machine learning- based applications need to be avoided.

9.

Mindboggling morphometry of human brains.

Klein, Arno; Ghosh, Satrajit S; Bao, Forrest S; Giard, Joachim; Häme, Yrjö; Stavsky, Eliezer; Lee, Noah; Rossa, Brian; Reuter, Martin; Chaibub Neto, Elias; Keshavan, Anisha.

PLoS Comput Biol ; 13(2): e1005350, 2017 02.

Artigo em Inglês | MEDLINE | ID: mdl-28231282

RESUMO

Mindboggle (http://mindboggle.info) is an open source brain morphometry platform that takes in preprocessed T1-weighted MRI data and outputs volume, surface, and tabular data containing label, feature, and shape information for further analysis. In this article, we document the software and demonstrate its use in studies of shape variation in healthy and diseased humans. The number of different shape measures and the size of the populations make this the largest and most detailed shape analysis of human brains ever conducted. Brain image morphometry shows great potential for providing much-needed biological markers for diagnosing, tracking, and predicting progression of mental health disorders. Very few software algorithms provide more than measures of volume and cortical thickness, while more subtle shape measures may provide more sensitive and specific biomarkers. Mindboggle computes a variety of (primarily surface-based) shapes: area, volume, thickness, curvature, depth, Laplace-Beltrami spectra, Zernike moments, etc. We evaluate Mindboggle's algorithms using the largest set of manually labeled, publicly available brain images in the world and compare them against state-of-the-art algorithms where they exist. All data, code, and results of these evaluations are publicly available.

Assuntos

Algoritmos , Pontos de Referência Anatômicos/diagnóstico por imagem , Encefalopatias/diagnóstico por imagem , Encefalopatias/patologia , Encéfalo/patologia , Imagem de Difusão por Ressonância Magnética , Feminino , Humanos , Interpretação de Imagem Assistida por Computador , Imageamento Tridimensional , Masculino , Tamanho do Órgão , Reconhecimento Automatizado de Padrão , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Software , Técnica de Subtração

10.

Using instrumental variables to disentangle treatment and placebo effects in blinded and unblinded randomized clinical trials influenced by unmeasured confounders.

Chaibub Neto, Elias.

Sci Rep ; 6: 37154, 2016 11 21.

Artigo em Inglês | MEDLINE | ID: mdl-27869205

RESUMO

Clinical trials traditionally employ blinding as a design mechanism to reduce the influence of placebo effects. In practice, however, it can be difficult or impossible to blind study participants and unblinded trials are common in medical research. Here we show how instrumental variables can be used to quantify and disentangle treatment and placebo effects in randomized clinical trials comparing control and active treatments in the presence of confounders. The key idea is to use randomization to separately manipulate treatment assignment and psychological encouragement conversations/interactions that increase the participants' desire for improved symptoms. The proposed approach is able to improve the estimation of treatment effects in blinded studies and, most importantly, opens the doors to account for placebo effects in unblinded trials.

Assuntos

Efeito Placebo , Ensaios Clínicos Controlados Aleatórios como Assunto , Projetos de Pesquisa , Interpretação Estatística de Dados , Método Duplo-Cego , Humanos

11.

Crowdsourced estimation of cognitive decline and resilience in Alzheimer's disease.

Allen, Genevera I; Amoroso, Nicola; Anghel, Catalina; Balagurusamy, Venkat; Bare, Christopher J; Beaton, Derek; Bellotti, Roberto; Bennett, David A; Boehme, Kevin L; Boutros, Paul C; Caberlotto, Laura; Caloian, Cristian; Campbell, Frederick; Chaibub Neto, Elias; Chang, Yu-Chuan; Chen, Beibei; Chen, Chien-Yu; Chien, Ting-Ying; Clark, Tim; Das, Sudeshna; Davatzikos, Christos; Deng, Jieyao; Dillenberger, Donna; Dobson, Richard J B; Dong, Qilin; Doshi, Jimit; Duma, Denise; Errico, Rosangela; Erus, Guray; Everett, Evan; Fardo, David W; Friend, Stephen H; Fröhlich, Holger; Gan, Jessica; St George-Hyslop, Peter; Ghosh, Satrajit S; Glaab, Enrico; Green, Robert C; Guan, Yuanfang; Hong, Ming-Yi; Huang, Chao; Hwang, Jinseub; Ibrahim, Joseph; Inglese, Paolo; Iyappan, Anandhi; Jiang, Qijia; Katsumata, Yuriko; Kauwe, John S K; Klein, Arno; Kong, Dehan.

Alzheimers Dement ; 12(6): 645-53, 2016 06.

Artigo em Inglês | MEDLINE | ID: mdl-27079753

RESUMO

Identifying accurate biomarkers of cognitive decline is essential for advancing early diagnosis and prevention therapies in Alzheimer's disease. The Alzheimer's disease DREAM Challenge was designed as a computational crowdsourced project to benchmark the current state-of-the-art in predicting cognitive outcomes in Alzheimer's disease based on high dimensional, publicly available genetic and structural imaging data. This meta-analysis failed to identify a meaningful predictor developed from either data modality, suggesting that alternate approaches should be considered for prediction of cognitive performance.

Assuntos

Doença de Alzheimer/complicações , Transtornos Cognitivos/diagnóstico , Transtornos Cognitivos/etiologia , Doença de Alzheimer/genética , Apolipoproteínas E/genética , Biomarcadores , Transtornos Cognitivos/genética , Biologia Computacional , Bases de Dados Bibliográficas/estatística & dados numéricos , Humanos , Valor Preditivo dos Testes

12.

PERSONALIZED HYPOTHESIS TESTS FOR DETECTING MEDICATION RESPONSE IN PARKINSON DISEASE PATIENTS USING iPHONE SENSOR DATA.

Chaibub Neto, Elias; Bot, Brian M; Perumal, Thanneer; Omberg, Larsson; Guinney, Justin; Kellen, Mike; Klein, Arno; Friend, Stephen H; Trister, Andrew D.

Pac Symp Biocomput ; 21: 273-84, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26776193

RESUMO

We propose hypothesis tests for detecting dopaminergic medication response in Parkinson disease patients, using longitudinal sensor data collected by smartphones. The processed data is composed of multiple features extracted from active tapping tasks performed by the participant on a daily basis, before and after medication, over several months. Each extracted feature corresponds to a time series of measurements annotated according to whether the measurement was taken before or after the patient has taken his/her medication. Even though the data is longitudinal in nature, we show that simple hypothesis tests for detecting medication response, which ignore the serial correlation structure of the data, are still statistically valid, showing type I error rates at the nominal level. We propose two distinct personalized testing approaches. In the first, we combine multiple feature-specific tests into a single union-intersection test. In the second, we construct personalized classifiers of the before/after medication labels using all the extracted features of a given participant, and test the null hypothesis that the area under the receiver operating characteristic curve of the classifier is equal to 1/2. We compare the statistical power of the personalized classifier tests and personalized union-intersection tests in a simulation study, and illustrate the performance of the proposed tests using data from mPower Parkinsons disease study, recently launched as part of Apples ResearchKit mobile platform. Our results suggest that the personalized tests, which ignore the longitudinal aspect of the data, can perform well in real data analyses, suggesting they might be used as a sound baseline approach, to which more sophisticated methods can be compared to.

Assuntos

Monitoramento de Medicamentos/métodos , Doença de Parkinson/tratamento farmacológico , Medicina de Precisão/métodos , Tecnologia de Sensoriamento Remoto/métodos , Algoritmos , Telefone Celular , Biologia Computacional/métodos , Simulação por Computador , Interpretação Estatística de Dados , Dopaminérgicos/uso terapêutico , Monitoramento de Medicamentos/estatística & dados numéricos , Humanos , Modelos Estatísticos , Medicina de Precisão/estatística & dados numéricos , Tecnologia de Sensoriamento Remoto/estatística & dados numéricos

13.

Identifying robust communities and multi-community nodes by combining top-down and bottom-up approaches to clustering.

Gaiteri, Chris; Chen, Mingming; Szymanski, Boleslaw; Kuzmin, Konstantin; Xie, Jierui; Lee, Changkyu; Blanche, Timothy; Chaibub Neto, Elias; Huang, Su-Chun; Grabowski, Thomas; Madhyastha, Tara; Komashko, Vitalina.

Sci Rep ; 5: 16361, 2015 Nov 09.

Artigo em Inglês | MEDLINE | ID: mdl-26549511

RESUMO

Biological functions are carried out by groups of interacting molecules, cells or tissues, known as communities. Membership in these communities may overlap when biological components are involved in multiple functions. However, traditional clustering methods detect non-overlapping communities. These detected communities may also be unstable and difficult to replicate, because traditional methods are sensitive to noise and parameter settings. These aspects of traditional clustering methods limit our ability to detect biological communities, and therefore our ability to understand biological functions. To address these limitations and detect robust overlapping biological communities, we propose an unorthodox clustering method called SpeakEasy which identifies communities using top-down and bottom-up approaches simultaneously. Specifically, nodes join communities based on their local connections, as well as global information about the network structure. This method can quantify the stability of each community, automatically identify the number of communities, and quickly cluster networks with hundreds of thousands of nodes. SpeakEasy shows top performance on synthetic clustering benchmarks and accurately identifies meaningful biological communities in a range of datasets, including: gene microarrays, protein interactions, sorted cell populations, electrophysiology and fMRI brain imaging.

Assuntos

Análise por Conglomerados , Modelos Teóricos , Algoritmos

14.

Speeding Up Non-Parametric Bootstrap Computations for Statistics Based on Sample Moments in Small/Moderate Sample Size Applications.

Chaibub Neto, Elias.

PLoS One ; 10(6): e0131333, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26125965

RESUMO

In this paper we propose a vectorized implementation of the non-parametric bootstrap for statistics based on sample moments. Basically, we adopt the multinomial sampling formulation of the non-parametric bootstrap, and compute bootstrap replications of sample moment statistics by simply weighting the observed data according to multinomial counts instead of evaluating the statistic on a resampled version of the observed data. Using this formulation we can generate a matrix of bootstrap weights and compute the entire vector of bootstrap replications with a few matrix multiplications. Vectorization is particularly important for matrix-oriented programming languages such as R, where matrix/vector calculations tend to be faster than scalar operations implemented in a loop. We illustrate the application of the vectorized implementation in real and simulated data sets, when bootstrapping Pearson's sample correlation coefficient, and compared its performance against two state-of-the-art R implementations of the non-parametric bootstrap, as well as a straightforward one based on a for loop. Our investigations spanned varying sample sizes and number of bootstrap replications. The vectorized bootstrap compared favorably against the state-of-the-art implementations in all cases tested, and was remarkably/considerably faster for small/moderate sample sizes. The same results were observed in the comparison with the straightforward implementation, except for large sample sizes, where the vectorized bootstrap was slightly slower than the straightforward implementation due to increased time expenditures in the generation of weight matrices via multinomial sampling.

Assuntos

Modelos Estatísticos , Estatísticas não Paramétricas , Algoritmos , Tamanho da Amostra

15.

Bayesian network reconstruction using systems genetics data: comparison of MCMC methods.

Tasaki, Shinya; Sauerwine, Ben; Hoff, Bruce; Toyoshiba, Hiroyoshi; Gaiteri, Chris; Chaibub Neto, Elias.

Genetics ; 199(4): 973-89, 2015 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-25631319

RESUMO

Reconstructing biological networks using high-throughput technologies has the potential to produce condition-specific interactomes. But are these reconstructed networks a reliable source of biological interactions? Do some network inference methods offer dramatically improved performance on certain types of networks? To facilitate the use of network inference methods in systems biology, we report a large-scale simulation study comparing the ability of Markov chain Monte Carlo (MCMC) samplers to reverse engineer Bayesian networks. The MCMC samplers we investigated included foundational and state-of-the-art Metropolis-Hastings and Gibbs sampling approaches, as well as novel samplers we have designed. To enable a comprehensive comparison, we simulated gene expression and genetics data from known network structures under a range of biologically plausible scenarios. We examine the overall quality of network inference via different methods, as well as how their performance is affected by network characteristics. Our simulations reveal that network size, edge density, and strength of gene-to-gene signaling are major parameters that differentiate the performance of various samplers. Specifically, more recent samplers including our novel methods outperform traditional samplers for highly interconnected large networks with strong gene-to-gene signaling. Our newly developed samplers show comparable or superior performance to the top existing methods. Moreover, this performance gain is strongest in networks with biologically oriented topology, which indicates that our novel samplers are suitable for inferring biological networks. The performance of MCMC samplers in this simulation framework can guide the choice of methods for network reconstruction using systems genetics data.

Assuntos

Algoritmos , Redes Reguladoras de Genes , Modelos Genéticos , Teorema de Bayes , Cadeias de Markov , Método de Monte Carlo

16.

DREAMTools: a Python package for scoring collaborative challenges.

Cokelaer, Thomas; Bansal, Mukesh; Bare, Christopher; Bilal, Erhan; Bot, Brian M; Chaibub Neto, Elias; Eduati, Federica; de la Fuente, Alberto; Gönen, Mehmet; Hill, Steven M; Hoff, Bruce; Karr, Jonathan R; Küffner, Robert; Menden, Michael P; Meyer, Pablo; Norel, Raquel; Pratap, Abhishek; Prill, Robert J; Weirauch, Matthew T; Costello, James C; Stolovitzky, Gustavo; Saez-Rodriguez, Julio.

F1000Res ; 4: 1030, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-27134723

RESUMO

UNLABELLED: DREAM challenges are community competitions designed to advance computational methods and address fundamental questions in system biology and translational medicine. Each challenge asks participants to develop and apply computational methods to either predict unobserved outcomes or to identify unknown model parameters given a set of training data. Computational methods are evaluated using an automated scoring metric, scores are posted to a public leaderboard, and methods are published to facilitate community discussions on how to build improved methods. By engaging participants from a wide range of science and engineering backgrounds, DREAM challenges can comparatively evaluate a wide range of statistical, machine learning, and biophysical methods. Here, we describe DREAMTools, a Python package for evaluating DREAM challenge scoring metrics. DREAMTools provides a command line interface that enables researchers to test new methods on past challenges, as well as a framework for scoring new challenges. As of March 2016, DREAMTools includes more than 80% of completed DREAM challenges. DREAMTools complements the data, metadata, and software tools available at the DREAM website http://dreamchallenges.org and on the Synapse platform at https://www.synapse.org. AVAILABILITY: DREAMTools is a Python package. Releases and documentation are available at http://pypi.python.org/pypi/dreamtools. The source code is available at http://github.com/dreamtools/dreamtools.

17.

Simulation studies as designed experiments: the comparison of penalized regression models in the "large p, small n" setting.

Chaibub Neto, Elias; Bare, J Christopher; Margolin, Adam A.

PLoS One ; 9(10): e107957, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25289666

RESUMO

New algorithms are continuously proposed in computational biology. Performance evaluation of novel methods is important in practice. Nonetheless, the field experiences a lack of rigorous methodology aimed to systematically and objectively evaluate competing approaches. Simulation studies are frequently used to show that a particular method outperforms another. Often times, however, simulation studies are not well designed, and it is hard to characterize the particular conditions under which different methods perform better. In this paper we propose the adoption of well established techniques in the design of computer and physical experiments for developing effective simulation studies. By following best practices in planning of experiments we are better able to understand the strengths and weaknesses of competing algorithms leading to more informed decisions about which method to use for a particular task. We illustrate the application of our proposed simulation framework with a detailed comparison of the ridge-regression, lasso and elastic-net algorithms in a large scale study investigating the effects on predictive performance of sample size, number of features, true model sparsity, signal-to-noise ratio, and feature correlation, in situations where the number of covariates is usually much larger than sample size. Analysis of data sets containing tens of thousands of features but only a few hundred samples is nowadays routine in computational biology, where "omics" features such as gene expression, copy number variation and sequence data are frequently used in the predictive modeling of complex phenotypes such as anticancer drug response. The penalized regression approaches investigated in this study are popular choices in this setting and our simulations corroborate well established results concerning the conditions under which each one of these methods is expected to perform best while providing several novel insights.

Assuntos

Simulação por Computador , Regressão Psicológica , Projetos de Pesquisa , Humanos

18.

A combination of hand-held models and computer imaging programs helps students answer oral questions about molecular structure and function: a controlled investigation of student learning.

Harris, Michelle A; Peck, Ronald F; Colton, Shannon; Morris, Jennifer; Chaibub Neto, Elias; Kallio, Julie.

CBE Life Sci Educ ; 8(1): 29-43, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-19255134

RESUMO

We conducted a controlled investigation to examine whether a combination of computer imagery and tactile tools helps introductory cell biology laboratory undergraduate students better learn about protein structure/function relationships as compared with computer imagery alone. In all five laboratory sections, students used the molecular imaging program, Protein Explorer (PE). In the three experimental sections, three-dimensional physical models were made available to the students, in addition to PE. Student learning was assessed via oral and written research summaries and videotaped interviews. Differences between the experimental and control group students were not found in our typical course assessments such as research papers, but rather were revealed during one-on-one interviews with students at the end of the semester. A subset of students in the experimental group produced superior answers to some higher-order interview questions as compared with students in the control group. During the interview, students in both groups preferred to use either the hand-held models alone or in combination with the PE imaging program. Students typically did not use any tools when answering knowledge (lower-level thinking) questions, but when challenged with higher-level thinking questions, students in both the control and experimental groups elected to use the models.

Assuntos

Instrução por Computador/métodos , Modelos Moleculares , Biologia Molecular/educação , Software , Bases de Dados Factuais , Feminino , Humanos , Internet , Aprendizagem , Masculino , Desenvolvimento de Programas , Estudantes

19.

Inferring causal phenotype networks from segregating populations.

Chaibub Neto, Elias; Ferrara, Christine T; Attie, Alan D; Yandell, Brian S.

Genetics ; 179(2): 1089-100, 2008 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-18505877

RESUMO

A major goal in the study of complex traits is to decipher the causal interrelationships among correlated phenotypes. Current methods mostly yield undirected networks that connect phenotypes without causal orientation. Some of these connections may be spurious due to partial correlation that is not causal. We show how to build causal direction into an undirected network of phenotypes by including causal QTL for each phenotype. We evaluate causal direction for each edge connecting two phenotypes, using a LOD score. This new approach can be applied to many different population structures, including inbred and outbred crosses as well as natural populations, and can accommodate feedback loops. We assess its performance in simulation studies and show that our method recovers network edges and infers causal direction correctly at a high rate. Finally, we illustrate our method with an example involving gene expression and metabolite traits from experimental crosses.

Assuntos

Genética Populacional/estatística & dados numéricos , Fenótipo , Locos de Características Quantitativas , Algoritmos , Animais , Biometria , Cruzamentos Genéticos , Feminino , Expressão Gênica , Endogamia , Escore Lod , Masculino , Metabolismo , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Obesos , Modelos Genéticos

20.

A gene expression network model of type 2 diabetes links cell cycle regulation in islets with diabetes susceptibility.

Keller, Mark P; Choi, YounJeong; Wang, Ping; Davis, Dawn Belt; Rabaglia, Mary E; Oler, Angie T; Stapleton, Donald S; Argmann, Carmen; Schueler, Kathy L; Edwards, Steve; Steinberg, H Adam; Chaibub Neto, Elias; Kleinhanz, Robert; Turner, Scott; Hellerstein, Marc K; Schadt, Eric E; Yandell, Brian S; Kendziorski, Christina; Attie, Alan D.

Genome Res ; 18(5): 706-16, 2008 May.

Artigo em Inglês | MEDLINE | ID: mdl-18347327

RESUMO

Insulin resistance is necessary but not sufficient for the development of type 2 diabetes. Diabetes results when pancreatic beta-cells fail to compensate for insulin resistance by increasing insulin production through an expansion of beta-cell mass or increased insulin secretion. Communication between insulin target tissues and beta-cells may initiate this compensatory response. Correlated changes in gene expression between tissues can provide evidence for such intercellular communication. We profiled gene expression in six tissues of mice from an obesity-induced diabetes-resistant and a diabetes-susceptible strain before and after the onset of diabetes. We studied the correlation structure of mRNA abundance and identified 105 co-expression gene modules. We provide an interactive gene network model showing the correlation structure between the expression modules within and among the six tissues. This resource also provides a searchable database of gene expression profiles for all genes in six tissues in lean and obese diabetes-resistant and diabetes-susceptible mice, at 4 and 10 wk of age. A cell cycle regulatory module in islets predicts diabetes susceptibility. The module predicts islet replication; we found a strong correlation between (2)H(2)O incorporation into islet DNA in vivo and the expression pattern of the cell cycle module. This pattern is highly correlated with that of several individual genes in insulin target tissues, including Igf2, which has been shown to promote beta-cell proliferation, suggesting that these genes may provide a link between insulin resistance and beta-cell proliferation.

Assuntos

Ciclo Celular , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/patologia , Regulação da Expressão Gênica , Predisposição Genética para Doença , Ilhotas Pancreáticas/patologia , Tecido Adiposo/citologia , Envelhecimento , Animais , Proliferação de Células , Diabetes Mellitus Tipo 2/metabolismo , Glucose/metabolismo , Insulina/metabolismo , Células Secretoras de Insulina/citologia , Células Secretoras de Insulina/patologia , Masculino , Camundongos , Modelos Genéticos , Obesidade/patologia , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Transcrição Gênica

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA