Pesquisa | Secretaria de Estado da Saúde

Distributionally robust learning-to-rank under the Wasserstein metric.

Sotudian, Shahabeddin; Chen, Ruidi; Paschalidis, Ioannis Ch.

PLoS One ; 18(3): e0283574, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36996130

RESUMO

Despite their satisfactory performance, most existing listwise Learning-To-Rank (LTR) models do not consider the crucial issue of robustness. A data set can be contaminated in various ways, including human error in labeling or annotation, distributional data shift, and malicious adversaries who wish to degrade the algorithm's performance. It has been shown that Distributionally Robust Optimization (DRO) is resilient against various types of noise and perturbations. To fill this gap, we introduce a new listwise LTR model called Distributionally Robust Multi-output Regression Ranking (DRMRR). Different from existing methods, the scoring function of DRMRR was designed as a multivariate mapping from a feature vector to a vector of deviation scores, which captures local context information and cross-document interactions. In this way, we are able to incorporate the LTR metrics into our model. DRMRR uses a Wasserstein DRO framework to minimize a multi-output loss function under the most adverse distributions in the neighborhood of the empirical data distribution defined by a Wasserstein ball. We present a compact and computationally solvable reformulation of the min-max formulation of DRMRR. Our experiments were conducted on two real-world applications: medical document retrieval and drug response prediction, showing that DRMRR notably outperforms state-of-the-art LTR models. We also conducted an extensive analysis to examine the resilience of DRMRR against various types of noise: Gaussian noise, adversarial perturbations, and label poisoning. Accordingly, DRMRR is not only able to achieve significantly better performance than other baselines, but it can maintain a relatively stable performance as more noise is added to the data.

Assuntos

Aprendizagem , Ruído , Humanos , Análise Multivariada

Profile of Folate in Breast Milk from Chinese Women over 1-400 Days Postpartum.

Su, Yanyan; Mao, Yingyi; Tian, Fang; Cai, Xiaokun; Chen, Ruidi; Li, Na; Qian, Changli; Li, Xiang; Zhao, Yanrong; Wang, Yu.

Nutrients ; 14(14)2022 Jul 20.

Artigo em Inglês | MEDLINE | ID: mdl-35889919

RESUMO

Folate is an essential nutrient for growth in early life. This study aimed to determine the levels and compositions of folate in Chinese breast milk samples. This study was part of the Maternal Nutrition and Infant Investigation (MUAI) study. A total of 205 healthy mothers were randomly recruited in Chengdu over 1−400 days postpartum. Five different species of folate, including tetrahydrofolate (THF), 5-methyl-THF, 5,10-methenyl-THF,5-formyl-THF and unmetabolized folic acid (UMFA), were measured for liquid chromatography−tandem mass spectrometry (LC-MS). The median levels of total folate ranged from 12.86 to 56.77 ng/mL in the breast milk of mothers at 1−400 days postpartum, gradually increasing throughout the lactating periods. The median levels of 5-methyl-THF, minor reduced folate (the sum of THF, 5,10-methenyl-THF and 5-formyl-THF) and UMFA were in the ranges of 8.52−40.65 ng/mL, 3.48−16.15 ng/mL and 0.00−1.24 ng/mL during 1−400 days postpartum, respectively. 5-Methyl-THF accounted for more than 65% of the total folate in all breast milk samples. The levels of UMFA in mature breast milk samples were higher in supplement users than nonusers, but not for colostrum and transitional milk samples (p < 0.05). In conclusion, the level of total folate in the breast milk changed along with the prolonged lactating periods, but 5-methyl-THF remains the dominant species of folate in the breast milk of Chinese populations across all entire lactating periods.

Assuntos

Ácido Fólico , Leite Humano , China , Suplementos Nutricionais , Feminino , Ácido Fólico/análise , Humanos , Lactente , Lactação , Leucovorina , Leite Humano/química

Detection of unwarranted CT radiation exposure from patient and imaging protocol meta-data using regularized regression.

Chen, Ruidi; Paschalidis, Ioannis Ch; Hatabu, Hiroto; Valtchinov, Vladimir I; Siegelman, Jenifer.

Eur J Radiol Open ; 6: 206-211, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31194104

RESUMO

BACKGROUND: Variability in radiation exposure from CT scans can be appropriate and driven by patient features such as body habitus. Quantitative analysis may be performed to discover instances of unwarranted radiation exposure and to reduce the probability of such occurrences in future patient visits. No universal process to perform identification of outliers is widely available, and access to expertise and resources is variable. OBJECTIVE: The goal of this study is to develop an automated outlier detection procedure to identify all scans with an unanticipated high radiation exposure, given the characteristics of the patient and the type of the exam. MATERIALS AND METHODS: This Institutional Review Board-approved retrospective cohort study was conducted from June 30, 2012 - December 31, 2013 in a quaternary academic medical center. The de-identified dataset contained 28 fields for 189,959 CT exams. We applied the variable selection method Least Absolute Shrinkage and Selection Operator (LASSO) to select important variables for predicting CT radiation dose. We then employed a regression approach that is robust to outliers, to learn from data a predictive model of CT radiation doses given important variables identified by LASSO. Patient visits whose predicted radiation dose was statistically different from the radiation dose actually received were identified as outliers. RESULTS: Our methodology identified 1% of CT exams as outliers. The top-5 predictors discovered by LASSO and strongly correlated with radiation dose were Tube Current, kVp, Weight, Width of collimator, and Reference milliampere-seconds. A human expert validation of the outlier detection algorithm has yielded specificity of 0.85 [95% CI 0.78-0.92] and sensitivity of 0.91 [95% CI 0.85-0.97] (PPVâ¯=â¯0.84, NPVâ¯=â¯0.92). These values substantially outperform alternative methods we tested (F1 score 0.88 for our method against 0.51 for the alternatives). CONCLUSION: The study developed and tested a novel, automated method for processing CT scanner meta-data to identify CT exams where patients received an unwarranted amount of radiation. Radiation safety and protocol review committees may use this technique to uncover systemic issues and reduce future incidents.

A Robust Learning Approach for Regression Models Based on Distributionally Robust Optimization.

Chen, Ruidi; Paschalidis, Ioannis Ch.

J Mach Learn Res ; 19(1): 517-564, 2018 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-34421397

RESUMO

We present a Distributionally Robust Optimization (DRO) approach to estimate a robustified regression plane in a linear regression setting, when the observed samples are potentially contaminated with adversarially corrupted outliers. Our approach mitigates the impact of outliers by hedging against a family of probability distributions on the observed data, some of which assign very low probabilities to the outliers. The set of distributions under consideration are close to the empirical distribution in the sense of the Wasserstein metric. We show that this DRO formulation can be relaxed to a convex optimization problem which encompasses a class of models. By selecting proper norm spaces for the Wasserstein metric, we are able to recover several commonly used regularized regression models. We provide new insights into the regularization term and give guidance on the selection of the regularization coefficient from the standpoint of a confidence region. We establish two types of performance guarantees for the solution to our formulation under mild conditions. One is related to its out-of-sample behavior (prediction bias), and the other concerns the discrepancy between the estimated and true regression planes (estimation bias). Extensive numerical results demonstrate the superiority of our approach to a host of regression models, in terms of the prediction and estimation accuracies. We also consider the application of our robust learning procedure to outlier detection, and show that our approach achieves a much higher AUC (Area Under the ROC Curve) than M-estimation (Huber, 1964, 1973).

Federated learning of predictive models from federated Electronic Health Records.

Brisimi, Theodora S; Chen, Ruidi; Mela, Theofanie; Olshevsky, Alex; Paschalidis, Ioannis Ch; Shi, Wei.

Int J Med Inform ; 112: 59-67, 2018 04.

Artigo em Inglês | MEDLINE | ID: mdl-29500022

RESUMO

BACKGROUND: In an era of "big data," computationally efficient and privacy-aware solutions for large-scale machine learning problems become crucial, especially in the healthcare domain, where large amounts of data are stored in different locations and owned by different entities. Past research has been focused on centralized algorithms, which assume the existence of a central data repository (database) which stores and can process the data from all participants. Such an architecture, however, can be impractical when data are not centrally located, it does not scale well to very large datasets, and introduces single-point of failure risks which could compromise the integrity and privacy of the data. Given scores of data widely spread across hospitals/individuals, a decentralized computationally scalable methodology is very much in need. OBJECTIVE: We aim at solving a binary supervised classification problem to predict hospitalizations for cardiac events using a distributed algorithm. We seek to develop a general decentralized optimization framework enabling multiple data holders to collaborate and converge to a common predictive model, without explicitly exchanging raw data. METHODS: We focus on the soft-margin l1-regularized sparse Support Vector Machine (sSVM) classifier. We develop an iterative cluster Primal Dual Splitting (cPDS) algorithm for solving the large-scale sSVM problem in a decentralized fashion. Such a distributed learning scheme is relevant for multi-institutional collaborations or peer-to-peer applications, allowing the data holders to collaborate, while keeping every participant's data private. RESULTS: We test cPDS on the problem of predicting hospitalizations due to heart diseases within a calendar year based on information in the patients Electronic Health Records prior to that year. cPDS converges faster than centralized methods at the cost of some communication between agents. It also converges faster and with less communication overhead compared to an alternative distributed algorithm. In both cases, it achieves similar prediction accuracy measured by the Area Under the Receiver Operating Characteristic Curve (AUC) of the classifier. We extract important features discovered by the algorithm that are predictive of future hospitalizations, thus providing a way to interpret the classification results and inform prevention efforts.

Assuntos

Algoritmos , Registros Eletrônicos de Saúde , Hospitalização/estatística & dados numéricos , Aprendizado de Máquina , Bases de Dados Factuais , Humanos , Curva ROC , Máquina de Vetores de Suporte

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa