Búsqueda | Portal de Búsqueda de la BVS

Long-distance disorder-disorder relation extraction with bootstrapped noisy data.

Lin, Yucong; Li, Yang; Lu, Keming; Ma, Cheng; Zhao, Peng; Gao, Daiqi; Fan, Zihao; Cheng, Zijie; Wang, Zheyu; Yu, Sheng.

J Biomed Inform ; 109: 103529, 2020 09.

Artículo en Inglés | MEDLINE | ID: mdl-32771539

RESUMEN

OBJECTIVE: Artificial intelligence in healthcare increasingly relies on relations in knowledge graphs for algorithm development. However, many important relations are not well covered in existing knowledge graphs. We aim to develop a novel long-distance relation extraction algorithm that leverages the article section structure and is trained with bootstrapped noisy data to identify important relations for diagnosis, including may cause, may be caused by, and differential diagnosis. METHODS: Known relations were extracted from semistructured web pages and a relational database and were paired with sentences containing corresponding medical concepts to form training data. The sentence form was extended to allow one concept to be in the title. An attention mechanism was applied to reduce the effect of noisily labeled sentences. Section structure embedding was added to provide additional context for relation expressions. Graph information was further incorporated into the model to differentiate the target relations whose expressions were often similar and interwoven. RESULTS: The extended sentence form allowed 1.75 times as many relations and 2.17 times as many sentences to be found compared to the conventional form. The various components of the proposed model all added to the accuracy. Overall, the positive sample accuracy of the proposed model was 9 percentage points higher than baseline deep learning models and 13 percentage points higher than naïve Bayes and support vector machines. CONCLUSION: Our bootstrap data preparation method and the extended sentence form could form a large training dataset to enable algorithm development and data mining efforts. Section structure embedding and graph information significantly increased prediction accuracy.

Asunto(s)

Inteligencia Artificial , Minería de Datos , Algoritmos , Teorema de Bayes , Bases de Datos Factuales

Non-asymptotic Properties of Individualized Treatment Rules from Sequentially Rule-Adaptive Trials.

Gao, Daiqi; Liu, Yufeng; Zeng, Donglin.

J Mach Learn Res ; 23(250)2022.

Artículo en Inglés | MEDLINE | ID: mdl-37576335

RESUMEN

Learning optimal individualized treatment rules (ITRs) has become increasingly important in the modern era of precision medicine. Many statistical and machine learning methods for learning optimal ITRs have been developed in the literature. However, most existing methods are based on data collected from traditional randomized controlled trials and thus cannot take advantage of the accumulative evidence when patients enter the trials sequentially. It is also ethically important that future patients should have a high probability to be treated optimally based on the updated knowledge so far. In this work, we propose a new design called sequentially rule-adaptive trials to learn optimal ITRs based on the contextual bandit framework, in contrast to the response-adaptive design in traditional adaptive trials. In our design, each entering patient will be allocated with a high probability to the current best treatment for this patient, which is estimated using the past data based on some machine learning algorithm (for example, outcome weighted learning in our implementation). We explore the tradeoff between training and test values of the estimated ITR in single-stage problems by proving theoretically that for a higher probability of following the estimated ITR, the training value converges to the optimal value at a faster rate, while the test value converges at a slower rate. This problem is different from traditional decision problems in the sense that the training data are generated sequentially and are dependent. We also develop a tool that combines martingale with empirical process to tackle the problem that cannot be solved by previous techniques for i.i.d. data. We show by numerical examples that without much loss of the test value, our proposed algorithm can improve the training value significantly as compared to existing methods. Finally, we use a real data study to illustrate the performance of the proposed method.

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA