Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Commun Biol ; 7(1): 516, 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38693292

RESUMO

The success of deep learning in various applications depends on task-specific architecture design choices, including the types, hyperparameters, and number of layers. In computational biology, there is no consensus on the optimal architecture design, and decisions are often made using insights from more well-established fields such as computer vision. These may not consider the domain-specific characteristics of genome sequences, potentially limiting performance. Here, we present GenomeNet-Architect, a neural architecture design framework that automatically optimizes deep learning models for genome sequence data. It optimizes the overall layout of the architecture, with a search space specifically designed for genomics. Additionally, it optimizes hyperparameters of individual layers and the model training procedure. On a viral classification task, GenomeNet-Architect reduced the read-level misclassification rate by 19%, with 67% faster inference and 83% fewer parameters, and achieved similar contig-level accuracy with ~100 times fewer parameters compared to the best-performing deep learning baselines.


Assuntos
Aprendizado Profundo , Genômica , Genômica/métodos , Biologia Computacional/métodos , Humanos , Redes Neurais de Computação
2.
J Med Imaging (Bellingham) ; 10(5): 054501, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37818179

RESUMO

Purpose: Deep supervised learning provides an effective approach for developing robust models for various computer-aided diagnosis tasks. However, there is often an underlying assumption that the frequencies of the samples between the different classes of the training dataset are either similar or balanced. In real-world medical data, the samples of positive classes often occur too infrequently to satisfy this assumption. Thus, there is an unmet need for deep-learning systems that can automatically identify and adapt to the real-world conditions of imbalanced data. Approach: We propose a deep Bayesian ensemble learning framework to address the representation learning problem of long-tailed and out-of-distribution (OOD) samples when training from medical images. By estimating the relative uncertainties of the input data, our framework can adapt to imbalanced data for learning generalizable classifiers. We trained and tested our framework on four public medical imaging datasets with various imbalance ratios and imaging modalities across three different learning tasks: semantic medical image segmentation, OOD detection, and in-domain generalization. We compared the performance of our framework with those of state-of-the-art comparator methods. Results: Our proposed framework outperformed the comparator models significantly across all performance metrics (pairwise t-test: p<0.01) in the semantic segmentation of high-resolution CT and MR images as well as in the detection of OOD samples (p<0.01), thereby showing significant improvement in handling the associated long-tailed data distribution. The results of the in-domain generalization also indicated that our framework can enhance the prediction of retinal glaucoma, contributing to clinical decision-making processes. Conclusions: Training of the proposed deep Bayesian ensemble learning framework with dynamic Monte-Carlo dropout and a combination of losses yielded the best generalization to unseen samples from imbalanced medical imaging datasets across different learning tasks.

3.
Commun Biol ; 6(1): 928, 2023 09 11.
Artigo em Inglês | MEDLINE | ID: mdl-37696966

RESUMO

Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models.


Assuntos
Aprendizado Profundo , Genômica , Biologia Computacional , Aprendizado de Máquina
4.
Invest Radiol ; 58(12): 874-881, 2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-37504498

RESUMO

OBJECTIVES: Optimizing a machine learning (ML) pipeline for radiomics analysis involves numerous choices in data set composition, preprocessing, and model selection. Objective identification of the optimal setup is complicated by correlated features, interdependency structures, and a multitude of available ML algorithms. Therefore, we present a radiomics-based benchmarking framework to optimize a comprehensive ML pipeline for the prediction of overall survival. This study is conducted on an image set of patients with hepatic metastases of colorectal cancer, for which radiomics features of the whole liver and of metastases from computed tomography images were calculated. A mixed model approach was used to find the optimal pipeline configuration and to identify the added prognostic value of radiomics features. MATERIALS AND METHODS: In this study, a large-scale ML benchmark pipeline consisting of preprocessing, feature selection, dimensionality reduction, hyperparameter optimization, and training of different models was developed for radiomics-based survival analysis. Portal-venous computed tomography imaging data from a previous prospective randomized trial evaluating radioembolization of liver metastases of colorectal cancer were quantitatively accessible through a radiomics approach. One thousand two hundred eighteen radiomics features of hepatic metastases and the whole liver were calculated, and 19 clinical parameters (age, sex, laboratory values, and treatment) were available for each patient. Three ML algorithms-a regression model with elastic net regularization (glmnet), a random survival forest (RSF), and a gradient tree-boosting technique (xgboost)-were evaluated for 5 combinations of clinical data, tumor radiomics, and whole-liver features. Hyperparameter optimization and model evaluation were optimized toward the performance metric integrated Brier score via nested cross-validation. To address dependency structures in the benchmark setup, a mixed-model approach was developed to compare ML and data configurations and to identify the best-performing model. RESULTS: Within our radiomics-based benchmark experiment, 60 ML pipeline variations were evaluated on clinical data and radiomics features from 491 patients. Descriptive analysis of the benchmark results showed a preference for RSF-based pipelines, especially for the combination of clinical data with radiomics features. This observation was supported by the quantitative analysis via a linear mixed model approach, computed to differentiate the effect of data sets and pipeline configurations on the resulting performance. This revealed the RSF pipelines to consistently perform similar or better than glmnet and xgboost. Further, for the RSF, there was no significantly better-performing pipeline composition regarding the sort of preprocessing or hyperparameter optimization. CONCLUSIONS: Our study introduces a benchmark framework for radiomics-based survival analysis, aimed at identifying the optimal settings with respect to different radiomics data sources and various ML pipeline variations, including preprocessing techniques and learning algorithms. A suitable analysis tool for the benchmark results is provided via a mixed model approach, which showed for our study on patients with intrahepatic liver metastases, that radiomics features captured the patients' clinical situation in a manner comparable to the provided information solely from clinical parameters. However, we did not observe a relevant additional prognostic value obtained by these radiomics features.


Assuntos
Neoplasias Colorretais , Neoplasias Hepáticas , Humanos , Benchmarking , Neoplasias Hepáticas/diagnóstico por imagem , Tomografia Computadorizada por Raios X , Aprendizado de Máquina , Análise de Sobrevida , Neoplasias Colorretais/diagnóstico por imagem , Estudos Retrospectivos
5.
Big Data ; 11(3): 181-198, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-34978896

RESUMO

The use of machine learning (ML) allows us to automate and scale the decision-making processes. The key to this automation is the development of ML models that generalize training data toward unseen data. Such models can become extremely versatile and powerful, which makes democratization of artificial intelligence (AI) possible, that is, providing ML to non-ML experts such as software engineers or domain experts. Typically, automated ML (AutoML) is being referred to as a key step toward it. However, from our perspective, we believe that democratization of the verification process of ML systems is a larger and even more crucial challenge to achieve the democratization of AI. Currently, the process of ensuring that an ML model works as intended is unstructured. It is largely based on experience and domain knowledge that cannot be automated. The current approaches such as cross-validation or explainable AI are not enough to overcome the real challenges and are discussed extensively in this article. Arguing toward structured verification approaches, we discuss a set of guidelines to verify models, code, and data in each step of the ML lifecycle. These guidelines can help to reliably measure and select an optimal solution, besides minimizing the risk of bugs and undesired behavior in edge-cases.


Assuntos
Inteligência Artificial , Aprendizado de Máquina , Automação , Projetos de Pesquisa , Software
6.
Chaos ; 31(5): 053121, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-34240952

RESUMO

We present an approach to construct structure-preserving emulators for Hamiltonian flow maps and Poincaré maps based directly on orbit data. Intended applications are in moderate-dimensional systems, in particular, long-term tracing of fast charged particles in accelerators and magnetic plasma confinement configurations. The method is based on multi-output Gaussian process (GP) regression on scattered training data. To obtain long-term stability, the symplectic property is enforced via the choice of the matrix-valued covariance function. Based on earlier work on spline interpolation, we observe derivatives of the generating function of a canonical transformation. A product kernel produces an accurate implicit method, whereas a sum kernel results in a fast explicit method from this approach. Both are related to symplectic Euler methods in terms of numerical integration but fulfill a complementary purpose. The developed methods are first tested on the pendulum and the Hénon-Heiles system and results compared to spectral regression of the flow map with orthogonal polynomials. Chaotic behavior is studied on the standard map. Finally, the application to magnetic field line tracing in a perturbed tokamak configuration is demonstrated. As an additional feature, in the limit of small mapping times, the Hamiltonian function can be identified with a part of the generating function and thereby learned from observed time-series data of the system's evolution. For implicit GP methods, we demonstrate regression performance comparable to spectral bases and artificial neural networks for symplectic flow maps, applicability to Poincaré maps, and correct representation of chaotic diffusion as well as a substantial increase in performance for learning the Hamiltonian function compared to existing approaches.

7.
Bioinformatics ; 37(17): 2789-2791, 2021 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-33523131

RESUMO

SUMMARY: As machine learning has become increasingly popular over the last few decades, so too has the number of machine-learning interfaces for implementing these models. Whilst many R libraries exist for machine learning, very few offer extended support for survival analysis. This is problematic considering its importance in fields like medicine, bioinformatics, economics, engineering and more. mlr3proba provides a comprehensive machine-learning interface for survival analysis and connects with mlr3's general model tuning and benchmarking facilities to provide a systematic infrastructure for survival modelling and evaluation. AVAILABILITY AND IMPLEMENTATION: mlr3proba is available under an LGPL-3 licence on CRAN and at https://github.com/mlr-org/mlr3proba, with further documentation at https://mlr3book.mlr-org.com/survival.html.

8.
Proc Natl Acad Sci U S A ; 117(30): 17680-17687, 2020 07 28.
Artigo em Inglês | MEDLINE | ID: mdl-32665436

RESUMO

Smartphones enjoy high adoption rates around the globe. Rarely more than an arm's length away, these sensor-rich devices can easily be repurposed to collect rich and extensive records of their users' behaviors (e.g., location, communication, media consumption), posing serious threats to individual privacy. Here we examine the extent to which individuals' Big Five personality dimensions can be predicted on the basis of six different classes of behavioral information collected via sensor and log data harvested from smartphones. Taking a machine-learning approach, we predict personality at broad domain ([Formula: see text] = 0.37) and narrow facet levels ([Formula: see text] = 0.40) based on behavioral data collected from 624 volunteers over 30 consecutive days (25,347,089 logging events). Our cross-validated results reveal that specific patterns in behaviors in the domains of 1) communication and social behavior, 2) music consumption, 3) app usage, 4) mobility, 5) overall phone activity, and 6) day- and night-time activity are distinctively predictive of the Big Five personality traits. The accuracy of these predictions is similar to that found for predictions based on digital footprints from social media platforms and demonstrates the possibility of obtaining information about individuals' private traits from behavioral patterns passively collected from their smartphones. Overall, our results point to both the benefits (e.g., in research settings) and dangers (e.g., privacy implications, psychological targeting) presented by the widespread collection and modeling of behavioral data obtained from smartphones.


Assuntos
Aprendizado de Máquina , Personalidade , Smartphone , Comportamento Social , Humanos , Modelos Teóricos , Privacidade , Característica Quantitativa Herdável , Reprodutibilidade dos Testes
9.
Sci Rep ; 10(1): 5860, 2020 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-32246097

RESUMO

Patients with advanced Parkinson's disease regularly experience unstable motor states. Objective and reliable monitoring of these fluctuations is an unmet need. We used deep learning to classify motion data from a single wrist-worn IMU sensor recording in unscripted environments. For validation purposes, patients were accompanied by a movement disorder expert, and their motor state was passively evaluated every minute. We acquired a dataset of 8,661 minutes of IMU data from 30 patients, with annotations about the motor state (OFF,ON, DYSKINETIC) based on MDS-UPDRS global bradykinesia item and the AIMS upper limb dyskinesia item. Using a 1-minute window size as an input for a convolutional neural network trained on data from a subset of patients, we achieved a three-class balanced accuracy of 0.654 on data from previously unseen subjects. This corresponds to detecting the OFF, ON, or DYSKINETIC motor state at a sensitivity/specificity of 0.64/0.89, 0.67/0.67 and 0.64/0.89, respectively. On average, the model outputs were highly correlated with the annotation on a per subject scale (r = 0.83/0.84; p < 0.0001), and sustained so for the highly resolved time windows of 1 minute (r = 0.64/0.70; p < 0.0001). Thus, we demonstrate the feasibility of long-term motor-state detection in a free-living setting with deep learning using motion data from a single IMU.


Assuntos
Movimento/fisiologia , Redes Neurais de Computação , Doença de Parkinson/diagnóstico , Idoso , Aprendizado Profundo , Discinesias/diagnóstico , Discinesias/fisiopatologia , Feminino , Humanos , Masculino , Modelos Estatísticos , Doença de Parkinson/fisiopatologia , Reprodutibilidade dos Testes
10.
Eur Arch Psychiatry Clin Neurosci ; 270(2): 153-168, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30542818

RESUMO

The intentional distortion of test results presents a fundamental problem to self-report-based psychiatric assessment, such as screening for depressive symptoms. The first objective of the study was to clarify whether depressed patients like healthy controls possess both the cognitive ability and motivation to deliberately influence results of commonly used screening measures. The second objective was the construction of a method derived directly from within the test takers' responses to systematically detect faking behavior. Supervised machine learning algorithms posit the potential to empirically learn the implicit interconnections between responses, which shape detectable faking patterns. In a standardized design, faking bad and faking good were experimentally induced in a matched sample of 150 depressed and 150 healthy subjects. Participants completed commonly used questionnaires to detect depressive and associated symptoms. Group differences throughout experimental conditions were evaluated using linear mixed-models. Machine learning algorithms were trained on the test results and compared regarding their capacity to systematically predict distortions in response behavior in two scenarios: (1) differentiation of authentic patient responses from simulated responses of healthy participants; (2) differentiation of authentic patient responses from dissimulated patient responses. Statistically significant convergence of the test scores in both faking conditions suggests that both depressive patients and healthy controls have the cognitive ability as well as the motivational compliance to alter their test results. Evaluation of the algorithmic capability to detect faking behavior yielded ideal predictive accuracies of up to 89%. Implications of the findings, as well as future research objectives are discussed. Trial Registration The study was pre-registered at the German registry for clinical trials (Deutsches Register klinischer Studien, DRKS; DRKS00007708).


Assuntos
Enganação , Depressão/diagnóstico , Simulação de Doença/diagnóstico , Psicometria , Aprendizado de Máquina Supervisionado , Adulto , Feminino , Humanos , Masculino , Valor Preditivo dos Testes , Adulto Jovem
11.
Comput Math Methods Med ; 2018: 2430438, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30073029

RESUMO

[This corrects the article DOI: 10.1155/2017/1421409.].

12.
Comput Math Methods Med ; 2017: 1421409, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28831289

RESUMO

We present a new variable selection method based on model-based gradient boosting and randomly permuted variables. Model-based boosting is a tool to fit a statistical model while performing variable selection at the same time. A drawback of the fitting lies in the need of multiple model fits on slightly altered data (e.g., cross-validation or bootstrap) to find the optimal number of boosting iterations and prevent overfitting. In our proposed approach, we augment the data set with randomly permuted versions of the true variables, so-called shadow variables, and stop the stepwise fitting as soon as such a variable would be added to the model. This allows variable selection in a single fit of the model without requiring further parameter tuning. We show that our probing approach can compete with state-of-the-art selection methods like stability selection in a high-dimensional classification benchmark and apply it on three gene expression data sets.


Assuntos
Perfilação da Expressão Gênica/métodos , Modelos Estatísticos , Interpretação Estatística de Dados
13.
Biometrics ; 72(2): 392-401, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-26676377

RESUMO

It is agreed among biostatisticians that prediction models for binary outcomes should satisfy two essential criteria: first, a prediction model should have a high discriminatory power, implying that it is able to clearly separate cases from controls. Second, the model should be well calibrated, meaning that the predicted risks should closely agree with the relative frequencies observed in the data. The focus of this work is on the predictiveness curve, which has been proposed by Huang et al. (Biometrics 63, 2007) as a graphical tool to assess the aforementioned criteria. By conducting a detailed analysis of its properties, we review the role of the predictiveness curve in the performance assessment of biomedical prediction models. In particular, we demonstrate that marker comparisons should not be based solely on the predictiveness curve, as it is not possible to consistently visualize the added predictive value of a new marker by comparing the predictiveness curves obtained from competing models. Based on our analysis, we propose the "residual-based predictiveness curve" (RBP curve), which addresses the aforementioned issue and which extends the original method to settings where the evaluation of a prediction model on independent test data is of particular interest. Similar to the predictiveness curve, the RBP curve reflects both the calibration and the discriminatory power of a prediction model. In addition, the curve can be conveniently used to conduct valid performance checks and marker comparisons.


Assuntos
Biomarcadores/análise , Interpretação Estatística de Dados , Modelos Estatísticos , Prognóstico , Biometria/métodos , Diagnóstico por Computador/métodos , Humanos , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...