Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
1.
Neuroimage ; 261: 119504, 2022 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-35882272

RESUMO

Brain-age (BA) estimates based on deep learning are increasingly used as neuroimaging biomarker for brain health; however, the underlying neural features have remained unclear. We combined ensembles of convolutional neural networks with Layer-wise Relevance Propagation (LRP) to detect which brain features contribute to BA. Trained on magnetic resonance imaging (MRI) data of a population-based study (n = 2637, 18-82 years), our models estimated age accurately based on single and multiple modalities, regionally restricted and whole-brain images (mean absolute errors 3.37-3.86 years). We find that BA estimates capture ageing at both small and large-scale changes, revealing gross enlargements of ventricles and subarachnoid spaces, as well as white matter lesions, and atrophies that appear throughout the brain. Divergence from expected ageing reflected cardiovascular risk factors and accelerated ageing was more pronounced in the frontal lobe. Applying LRP, our study demonstrates how superior deep learning models detect brain-ageing in healthy and at-risk individuals throughout adulthood.


Assuntos
Aprendizado Profundo , Adulto , Envelhecimento/patologia , Encéfalo/diagnóstico por imagem , Encéfalo/patologia , Pré-Escolar , Humanos , Imageamento por Ressonância Magnética/métodos , Neuroimagem/métodos
2.
Bioinformatics ; 36(8): 2401-2409, 2020 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-31913448

RESUMO

MOTIVATION: Inferring the properties of a protein from its amino acid sequence is one of the key problems in bioinformatics. Most state-of-the-art approaches for protein classification are tailored to single classification tasks and rely on handcrafted features, such as position-specific-scoring matrices from expensive database searches. We argue that this level of performance can be reached or even be surpassed by learning a task-agnostic representation once, using self-supervised language modeling, and transferring it to specific tasks by a simple fine-tuning step. RESULTS: We put forward a universal deep sequence model that is pre-trained on unlabeled protein sequences from Swiss-Prot and fine-tuned on protein classification tasks. We apply it to three prototypical tasks, namely enzyme class prediction, gene ontology prediction and remote homology and fold detection. The proposed method performs on par with state-of-the-art algorithms that were tailored to these specific tasks or, for two out of three tasks, even outperforms them. These results stress the possibility of inferring protein properties from the sequence alone and, on more general grounds, the prospects of modern natural language processing methods in omics. Moreover, we illustrate the prospects for explainable machine learning methods in this field by selected case studies. AVAILABILITY AND IMPLEMENTATION: Source code is available under https://github.com/nstrodt/UDSMProt. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Proteínas , Sequência de Aminoácidos , Bases de Dados de Proteínas , Proteínas/genética , Software
3.
J Med Syst ; 45(12): 105, 2021 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-34729675

RESUMO

Developers proposing new machine learning for health (ML4H) tools often pledge to match or even surpass the performance of existing tools, yet the reality is usually more complicated. Reliable deployment of ML4H to the real world is challenging as examples from diabetic retinopathy or Covid-19 screening show. We envision an integrated framework of algorithm auditing and quality control that provides a path towards the effective and reliable application of ML systems in healthcare. In this editorial, we give a summary of ongoing work towards that vision and announce a call for participation to the special issue  Machine Learning for Health: Algorithm Auditing & Quality Control in this journal to advance the practice of ML4H auditing.


Assuntos
Algoritmos , Aprendizado de Máquina , Controle de Qualidade , Humanos
4.
BMC Bioinformatics ; 21(1): 279, 2020 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-32615972

RESUMO

BACKGROUND: Immunotherapy is a promising route towards personalized cancer treatment. A key algorithmic challenge in this process is to decide if a given peptide (neoepitope) binds with the major histocompatibility complex (MHC). This is an active area of research and there are many MHC binding prediction algorithms that can predict the MHC binding affinity for a given peptide to a high degree of accuracy. However, most of the state-of-the-art approaches make use of complicated training and model selection procedures, are restricted to peptides of a certain length and/or rely on heuristics. RESULTS: We put forward USMPep, a simple recurrent neural network that reaches state-of-the-art approaches on MHC class I binding prediction with a single, generic architecture and even a single set of hyperparameters both on IEDB benchmark datasets and on the very recent HPV dataset. Moreover, the algorithm is competitive for a single model trained from scratch, while ensembling multiple regressors and language model pretraining can still slightly improve the performance. The direct application of the approach to MHC class II binding prediction shows a solid performance despite of limited training data. CONCLUSIONS: We demonstrate that competitive performance in MHC binding affinity prediction can be reached with a standard architecture and training procedure without relying on any heuristics.


Assuntos
Algoritmos , Antígenos de Histocompatibilidade Classe II/metabolismo , Antígenos de Histocompatibilidade Classe I/metabolismo , Modelos Genéticos , Alelos , Área Sob a Curva , Sequência de Bases , Bases de Dados Genéticas , Humanos , Peptídeos/metabolismo , Ligação Proteica , Curva ROC
5.
Entropy (Basel) ; 21(7)2019 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-33267367

RESUMO

The birth of Information Theory, right after the pioneering work of Claude Shannon and his celebrated publication of the paper "A mathematical theory of Communication" [...].

6.
Neuroimage ; 174: 352-363, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29421325

RESUMO

We propose a new method for the localization of nonlinear cross-frequency coupling in EEG and MEG data analysis, based on the estimation of bicoherences at the source level. While for the analysis of rhythmic brain activity, source directions are commonly chosen to maximize power, we suggest to maximize bicoherence instead. The resulting nonlinear cost function can be minimized effectively using a gradient approach. We argue, that bicoherence is also a generally useful tool to analyze phase-amplitude coupling (PAC), by deriving formal relations between PAC and bispectra. This is illustrated in simulated and empirical LFP data. The localization method is applied to EEG resting state data, where the most prominent bicoherence signatures originate from the occipital alpha rhythm and the mu rhythm. While the latter is hardly visible using power analysis, we observe clear bicoherence peaks in the high alpha range of sensorymotor areas. We additionally apply our method to resting-state data of subjects with schizophrenia and healthy controls and observe significant bicoherence differences in motor areas which could not be found from analyzing power differences.


Assuntos
Ritmo alfa , Encéfalo/fisiologia , Eletroencefalografia/métodos , Magnetoencefalografia/métodos , Encéfalo/fisiopatologia , Humanos , Modelos Neurológicos , Esquizofrenia/fisiopatologia , Processamento de Sinais Assistido por Computador
7.
Neuroimage ; 141: 291-303, 2016 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-27402598

RESUMO

Ongoing neuronal oscillations are pivotal in brain functioning and are known to influence subjects' performance. This modulation is usually studied on short time scales whilst multiple time scales are rarely considered. In our study we show that Long-Range Temporal Correlations (LRTCs) estimated from the amplitude of EEG oscillations over a range of time-scales predict performance in a complex sensorimotor task, based on Brain-Computer Interfacing (BCI). Our paradigm involved eighty subjects generating covert motor responses to dynamically changing visual cues and thus controlling a computer program through the modulation of neuronal oscillations. The neuronal dynamics were estimated with multichannel EEG. Our results show that: (a) BCI task accuracy may be predicted on the basis of LRTCs measured during the preceding training session, and (b) this result was not due to signal-to-noise ratio of the ongoing neuronal oscillations. Our results provide direct empirical evidence in addition to previous theoretical work suggesting that scale-free neuronal dynamics are important for optimal brain functioning.


Assuntos
Ritmo alfa/fisiologia , Interfaces Cérebro-Computador , Córtex Cerebral/fisiologia , Imaginação/fisiologia , Movimento/fisiologia , Desempenho Psicomotor/fisiologia , Percepção Visual/fisiologia , Adulto , Mapeamento Encefálico/métodos , Eletroencefalografia/métodos , Feminino , Humanos , Masculino , Prognóstico , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Fatores de Tempo
8.
Neural Comput ; 26(2): 349-76, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24206388

RESUMO

Electroencephalographic signals are known to be nonstationary and easily affected by artifacts; therefore, their analysis requires methods that can deal with noise. In this work, we present a way to robustify the popular common spatial patterns (CSP) algorithm under a maxmin approach. In contrast to standard CSP that maximizes the variance ratio between two conditions based on a single estimate of the class covariance matrices, we propose to robustly compute spatial filters by maximizing the minimum variance ratio within a prefixed set of covariance matrices called the tolerance set. We show that this kind of maxmin optimization makes CSP robust to outliers and reduces its tendency to overfit. We also present a data-driven approach to construct a tolerance set that captures the variability of the covariance matrices over time and shows its ability to reduce the nonstationarity of the extracted features and significantly improve classification accuracy. We test the spatial filters derived with this approach and compare them to standard CSP and a state-of-the-art method on a real-world brain-computer interface (BCI) data set in which we expect substantial fluctuations caused by environmental differences. Finally we investigate the advantages and limitations of the maxmin approach with simulations.


Assuntos
Interfaces Cérebro-Computador/normas , Eletroencefalografia/normas , Modelos Neurológicos , Eletroencefalografia/métodos , Humanos
9.
JMIR Public Health Surveill ; 10: e48060, 2024 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-38592761

RESUMO

BACKGROUND: The decline in global child mortality is an important public health achievement, yet child mortality remains disproportionally high in many low-income countries like Guinea-Bissau. The persisting high mortality rates necessitate targeted research to identify vulnerable subgroups of children and formulate effective interventions. OBJECTIVE: This study aimed to discover subgroups of children at an elevated risk of mortality in the urban setting of Bissau, Guinea-Bissau, West Africa. By identifying these groups, we intend to provide a foundation for developing targeted health interventions and inform public health policy. METHODS: We used data from the health and demographic surveillance site, Bandim Health Project, covering 2003 to 2019. We identified baseline variables recorded before children reached the age of 6 weeks. The focus was on determining factors consistently linked with increased mortality up to the age of 3 years. Our multifaceted methodological approach incorporated spatial analysis for visualizing geographical variations in mortality risk, causally adjusted regression analysis to single out specific risk factors, and machine learning techniques for identifying clusters of multifactorial risk factors. To ensure robustness and validity, we divided the data set temporally, assessing the persistence of identified subgroups over different periods. The reassessment of mortality risk used the targeted maximum likelihood estimation (TMLE) method to achieve more robust causal modeling. RESULTS: We analyzed data from 21,005 children. The mortality risk (6 weeks to 3 years of age) was 5.2% (95% CI 4.8%-5.6%) for children born between 2003 and 2011, and 2.9% (95% CI 2.5%-3.3%) for children born between 2012 and 2016. Our findings revealed 3 distinct high-risk subgroups with notably higher mortality rates, children residing in a specific urban area (adjusted mortality risk difference of 3.4%, 95% CI 0.3%-6.5%), children born to mothers with no prenatal consultations (adjusted mortality risk difference of 5.8%, 95% CI 2.6%-8.9%), and children from polygamous families born during the dry season (adjusted mortality risk difference of 1.7%, 95% CI 0.4%-2.9%). These subgroups, though small, showed a consistent pattern of higher mortality risk over time. Common social and economic factors were linked to a larger share of the total child deaths. CONCLUSIONS: The study's results underscore the need for targeted interventions to address the specific risks faced by these identified high-risk subgroups. These interventions should be designed to work to complement broader public health strategies, creating a comprehensive approach to reducing child mortality. We suggest future research that focuses on developing, testing, and comparing targeted intervention strategies unraveling the proposed hypotheses found in this study. The ultimate aim is to optimize health outcomes for all children in high-mortality settings, leveraging a strategic mix of targeted and general health interventions to address the varied needs of different child subgroups.


Assuntos
Aprendizado de Máquina , Saúde Pública , Criança , Humanos , Lactente , Pré-Escolar , Guiné-Bissau/epidemiologia , Estudos de Coortes , Geografia
10.
IEEE Trans Neural Netw Learn Syst ; 34(9): 5531-5543, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34851838

RESUMO

Federated distillation (FD) is a popular novel algorithmic paradigm for Federated learning (FL), which achieves training performance competitive to prior parameter averaging-based methods, while additionally allowing the clients to train different model architectures, by distilling the client predictions on an unlabeled auxiliary set of data into a student model. In this work, we propose FedAUX, an extension to FD, which, under the same set of assumptions, drastically improves the performance by deriving maximum utility from the unlabeled auxiliary data. FedAUX modifies the FD training procedure in two ways: First, unsupervised pre-training on the auxiliary data is performed to find a suitable model initialization for the distributed training. Second, (ε, δ) -differentially private certainty scoring is used to weight the ensemble predictions on the auxiliary data according to the certainty of each client model. Experiments on large-scale convolutional neural networks (CNNs) and transformer models demonstrate that our proposed method achieves remarkable performance improvements over state-of-the-art FL methods, without adding appreciable computation, communication, or privacy cost. For instance, when training ResNet8 on non-independent identically distributed (i.i.d.) subsets of CIFAR10, FedAUX raises the maximum achieved validation accuracy from 30.4% to 78.1%, further closing the gap to centralized training performance. Code is available at https://github.com/fedl-repo/fedaux.

11.
IEEE Trans Neural Netw Learn Syst ; 34(10): 7675-7688, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35133968

RESUMO

Domain translation is the task of finding correspondence between two domains. Several deep neural network (DNN) models, e.g., CycleGAN and cross-lingual language models, have shown remarkable successes on this task under the unsupervised setting-the mappings between the domains are learned from two independent sets of training data in both domains (without paired samples). However, those methods typically do not perform well on a significant proportion of test samples. In this article, we hypothesize that many of such unsuccessful samples lie at the fringe-relatively low-density areas-of data distribution, where the DNN was not trained very well, and propose to perform the Langevin dynamics to bring such fringe samples toward high-density areas. We demonstrate qualitatively and quantitatively that our strategy, called Langevin cooling (L-Cool), enhances state-of-the-art methods in image translation and language translation tasks.

12.
Sci Rep ; 13(1): 9940, 2023 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-37336995

RESUMO

The goal of pollution forecasting models is to allow the prediction and control of the air quality. Non-linear data-driven approaches based on deep neural networks have been increasingly used in such contexts showing significant improvements w.r.t. more conventional approaches like regression models and mechanistic approaches. While such deep learning models were deemed for a long time as black boxes, recent advances in eXplainable AI (XAI) allow to look through the model's decision-making process, providing insights into decisive input features responsible for the model's prediction. One XAI technique to explain the predictions of neural networks which was proven useful in various domains is Layer-wise Relevance Propagation (LRP). In this work, we extend the LRP technique to a sequence-to-sequence neural network model with GRU layers. The explanation heatmaps provided by LRP allow us to identify important meteorological and temporal features responsible for the accumulation of four major pollutants in the air ([Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text]), and our findings can be backed up with prior knowledge in environmental and pollution research. This illustrates the appropriateness of XAI for understanding pollution forecastings and opens up new avenues for controlling and mitigating the pollutants' load in the air.

13.
Artigo em Inglês | MEDLINE | ID: mdl-35797317

RESUMO

A recent trend in machine learning has been to enrich learned models with the ability to explain their own predictions. The emerging field of explainable AI (XAI) has so far mainly focused on supervised learning, in particular, deep neural network classifiers. In many practical problems, however, the label information is not given and the goal is instead to discover the underlying structure of the data, for example, its clusters. While powerful methods exist for extracting the cluster structure in data, they typically do not answer the question why a certain data point has been assigned to a given cluster. We propose a new framework that can, for the first time, explain cluster assignments in terms of input features in an efficient and reliable manner. It is based on the novel insight that clustering models can be rewritten as neural networks-or "neuralized." Cluster predictions of the obtained networks can then be quickly and accurately attributed to the input features. Several showcases demonstrate the ability of our method to assess the quality of learned clusters and to extract novel insights from the analyzed data and representations.

14.
Med Phys ; 49(11): 7262-7277, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-35861655

RESUMO

PURPOSE: The coronary artery calcification (CAC) score is an independent marker for the risk of cardiovascular events. Automatic methods for quantifying CAC could reduce workload and assist radiologists in clinical decision-making. However, large annotated datasets are needed for training to achieve very good model performance, which is an expensive process and requires expert knowledge. The number of training data required can be reduced in an active learning scenario, which requires only the most informative samples to be labeled. Multitask learning techniques can improve model performance by joint learning of multiple related tasks and extraction of shared informative features. METHODS: We propose an uncertainty-weighted multitask learning model for coronary calcium scoring in electrocardiogram-gated (ECG-gated), noncontrast-enhanced cardiac calcium scoring CT. The model was trained to solve the two tasks of coronary artery region segmentation (weak labels) and coronary artery calcification segmentation (strong labels) simultaneously in an active learning scenario to improve model performance and reduce the number of samples needed for training. We compared our model with a single-task U-Net and a sequential-task model as well as other state-of-the-art methods. The model was evaluated on 1275 individual patients in three different datasets (DISCHARGE, CADMAN, orCaScore), and the relationship between model performance and various influencing factors (image noise, metal artifacts, motion artifacts, image quality) was analyzed. RESULTS: Joint learning of multiclass coronary artery region segmentation and binary coronary calcium segmentation improved calcium scoring performance. Since shared information can be learned from both tasks for complementary purposes, the model reached optimal performance with only 12% of the training data and one-third of the labeling time in an active learning scenario. We identified image noise as one of the most important factors influencing model performance along with anatomical abnormalities and metal artifacts. CONCLUSIONS: Our multitask learning approach with uncertainty-weighted loss improves calcium scoring performance by joint learning of shared features and reduces labeling costs when trained in an active learning scenario.


Assuntos
Cálcio , Calcificação Vascular , Humanos
15.
PLoS One ; 17(10): e0274291, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36256665

RESUMO

There is an increasing number of medical use cases where classification algorithms based on deep neural networks reach performance levels that are competitive with human medical experts. To alleviate the challenges of small dataset sizes, these systems often rely on pretraining. In this work, we aim to assess the broader implications of these approaches in order to better understand what type of pretraining works reliably (with respect to performance, robustness, learned representation etc.) in practice and what type of pretraining dataset is best suited to achieve good performance in small target dataset size scenarios. Considering diabetic retinopathy grading as an exemplary use case, we compare the impact of different training procedures including recently established self-supervised pretraining methods based on contrastive learning. To this end, we investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions. Our results indicate that models initialized from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions. In particular, self-supervised models show further benefits to supervised models. Self-supervised models with initialization from ImageNet pretraining not only report higher performance, they also reduce overfitting to large lesions along with improvements in taking into account minute lesions indicative of the progression of the disease. Understanding the effects of pretraining in a broader sense that goes beyond simple performance comparisons is of crucial importance for the broader medical imaging community beyond the use case considered in this work.


Assuntos
Diabetes Mellitus , Retinopatia Diabética , Humanos , Redes Neurais de Computação , Algoritmos , Análise de Sistemas
16.
Int J Epidemiol ; 51(5): 1622-1636, 2022 10 13.
Artigo em Inglês | MEDLINE | ID: mdl-35526156

RESUMO

Nearly all diseases are caused by different combinations of exposures. Yet, most epidemiological studies focus on estimating the effect of a single exposure on a health outcome. We present the Causes of Outcome Learning approach (CoOL), which seeks to discover combinations of exposures that lead to an increased risk of a specific outcome in parts of the population. The approach allows for exposures acting alone and in synergy with others. The road map of CoOL involves (i) a pre-computational phase used to define a causal model; (ii) a computational phase with three steps, namely (a) fitting a non-negative model on an additive scale, (b) decomposing risk contributions and (c) clustering individuals based on the risk contributions into subgroups; and (iii) a post-computational phase on hypothesis development, validation and triangulation using new data before eventually updating the causal model. The computational phase uses a tailored neural network for the non-negative model on an additive scale and layer-wise relevance propagation for the risk decomposition through this model. We demonstrate the approach on simulated and real-life data using the R package 'CoOL'. The presentation focuses on binary exposures and outcomes but can also be extended to other measurement types. This approach encourages and enables researchers to identify combinations of exposures as potential causes of the health outcome of interest. Expanding our ability to discover complex causes could eventually result in more effective, targeted and informed interventions prioritized for their public health impact.


Assuntos
Aprendizado de Máquina , Saúde Pública , Causalidade , Humanos , Avaliação de Resultados em Cuidados de Saúde
17.
Sci Rep ; 12(1): 18991, 2022 11 08.
Artigo em Inglês | MEDLINE | ID: mdl-36347879

RESUMO

Histological sections of the lymphatic system are usually the basis of static (2D) morphological investigations. Here, we performed a dynamic (4D) analysis of human reactive lymphoid tissue using confocal fluorescent laser microscopy in combination with machine learning. Based on tracks for T-cells (CD3), B-cells (CD20), follicular T-helper cells (PD1) and optical flow of follicular dendritic cells (CD35), we put forward the first quantitative analysis of movement-related and morphological parameters within human lymphoid tissue. We identified correlations of follicular dendritic cell movement and the behavior of lymphocytes in the microenvironment. In addition, we investigated the value of movement and/or morphological parameters for a precise definition of cell types (CD clusters). CD-clusters could be determined based on movement and/or morphology. Differentiating between CD3- and CD20 positive cells is most challenging and long term-movement characteristics are indispensable. We propose morphological and movement-related prototypes of cell entities applying machine learning models. Finally, we define beyond CD clusters new subgroups within lymphocyte entities based on long term movement characteristics. In conclusion, we showed that the combination of 4D imaging and machine learning is able to define characteristics of lymphocytes not visible in 2D histology.


Assuntos
Células Dendríticas Foliculares , Tecido Linfoide , Humanos , Tecido Linfoide/patologia , Células Dendríticas Foliculares/metabolismo , Linfócitos T Auxiliares-Indutores , Linfócitos , Aprendizado de Máquina
18.
IEEE J Biomed Health Inform ; 25(5): 1519-1528, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-32903191

RESUMO

Electrocardiography (ECG) is a very common, non-invasive diagnostic procedure and its interpretation is increasingly supported by algorithms. The progress in the field of automatic ECG analysis has up to now been hampered by a lack of appropriate datasets for training as well as a lack of well-defined evaluation procedures to ensure comparability of different algorithms. To alleviate these issues, we put forward first benchmarking results for the recently published, freely accessible clinical 12-lead ECG dataset PTB-XL, covering a variety of tasks from different ECG statement prediction tasks to age and sex prediction. Among the investigated deep-learning-based timeseries classification algorithms, we find that convolutional neural networks, in particular resnet- and inception-based architectures, show the strongest performance across all tasks. We find consistent results on the ICBEB2018 challenge ECG dataset and discuss prospects of transfer learning using classifiers pretrained on PTB-XL. These benchmarking results are complemented by deeper insights into the classification algorithm in terms of hidden stratification, model uncertainty and an exploratory interpretability analysis, which provide connecting points for future research on the dataset. Our results emphasize the prospects of deep-learning-based algorithms in the field of ECG analysis, not only in terms of quantitative accuracy but also in terms of clinically equally important further quality metrics such as uncertainty quantification and interpretability. With this resource, we aim to establish the PTB-XL dataset as a resource for structured benchmarking of ECG analysis algorithms and encourage other researchers in the field to join these efforts.


Assuntos
Benchmarking , Aprendizado Profundo , Eletrocardiografia , Algoritmos , Humanos , Redes Neurais de Computação
19.
IEEE Trans Neural Netw Learn Syst ; 32(8): 3710-3722, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-32833654

RESUMO

Federated learning (FL) is currently the most widely adopted framework for collaborative training of (deep) machine learning models under privacy constraints. Albeit its popularity, it has been observed that FL yields suboptimal results if the local clients' data distributions diverge. To address this issue, we present clustered FL (CFL), a novel federated multitask learning (FMTL) framework, which exploits geometric properties of the FL loss surface to group the client population into clusters with jointly trainable data distributions. In contrast to existing FMTL approaches, CFL does not require any modifications to the FL communication protocol to be made, is applicable to general nonconvex objectives (in particular, deep neural networks), does not require the number of clusters to be known a priori, and comes with strong mathematical guarantees on the clustering quality. CFL is flexible enough to handle client populations that vary over time and can be implemented in a privacy-preserving way. As clustering is only performed after FL has converged to a stationary point, CFL can be viewed as a postprocessing method that will always achieve greater or equal performance than conventional FL by allowing clients to arrive at more specialized models. We verify our theoretical analysis in experiments with deep convolutional and recurrent neural networks on commonly used FL data sets.

20.
Neural Netw ; 137: 1-17, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-33515855

RESUMO

Adversarial attacks on deep learning models have compromised their performance considerably. As remedies, a number of defense methods were proposed, which however, have been circumvented by newer and more sophisticated attacking strategies. In the midst of this ensuing arms race, the problem of robustness against adversarial attacks still remains a challenging task. This paper proposes a novel, simple yet effective defense strategy where off-manifold adversarial samples are driven towards high density regions of the data generating distribution of the (unknown) target class by the Metropolis-adjusted Langevin algorithm (MALA) with perceptual boundary taken into account. To achieve this task, we introduce a generative model of the conditional distribution of the inputs given labels that can be learned through a supervised Denoising Autoencoder (sDAE) in alignment with a discriminative classifier. Our algorithm, called MALA for DEfense (MALADE), is equipped with significant dispersion-projection is distributed broadly. This prevents white box attacks from accurately aligning the input to create an adversarial sample effectively. MALADE is applicable to any existing classifier, providing robust defense as well as off-manifold sample detection. In our experiments, MALADE exhibited state-of-the-art performance against various elaborate attacking strategies.


Assuntos
Segurança Computacional , Aprendizado Profundo/normas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA