RESUMO
Sudden cardiac death from arrhythmia is a major cause of mortality worldwide. Here, we develop a novel deep learning (DL) approach that blends neural networks and survival analysis to predict patient-specific survival curves from contrast-enhanced cardiac magnetic resonance images and clinical covariates for patients with ischemic heart disease. The DL-predicted survival curves offer accurate predictions at times up to 10 years and allow for estimation of uncertainty in predictions. The performance of this learning architecture was evaluated on multi-center internal validation data and tested on an independent test set, achieving concordance index of 0.83 and 0.74, and 10-year integrated Brier score of 0.12 and 0.14. We demonstrate that our DL approach with only raw cardiac images as input outperforms standard survival models constructed using clinical covariates. This technology has the potential to transform clinical decision-making by offering accurate and generalizable predictions of patient-specific survival probabilities of arrhythmic death over time.
RESUMO
Background: Visualizing fibrosis on cardiac magnetic resonance (CMR) imaging with contrast enhancement (late gadolinium enhancement; LGE) is paramount in characterizing disease progression and identifying arrhythmia substrates. Segmentation and fibrosis quantification from LGE-CMR is intensive, manual, and prone to interobserver variability. There is an unmet need for automated LGE-CMR image segmentation that ensures anatomical accuracy and seamless extraction of clinical features. Objective: This study aimed to develop a novel deep learning solution for analysis of contrast-enhanced CMR images that produces anatomically accurate myocardium and scar/fibrosis segmentations and uses these to calculate features of clinical interest. Methods: Data sources were 155 2-dimensional LGE-CMR patient scans (1124 slices) and 246 synthetic "LGE-like" scans (1360 slices) obtained from cine CMR using a novel style-transfer algorithm. We trained and tested a 3-stage neural network that identified the left ventricle (LV) region of interest (ROI), segmented ROI into viable myocardium and regions of enhancement, and postprocessed the segmentation results to enforce conforming to anatomical constraints. The segmentations were used to directly compute clinical features, such as LV volume and scar burden. Results: Predicted LV and scar segmentations achieved 96% and 75% balanced accuracy, respectively, and 0.93 and 0.57 Dice coefficient when compared to trained expert segmentations. The mean scar burden difference between manual and predicted segmentations was 2%. Conclusion: We developed and validated a deep neural network for automatic, anatomically accurate expert-level LGE- CMR myocardium and scar/fibrosis segmentation, allowing direct calculation of clinical measures. Given the training set heterogeneity, our approach could be extended to multiple imaging modalities and patient pathologies.
RESUMO
To solve key biomedical problems, experimentalists now routinely measure millions or billions of features (dimensions) per sample, with the hope that data science techniques will be able to build accurate data-driven inferences. Because sample sizes are typically orders of magnitude smaller than the dimensionality of these data, valid inferences require finding a low-dimensional representation that preserves the discriminating information (e.g., whether the individual suffers from a particular disease). There is a lack of interpretable supervised dimensionality reduction methods that scale to millions of dimensions with strong statistical theoretical guarantees. We introduce an approach to extending principal components analysis by incorporating class-conditional moment estimates into the low-dimensional projection. The simplest version, Linear Optimal Low-rank projection, incorporates the class-conditional means. We prove, and substantiate with both synthetic and real data benchmarks, that Linear Optimal Low-Rank Projection and its generalizations lead to improved data representations for subsequent classification, while maintaining computational efficiency and scalability. Using multiple brain imaging datasets consisting of more than 150 million features, and several genomics datasets with more than 500,000 features, Linear Optimal Low-Rank Projection outperforms other scalable linear dimensionality reduction techniques in terms of accuracy, while only requiring a few minutes on a standard desktop computer.
RESUMO
Particle- and agent-based systems are a ubiquitous modeling tool in many disciplines. We consider the fundamental problem of inferring the governing structure, i.e. interaction kernels, in a nonparametric fashion, from observations of agent-based dynamical systems. In particular, we are interested in collective dynamical systems exhibiting emergent behaviors with complicated interaction kernels, and for kernels which are parameterized by a single unknown parameter. This work extends the estimators introduced in Lu et al. (2019), which are based on suitably regularized least squares estimators, to these larger classes of systems. We provide extensive numerical evidence that the estimators provide faithful approximations to the interaction kernels, and provide accurate predictions for trajectories started at new initial conditions, both throughout the "training" time interval in which the observations were made, and often much beyond. We demonstrate these features on prototypical systems displaying collective behaviors, ranging from opinion dynamics, flocking dynamics, self-propelling particle dynamics, synchronized oscillator dynamics, to a gravitational system. Our experiments also suggest that our estimated systems can display the same emergent behaviors as the observed systems, including those that occur at larger timescales than those in the training data. Finally, in the case of families of systems governed by a parametric family of interaction kernels, we introduce novel estimators that estimate the parametric family of kernels, splitting it into a common interaction kernel and the action of parameters. We demonstrate this in the case of gravity, by learning both the "common component" 1/r 2 and the dependency on mass, without any a priori knowledge of either one, from observations of planetary motions in our solar system.
RESUMO
BACKGROUND: Transition zones between healthy myocardium and scar form a spatially complex substrate that may give rise to reentrant ventricular arrhythmias (VAs). We sought to assess the utility of a novel machine learning approach for quantifying 3-dimensional spatial complexity of grayscale patterns on late gadolinium enhanced cardiac magnetic resonance images to predict VAs in patients with ischemic cardiomyopathy. METHODS: One hundred twenty-two consecutive ischemic cardiomyopathy patients with left ventricular ejection fraction ≤35% without prior history of VAs underwent late gadolinium enhanced cardiac magnetic resonance images. From raw grayscale data, we generated graphs encoding the 3-dimensional geometry of the left ventricle. A novel technique, adapted to these graphs, assessed global regularity of signal intensity patterns using Fourier-like analysis and generated a substrate spatial complexity profile for each patient. A machine learning statistical algorithm was employed to discern which substrate spatial complexity profiles correlated with VA events (appropriate implantable cardioverter-defibrillator firings and arrhythmic sudden cardiac death) at 5 years of follow-up. From the statistical machine learning results, a complexity score ranging from 0 to 1 was calculated for each patient and tested using multivariable Cox regression models. RESULTS: At 5 years of follow-up, 40 patients had VA events. The machine learning algorithm classified with 81% overall accuracy and correctly classified 86% of those without VAs. Overall negative predictive value was 91%. Average complexity score was significantly higher in patients with VA events versus those without (0.5±0.5 versus 0.1±0.2; P<0.0001) and was independently associated with VA events in a multivariable model (hazard ratio, 1.5 [1.2-2.0]; P=0.002). CONCLUSIONS: Substrate spatial complexity analysis of late gadolinium enhanced cardiac magnetic resonance images may be helpful in refining VA risk in patients with ischemic cardiomyopathy, particularly to identify low-risk patients who may not benefit from prophylactic implantable cardioverter-defibrillator therapy. Visual Overview: A visual overview is available for this article.
Assuntos
Arritmias Cardíacas/etiologia , Cardiomiopatias/diagnóstico por imagem , Diagnóstico por Computador , Aprendizado de Máquina , Imageamento por Ressonância Magnética , Isquemia Miocárdica/complicações , Potenciais de Ação , Idoso , Arritmias Cardíacas/diagnóstico , Arritmias Cardíacas/fisiopatologia , Cardiomiopatias/complicações , Cardiomiopatias/fisiopatologia , Meios de Contraste/administração & dosagem , Morte Súbita Cardíaca/etiologia , Feminino , Análise de Fourier , Gadolínio DTPA/administração & dosagem , Frequência Cardíaca , Humanos , Imageamento Tridimensional , Masculino , Pessoa de Meia-Idade , Isquemia Miocárdica/diagnóstico por imagem , Isquemia Miocárdica/fisiopatologia , Valor Preditivo dos Testes , Prognóstico , Sistema de Registros , Estudos Retrospectivos , Medição de Risco , Fatores de Risco , Volume Sistólico , Estados Unidos , Função Ventricular EsquerdaRESUMO
Inferring the laws of interaction in agent-based systems from observational data is a fundamental challenge in a wide variety of disciplines. We propose a nonparametric statistical learning approach for distance-based interactions, with no reference or assumption on their analytical form, given data consisting of sampled trajectories of interacting agents. We demonstrate the effectiveness of our estimators both by providing theoretical guarantees that avoid the curse of dimensionality and by testing them on a variety of prototypical systems used in various disciplines. These systems include homogeneous and heterogeneous agent systems, ranging from particle systems in fundamental physics to agent-based systems that model opinion dynamics under the social influence, prey-predator dynamics, flocking and swarming, and phototaxis in cell dynamics.
Assuntos
Ciência de Dados/métodos , Conjuntos de Dados como Assunto , Interpretação Estatística de Dados , Estatísticas não ParamétricasRESUMO
Understanding the relationships between different properties of data, such as whether a genome or connectome has information about disease status, is increasingly important. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes and often are not interpretable. Our approach, 'Multiscale Graph Correlation' (MGC), is a dependence test that juxtaposes disparate data science techniques, including k-nearest neighbors, kernel methods, and multiscale analysis. Other methods may require double or triple the number of samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships, with dimensionality ranging from 1 to 1000. Moreover, MGC uniquely characterizes the latent geometry underlying the relationship, while maintaining computational efficiency. In real data, including brain imaging and cancer genetics, MGC detects the presence of a dependency and provides guidance for the next experiments to conduct.
Assuntos
Algoritmos , Análise de Dados , Biomarcadores Tumorais/metabolismo , Encéfalo/diagnóstico por imagem , Encéfalo/fisiologia , Simulação por Computador , Humanos , Neoplasias/metabolismo , Tamanho da AmostraRESUMO
The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3). The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, and as a result, the sequence composition and potential biological functions of these regions remain largely unexplored. Furthermore, existing genomic tools designed to predict consensus-based descriptions of repeat families cannot be readily applied to complex satellite repeats such as HSat2,3, which lack a consistent repeat unit reference sequence. Here we present an alignment-free method to characterize complex satellites using whole-genome shotgun read datasets. Utilizing this approach, we classify HSat2,3 sequences into fourteen subfamilies and predict their chromosomal distributions, resulting in a comprehensive satellite reference database to further enable genomic studies of heterochromatic regions. We also identify 1.3 Mb of non-repetitive sequence interspersed with HSat2,3 across 17 unmapped assembly scaffolds, including eight annotated gene predictions. Finally, we apply our satellite reference database to high-throughput sequence data from 396 males to estimate array size variation of the predominant HSat3 array on the Y chromosome, confirming that satellite array sizes can vary between individuals over an order of magnitude (7 to 98 Mb) and further demonstrating that array sizes are distributed differently within distinct Y haplogroups. In summary, we present a novel framework for generating initial reference databases for unassembled genomic regions enriched with complex satellite DNA, and we further demonstrate the utility of these reference databases for studying patterns of sequence variation within human populations.
Assuntos
Mapeamento Cromossômico/métodos , Cromossomos Humanos Y/genética , DNA Satélite/genética , Genoma Humano/genética , Heterocromatina/genética , Análise de Sequência de DNA/métodos , Sequência de Bases , Humanos , Dados de Sequência MolecularRESUMO
Recently, we have described a strong association of branched-chain amino acids (BCAA) and aromatic amino acids (AAA) with obesity and insulin resistance. In the current study, we have investigated the potential impact of BCAA on behavioral functions. We demonstrate that supplementation of either a high-sucrose or a high-fat diet with BCAA induces anxiety-like behavior in rats compared with control groups fed on unsupplemented diets. These behavioral changes are associated with a significant decrease in the concentration of tryptophan (Trp) in brain tissues and a consequent decrease in serotonin but no difference in indices of serotonin synaptic function. The anxiety-like behaviors and decreased levels of Trp in the brain of BCAA-fed rats were reversed by supplementation of Trp in the drinking water but not by administration of fluoxetine, a selective serotonin reuptake inhibitor, suggesting that the behavioral changes are independent of the serotonergic pathway of Trp metabolism. Instead, BCAA supplementation lowers the brain levels of another Trp-derived metabolite, kynurenic acid, and these levels are normalized by Trp supplementation. We conclude that supplementation of high-energy diets with BCAA causes neurobehavioral impairment. Since BCAA are elevated spontaneously in human obesity, our studies suggest a potential mechanism for explaining the strong association of obesity and mood disorders.
Assuntos
Aminoácidos de Cadeia Ramificada/efeitos adversos , Ansiedade/etiologia , Encéfalo/metabolismo , Dieta/efeitos adversos , Neurônios/metabolismo , Aminoácidos de Cadeia Ramificada/sangue , Animais , Ansiedade/metabolismo , Ansiedade/fisiopatologia , Ansiedade/prevenção & controle , Comportamento Animal , Encéfalo/fisiopatologia , Dieta Hiperlipídica/efeitos adversos , Sacarose Alimentar/efeitos adversos , Comportamento Exploratório , Ácido Cinurênico/metabolismo , Masculino , Transtornos do Humor/etiologia , Obesidade/etiologia , Obesidade/psicologia , Ratos , Ratos Wistar , Serotonina/metabolismo , Triptofano/metabolismo , Triptofano/uso terapêutico , Aumento de PesoRESUMO
We present a multiscale method for the determination of collective reaction coordinates for macromolecular dynamics based on two recently developed mathematical techniques: diffusion map and the determination of local intrinsic dimensionality of large datasets. Our method accounts for the local variation of molecular configuration space, and the resulting global coordinates are correlated with the time scales of the molecular motion. To illustrate the approach, we present results for two model systems: all-atom alanine dipeptide and coarse-grained src homology 3 protein domain. We provide clear physical interpretation for the emerging coordinates and use them to calculate transition rates. The technique is general enough to be applied to any system for which a Boltzmann-sampled set of molecular configurations is available.
RESUMO
A recent study on the dynamics of polymer reversal inside a nanopore by Huang and Makarov [J. Chem. Phys. 128, 114903 (2008)] demonstrated that the reaction rate cannot be reproduced by projecting the dynamics onto a single empirical reaction coordinate, a result suggesting the dynamics of this system cannot be correctly described by using a single collective coordinate. To further investigate this possibility we have applied our recently developed multiscale framework, locally scaled diffusion map (LSDMap), to obtain collective reaction coordinates for this system. Using a single diffusion coordinate, we obtain a reversal rate via Kramers expression that is in good agreement with the exact rate obtained from the simulations. Our mathematically rigorous approach accounts for the local heterogeneity of molecular configuration space in constructing a diffusion map, from which collective coordinates emerge. We believe this approach can be applied in general to characterize complex macromolecular dynamics by providing an accurate definition of the collective coordinates associated with processes at different time scales.
Assuntos
Físico-Química/métodos , Simulação de Dinâmica Molecular , Polímeros/química , Algoritmos , Difusão , Cinética , Conformação Molecular , Nanoporos , Termodinâmica , Fatores de TempoRESUMO
We use heat kernels or eigenfunctions of the Laplacian to construct local coordinates on large classes of Euclidean domains and Riemannian manifolds (not necessarily smooth, e.g., with (alpha) metric). These coordinates are bi-Lipschitz on large neighborhoods of the domain or manifold, with constants controlling the distortion and the size of the neighborhoods that depend only on natural geometric properties of the domain or manifold. The proof of these results relies on novel estimates, from above and below, for the heat kernel and its gradient, as well as for the eigenfunctions of the Laplacian and their gradient, that hold in the non-smooth category, and are stable with respect to perturbations within this category. Finally, these coordinate systems are intrinsic and efficiently computable, and are of value in applications.
RESUMO
Harmonic analysis on manifolds and graphs has recently led to mathematical developments in the field of data analysis. The resulting new tools can be used to compress and analyze large and complex data sets, such as those derived from sensor networks or neuronal activity datasets, obtained in the laboratory or through computer modeling. The nature of the algorithms (based on diffusion maps and connectivity strengths on graphs) possesses a certain analogy with neural information processing, and has the potential to provide inspiration for modeling and understanding biological organization in perception and memory formation.
Assuntos
Modelos Neurológicos , Modelos Teóricos , Redes Neurais de Computação , AlgoritmosRESUMO
Recurrence of hepatitis B impairs the outcome of liver transplantation (OLT). In serum hepatitis B virus (HBV)-DNA-positive recipients, prophylaxis using lamivudine and immunoglobulins (HBIg) reduces the risk of recurrence, but it is undefined whether this regimen also protects candidates with YMDD mutants. Seventeen OLT viraemic candidates received pre-emptive lamivudine followed by post-OLT prophylaxis with lamivudine and HBIg. Both sera and liver biopsies were prospectively collected and high-sensitive polymerase chain reaction (PCR) assay was applied for HBV-DNA detection. Finally, the presence of YMDD mutants was explored in all PCR-positive samples. All patients remained hepatitis B recurrence-free after a mean follow up of 32 months. By PCR, serum HBV-DNA was detectable in 64.3% of cases at OLT-baseline, in 64.7% under combined prophylaxis and in 58.8% in patients (70.5% of the total) with a minimum follow up of 24 months. At OLT-baseline, YMDD mutants were found in 44.4% of patients. After OLT, mutants were present in 50% of patients but only in 16.6% of cases in the long period. Although 41% of the native livers and 42.8% of the analysed grafts harboured HBV-DNA, YMDD mutants were detected in 57% of the native positive livers. YMDD mutants were largely detected both at OLT-baseline and post-OLT, but their presence decreased over time. Regardless of the presence of YMDD mutants, no hepatitis B recurrence was observed in our OLT recipients using pre-emptive lamivudine followed by continuous prophylaxis with lamivudine and HBIg.
Assuntos
Antivirais/administração & dosagem , Hepatite B/prevenção & controle , Transplante de Fígado/efeitos adversos , Adulto , DNA Viral/sangue , DNA Viral/genética , Farmacorresistência Viral/genética , Quimioterapia Combinada , Hepatite B/etiologia , Hepatite B/virologia , Anticorpos Anti-Hepatite B/administração & dosagem , Vírus da Hepatite B/efeitos dos fármacos , Vírus da Hepatite B/genética , Humanos , Imunossupressores/uso terapêutico , Lamivudina/administração & dosagem , Masculino , Pessoa de Meia-Idade , Mutação , Estudos Prospectivos , RecidivaRESUMO
An approximating neural model, called hierarchical radial basis function (HRBF) network, is presented here. This is a self-organizing (by growing) multiscale version of a radial basis function (RBF) network. It is constituted of hierarchical layers, each containing a Gaussian grid at a decreasing scale. The grids are not completely filled, but units are inserted only where the local error is over threshold. This guarantees a uniform residual error and the allocation of more units with smaller scales where the data contain higher frequencies. Only local operations, which do not require any iteration on the data, are required; this allows to construct the network in quasi-real time. Through harmonic analysis, it is demonstrated that, although a HRBF cannot be reduced to a traditional wavelet-based multiresolution analysis (MRA), it does employ Riesz bases and enjoys asymptotic approximation properties for a very large class of functions. HRBF networks have been extensively applied to the reconstruction of three-dimensional (3-D) models from noisy range data. The results illustrate their power in denoising the original data, obtaining an effective multiscale reconstruction of better quality than that obtained by MRA.