Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 38(19): 4554-4561, 2022 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-35929808

RESUMO

MOTIVATION: In many biomedical studies, there arises the need to integrate data from multiple directly or indirectly related sources. Collective matrix factorization (CMF) and its variants are models designed to collectively learn from arbitrary collections of matrices. The latent factors learnt are rich integrative representations that can be used in downstream tasks, such as clustering or relation prediction with standard machine-learning models. Previous CMF-based methods have numerous modeling limitations. They do not adequately capture complex non-linear interactions and do not explicitly model varying sparsity and noise levels in the inputs, and some cannot model inputs with multiple datatypes. These inadequacies limit their use on many biomedical datasets. RESULTS: To address these limitations, we develop Neural Collective Matrix Factorization (NCMF), the first fully neural approach to CMF. We evaluate NCMF on relation prediction tasks of gene-disease association prediction and adverse drug event prediction, using multiple datasets. In each case, data are obtained from heterogeneous publicly available databases and used to learn representations to build predictive models. NCMF is found to outperform previous CMF-based methods and several state-of-the-art graph embedding methods for representation learning in our experiments. Our experiments illustrate the versatility and efficacy of NCMF in representation learning for seamless integration of heterogeneous data. AVAILABILITY AND IMPLEMENTATION: https://github.com/ajayago/NCMF_bioinformatics. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado de Máquina , Bases de Dados Factuais
2.
Bioinformatics ; 37(8): 1083-1092, 2021 05 23.
Artigo em Inglês | MEDLINE | ID: mdl-33135733

RESUMO

MOTIVATION: The study of the evolutionary history of biological networks enables deep functional understanding of various bio-molecular processes. Network growth models, such as the Duplication-Mutation with Complementarity (DMC) model, provide a principled approach to characterizing the evolution of protein-protein interactions (PPIs) based on duplication and divergence. Current methods for model-based ancestral network reconstruction primarily use greedy heuristics and yield sub-optimal solutions. RESULTS: We present a new Integer Linear Programming (ILP) solution for maximum likelihood reconstruction of ancestral PPI networks using the DMC model. We prove the correctness of our solution that is designed to find the optimal solution. It can also use efficient heuristics from general-purpose ILP solvers to obtain multiple optimal and near-optimal solutions that may be useful in many applications. Experiments on synthetic data show that our ILP obtains solutions with higher likelihood than those from previous methods, and is robust to noise and model mismatch. We evaluate our algorithm on two real PPI networks, with proteins from the families of bZIP transcription factors and the Commander complex. On both the networks, solutions from our ILP have higher likelihood and are in better agreement with independent biological evidence from other studies. AVAILABILITY AND IMPLEMENTATION: A Python implementation is available at https://bitbucket.org/cdal/network-reconstruction. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Programação Linear , Probabilidade , Proteínas
3.
Bioinformatics ; 36(Suppl_1): i3-i11, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657364

RESUMO

MOTIVATION: Metagenomics studies have provided key insights into the composition and structure of microbial communities found in different environments. Among the techniques used to analyse metagenomic data, binning is considered a crucial step to characterize the different species of micro-organisms present. The use of short-read data in most binning tools poses several limitations, such as insufficient species-specific signal, and the emergence of long-read sequencing technologies offers us opportunities to surmount them. However, most current metagenomic binning tools have been developed for short reads. The few tools that can process long reads either do not scale with increasing input size or require a database with reference genomes that are often unknown. In this article, we present MetaBCC-LR, a scalable reference-free binning method which clusters long reads directly based on their k-mer coverage histograms and oligonucleotide composition. RESULTS: We evaluate MetaBCC-LR on multiple simulated and real metagenomic long-read datasets with varying coverages and error rates. Our experiments demonstrate that MetaBCC-LR substantially outperforms state-of-the-art reference-free binning tools, achieving ∼13% improvement in F1-score and ∼30% improvement in ARI compared to the best previous tools. Moreover, we show that using MetaBCC-LR before long-read assembly helps to enhance the assembly quality while significantly reducing the assembly cost in terms of time and memory usage. The efficiency and accuracy of MetaBCC-LR pave the way for more effective long-read-based metagenomics analyses to support a wide range of applications. AVAILABILITY AND IMPLEMENTATION: The source code is freely available at: https://github.com/anuradhawick/MetaBCC-LR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Metagenômica , Metagenoma , Análise de Sequência de DNA , Software
4.
Bioinformatics ; 36(2): 621-628, 2020 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-31368480

RESUMO

MOTIVATION: The identification of sub-populations of patients with similar characteristics, called patient subtyping, is important for realizing the goals of precision medicine. Accurate subtyping is crucial for tailoring therapeutic strategies that can potentially lead to reduced mortality and morbidity. Model-based clustering, such as Gaussian mixture models, provides a principled and interpretable methodology that is widely used to identify subtypes. However, they impose identical marginal distributions on each variable; such assumptions restrict their modeling flexibility and deteriorates clustering performance. RESULTS: In this paper, we use the statistical framework of copulas to decouple the modeling of marginals from the dependencies between them. Current copula-based methods cannot scale to high dimensions due to challenges in parameter inference. We develop HD-GMCM, that addresses these challenges and, to our knowledge, is the first copula-based clustering method that can fit high-dimensional data. Our experiments on real high-dimensional gene-expression and clinical datasets show that HD-GMCM outperforms state-of-the-art model-based clustering methods, by virtue of modeling non-Gaussian data and being robust to outliers through the use of Gaussian mixture copulas. We present a case study on lung cancer data from TCGA. Clusters obtained from HD-GMCM can be interpreted based on the dependencies they model, that offers a new way of characterizing subtypes. Empirically, such modeling not only uncovers latent structure that leads to better clustering but also meaningful clinical subtypes in terms of survival rates of patients. AVAILABILITY AND IMPLEMENTATION: An implementation of HD-GMCM in R is available at: https://bitbucket.org/cdal/hdgmcm/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biometria , Medicina de Precisão , Algoritmos , Análise por Conglomerados , Humanos , Distribuição Normal
5.
Bioinformatics ; 36(7): 2209-2216, 2020 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-31782759

RESUMO

MOTIVATION: A synthetic lethal (SL) interaction is a relationship between two functional entities where the loss of either one of the entities is viable but the loss of both entities is lethal to the cell. Such pairs can be used as drug targets in targeted anticancer therapies, and so, many methods have been developed to identify potential candidate SL pairs. However, these methods use only a subset of available data from multiple platforms, at genomic, epigenomic and transcriptomic levels; and hence are limited in their ability to learn from complex associations in heterogeneous data sources. RESULTS: In this article, we develop techniques that can seamlessly integrate multiple heterogeneous data sources to predict SL interactions. Our approach obtains latent representations by collective matrix factorization-based techniques, which in turn are used for prediction through matrix completion. Our experiments, on a variety of biological datasets, illustrate the efficacy and versatility of our approach, that outperforms state-of-the-art methods for predicting SL interactions and can be used with heterogeneous data sources with minimal feature engineering. AVAILABILITY AND IMPLEMENTATION: Software available at https://github.com/lianyh. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Software , Armazenamento e Recuperação da Informação
6.
Mol Biol Evol ; 30(3): 689-712, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23193120

RESUMO

Inaccurate inference of positional homologies in multiple sequence alignments and systematic errors introduced by alignment heuristics obfuscate phylogenetic inference. Alignment masking, the elimination of phylogenetically uninformative or misleading sites from an alignment before phylogenetic analysis, is a common practice in phylogenetic analysis. Although masking is often done manually, automated methods are necessary to handle the much larger data sets being prepared today. In this study, we introduce the concept of subsplits and demonstrate their use in extracting phylogenetic signal from alignments. We design a clustering approach for alignment masking where each cluster contains similar columns-similarity being defined on the basis of compatible subsplits; our approach then identifies noisy clusters and eliminates them. Trees inferred from the columns in the retained clusters are found to be topologically closer to the reference trees. We test our method on numerous standard benchmarks (both synthetic and biological data sets) and compare its performance with other methods of alignment masking. We find that our method can eliminate sites more accurately than other methods, particularly on divergent data, and can improve the topologies of the inferred trees in likelihood-based analyses. Software available upon request from the author.


Assuntos
Filogenia , Alinhamento de Sequência/métodos , Algoritmos , Animais , Artrópodes/genética , Análise por Conglomerados , Escherichia coli/genética , Proteínas de Insetos/genética , Funções Verossimilhança , Proteínas Mitocondriais/genética , Dados de Sequência Molecular , Homologia de Sequência
7.
IEEE J Biomed Health Inform ; 28(7): 4269-4280, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38662559

RESUMO

Explainable Artificial Intelligence (XAI) techniques generate explanations for predictions from AI models. These explanations can be evaluated for (i) faithfulness to the prediction, i.e., its correctness about the reasons for prediction, and (ii) usefulness to the user. While there are metrics to evaluate faithfulness, to our knowledge, there are no automated metrics to evaluate the usefulness of explanations in the clinical context. Our objective is to develop a new metric to evaluate usefulness of AI explanations to clinicians. Usefulness evaluation needs to consider both (a) how humans generally process explanations and (b) clinicians' specific requirements from explanations presented by clinical decision support systems (CDSS). Our new scoring method can evaluate the usefulness of explanations generated by any XAI method that provides importance values for the input features of the prediction model. Our method draws on theories from social science to gauge usefulness, and uses literature-derived biomedical knowledge graphs to quantify support for the explanations from clinical literature. We evaluate our method in a case study on predicting onset of sepsis in intensive care units. Our analysis shows that the scores obtained using our method corroborate with independent evidence from clinical literature and have the required qualities expected from such a metric. Thus, our method can be used to evaluate and select useful explanations from a diverse set of XAI techniques in clinical contexts, making it a fundamental tool for future research in the design of AI-driven CDSS.


Assuntos
Algoritmos , Inteligência Artificial , Tomada de Decisão Clínica , Sistemas de Apoio a Decisões Clínicas , Ciências Sociais , Humanos , Tomada de Decisão Clínica/métodos , Ciências Sociais/métodos , Sepse/diagnóstico
8.
IEEE J Biomed Health Inform ; 28(3): 1785-1796, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38227408

RESUMO

A Synthetic Lethal (SL) interaction is a functional relationship between two genes or functional entities where the loss of either entity is viable but the loss of both is lethal. Such pairs can be used to develop targeted anticancer therapies with fewer side effects and reduced overtreatment. However, finding clinically relevant SL interactions remains challenging. Leveraging unified gene expression data of both disease-free and cancerous samples, we design a new technique based on statistical hypothesis testing, called ASTER, to identify SL pairs. We empirically find that the patterns of mutually exclusivity ASTER finds using genomic and transcriptomic data provides a strong signal of synthetic lethality. For large-scale multiple hypothesis testing, we develop an extension called ASTER++ that can utilize additional input gene features within the hypothesis testing framework. Our computational and functional experiments demonstrate the efficacy of ASTER in identifying SL pairs with potential therapeutic benefits.


Assuntos
Genômica , Neoplasias , Humanos , Neoplasias/genética , Neoplasias/tratamento farmacológico , Perfilação da Expressão Gênica
9.
Bioinformatics ; 28(24): 3324-5, 2012 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-23060619

RESUMO

TIBA is a tool to reconstruct phylogenetic trees from rearrangement data that consist of ordered lists of synteny blocks (or genes), where each synteny block is shared with all of its homologues in the input genomes. The evolution of these synteny blocks, through rearrangement operations, is modelled by the uniform Double-Cut-and-Join model. Using a true distance estimate under this model and simple distance-based methods, TIBA reconstructs a phylogeny of the input genomes. Unlike any previous tool for inferring phylogenies from rearrangement data, TIBA uses novel methods of robustness estimation to provide support values for the edges in the inferred tree.


Assuntos
Filogenia , Software , Evolução Molecular , Genoma , Sintenia
10.
IEEE J Biomed Health Inform ; 27(10): 5076-5086, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37819834

RESUMO

Risk models play a crucial role in disease prevention, particularly in intensive care units (ICUs). Diseases often have complex manifestations with heterogeneous subpopulations, or subtypes, that exhibit distinct clinical characteristics. Risk models that explicitly model subtypes have high predictive accuracy and facilitate subtype-specific personalization. Such models combine clustering and classification methods but do not effectively utilize the inferred subtypes in risk modeling. Their limitations include tendency to obtain degenerate clusters and cluster-specific data scarcity leading to insufficient training data for the corresponding classifier. In this article, we develop a new deep learning model for simultaneous clustering and classification, ExpertNet, with novel loss terms and network training strategies that address these limitations. The performance of ExpertNet is evaluated on the tasks of predicting risk of (i) sepsis and (ii) acute respiratory distress syndrome (ARDS), using two large electronic medical records datasets from ICUs. Our extensive experiments show that, in comparison to state-of-the-art baselines for combined clustering and classification, ExpertNet achieves superior accuracy in risk prediction for both ARDS and sepsis; and comparable clustering performance. Visual analysis of the clusters further demonstrates that the clusters obtained are clinically meaningful and a knowledge-distilled model shows significant differences in risk factors across the subtypes. By addressing technical challenges in training neural networks for simultaneous clustering and classification, ExpertNet lays the algorithmic foundation for the future development of subtype-aware risk models.


Assuntos
Aprendizado Profundo , Síndrome do Desconforto Respiratório , Sepse , Humanos , Redes Neurais de Computação , Unidades de Terapia Intensiva , Sepse/diagnóstico
11.
Sci Rep ; 13(1): 19164, 2023 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-37932317

RESUMO

Clustering is a fundamental tool for exploratory data analysis, and is ubiquitous across scientific disciplines. Gaussian Mixture Model (GMM) is a popular probabilistic and interpretable model for clustering. In many practical settings, the true data distribution, which is unknown, may be non-Gaussian and may be contaminated by noise or outliers. In such cases, clustering may still be done with a misspecified GMM. However, this may lead to incorrect classification of the underlying subpopulations. In this paper, we identify and characterize the problem of inferior clustering solutions. Similar to well-known spurious solutions, these inferior solutions have high likelihood and poor cluster interpretation; however, they differ from spurious solutions in other characteristics, such as asymmetry in the fitted components. We theoretically analyze this asymmetry and its relation to misspecification. We propose a new penalty term that is designed to avoid both inferior and spurious solutions. Using this penalty term, we develop a new model selection criterion and a new GMM-based clustering algorithm, SIA. We empirically demonstrate that, in cases of misspecification, SIA avoids inferior solutions and outperforms previous GMM-based clustering methods.

12.
Nat Commun ; 14(1): 384, 2023 01 24.
Artigo em Inglês | MEDLINE | ID: mdl-36693837

RESUMO

Single cell data integration methods aim to integrate cells across data batches and modalities, and data integration tasks can be categorized into horizontal, vertical, diagonal, and mosaic integration, where mosaic integration is the most general and challenging case with few methods developed. We propose scMoMaT, a method that is able to integrate single cell multi-omics data under the mosaic integration scenario using matrix tri-factorization. During integration, scMoMaT is also able to uncover the cluster specific bio-markers across modalities. These multi-modal bio-markers are used to interpret and annotate the clusters to cell types. Moreover, scMoMaT can integrate cell batches with unequal cell type compositions. Applying scMoMaT to multiple real and simulated datasets demonstrated these features of scMoMaT and showed that scMoMaT has superior performance compared to existing methods. Specifically, we show that integrated cell embedding combined with learned bio-markers lead to cell type annotations of higher quality or resolution compared to their original annotations.


Assuntos
Multiômica , Software
13.
IEEE J Biomed Health Inform ; 26(6): 2830-2838, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-34990373

RESUMO

Study of pairwise genetic interactions, such as mutually exclusive mutations, has led to understanding of underlying mechanisms in cancer. Investigation of various combinatorial motifs within networks of such interactions can lead to deeper insights into its mutational landscape and inform therapy development. One such motif called the Between-Pathway Model (BPM) represents redundant or compensatory pathways that can be therapeutically exploited. Finding such BPM motifs is challenging since most formulations require solving variants of the NP-complete maximum weight bipartite subgraph problem. In this paper we design an algorithm based on Integer Linear Programming (ILP) to solve this problem. In our experiments, our approach outperforms the best previous method to mine BPM motifs. Further, our ILP-based approach allows us to easily model additional application-specific constraints. We illustrate this advantage through a new application of BPM motifs that can potentially aid in finding combination therapies to combat cancer.


Assuntos
Algoritmos , Neoplasias , Epistasia Genética , Humanos , Neoplasias/genética , Neoplasias/terapia
14.
JMIR Med Inform ; 10(1): e28842, 2022 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-35049514

RESUMO

BACKGROUND: Patient representation learning aims to learn features, also called representations, from input sources automatically, often in an unsupervised manner, for use in predictive models. This obviates the need for cumbersome, time- and resource-intensive manual feature engineering, especially from unstructured data such as text, images, or graphs. Most previous techniques have used neural network-based autoencoders to learn patient representations, primarily from clinical notes in electronic medical records (EMRs). Knowledge graphs (KGs), with clinical entities as nodes and their relations as edges, can be extracted automatically from biomedical literature and provide complementary information to EMR data that have been found to provide valuable predictive signals. OBJECTIVE: This study aims to evaluate the efficacy of collective matrix factorization (CMF), both the classical variant and a recent neural architecture called deep CMF (DCMF), in integrating heterogeneous data sources from EMR and KG to obtain patient representations for clinical decision support tasks. METHODS: Using a recent formulation for obtaining graph representations through matrix factorization within the context of CMF, we infused auxiliary information during patient representation learning. We also extended the DCMF architecture to create a task-specific end-to-end model that learns to simultaneously find effective patient representations and predictions. We compared the efficacy of such a model to that of first learning unsupervised representations and then independently learning a predictive model. We evaluated patient representation learning using CMF-based methods and autoencoders for 2 clinical decision support tasks on a large EMR data set. RESULTS: Our experiments show that DCMF provides a seamless way for integrating multiple sources of data to obtain patient representations, both in unsupervised and supervised settings. Its performance in single-source settings is comparable with that of previous autoencoder-based representation learning methods. When DCMF is used to obtain representations from a combination of EMR and KG, where most previous autoencoder-based methods cannot be used directly, its performance is superior to that of previous nonneural methods for CMF. Infusing information from KGs into patient representations using DCMF was found to improve downstream predictive performance. CONCLUSIONS: Our experiments indicate that DCMF is a versatile model that can be used to obtain representations from single and multiple data sources and combine information from EMR data and KGs. Furthermore, DCMF can be used to learn representations in both supervised and unsupervised settings. Thus, DCMF offers an effective way of integrating heterogeneous data sources and infusing auxiliary knowledge into patient representations.

15.
JMIR Med Inform ; 9(10): e32730, 2021 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-34694230

RESUMO

BACKGROUND: Adverse drug events (ADEs) are unintended side effects of drugs that cause substantial clinical and economic burdens globally. Not all ADEs are discovered during clinical trials; therefore, postmarketing surveillance, called pharmacovigilance, is routinely conducted to find unknown ADEs. A wealth of information, which facilitates ADE discovery, lies in the growing body of biomedical literature. Knowledge graphs (KGs) encode information from the literature, where the vertices and the edges represent clinical concepts and their relations, respectively. The scale and unstructured form of the literature necessitates the use of natural language processing (NLP) to automatically create such KGs. Previous studies have demonstrated the utility of such literature-derived KGs in ADE prediction. Through unsupervised learning of the representations (features) of clinical concepts from the KG, which are used in machine learning models, state-of-the-art results for ADE prediction were obtained on benchmark data sets. OBJECTIVE: Due to the use of NLP to infer literature-derived KGs, there is noise in the form of false positive (erroneous) and false negative (absent) nodes and edges. Previous representation learning methods do not account for such inaccuracies in the graph. NLP algorithms can quantify the confidence in their inference of extracted concepts and relations from the literature. Our hypothesis, which motivates this work, is that by using such confidence scores during representation learning, the learned embeddings would yield better features for ADE prediction models. METHODS: We developed methods to use these confidence scores on two well-known representation learning methods-DeepWalk and Translating Embeddings for Modeling Multi-relational Data (TransE)-to develop their weighted versions: Weighted DeepWalk and Weighted TransE. These methods were used to learn representations from a large literature-derived KG, the Semantic MEDLINE Database, which contains more than 93 million clinical relations. They were compared with Embedding of Semantic Predications, which, to our knowledge, is the best reported representation learning method using the Semantic MEDLINE Database with state-of-the-art results for ADE prediction. Representations learned from different methods were used (separately) as features of drugs and diseases to build classification models for ADE prediction using benchmark data sets. The methods were compared rigorously over multiple cross-validation settings. RESULTS: The weighted versions we designed were able to learn representations that yielded more accurate predictive models than the corresponding unweighted versions of both DeepWalk and TransE, as well as Embedding of Semantic Predications, in our experiments. There were performance improvements of up to 5.75% in the F1-score and 8.4% in the area under the receiver operating characteristic curve value, thus advancing the state of the art in ADE prediction from literature-derived KGs. CONCLUSIONS: Our classification models can be used to aid pharmacovigilance teams in detecting potentially new ADEs. Our experiments demonstrate the importance of modeling inaccuracies in the inferred KGs for representation learning.

16.
BMC Bioinformatics ; 11 Suppl 1: S54, 2010 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-20122229

RESUMO

BACKGROUND: The rapidly increasing availability of whole-genome sequences has enabled the study of whole-genome evolution. Evolutionary mechanisms based on genome rearrangements have attracted much attention and given rise to many models; somewhat independently, the mechanisms of gene duplication and loss have seen much work. However, the two are not independent and thus require a unified treatment, which remains missing to date. Moreover, existing rearrangement models do not fit the dichotomy between most prokaryotic genomes (one circular chromosome) and most eukaryotic genomes (multiple linear chromosomes). RESULTS: To handle rearrangements, gene duplications and losses, we propose a new evolutionary model and the corresponding method for estimating true evolutionary distance. Our model, inspired from the DCJ model, is simple and the first to respect the prokaryotic/eukaryotic structural dichotomy. Experimental results on a wide variety of genome structures demonstrate the very high accuracy and robustness of our distance estimator. CONCLUSION: We give the first robust, statistically based, estimate of genomic pairwise distances based on rearrangements, duplications and losses, under a model that respects the structural dichotomy between prokaryotic and eukaryotic genomes. Accurate and robust estimates in true evolutionary distances should translate into much better phylogenetic reconstructions as well as more accurate genomic alignments, while our new model of genome rearrangements provides another refinement in simplicity and verisimilitude.


Assuntos
Evolução Molecular , Duplicação Gênica , Rearranjo Gênico/genética , Genômica/métodos , Genoma , Filogenia
17.
BMC Bioinformatics ; 11 Suppl 1: S30, 2010 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-20122203

RESUMO

BACKGROUND: The study of genome rearrangements has become a mainstay of phylogenetics and comparative genomics. Fundamental in such a study is the median problem: given three genomes find a fourth that minimizes the sum of the evolutionary distances between itself and the given three. Many exact algorithms and heuristics have been developed for the inversion median problem, of which the best known is MGR. RESULTS: We present a unifying framework for median heuristics, which enables us to clarify existing strategies and to place them in a partial ordering. Analysis of this framework leads to a new insight: the best strategies continue to refer to the input data rather than reducing the problem to smaller instances. Using this insight, we develop a new heuristic for inversion medians that uses input data to the end of its computation and leverages our previous work with DCJ medians. Finally, we present the results of extensive experimentation showing that our new heuristic outperforms all others in accuracy and, especially, in running time: the heuristic typically returns solutions within 1% of optimal and runs in seconds to minutes even on genomes with 25'000 genes--in contrast, MGR can take days on instances of 200 genes and cannot be used beyond 1'000 genes. CONCLUSION: Finding good rearrangement medians, in particular inversion medians, had long been regarded as the computational bottleneck in whole-genome studies. Our new heuristic for inversion medians, ASM, which dominates all others in our framework, puts that issue to rest by providing near-optimal solutions within seconds to minutes on even the largest genomes.


Assuntos
Algoritmos , Genoma , Genômica/métodos , Evolução Molecular , Rearranjo Gênico , Filogenia
18.
JMIR Mhealth Uhealth ; 7(1): e11098, 2019 01 16.
Artigo em Inglês | MEDLINE | ID: mdl-30664474

RESUMO

BACKGROUND: Fitness devices have spurred the development of apps that aim to motivate users, through interventions, to increase their physical activity (PA). Personalization in the interventions is essential as the target users are diverse with respect to their activity levels, requirements, preferences, and behavior. OBJECTIVE: This review aimed to (1) identify different kinds of personalization in interventions for promoting PA among any type of user group, (2) identify user models used for providing personalization, and (3) identify gaps in the current literature and suggest future research directions. METHODS: A scoping review was undertaken by searching the databases PsycINFO, PubMed, Scopus, and Web of Science. The main inclusion criteria were (1) studies that aimed to promote PA; (2) studies that had personalization, with the intention of promoting PA through technology-based interventions; and (3) studies that described user models for personalization. RESULTS: The literature search resulted in 49 eligible studies. Of these, 67% (33/49) studies focused solely on increasing PA, whereas the remaining studies had other objectives, such as maintaining healthy lifestyle (8 studies), weight loss management (6 studies), and rehabilitation (2 studies). The reviewed studies provide personalization in 6 categories: goal recommendation, activity recommendation, fitness partner recommendation, educational content, motivational content, and intervention timing. With respect to the mode of generation, interventions were found to be semiautomated or automatic. Of these, the automatic interventions were either knowledge-based or data-driven or both. User models in the studies were constructed with parameters from 5 categories: PA profile, demographics, medical data, behavior change technique (BCT) parameters, and contextual information. Only 27 of the eligible studies evaluated the interventions for improvement in PA, and 16 of these concluded that the interventions to increase PA are more effective when they are personalized. CONCLUSIONS: This review investigates personalization in the form of recommendations or feedback for increasing PA. On the basis of the review and gaps identified, research directions for improving the efficacy of personalized interventions are proposed. First, data-driven prediction techniques can facilitate effective personalization. Second, use of BCTs in automated interventions, and in combination with PA guidelines, are yet to be explored, and preliminary studies in this direction are promising. Third, systems with automated interventions also need to be suitably adapted to serve specific needs of patients with clinical conditions. Fourth, previous user models focus on single metric evaluations of PA instead of a potentially more effective, holistic, and multidimensional view. Fifth, with the widespread adoption of activity monitoring devices and mobile phones, personalized and dynamic user models can be created using available user data, including users' social profile. Finally, the long-term effects of such interventions as well as the technology medium used for the interventions need to be evaluated rigorously.


Assuntos
Retroalimentação , Monitores de Aptidão Física/tendências , Medicina de Precisão/métodos , Exercício Físico/psicologia , Monitores de Aptidão Física/normas , Promoção da Saúde/métodos , Humanos , Aplicativos Móveis/tendências , Medicina de Precisão/instrumentação , Medicina de Precisão/tendências , Singapura
19.
PLoS One ; 13(2): e0193259, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29474481

RESUMO

An Acute Hypotensive Episode (AHE) is the sudden onset of a sustained period of low blood pressure and is one among the most critical conditions in Intensive Care Units (ICU). Without timely medical care, it can lead to an irreversible organ damage and death. By identifying patients at risk for AHE early, adequate medical intervention can save lives and improve patient outcomes. In this paper, we design a novel dual-boundary classification based approach for identifying patients at risk for AHE. Our algorithm uses only simple summary statistics of past Blood Pressure measurements and can be used in an online environment facilitating real-time updates and prediction. We perform extensive experiments with more than 4,500 patient records and demonstrate that our method outperforms the previous best approaches of AHE prediction. Our method can identify AHE patients two hours in advance of the onset, giving sufficient time for appropriate clinical intervention with nearly 80% sensitivity and at 95% specificity, thus having very few false positives.


Assuntos
Pressão Sanguínea , Cuidados Críticos/métodos , Hipotensão , Sistemas Computadorizados de Registros Médicos , Modelos Cardiovasculares , Feminino , Humanos , Hipotensão/diagnóstico , Hipotensão/fisiopatologia , Masculino , Valor Preditivo dos Testes
20.
Annu Int Conf IEEE Eng Med Biol Soc ; 2017: 3660-3663, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29060692

RESUMO

Clinical time series, comprising of repeated clinical measurements provide valuable information of the trajectory of patients' condition. Linear dynamical systems (LDS) are used extensively in science and engineering for modeling time series data. The observation and state variables in LDS are assumed to be uniformly sampled in time with a fixed sampling rate. The observation sequence for clinical time series is often irregularly sampled and LDS do not model such data well. In this paper, we develop two LDS-based models for irregularly sampled data. The key idea is to incorporate a temporal difference variable within the state equations of LDS whose parameters are estimated using observed data. Our models are evaluated on prediction and imputation tasks using real irregularly sampled clinical time series data and are found to outperform state-of-the-art techniques.


Assuntos
Modelos Lineares
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA