Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37287133

RESUMO

MicroRNAs (miRNAs) are a family of non-coding RNA molecules with vital roles in regulating gene expression. Although researchers have recognized the importance of miRNAs in the development of human diseases, it is very resource-consuming to use experimental methods for identifying which dysregulated miRNA is associated with a specific disease. To reduce the cost of human effort, a growing body of studies has leveraged computational methods for predicting the potential miRNA-disease associations. However, the extant computational methods usually ignore the crucial mediating role of genes and suffer from the data sparsity problem. To address this limitation, we introduce the multi-task learning technique and develop a new model called MTLMDA (Multi-Task Learning model for predicting potential MicroRNA-Disease Associations). Different from existing models that only learn from the miRNA-disease network, our MTLMDA model exploits both miRNA-disease and gene-disease networks for improving the identification of miRNA-disease associations. To evaluate model performance, we compare our model with competitive baselines on a real-world dataset of experimentally supported miRNA-disease associations. Empirical results show that our model performs best using various performance metrics. We also examine the effectiveness of model components via ablation study and further showcase the predictive power of our model for six types of common cancers. The data and source code are available from https://github.com/qwslle/MTLMDA.


Assuntos
MicroRNAs , Neoplasias , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Algoritmos , Biologia Computacional/métodos , Neoplasias/genética , Software
2.
Sensors (Basel) ; 23(8)2023 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-37112479

RESUMO

A personalized point-of-interest (POI) recommender system is of great significance to facilitate the daily life of users. However, it suffers from some challenges, such as trustworthiness and data sparsity problems. Existing models only consider the trust user influence and ignore the role of the trust location. Furthermore, they fail to refine the influence of context factors and fusion between the user preference and context models. To address the trustworthiness problem, we propose a novel bidirectional trust-enhanced collaborative filtering model, which investigates the trust filtering from the views of users and locations. To tackle the data sparsity problem, we introduce temporal factor into the trust filtering of users as well as geographical and textual content factors into the trust filtering of locations. To further alleviate the sparsity of user-POI rating matrices, we employ a weighted matrix factorization fused with the POI category factor to learn the user preference. To integrate the trust filtering models and the user preference model, we develop a fused framework with two kinds of integrating methods in relation to the different impacts of factors on the POIs that users have visited and the POIs that users have not visited. Finally, we conduct extensive experiments on Gowalla and Foursquare datasets to evaluate our proposed POI recommendation model, and the results show that our proposed model improves by 13.87% at precision@5 and 10.36% at recall@5 over the state-of-the-art model, which demonstrates that our proposed model outperforms the state-of-the-art method.

3.
J Nutr ; 152(7): 1773-1782, 2022 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-35349691

RESUMO

BACKGROUND: Monitoring countries' progress toward the achievement of their nutrition targets is an important task, but data sparsity makes monitoring trends challenging. Childhood stunting and overweight data in the European region over the last 30 y have had low coverage and frequency, with most data only covering a portion of the complete age interval of 0-59 mo. OBJECTIVES: We implemented a statistical method to extract useful information on child malnutrition trends from sparse longitudinal data for these indicators. METHODS: Heteroscedastic penalized longitudinal mixed models were used to accommodate data sparsity and predict region-wide, country-level trends over time. We leveraged prevalence estimates stratified by sex and partial age intervals (i.e., intervals that do not cover the complete 0-59 mo), which expanded the available data (for stunting: from 84 sources and 428 prevalence estimates to 99 sources and 1786 estimates), improving the robustness of our analysis. RESULTS: Results indicated a generally decreasing trend in stunting and a stable, slightly diminishing rate for overweight, with large differences in trends between low- and middle-income countries compared with high-income countries. No differences were found between age groups and between sexes. Cross-validation results indicated that both stunting and overweight models were robust in estimating the indicators for our data (root mean squared error: 0.061 and 0.056; median absolute deviation: 0.045 and 0.042; for stunting and overweight, respectively). CONCLUSIONS: These statistical methods can provide useful and robust information on child malnutrition trends over time, even when data are sparse.


Assuntos
Transtornos da Nutrição Infantil , Desnutrição , Criança , Transtornos da Nutrição Infantil/epidemiologia , Transtornos do Crescimento/epidemiologia , Humanos , Renda , Desnutrição/epidemiologia , Estado Nutricional , Sobrepeso/epidemiologia , Prevalência
4.
Entropy (Basel) ; 24(4)2022 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-35455166

RESUMO

In the current era of online information overload, recommendation systems are very useful for helping users locate content that may be of interest to them. A personalized recommendation system presents content based on information such as a user's browsing history and the videos watched. However, information filtering-based recommendation systems are vulnerable to data sparsity and cold-start problems. Additionally, existing recommendation systems suffer from the large overhead incurred in learning regression models used for preference prediction or in selecting groups of similar users. In this study, we propose a preference-tree-based real-time recommendation system that uses various tree models to predict user preferences with a fast runtime. The proposed system predicts preferences based on two balance constants and one similarity threshold to recommend content with a high accuracy while balancing generalized and personalized preferences. The results of comparative experiments and ablation studies confirm that the proposed system can accurately recommend content to users. Specifically, we confirmed that the accuracy and novelty of the recommended content were, respectively, improved by 12.1% and 27.2% compared to existing systems. Furthermore, we verified that the proposed system satisfies real-time requirements and mitigates both cold-start and overfitting problems.

5.
Appl Math Model ; 72: 537-552, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31379403

RESUMO

Efficient farm management can be aided by the identification of zones in the landscape. These zones can be informed from different measured variables by ensuring a sense of spatial coherence. Forming spatially coherent zones is an established method in the literature, but has been found to perform poorly when data are sparse. In this paper, we describe the different types of data sparsity and investigate how this impacts the performance of established methods. We introduce a set of methodological advances that address these shortcomings to provide a method for forming spatially coherent zones under data sparsity.

6.
Sensors (Basel) ; 18(5)2018 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-29757995

RESUMO

With the rapid development of cyber-physical systems (CPS), building cyber-physical systems with high quality of service (QoS) has become an urgent requirement in both academia and industry. During the procedure of building Cyber-physical systems, it has been found that a large number of functionally equivalent services exist, so it becomes an urgent task to recommend suitable services from the large number of services available in CPS. However, since it is time-consuming, and even impractical, for a single user to invoke all of the services in CPS to experience their QoS, a robust QoS prediction method is needed to predict unknown QoS values. A commonly used method in QoS prediction is collaborative filtering, however, it is hard to deal with the data sparsity and cold start problem, and meanwhile most of the existing methods ignore the data credibility issue. Thence, in order to solve both of these challenging problems, in this paper, we design a framework of QoS prediction for CPS services, and propose a personalized QoS prediction approach based on reputation and location-aware collaborative filtering. Our approach first calculates the reputation of users by using the Dirichlet probability distribution, so as to identify untrusted users and process their unreliable data, and then it digs out the geographic neighborhood in three levels to improve the similarity calculation of users and services. Finally, the data from geographical neighbors of users and services are fused to predict the unknown QoS values. The experiments using real datasets show that our proposed approach outperforms other existing methods in terms of accuracy, efficiency, and robustness.

7.
Sci Rep ; 14(1): 7816, 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38570539

RESUMO

Given the challenges of inter-domain information fusion and data sparsity in collaborative filtering algorithms, this paper proposes a cross-domain information fusion matrix decomposition algorithm to enhance the accuracy of personalized recommendations in artificial intelligence recommendation systems. The study begins by collecting Douban movie rating data and social network information. To ensure data integrity, Levenshtein distance detection is employed to remove duplicate scores, while natural language processing technology is utilized to extract keywords and topic information from social texts. Additionally, graph convolutional networks are utilized to convert user relationships into feature vectors, and a unique thermal coding method is used to convert discrete user and movie information into binary matrices. To prevent overfitting, the Ridge regularization method is introduced to gradually optimize potential feature vectors. Weighted average and feature connection techniques are then applied to integrate features from different fields. Moreover, the paper combines the item-based collaborative filtering algorithm with merged user characteristics to generate personalized recommendation lists.In the experimental stage, the paper conducts cross-domain information fusion optimization on four mainstream mathematical matrix decomposition algorithms: alternating least squares method, non-negative matrix decomposition, singular value decomposition, and latent factor model (LFM). It compares these algorithms with the non-fused approach. The results indicate a significant improvement in score accuracy, with mean absolute error and root mean squared error reduced by 12.8% and 13.2% respectively across the four algorithms. Additionally, when k = 10, the average F1 score reaches 0.97, and the ranking accuracy coverage of the LFM algorithm increases by 54.2%. Overall, the mathematical matrix decomposition algorithm combined with cross-domain information fusion demonstrates clear advantages in accuracy, prediction performance, recommendation diversity, and ranking quality, and improves the accuracy and diversity of the recommendation system. By effectively addressing collaborative filtering challenges through the integration of diverse techniques, it significantly surpasses traditional models in recommendation accuracy and variety.

8.
JMIR Med Inform ; 12: e50209, 2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38896468

RESUMO

BACKGROUND: Diagnostic errors pose significant health risks and contribute to patient mortality. With the growing accessibility of electronic health records, machine learning models offer a promising avenue for enhancing diagnosis quality. Current research has primarily focused on a limited set of diseases with ample training data, neglecting diagnostic scenarios with limited data availability. OBJECTIVE: This study aims to develop an information retrieval (IR)-based framework that accommodates data sparsity to facilitate broader diagnostic decision support. METHODS: We introduced an IR-based diagnostic decision support framework called CliniqIR. It uses clinical text records, the Unified Medical Language System Metathesaurus, and 33 million PubMed abstracts to classify a broad spectrum of diagnoses independent of training data availability. CliniqIR is designed to be compatible with any IR framework. Therefore, we implemented it using both dense and sparse retrieval approaches. We compared CliniqIR's performance to that of pretrained clinical transformer models such as Clinical Bidirectional Encoder Representations from Transformers (ClinicalBERT) in supervised and zero-shot settings. Subsequently, we combined the strength of supervised fine-tuned ClinicalBERT and CliniqIR to build an ensemble framework that delivers state-of-the-art diagnostic predictions. RESULTS: On a complex diagnosis data set (DC3) without any training data, CliniqIR models returned the correct diagnosis within their top 3 predictions. On the Medical Information Mart for Intensive Care III data set, CliniqIR models surpassed ClinicalBERT in predicting diagnoses with <5 training samples by an average difference in mean reciprocal rank of 0.10. In a zero-shot setting where models received no disease-specific training, CliniqIR still outperformed the pretrained transformer models with a greater mean reciprocal rank of at least 0.10. Furthermore, in most conditions, our ensemble framework surpassed the performance of its individual components, demonstrating its enhanced ability to make precise diagnostic predictions. CONCLUSIONS: Our experiments highlight the importance of IR in leveraging unstructured knowledge resources to identify infrequently encountered diagnoses. In addition, our ensemble framework benefits from combining the complementary strengths of the supervised and retrieval-based models to diagnose a broad spectrum of diseases.

9.
Front Artif Intell ; 6: 1167735, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37293239

RESUMO

The current recommendation system predominantly relies on evidential factors such as behavioral outcomes and purchasing history. However, limited research has been conducted to explore the use of psychological data in these algorithms, such as consumers' self-perceived identities. Based on the gap identified and the soaring significance of levering the non-purchasing data, this study presents a methodology to quantify consumers' self-identities to help examine the relationship between these psychological cues and decision-making in an e-commerce context, focusing on the projective self, which has been overlooked in previous research. This research is expected to contribute to a better understanding of the cause of inconsistency in similar studies and provide a basis for further exploration of the impact of self-concepts on consumer behavior. The coding method in grounded theory, in conjunction with the synthesis of literature analysis, was employed to generate the final approach and solution in this study as they provide a robust and rigorous basis for the findings and recommendations presented in this study.

10.
Big Data ; 9(3): 203-218, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33739861

RESUMO

The Recommendation system relies on feedback and personal information collected from users for effective recommendation. The success of a recommendation system is highly dependent on storing and managing sensitive customer information. Users refrain from using the application if there is a threat to user privacy. Several works that were performed to protect user privacy have paid little attention to utility. Hence, there is a need for a robust recommendation system with high accuracy and privacy. Model-based approaches are more prevalent and commonly used in recommendation. The proposed work improvises the existing private model-based collaborative filtering algorithm with high privacy and utility. We identified that data sparsity is the primary reason for most of the threats in a recommender framework through an extensive literature survey. Hence, our approach combines the l injection for imputing the missing ratings, which are deemed low, with differential privacy. We additionally introduce a random differential privacy approach to alternating least square (ALS) for improved utility. Experimental results on benchmarked datasets confirm that the performance of our private noisy Random ALS algorithm outperforms the non-noisy ALS for all datasets.


Assuntos
Algoritmos , Privacidade
11.
Mol Inform ; 39(10): e2000086, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32558335

RESUMO

In the present report we evaluate the possible utility of the Generative Adversarial Networks (GANs) in mapping the chemical structural space for molecular property profiles, with the goal of subsequently yielding synthetic (artificial) samples for ligand-based molecular modeling. Two case studies are considered: BACE-1 (ß-Secretase 1) and DENV (Dengue Virus) inhibitory activities, with the former focused on data populating and the latter on data balancing tasks. We train GANs using subsamples extracted from datasets for each bioactivity endpoint, and apply the trained networks in generating synthetic examples from the respective bioactivity chemical spaces. Original and synthetic samples are pooled together and employed to build BACE-1 and DENV inhibitory activity classifiers and their performance evaluated over tenfold external validation sets. In both case studies, the obtained classifiers demonstrate satisfactory predictivity with the former yielding accuracy (ACC) and Mathew's correlation coefficient (MCC) values of 0.80 and 0.59, while the latter produces balanced accuracy(BACC) and MCC values of 0.81 and 0.70, respectively. Moreover, the statistics of these classifiers are compared with those of other models in the literature demonstrating comparable to better performance. These results suggest that GANs may be useful in mapping the chemical space for molecular property profiles of interest, and thus allow for the extraction of synthetic examples for computational modeling.


Assuntos
Secretases da Proteína Precursora do Amiloide/química , Ácido Aspártico Endopeptidases/química , Biologia Computacional/métodos , Vírus da Dengue/efeitos dos fármacos , Bibliotecas de Moléculas Pequenas/farmacologia , Secretases da Proteína Precursora do Amiloide/antagonistas & inibidores , Antivirais/química , Antivirais/farmacologia , Ácido Aspártico Endopeptidases/antagonistas & inibidores , Simulação por Computador , Avaliação Pré-Clínica de Medicamentos , Inibidores Enzimáticos/química , Inibidores Enzimáticos/farmacologia , Humanos , Modelos Moleculares , Redes Neurais de Computação , Bibliotecas de Moléculas Pequenas/química , Máquina de Vetores de Suporte
12.
J R Soc Interface ; 16(157): 20190141, 2019 08 30.
Artigo em Inglês | MEDLINE | ID: mdl-31455165

RESUMO

Cutaneous leishmaniasis (CL) is a neglected tropical disease transmitted by species of Phlebotominae sand flies. CL is responsible for more than 1000 reported cases per year in Ecuador. Vector collection studies in Ecuador suggest that there is a strong association between the ecological diversity of an ecosystem, the presence of potential alternative or reservoir hosts and the abundance of sand fly species. Data collected from a coastal community in Ecuador showed that Leishmania parasites may be circulating in diverse hosts, including mammalian and potentially avian species, and these hosts may serve as potential hosts for the parasite. There has been limited reporting of CL cases in Ecuador because the disease is non-fatal and its surveillance system is passive. Hence, the actual incidence of CL is unknown. In this study, an epidemic model was developed and analysed to understand the complexity of CL transmission dynamics with potential non-human hosts in the coastal ecosystem and to estimate critical epidemiological quantities for Ecuador. The model is fitted to the 2010 CL outbreak in the town of Valle Hermoso in the Santo Domingo de los Tsachilas province of Ecuador and parameters such as CL transmission rates in different types of hosts (primary and alternative), and levels of case reporting in the town are estimated. The results suggest that the current surveillance in this region fails to capture 38% (with 95% CI (29%, 47%)) of the actual number of cases under the assumption that alternative hosts are dead-end hosts and that the mean CL reproduction number in the town is 3.9. This means that on the average 3.9 new human CL cases were generated by a single infectious human in the town during the initial period of the 2010 outbreak. Moreover, major outbreaks of CL in Ecuador in coastal settings are unavoidable until reporting through the surveillance system is improved and alternative hosts are managed properly. The estimated infection transmission probabilities from alternative hosts to sand flies, and sand flies to alternative hosts are 27% and 32%, respectively. The analysis highlights that vector control and alternative host management are two effective programmes for Ecuador but need to be implemented concurrently to avoid future major outbreaks.


Assuntos
Ecossistema , Insetos Vetores/fisiologia , Leishmaniose Cutânea/epidemiologia , Modelos Biológicos , Psychodidae/fisiologia , Animais , Aves/parasitologia , Equador/epidemiologia , Humanos , Leishmania/isolamento & purificação , Psychodidae/parasitologia , Zoonoses
13.
Int J Med Inform ; 126: 147-155, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31029256

RESUMO

INTRODUCTION: The clinical course of chronic obstructive pulmonary disease (COPD) is marked by acute exacerbation events that increase hospitalization rates and healthcare spending. The early identification of future high-cost patients with COPD may decrease healthcare spending by informing individualized interventions that prevent exacerbation events and decelerate disease progression. Existing studies of cost prediction of other chronic diseases have applied regression and machine-learning methods that cannot capture the complex causal relationships between COPD factors. Thus, the exploration of these factors through nonlinear, high-dimensional but explainable modeling is greatly needed. OBJECTIVES: We aimed to develop a machine-learning model to identify future high-cost patients with COPD. Such a model should incorporate expert knowledge about causal relationships, and the method for estimating the model could provide more accurate predictions than other machine learning methods. METHODS: We used the 2011-2013 medical insurance data of patients with COPD in a large city. The data set included demographic information and admission records. Leveraging on developments in graphical modeling methods, we proposed a smooth Bayesian network (SBN) model for the prediction of high-cost individuals using medical insurance data. The modeling method incorporated some expert knowledge about causal relationships (i.e., about the Bayesian network structure). We employed a smoothing kernel based on the weighted nearest neighborhood method in the SBN model to address overfitting, case-mix effect, and data sparsity (i.e., using data about "similar patients"). RESULTS: The proposed SBN achieved the area under curve (AUC) of 0.80 and showed considerable improvement over the baseline machine-learning methods. Besides confirming the known factors from the literature, we found "region" (i.e., a suburban or urban area) to be a significant factor, and that in a 3-tier system with primary, secondary and tertiary hospitals, COPD patients who had been admitted to primary hospitals were more likely to develop into future high-cost patients than patients who had been admitted to tertiary hospitals. CONCLUSION: The proposed SBN model not only obtained higher prediction accuracy and stronger generalizability than a number of benchmark machine-learning methods, but also used the Bayesian network to capture the complex causal relationships between different predictors by incorporating expert knowledge. Furthermore, a framework was developed to establish the relationships between exposure to historical trajectory and future outcome, which can also be applied to other temporal data to model different trajectory information and predict other outcomes.


Assuntos
Teorema de Bayes , Efeitos Psicossociais da Doença , Aprendizado de Máquina , Doença Pulmonar Obstrutiva Crônica/economia , Doença Pulmonar Obstrutiva Crônica/terapia , Idoso , Progressão da Doença , Feminino , Hospitalização , Humanos , Masculino , Pessoa de Meia-Idade , Doença Pulmonar Obstrutiva Crônica/patologia , Medição de Risco , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA