Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Bioinformatics ; 40(3)2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38449285

RESUMO

MOTIVATION: Drug-target interaction (DTI) prediction aims to identify interactions between drugs and protein targets. Deep learning can automatically learn discriminative features from drug and protein target representations for DTI prediction, but challenges remain, making it an open question. Existing approaches encode drugs and targets into features using deep learning models, but they often lack explanations for underlying interactions. Moreover, limited labeled DTIs in the chemical space can hinder model generalization. RESULTS: We propose an interpretable nested graph neural network for DTI prediction (iNGNN-DTI) using pre-trained molecule and protein models. The analysis is conducted on graph data representing drugs and targets by using a specific type of nested graph neural network, in which the target graphs are created based on 3D structures using Alphafold2. This architecture is highly expressive in capturing substructures of the graph data. We use a cross-attention module to capture interaction information between the substructures of drugs and targets. To improve feature representations, we integrate features learned by models that are pre-trained on large unlabeled small molecule and protein datasets, respectively. We evaluate our model on three benchmark datasets, and it shows a consistent improvement on all baseline models in all datasets. We also run an experiment with previously unseen drugs or targets in the test set, and our model outperforms all of the baselines. Furthermore, the iNGNN-DTI can provide more insights into the interaction by visualizing the weights learned by the cross-attention module. AVAILABILITY AND IMPLEMENTATION: The source code of the algorithm is available at https://github.com/syan1992/iNGNN-DTI.


Assuntos
Algoritmos , Redes Neurais de Computação , Interações Medicamentosas , Benchmarking , Sistemas de Liberação de Medicamentos
2.
J Biomed Inform ; 152: 104629, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38552994

RESUMO

BACKGROUND: In health research, multimodal omics data analysis is widely used to address important clinical and biological questions. Traditional statistical methods rely on the strong assumptions of distribution. Statistical methods such as testing and differential expression are commonly used in omics analysis. Deep learning, on the other hand, is an advanced computer science technique that is powerful in mining high-dimensional omics data for prediction tasks. Recently, integrative frameworks or methods have been developed for omics studies that combine statistical models and deep learning algorithms. METHODS AND RESULTS: The aim of these integrative frameworks is to combine the strengths of both statistical methods and deep learning algorithms to improve prediction accuracy while also providing interpretability and explainability. This review report discusses the current state-of-the-art integrative frameworks, their limitations, and potential future directions in survival and time-to-event longitudinal analysis, dimension reduction and clustering, regression and classification, feature selection, and causal and transfer learning.


Assuntos
Aprendizado Profundo , Genômica , Genômica/métodos , Biologia Computacional/métodos , Algoritmos , Modelos Estatísticos
3.
BMC Med Res Methodol ; 22(1): 165, 2022 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-35676621

RESUMO

BACKGROUND: Network analysis, a technique for describing relationships, can provide insights into patterns of co-occurring chronic health conditions. The effect that co-occurrence measurement has on disease network structure and resulting inferences has not been well studied. The purpose of the study was to compare structural differences among multimorbidity networks constructed using different co-occurrence measures. METHODS: A retrospective cohort study was conducted using four fiscal years of administrative health data (2015/16 - 2018/19) from the province of Manitoba, Canada (population 1.5 million). Chronic conditions were identified using diagnosis codes from electronic records of physician visits, surgeries, and inpatient hospitalizations, and grouped into categories using the Johns Hopkins Adjusted Clinical Group (ACG) System. Pairwise disease networks were separately constructed using each of seven co-occurrence measures: lift, relative risk, phi, Jaccard, cosine, Kulczynski, and joint prevalence. Centrality analysis was limited to the top 20 central nodes, with degree centrality used to identify potentially influential chronic conditions. Community detection was used to identify disease clusters. Similarities in community structure between networks was measured using the adjusted Rand index (ARI). Network edges were described using disease prevalence categorized as low (< 1%), moderate (1 to < 7%), and high (≥7%). Network complexity was measured using network density and frequencies of nodes and edges. RESULTS: Relative risk and lift highlighted co-occurrences between pairs of low prevalence health conditions. Kulczynski emphasized relationships between high and low prevalence conditions. Joint prevalence focused on highly-prevalent conditions. Phi, Jaccard, and cosine emphasized associations involving moderately prevalent conditions. Co-occurrence measurement differences significantly affected the number and structure of identified disease clusters. When limiting the number of edges to produce visually interpretable graphs, networks had significant dissimilarity in the percentage of co-occurrence relationships in common, and in their selection of the highest-degree nodes. CONCLUSIONS: Multimorbidity network analyses are sensitive to disease co-occurrence measurement. Co-occurrence measures should be selected considering their intrinsic properties, research objectives, and the health condition prevalence relationships of greatest interest. Researchers should consider conducting sensitivity analyses using different co-occurrence measures.


Assuntos
Multimorbidade , Canadá/epidemiologia , Doença Crônica , Humanos , Prevalência , Estudos Retrospectivos
4.
BMC Genomics ; 21(Suppl 2): 252, 2020 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-32299351

RESUMO

BACKGROUND: In bacterial genomes, rRNA and tRNA genes are often organized into operons, i.e. segments of closely located genes that share a single promoter and are transcribed as a single unit. Analyzing how these genes and operons evolve can help us understand what are the most common evolutionary events affecting them and give us a better picture of ancestral codon usage and protein synthesis. RESULTS: We introduce BOPAL, a new approach for the inference of evolutionary histories of rRNA and tRNA genes in bacteria, which is based on the identification of orthologous operons. Since operons can move around in the genome but are rarely transformed (e.g. rarely broken into different parts), this approach allows for a better inference of orthologous genes in genomes that have been affected by many rearrangements, which in turn helps with the inference of more realistic evolutionary scenarios and ancestors. CONCLUSIONS: From our comparisons of BOPAL with other gene order alignment programs using simulated data, we have found that BOPAL infers evolutionary events and ancestral gene orders more accurately than other methods based on alignments. An analysis of 12 Bacillus genomes also showed that BOPAL performs just as well as other programs at building ancestral histories in a minimal amount of events.


Assuntos
Bactérias/genética , Genômica/métodos , Óperon/genética , RNA Ribossômico/genética , RNA de Transferência/genética , Algoritmos , Bacillus/genética , Bases de Dados Genéticas , Evolução Molecular , Duplicação Gênica , Ordem dos Genes , Genoma Bacteriano , Modelos Genéticos , Filogenia , Projetos de Pesquisa , Deleção de Sequência
5.
Sensors (Basel) ; 19(6)2019 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-30889840

RESUMO

In recent years, artificial intelligence (AI) and its subarea of deep learning have drawn the attention of many researchers. At the same time, advances in technologies enable the generation or collection of large amounts of valuable data (e.g., sensor data) from various sources in different applications, such as those for the Internet of Things (IoT), which in turn aims towards the development of smart cities. With the availability of sensor data from various sources, sensor information fusion is in demand for effective integration of big data. In this article, we present an AI-based sensor-information fusion system for supporting deep supervised learning of transportation data generated and collected from various types of sensors, including remote sensed imagery for the geographic information system (GIS), accelerometers, as well as sensors for the global navigation satellite system (GNSS) and global positioning system (GPS). The discovered knowledge and information returned from our system provides analysts with a clearer understanding of trajectories or mobility of citizens, which in turn helps to develop better transportation models to achieve the ultimate goal of smarter cities. Evaluation results show the effectiveness and practicality of our AI-based sensor information fusion system for supporting deep supervised learning of big transportation data.

6.
Bioinform Adv ; 3(1): vbad059, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37228387

RESUMO

Motivation: Human microbiome is complex and highly dynamic in nature. Dynamic patterns of the microbiome can capture more information than single point inference as it contains the temporal changes information. However, dynamic information of the human microbiome can be hard to be captured due to the complexity of obtaining the longitudinal data with a large volume of missing data that in conjunction with heterogeneity may provide a challenge for the data analysis. Results: We propose using an efficient hybrid deep learning architecture convolutional neural network-long short-term memory, which combines with self-knowledge distillation to create highly accurate models to analyze the longitudinal microbiome profiles to predict disease outcomes. Using our proposed models, we analyzed the datasets from Predicting Response to Standardized Pediatric Colitis Therapy (PROTECT) study and DIABIMMUNE study. We showed the significant improvement in the area under the receiver operating characteristic curve scores, achieving 0.889 and 0.798 on PROTECT study and DIABIMMUNE study, respectively, compared with state-of-the-art temporal deep learning models. Our findings provide an effective artificial intelligence-based tool to predict disease outcomes using longitudinal microbiome profiles from collected patients. Availability and implementation: The data and source code can be accessed at https://github.com/darylfung96/UC-disease-TL.

7.
Int J Popul Data Sci ; 7(1): 1757, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37670734

RESUMO

Introduction: Unstructured text data (UTD) are increasingly found in many databases that were never intended to be used for research, including electronic medical record (EMR) databases. Data quality can impact the usefulness of UTD for research. UTD are typically prepared for analysis (i.e., preprocessed) and analyzed using natural language processing (NLP) techniques. Different NLP methods are used to preprocess UTD and may affect data quality. Objective: Our objective was to systematically document current research and practices about NLP preprocessing methods to describe or improve the quality of UTD, including UTD found in EMR databases. Methods: A scoping review was undertaken of peer-reviewed studies published between December 2002 and January 2021. Scopus, Web of Science, ProQuest, and EBSCOhost were searched for literature relevant to the study objective. Information extracted from the studies included article characteristics (i.e., year of publication, journal discipline), data characteristics, types of preprocessing methods, and data quality topics. Study data were presented using a narrative synthesis. Results: A total of 41 articles were included in the scoping review; over 50% were published between 2016 and 2021. Almost 20% of the articles were published in health science journals. Common preprocessing methods included removal of extraneous text elements such as stop words, punctuation, and numbers, word tokenization, and parts of speech tagging. Data quality topics for articles about EMR data included misspelled words, security (i.e., de-identification), word variability, sources of noise, quality of annotations, and ambiguity of abbreviations. Conclusions: Multiple NLP techniques have been proposed to preprocess UTD, with some differences in techniques applied to EMR data. There are similarities in the data quality dimensions used to characterize structured data and UTD. While a few general-purpose measures of data quality that do not require external data; most of these focus on the measurement of noise.


Assuntos
Confiabilidade dos Dados , Registros Eletrônicos de Saúde , Bases de Dados Factuais , Narração , Processamento de Linguagem Natural
8.
Procedia Comput Sci ; 176: 3009-3018, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33042316

RESUMO

In the current era of big data, huge quantities of valuable data, which may be of different levels of veracity, are being generated at a rapid rate. Embedded into these big data are implicit, previously unknown and potentially useful information and valuable knowledge that can be discovered by data science solutions, which apply techniques like data mining. There has been a trend that more and more collections of these big data have been made openly available in science, government and non-profit organizations so that people could collaboratively study and analysis these open big data. In this article, we focus on open big data for public transit because public transit (e.g., bus) as a means of transportation is a vital part of many people's lives. As time is a precious resource, bus delays could negatively affect commuters' plans. Unfortunately, they are inevitable. Hence, many existing works focused on predicting bus delays. However, predicting on-time or early buses is also important. For instance, commuters who come to a bus stop on time may still miss their buses if the buses leave early. So, in this article, we examine open big data about bus performance (e.g., early, on-time, and late stops). We analyze the data with frequent pattern mining and make predictions with decision-tree based classification. For illustration, we perform predictive analytics on real-life open big data available on Winnipeg Open Data Portal, about bus performance from Winnipeg Transit. It shows the benefits of predictive analytics on open big data for supporting smart transportation services.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA