Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36736370

RESUMEN

As the number of protein sequences increases in biological databases, computational methods are required to provide accurate functional annotation with high coverage. Although several machine learning methods have been proposed for this purpose, there are still two main issues: (i) construction of reliable positive and negative training and validation datasets, and (ii) fair evaluation of their performances based on predefined experimental settings. To address these issues, we have developed ProFAB: Open Protein Functional Annotation Benchmark, which is a platform providing an infrastructure for a fair comparison of protein function prediction methods. ProFAB provides filtered and preprocessed protein annotation datasets and enables the training and evaluation of function prediction methods via several options. We believe that ProFAB will be useful for both computational and experimental researchers by enabling the utilization of ready-to-use datasets and machine learning algorithms for protein function prediction based on Gene Ontology terms and Enzyme Commission numbers. ProFAB is available at https://github.com/kansil/ProFAB and https://profab.kansil.org.


Asunto(s)
Benchmarking , Programas Informáticos , Anotación de Secuencia Molecular , Algoritmos , Proteínas/metabolismo , Biología Computacional/métodos
2.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-36007229

RESUMEN

Statistical and machine learning techniques based on relative abundances have been used to predict health conditions and to identify microbial biomarkers. However, high dimensionality, sparsity and the compositional nature of microbiome data represent statistical challenges. On the other hand, the taxon grouping allows summarizing microbiome abundance with a coarser resolution in a lower dimension, but it presents new challenges when correlating taxa with a disease. In this work, we present a novel approach that groups Operational Taxonomical Units (OTUs) based only on relative abundances as an alternative to taxon grouping. The proposed procedure acknowledges the compositional data making use of principal balances. The identified groups are called Principal Microbial Groups (PMGs). The procedure reduces the need for user-defined aggregation of $\textrm{OTU}$s and offers the possibility of working with coarse group of $\textrm{OTU}$s, which are not present in a phylogenetic tree. PMGs can be used for two different goals: (1) as a dimensionality reduction method for compositional data, (2) as an aggregation procedure that provides an alternative to taxon grouping for construction of microbial balances afterward used for disease prediction. We illustrate the procedure with a cirrhosis study data. PMGs provide a coherent data analysis for the search of biomarkers in human microbiota. The source code and demo data for PMGs are available at: https://github.com/asliboyraz/PMGs.


Asunto(s)
Microbiota , Análisis de Datos , Humanos , Microbiota/genética , Filogenia
3.
Bioinformatics ; 39(39 Suppl 1): i103-i110, 2023 06 30.
Artículo en Inglés | MEDLINE | ID: mdl-37387156

RESUMEN

MOTIVATION: Utilizing AI-driven approaches for drug-target interaction (DTI) prediction require large volumes of training data which are not available for the majority of target proteins. In this study, we investigate the use of deep transfer learning for the prediction of interactions between drug candidate compounds and understudied target proteins with scarce training data. The idea here is to first train a deep neural network classifier with a generalized source training dataset of large size and then to reuse this pre-trained neural network as an initial configuration for re-training/fine-tuning purposes with a small-sized specialized target training dataset. To explore this idea, we selected six protein families that have critical importance in biomedicine: kinases, G-protein-coupled receptors (GPCRs), ion channels, nuclear receptors, proteases, and transporters. In two independent experiments, the protein families of transporters and nuclear receptors were individually set as the target datasets, while the remaining five families were used as the source datasets. Several size-based target family training datasets were formed in a controlled manner to assess the benefit provided by the transfer learning approach. RESULTS: Here, we present a systematic evaluation of our approach by pre-training a feed-forward neural network with source training datasets and applying different modes of transfer learning from the pre-trained source network to a target dataset. The performance of deep transfer learning is evaluated and compared with that of training the same deep neural network from scratch. We found that when the training dataset contains fewer than 100 compounds, transfer learning outperforms the conventional strategy of training the system from scratch, suggesting that transfer learning is advantageous for predicting binders to under-studied targets. AVAILABILITY AND IMPLEMENTATION: The source code and datasets are available at https://github.com/cansyl/TransferLearning4DTI. Our web-based service containing the ready-to-use pre-trained models is accessible at https://tl4dti.kansil.org.


Asunto(s)
Redes Neurales de la Computación , Péptido Hidrolasas , Programas Informáticos , Aprendizaje Automático
4.
Int J Mol Sci ; 23(21)2022 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-36362402

RESUMEN

Lamina-associated polypeptide 1 (LAP1) is a ubiquitously expressed inner nuclear membrane protein encoded by TOR1AIP1, and presents as two isoforms in humans, LAP1B and LAP1C. While loss of both isoforms results in a multisystemic progeroid-like syndrome, specific loss of LAP1B causes muscular dystrophy and cardiomyopathy, suggesting that LAP1B has a critical role in striated muscle. To gain more insight into the molecular pathophysiology underlying muscular dystrophy caused by LAP1B, we established a patient-derived fibroblast line that was transdifferentiated into myogenic cells using inducible MyoD expression. Compared to the controls, we observed strongly reduced myogenic differentiation and fusion potentials. Similar defects were observed in the C2C12 murine myoblasts carrying loss-of-function LAP1A/B mutations. Using RNA sequencing, we found that, despite MyoD overexpression and efficient cell cycle exit, transcriptional reprogramming of the LAP1B-deficient cells into the myogenic lineage is impaired with delayed activation of MYOG and muscle-specific genes. Gene set enrichment analyses suggested dysregulations of protein metabolism, extracellular matrix, and chromosome organization. Finally, we found that the LAP1B-deficient cells exhibit nuclear deformations, such as an increased number of micronuclei and altered morphometric parameters. This study uncovers the phenotypic and transcriptomic changes occurring during myoconversion of patient-derived LAP1B-deficient fibroblasts and provides a useful resource to gain insights into the mechanisms implicated in LAP1B-associated nuclear envelopathies.


Asunto(s)
Distrofias Musculares , Membrana Nuclear , Animales , Humanos , Ratones , Diferenciación Celular/genética , Fibroblastos/metabolismo , Proteínas de la Membrana/genética , Proteínas de la Membrana/metabolismo , Desarrollo de Músculos/genética , Distrofias Musculares/metabolismo , Proteína MioD/genética , Proteína MioD/metabolismo , Membrana Nuclear/metabolismo , Isoformas de Proteínas/metabolismo
5.
Bioinformatics ; 36(14): 4227-4230, 2020 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-32407491

RESUMEN

SUMMARY: iBioProVis is an interactive tool for visual analysis of the compound bioactivity space in the context of target proteins, drugs and drug candidate compounds. iBioProVis tool takes target protein identifiers and, optionally, compound SMILES as input, and uses the state-of-the-art non-linear dimensionality reduction method t-Distributed Stochastic Neighbor Embedding (t-SNE) to plot the distribution of compounds embedded in a 2D map, based on the similarity of structural properties of compounds and in the context of compounds' cognate targets. Similar compounds, which are embedded to proximate points on the 2D map, may bind the same or similar target proteins. Thus, iBioProVis can be used to easily observe the structural distribution of one or two target proteins' known ligands on the 2D compound space, and to infer new binders to the same protein, or to infer new potential target(s) for a compound of interest, based on this distribution. Principal component analysis (PCA) projection of the input compounds is also provided, Hence the user can interactively observe the same compound or a group of selected compounds which is projected by both PCA and embedded by t-SNE. iBioProVis also provides detailed information about drugs and drug candidate compounds through cross-references to widely used and well-known databases, in the form of linked table views. Two use-case studies were demonstrated, one being on angiotensin-converting enzyme 2 (ACE2) protein which is Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Spike protein receptor. ACE2 binding compounds and seven antiviral drugs were closely embedded in which two of them have been under clinical trial for Coronavirus disease 19 (COVID-19). AVAILABILITY AND IMPLEMENTATION: iBioProVis and its carefully filtered dataset are available at https://ibpv.kansil.org/ for public use. CONTACT: vatalay@metu.edu.tr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Modelos Moleculares , Peptidil-Dipeptidasa A/química , Programas Informáticos , Glicoproteína de la Espiga del Coronavirus/química , Enzima Convertidora de Angiotensina 2 , Inhibidores de la Enzima Convertidora de Angiotensina/química , Antivirales/química , Betacoronavirus , COVID-19 , Infecciones por Coronavirus , Humanos , Internet , Pandemias , Neumonía Viral , Análisis de Componente Principal , Receptores Adrenérgicos beta 2/química , Receptores Adrenérgicos beta 3/química , SARS-CoV-2 , Interfaz Usuario-Computador
6.
Turk J Med Sci ; 51(1): 16-27, 2021 02 26.
Artículo en Inglés | MEDLINE | ID: mdl-32530587

RESUMEN

Background/aim: The COVID-19 pandemic originated in Wuhan, China, in December 2019 and became one of the worst global health crises ever. While struggling with the unknown nature of this novel coronavirus, many researchers and groups attempted to project the progress of the pandemic using empirical or mechanistic models, each one having its drawbacks. The first confirmed cases were announced early in March, and since then, serious containment measures have taken place in Turkey. Materials and methods: Here, we present a different approach, a Bayesian negative binomial multilevel model with mixed effects, for the projection of the COVID-19 pandemic and we apply this model to the Turkish case. The model source code is available at https:// github.com/kansil/covid-19. We predicted the confirmed daily cases and cumulative numbers from June 6th to June 26th with 80%, 95%, and 99% prediction intervals (PI). Results: Our projections showed that if we continued to comply with the measures and no drastic changes were seen in diagnosis or management protocols, the epidemic curve would tend to decrease in this time interval. Also, the predictive validity analysis suggests that the proposed model projections should have a PI around 95% for the first 12 days of the projections. Conclusion: We expect that drastic changes in the course of COVID-19 in Turkey will cause the model to suffer in predictive validity, and this can be used to monitor the epidemic. We hope that the discussion on these projections and the limitations of the epidemiological forecasting will be beneficial to the medical community, and policy makers.


Asunto(s)
COVID-19/epidemiología , Pandemias/estadística & datos numéricos , Teorema de Bayes , Métodos Epidemiológicos , Predicción , Humanos , Modelos Estadísticos , Probabilidad , Turquía/epidemiología
7.
J Digit Imaging ; 33(3): 763-775, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-31974686

RESUMEN

Malaria is a serious public health problem in many parts of the world. Early diagnosis and prompt effective treatment are required to avoid anemia, organ failure, and malaria-associated deaths. Microscopic analysis of blood samples is the preferred method for diagnosis. However, manual microscopic examination is very laborious and requires skilled health personnel of which there is a critical shortage in the developing world such as in sub-Saharan Africa. Critical shortages of trained health personnel and the inability to cope with the workload to examine malaria slides are among the main limitations of malaria microscopy especially in low-resource and high disease burden areas. We present a low-cost alternative and complementary solution for rapid malaria screening for low resource settings to potentially reduce the dependence on manual microscopic examination. We develop an image processing pipeline using a modified YOLOv3 detection algorithm to run in real time on low-cost devices. We test the performance of our solution on two datasets. In the dataset collected using a microscope camera, our model achieved 99.07% accuracy and 97.46% accuracy on the dataset collected using a mobile phone camera. While the mean average precision of our model is on par with human experts at an object level, we are several orders of magnitude faster than human experts as we can detect parasites in images as well as videos in real time.


Asunto(s)
Malaria , Parásitos , Algoritmos , Animales , Humanos , Procesamiento de Imagen Asistido por Computador , Malaria/diagnóstico , Microscopía
9.
Comput Biol Med ; 169: 107810, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38134749

RESUMEN

Non-silent single nucleotide genetic variants, like nonsense changes and insertion-deletion variants, that affect protein function and length substantially are prevalent and are frequently misclassified. The low sensitivity and specificity of existing variant effect predictors for nonsense and indel variations restrict their use in clinical applications. We propose the Pathogenic Mutation Prediction (PMPred) method to predict the pathogenicity of single nucleotide variations, which impair protein function by prematurely terminating a protein's elongation during its synthesis. The prediction starts by monitoring functional effects (Gene Ontology annotation changes) of the change in sequence, using an existing ensemble machine learning model (UniGOPred). This, in turn, reveals the mutations that significantly deviate functionally from the wild-type sequence. We have identified novel harmful mutations in patient data and present them as motivating case studies. We also show that our method has increased sensitivity and specificity compared to state-of-the-art, especially in single nucleotide variations that produce large functional changes in the final protein. As further validation, we have done a comparative docking study on such a variation that is misclassified by existing methods and, using the altered binding affinities, show how PMPred can correctly predict the pathogenicity when other tools miss it. PMPred is freely accessible as a web service at https://pmpred.kansil.org/, and the related code is available at https://github.com/kansil/PMPred.


Asunto(s)
Exoma , Descubrimiento del Conocimiento , Humanos , Secuenciación del Exoma , Mutación , Nucleótidos , Biología Computacional/métodos
10.
Nucleic Acids Res ; 39(Database issue): D170-80, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21177657

RESUMEN

microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , MicroARNs/química , MicroARNs/metabolismo , Animales , Humanos , Ratones , Análisis de Secuencia de ARN , Programas Informáticos , Integración de Sistemas , Interfaz Usuario-Computador
11.
Comput Methods Programs Biomed ; 208: 106256, 2021 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-34242864

RESUMEN

OBJECTIVE: The maximum diameter measurement of an abdominal aortic aneurysm (AAA), which depends on orthogonal and axial cross-sections or maximally inscribed spheres within the AAA, plays a significant role in the clinical decision making process. This study aims to build a total of 21 morphological parameters from longitudinal CT scans and analyze their correlations. Furthermore, this work explores the existence of a "master curve" of AAA growth, and tests which parameters serve to enhance its predictability for clinical use. METHODS: 106 CT scan images from 25 Korean AAA patients were retrospectively obtained. We subsequently computed morphological parameters, growth rates, and pair-wise correlations, and attempted to enhance the predictability of the growth for high-risk aneurysms using non-linear curve fitting and least-square minimization. RESULTS: An exponential AAA growth model was fitted to the maximum spherical diameter, as the best representative of the growth among all parameters (r-square: 0.94) and correctly predicted to 15 of 16 validation scans based on a 95% confidence interval. AAA volume expansion rates were highly correlated (r=0.75) with thrombus accumulation rates. CONCLUSIONS: The exponential growth model using spherical diameter provides useful information about progression of aneurysm size and enables AAA growth rate extrapolation during a given surveillance period.


Asunto(s)
Aneurisma de la Aorta Abdominal , Trombosis , Aneurisma de la Aorta Abdominal/diagnóstico por imagen , Progresión de la Enfermedad , Humanos , Estudios Retrospectivos , Tomografía Computarizada por Rayos X
12.
J Gastrointest Cancer ; 52(4): 1266-1276, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34910274

RESUMEN

PURPOSE: Computational approaches have been used at different stages of drug development with the purpose of decreasing the time and cost of conventional experimental procedures. Lately, techniques mainly developed and applied in the field of artificial intelligence (AI), have been transferred to different application domains such as biomedicine. METHODS: In this study, we conducted an investigative analysis via data-driven evaluation of potential hepatocellular carcinoma (HCC) therapeutics in the context of AI-assisted drug discovery/repurposing. First, we discussed basic concepts, computational approaches, databases, modeling approaches, and featurization techniques in drug discovery/repurposing. In the analysis part, we automatically integrated HCC-related biological entities such as genes/proteins, pathways, phenotypes, drugs/compounds, and other diseases with similar implications, and represented these heterogeneous relationships via a knowledge graph using the CROssBAR system. RESULTS: Following the system-level evaluation and selection of critical genes/proteins and pathways to target, our deep learning-based drug/compound-target protein interaction predictors DEEPScreen and MDeePred have been employed for predicting new bioactive drugs and compounds for these critical targets. Finally, we embedded ligands of selected HCC-associated proteins which had a significant enrichment with the CROssBAR system into a 2-D space to identify and repurpose small molecule inhibitors as potential drug candidates based on their molecular similarities to known HCC drugs. CONCLUSIONS: We expect that these series of data-driven analyses can be used as a roadmap to propose early-stage potential inhibitors (from database-scale sets of compounds) to both HCC and other complex diseases, which may subsequently be analyzed with more targeted in silico and experimental approaches.


Asunto(s)
Antineoplásicos/farmacología , Inteligencia Artificial , Carcinoma Hepatocelular/tratamiento farmacológico , Desarrollo de Medicamentos/métodos , Neoplasias Hepáticas/tratamiento farmacológico , Carcinoma Hepatocelular/patología , Biología Computacional , Humanos , Neoplasias Hepáticas/patología , Terapia Molecular Dirigida
13.
IEEE/ACM Trans Comput Biol Bioinform ; 17(5): 1810-1821, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-30835228

RESUMEN

We motivate and describe the application of Hierarchical Dirichlet Process (HDP) models to the "soft" biclustering of gene expression data, in which we obtain modules (biclusters) where the affiliation of genes and samples with the modules are weighted, instead of being hard memberships. As a distinct contribution, we propose a method which HDP is informed with prior beliefs, significantly increasing the quality of the biclustering in terms of both the correctness of the number of modules inferred, and the precision of these modules, especially when evidence is sparse. We outline two such informed priors; one based on co-expression relationships inherent in the data, the other based on an externally provided regulatory network. We validate these results and compare the performance of our approach to Weighted Gene Correlation Network Analysis (WGCNA), another model that features weighted modules. We have, to this end, performed experiments on semi-synthetic data. The results show that HDP, with the addition of a well-informed prior, is able to capture the correct number of modules with increased accuracy. Furthermore, the model becomes robust to changes in the strength of the prior. We conclude by discussing these results and the benefits provided by our approach for gene expression analysis and network validation.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes/genética , Transcriptoma/genética , Algoritmos , Teorema de Bayes , Análisis por Conglomerados , Humanos , Neoplasias/genética
14.
Comput Biol Med ; 117: 103620, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-32072970

RESUMEN

OBJECTIVE: For small abdominal aortic aneurysms (AAAs), a regular follow-up examination is recommended every 12 months for AAAs of 30-39 mm and every six months for AAAs of 40-55 mm. Follow-up diameters can determine if a patient follows the common growth model of the population. However, the rapid expansion of an AAA, often associated with higher rupture risk, may be overlooked even though it requires surgical intervention. Therefore, the prognosis of abdominal aortic aneurysm growth is clinically important for planning treatment. This study aims to build enhanced Bayesian inference methods to predict maximum aneurysm diameter. METHODS: 106 CT scans from 25 Korean AAA patients were retrospectively obtained. A two-step approach based on Bayesian calibration was used, and an exponential abdominal aortic aneurysm growth model (population-based) was specified according to each individual patient's growth (patient-specific) and morphologic characteristics of the aneurysm sac (enhanced). The distribution estimates were obtained using a Markov Chain Monte Carlo (MCMC) sampler. RESULTS: The follow-up diameters were predicted satisfactorily (i.e. the true follow-up diameter was in the 95% prediction interval) for 79% of the scans using the population-based growth model, and 83% of the scans using the patient-specific growth model. Among the evaluated geometric measurements, centerline tortuosity was a significant (p = 0.0002) predictor of growth for AAAs with accelerated and stable expansion rates. Using the enhanced prediction model, 86% of follow-up scans were predicted satisfactorily. The average prediction errors of population-based, patient-specific, and enhanced models were ±2.67, ±2.61 and ± 2.79 mm, respectively. CONCLUSION: A computational framework using patient-oriented growth models provides useful tools for per-patient basis treatment and enables better prediction of AAA growth.


Asunto(s)
Aneurisma de la Aorta Abdominal , Aneurisma de la Aorta Abdominal/diagnóstico por imagen , Teorema de Bayes , Humanos , Estudios Retrospectivos , Factores de Riesgo , Factores de Tiempo , Tomografía Computarizada por Rayos X
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA