Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 89
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Methods ; 230: 119-128, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39168294

RESUMEN

Promoters, which are short (50-1500 base-pair) in DNA regions, have emerged to play a critical role in the regulation of gene transcription. Numerous dangerous diseases, likewise cancer, cardiovascular, and inflammatory bowel diseases, are caused by genetic variations in promoters. Consequently, the correct identification and characterization of promoters are significant for the discovery of drugs. However, experimental approaches to recognizing promoters and their strengths are challenging in terms of cost, time, and resources. Therefore, computational techniques are highly desirable for the correct characterization of promoters from unannotated genomic data. Here, we designed a powerful bi-layer deep-learning based predictor named "PROCABLES", which discriminates DNA samples as promoters in the first-phase and strong or weak promoters in the second-phase respectively. The proposed method utilizes five distinct features, such as word2vec, k-spaced nucleotide pairs, trinucleotide propensity-based features, trinucleotide composition, and electron-ion interaction pseudopotentials, to extract the hidden patterns from the DNA sequence. Afterwards, a stacked framework is formed by integrating a convolutional neural network (CNN) with bidirectional long-short-term memory (LSTM) using multi-view attributes to train the proposed model. The PROCABLES model achieved an accuracy of 0.971 and 0.920 and the MCC 0.940 and 0.840 for the first and second-layer using the ten-fold cross-validation test, respectively. The predicted results anticipate that the proposed PROCABLES protocol outperformed the advanced computational predictors targeting promoters and their types. In summary, this research will provide useful hints for the recognition of large-scale promoters in particular and other DNA problems in general.


Asunto(s)
Aprendizaje Profundo , Regiones Promotoras Genéticas , Humanos , Redes Neurales de la Computación , Biología Computacional/métodos , ADN/genética , ADN/química
2.
Methods ; 230: 129-139, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39173785

RESUMEN

Host defense or antimicrobial peptides (AMPs) are promising candidates for protecting host against microbial pathogens for example bacteria, virus, fungi, yeast. Defensins are the type of AMPs that act as potential therapeutic drug agent and perform vital role in various biological process. Conventional Experiments to identify defensin peptides (DPs) are time consuming and expensive. Thus, the shortcomings of wet lab experiments are leveraged by computational methods to accurately predict the functional types of DPs. In this paper, we aim to propose a novel multi-class ensemble-based prediction model called StackDPPred for identifying the properties of DPs. The peptide sequences are encoded using split amino acid composition (SAAC), segmented position specific scoring matrix (SegPSSM), histogram of oriented gradients-based PSSM (HOGPSSM) and feature extraction based graphical and statistical (FEGS) descriptors. Next, principal component analysis (PCA) is used to select the best subset of attributes. After that, the optimized features are fed into single machine learning and stacking-based ensemble classifiers. Furthermore, the ablation study demonstrates the robustness and efficacy of the stacking approach using reduced features for predicting DPs and their families. The proposed StackDPPred method improves the overall accuracy by 13.41% and 7.62% compared to existing DPs predictors iDPF-PseRAAC and iDEF-PseRAAC, respectively on validation test. Additionally, we applied the local interpretable model-agnostic explanations (LIME) algorithm to understand the contribution of selected features to the overall prediction. We believe, StackDPPred could serve as a valuable tool accelerating the screening of large-scale DPs and peptide-based drug discovery process.


Asunto(s)
Defensinas , Aprendizaje Automático , Defensinas/química , Biología Computacional/métodos , Análisis de Componente Principal , Secuencia de Aminoácidos , Algoritmos , Posición Específica de Matrices de Puntuación
3.
BMC Bioinformatics ; 25(1): 145, 2024 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-38580921

RESUMEN

BACKGROUND: Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor's performance is still not satisfactory. METHODS: In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF. RESULTS: The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew's-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process. AVAILABILITY: The benchmark datasets and source codes are available in GitHub: http://github.com/Muhammad-Arif-NUST/DPI_CDF .


Asunto(s)
Proteínas , Programas Informáticos , Secuencia de Aminoácidos , Posición Específica de Matrices de Puntuación , Evolución Biológica , Biología Computacional/métodos
4.
BMC Genomics ; 25(1): 151, 2024 Feb 07.
Artículo en Inglés | MEDLINE | ID: mdl-38326777

RESUMEN

BACKGROUND: The mRNA subcellular localization bears substantial impact in the regulation of gene expression, cellular migration, and adaptation. However, the methods employed for experimental determination of this localization are arduous, time-intensive, and come with a high cost. METHODS: In this research article, we tackle the essential challenge of predicting the subcellular location of messenger RNAs (mRNAs) through Unified mRNA Subcellular Localization Predictor (UMSLP), a machine learning (ML) based approach. We embrace an in silico strategy that incorporate four distinct feature sets: kmer, pseudo k-tuple nucleotide composition, nucleotide physicochemical attributes, and the 3D sequence depiction achieved via Z-curve transformation for predicting subcellular localization in benchmark dataset across five distinct subcellular locales, encompassing nucleus, cytoplasm, extracellular region (ExR), mitochondria, and endoplasmic reticulum (ER). RESULTS: The proposed ML model UMSLP attains cutting-edge outcomes in predicting mRNA subcellular localization. On independent testing dataset, UMSLP ahcieved over 87% precision, 94% specificity, and 94% accuracy. Compared to other existing tools, UMSLP outperformed mRNALocator, mRNALoc, and SubLocEP by 11%, 21%, and 32%, respectively on average prediction accuracy for all five locales. SHapley Additive exPlanations analysis highlights the dominance of k-mer features in predicting cytoplasm, nucleus, ER, and ExR localizations, while Z-curve based features play pivotal roles in mitochondria subcellular localization detection. AVAILABILITY: We have shared datasets, code, Docker API for users in GitHub at: https://github.com/smusleh/UMSLP .


Asunto(s)
Retículo Endoplásmico , Mitocondrias , ARN Mensajero/genética , Mitocondrias/genética , Biología Computacional/métodos , Aprendizaje Automático , Nucleótidos
5.
BMC Med Inform Decis Mak ; 24(1): 144, 2024 May 29.
Artículo en Inglés | MEDLINE | ID: mdl-38811939

RESUMEN

BACKGROUND: Diabetes is a chronic condition that can result in many long-term physiological, metabolic, and neurological complications. Therefore, early detection of diabetes would help to determine a proper diagnosis and treatment plan. METHODS: In this study, we employed machine learning (ML) based case-control study on a diabetic cohort size of 1000 participants form Qatar Biobank to predict diabetes using clinical and bone health indicators from Dual Energy X-ray Absorptiometry (DXA) machines. ML models were utilized to distinguish diabetes groups from non-diabetes controls. Recursive feature elimination (RFE) was leveraged to identify a subset of features to improve the performance of model. SHAP based analysis was used for the importance of features and support the explainability of the proposed model. RESULTS: Ensemble based models XGboost and RF achieved over 84% accuracy for detecting diabetes. After applying RFE, we selected only 20 features which improved the model accuracy to 87.2%. From a clinical standpoint, higher HDL-Cholesterol and Neutrophil levels were observed in the diabetic group, along with lower vitamin B12 and testosterone levels. Lower sodium levels were found in diabetics, potentially stemming from clinical factors including specific medications, hormonal imbalances, unmanaged diabetes. We believe Dapagliflozin prescriptions in Qatar were associated with decreased Gamma Glutamyltransferase and Aspartate Aminotransferase enzyme levels, confirming prior research. We observed that bone area, bone mineral content, and bone mineral density were slightly lower in the Diabetes group across almost all body parts, but the difference against the control group was not statistically significant except in T12, troch and trunk area. No significant negative impact of diabetes progression on bone health was observed over a period of 5-15 yrs in the cohort. CONCLUSION: This study recommends the inclusion of ML model which combines both DXA and clinical data for the early diagnosis of diabetes.


Asunto(s)
Absorciometría de Fotón , Diabetes Mellitus Tipo 2 , Aprendizaje Automático , Humanos , Persona de Mediana Edad , Masculino , Estudios de Casos y Controles , Femenino , Qatar , Adulto , Anciano , Densidad Ósea
6.
Semin Cancer Biol ; 86(Pt 3): 325-345, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-35643221

RESUMEN

Understanding the complex and specific roles played by non-coding RNAs (ncRNAs), which comprise the bulk of the genome, is important for understanding virtually every hallmark of cancer. This large group of molecules plays pivotal roles in key regulatory mechanisms in various cellular processes. Regulatory mechanisms, mediated by long non-coding RNA (lncRNA) and RNA-binding protein (RBP) interactions, are well documented in several types of cancer. Their effects are enabled through networks affecting lncRNA and RBP stability, RNA metabolism including N6-methyladenosine (m6A) and alternative splicing, subcellular localization, and numerous other mechanisms involved in cancer. In this review, we discuss the reciprocal interplay between lncRNAs and RBPs and their involvement in epigenetic regulation via histone modifications, as well as their key role in resistance to cancer therapy. Other aspects of RBPs including their structural domains, provide a deeper knowledge on how lncRNAs and RBPs interact and exert their biological functions. In addition, current state-of-the-art knowledge, facilitated by machine and deep learning approaches, unravels such interactions in better details to further enhance our understanding of the field, and the potential to harness RNA-based therapeutics as an alternative treatment modality for cancer are discussed.


Asunto(s)
Neoplasias , ARN Largo no Codificante , Humanos , ARN Largo no Codificante/genética , Epigénesis Genética , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismo , Neoplasias/genética , Aprendizaje Automático
7.
BMC Bioinformatics ; 24(1): 109, 2023 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-36949389

RESUMEN

BACKGROUND: Subcellular localization of messenger RNA (mRNAs) plays a pivotal role in the regulation of gene expression, cell migration as well as in cellular adaptation. Experiment techniques for pinpointing the subcellular localization of mRNAs are laborious, time-consuming and expensive. Therefore, in silico approaches for this purpose are attaining great attention in the RNA community. METHODS: In this article, we propose MSLP, a machine learning-based method to predict the subcellular localization of mRNA. We propose a novel combination of four types of features representing k-mer, pseudo k-tuple nucleotide composition (PseKNC), physicochemical properties of nucleotides, and 3D representation of sequences based on Z-curve transformation to feed into machine learning algorithm to predict the subcellular localization of mRNAs. RESULTS: Considering the combination of the above-mentioned features, ennsemble-based models achieved state-of-the-art results in mRNA subcellular localization prediction tasks for multiple benchmark datasets. We evaluated the performance of our method  in ten subcellular locations, covering cytoplasm, nucleus, endoplasmic reticulum (ER), extracellular region (ExR), mitochondria, cytosol, pseudopodium, posterior, exosome, and the ribosome. Ablation study highlighted k-mer and PseKNC to be more dominant than other features for predicting cytoplasm, nucleus, and ER localizations. On the other hand, physicochemical properties and Z-curve based features contributed the most to ExR and mitochondria detection. SHAP-based analysis revealed the relative importance of features to provide better insights into the proposed approach. AVAILABILITY: We have implemented a Docker container and API for end users to run their sequences on our model. Datasets, the code of API and the Docker are shared for the community in GitHub at: https://github.com/smusleh/MSLP .


Asunto(s)
Algoritmos , Núcleo Celular , ARN Mensajero/genética , Ribosomas , Aprendizaje Automático , Biología Computacional/métodos
8.
Genome Res ; 30(7): 951-961, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32718981

RESUMEN

Gene expression profiles in homologous tissues have been observed to be different between species, which may be due to differences between species in the gene expression program in each cell type, but may also reflect differences in cell type composition of each tissue in different species. Here, we compare expression profiles in matching primary cells in human, mouse, rat, dog, and chicken using Cap Analysis Gene Expression (CAGE) and short RNA (sRNA) sequencing data from FANTOM5. While we find that expression profiles of orthologous genes in different species are highly correlated across cell types, in each cell type many genes were differentially expressed between species. Expression of genes with products involved in transcription, RNA processing, and transcriptional regulation was more likely to be conserved, while expression of genes encoding proteins involved in intercellular communication was more likely to have diverged during evolution. Conservation of expression correlated positively with the evolutionary age of genes, suggesting that divergence in expression levels of genes critical for cell function was restricted during evolution. Motif activity analysis showed that both promoters and enhancers are activated by the same transcription factors in different species. An analysis of expression levels of mature miRNAs and of primary miRNAs identified by CAGE revealed that evolutionary old miRNAs are more likely to have conserved expression patterns than young miRNAs. We conclude that key aspects of the regulatory network are conserved, while differential expression of genes involved in cell-to-cell communication may contribute greatly to phenotypic differences between species.


Asunto(s)
Evolución Molecular , Transcriptoma , Animales , Pollos/genética , Perros , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Ratones , MicroARNs/metabolismo , Motivos de Nucleótidos , Análisis de Componente Principal , Regiones Promotoras Genéticas , Ratas , Especificidad de la Especie , Factores de Transcripción/metabolismo
9.
Nature ; 543(7644): 199-204, 2017 03 09.
Artículo en Inglés | MEDLINE | ID: mdl-28241135

RESUMEN

Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5' ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome.


Asunto(s)
Bases de Datos Genéticas , ARN Largo no Codificante/química , ARN Largo no Codificante/genética , Transcriptoma/genética , Células Cultivadas , Secuencia Conservada/genética , Conjuntos de Datos como Asunto , Elementos de Facilitación Genéticos/genética , Epigénesis Genética , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Genómica , Humanos , Internet , Anotación de Secuencia Molecular , Especificidad de Órganos/genética , Polimorfismo de Nucleótido Simple , Regiones Promotoras Genéticas/genética , Sitios de Carácter Cuantitativo/genética , Estabilidad del ARN , ARN Mensajero/genética
10.
Sensors (Basel) ; 23(19)2023 Sep 27.
Artículo en Inglés | MEDLINE | ID: mdl-37836936

RESUMEN

The primary goal of this study is to develop a deep neural network for action recognition that enhances accuracy and minimizes computational costs. In this regard, we propose a modified EMO-MoviNet-A2* architecture that integrates Evolving Normalization (EvoNorm), Mish activation, and optimal frame selection to improve the accuracy and efficiency of action recognition tasks in videos. The asterisk notation indicates that this model also incorporates the stream buffer concept. The Mobile Video Network (MoviNet) is a member of the memory-efficient architectures discovered through Neural Architecture Search (NAS), which balances accuracy and efficiency by integrating spatial, temporal, and spatio-temporal operations. Our research implements the MoviNet model on the UCF101 and HMDB51 datasets, pre-trained on the kinetics dataset. Upon implementation on the UCF101 dataset, a generalization gap was observed, with the model performing better on the training set than on the testing set. To address this issue, we replaced batch normalization with EvoNorm, which unifies normalization and activation functions. Another area that required improvement was key-frame selection. We also developed a novel technique called Optimal Frame Selection (OFS) to identify key-frames within videos more effectively than random or densely frame selection methods. Combining OFS with Mish nonlinearity resulted in a 0.8-1% improvement in accuracy in our UCF101 20-classes experiment. The EMO-MoviNet-A2* model consumes 86% fewer FLOPs and approximately 90% fewer parameters on the UCF101 dataset, with a trade-off of 1-2% accuracy. Additionally, it achieves 5-7% higher accuracy on the HMDB51 dataset while requiring seven times fewer FLOPs and ten times fewer parameters compared to the reference model, Motion-Augmented RGB Stream (MARS).

11.
Sensors (Basel) ; 22(18)2022 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-36146323

RESUMEN

Background: Brain traumas, mental disorders, and vocal abuse can result in permanent or temporary speech impairment, significantly impairing one's quality of life and occasionally resulting in social isolation. Brain-computer interfaces (BCI) can support people who have issues with their speech or who have been paralyzed to communicate with their surroundings via brain signals. Therefore, EEG signal-based BCI has received significant attention in the last two decades for multiple reasons: (i) clinical research has capitulated detailed knowledge of EEG signals, (ii) inexpensive EEG devices, and (iii) its application in medical and social fields. Objective: This study explores the existing literature and summarizes EEG data acquisition, feature extraction, and artificial intelligence (AI) techniques for decoding speech from brain signals. Method: We followed the PRISMA-ScR guidelines to conduct this scoping review. We searched six electronic databases: PubMed, IEEE Xplore, the ACM Digital Library, Scopus, arXiv, and Google Scholar. We carefully selected search terms based on target intervention (i.e., imagined speech and AI) and target data (EEG signals), and some of the search terms were derived from previous reviews. The study selection process was carried out in three phases: study identification, study selection, and data extraction. Two reviewers independently carried out study selection and data extraction. A narrative approach was adopted to synthesize the extracted data. Results: A total of 263 studies were evaluated; however, 34 met the eligibility criteria for inclusion in this review. We found 64-electrode EEG signal devices to be the most widely used in the included studies. The most common signal normalization and feature extractions in the included studies were the bandpass filter and wavelet-based feature extraction. We categorized the studies based on AI techniques, such as machine learning and deep learning. The most prominent ML algorithm was a support vector machine, and the DL algorithm was a convolutional neural network. Conclusions: EEG signal-based BCI is a viable technology that can enable people with severe or temporal voice impairment to communicate to the world directly from their brain. However, the development of BCI technology is still in its infancy.


Asunto(s)
Interfaces Cerebro-Computador , Algoritmos , Inteligencia Artificial , Electroencefalografía/métodos , Humanos , Calidad de Vida , Habla
12.
Sensors (Basel) ; 22(12)2022 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-35746092

RESUMEN

Cardiovascular diseases (CVD) are the leading cause of death worldwide. People affected by CVDs may go undiagnosed until the occurrence of a serious heart failure event such as stroke, heart attack, and myocardial infraction. In Qatar, there is a lack of studies focusing on CVD diagnosis based on non-invasive methods such as retinal image or dual-energy X-ray absorptiometry (DXA). In this study, we aimed at diagnosing CVD using a novel approach integrating information from retinal images and DXA data. We considered an adult Qatari cohort of 500 participants from Qatar Biobank (QBB) with an equal number of participants from the CVD and the control groups. We designed a case-control study with a novel multi-modal (combining data from multiple modalities-DXA and retinal images)-to propose a deep learning (DL)-based technique to distinguish the CVD group from the control group. Uni-modal models based on retinal images and DXA data achieved 75.6% and 77.4% accuracy, respectively. The multi-modal model showed an improved accuracy of 78.3% in classifying CVD group and the control group. We used gradient class activation map (GradCAM) to highlight the areas of interest in the retinal images that influenced the decisions of the proposed DL model most. It was observed that the model focused mostly on the centre of the retinal images where signs of CVD such as hemorrhages were present. This indicates that our model can identify and make use of certain prognosis markers for hypertension and ischemic heart disease. From DXA data, we found higher values for bone mineral density, fat content, muscle mass and bone area across majority of the body parts in CVD group compared to the control group indicating better bone health in the Qatari CVD cohort. This seminal method based on DXA scans and retinal images demonstrate major potentials for the early detection of CVD in a fast and relatively non-invasive manner.


Asunto(s)
Enfermedades Cardiovasculares , Aprendizaje Profundo , Absorciometría de Fotón/métodos , Adulto , Densidad Ósea , Enfermedades Cardiovasculares/diagnóstico por imagen , Estudios de Casos y Controles , Humanos
13.
Sensors (Basel) ; 22(5)2022 Feb 24.
Artículo en Inglés | MEDLINE | ID: mdl-35270938

RESUMEN

Diabetes mellitus (DM) can lead to plantar ulcers, amputation and death. Plantar foot thermogram images acquired using an infrared camera have been shown to detect changes in temperature distribution associated with a higher risk of foot ulceration. Machine learning approaches applied to such infrared images may have utility in the early diagnosis of diabetic foot complications. In this work, a publicly available dataset was categorized into different classes, which were corroborated by domain experts, based on a temperature distribution parameter-the thermal change index (TCI). We then explored different machine-learning approaches for classifying thermograms of the TCI-labeled dataset. Classical machine learning algorithms with feature engineering and the convolutional neural network (CNN) with image enhancement techniques were extensively investigated to identify the best performing network for classifying thermograms. The multilayer perceptron (MLP) classifier along with the features extracted from thermogram images showed an accuracy of 90.1% in multi-class classification, which outperformed the literature-reported performance metrics on this dataset.


Asunto(s)
Diabetes Mellitus , Pie Diabético , Algoritmos , Pie Diabético/diagnóstico por imagen , Humanos , Aprendizaje Automático , Redes Neurales de la Computación , Termografía
14.
Bioinformatics ; 36(4): 1121-1128, 2020 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-31584626

RESUMEN

MOTIVATION: Leucine-aspartic acid (LD) motifs are short linear interaction motifs (SLiMs) that link paxillin family proteins to factors controlling cell adhesion, motility and survival. The existence and importance of LD motifs beyond the paxillin family is poorly understood. RESULTS: To enable a proteome-wide assessment of LD motifs, we developed an active learning based framework (LD motif finder; LDMF) that iteratively integrates computational predictions with experimental validation. Our analysis of the human proteome revealed a dozen new proteins containing LD motifs. We found that LD motif signalling evolved in unicellular eukaryotes more than 800 Myr ago, with paxillin and vinculin as core constituents, and nuclear export signal as a likely source of de novo LD motifs. We show that LD motif proteins form a functionally homogenous group, all being involved in cell morphogenesis and adhesion. This functional focus is recapitulated in cells by GFP-fused LD motifs, suggesting that it is intrinsic to the LD motif sequence, possibly through their effect on binding partners. Our approach elucidated the origin and dynamic adaptations of an ancestral SLiM, and can serve as a guide for the identification of other SLiMs for which only few representatives are known. AVAILABILITY AND IMPLEMENTATION: LDMF is freely available online at www.cbrc.kaust.edu.sa/ldmf; Source code is available at https://github.com/tanviralambd/LD/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Proteoma , Secuencias de Aminoácidos , Ácido Aspártico , Humanos , Leucina , Prevalencia
15.
J Med Internet Res ; 23(11): e29749, 2021 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-34806996

RESUMEN

BACKGROUND: Bipolar disorder (BD) is the 10th most common cause of frailty in young individuals and has triggered morbidity and mortality worldwide. Patients with BD have a life expectancy 9 to 17 years lower than that of normal people. BD is a predominant mental disorder, but it can be misdiagnosed as depressive disorder, which leads to difficulties in treating affected patients. Approximately 60% of patients with BD are treated for depression. However, machine learning provides advanced skills and techniques for better diagnosis of BD. OBJECTIVE: This review aims to explore the machine learning algorithms used for the detection and diagnosis of bipolar disorder and its subtypes. METHODS: The study protocol adopted the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. We explored 3 databases, namely Google Scholar, ScienceDirect, and PubMed. To enhance the search, we performed backward screening of all the references of the included studies. Based on the predefined selection criteria, 2 levels of screening were performed: title and abstract review, and full review of the articles that met the inclusion criteria. Data extraction was performed independently by all investigators. To synthesize the extracted data, a narrative synthesis approach was followed. RESULTS: We retrieved 573 potential articles were from the 3 databases. After preprocessing and screening, only 33 articles that met our inclusion criteria were identified. The most commonly used data belonged to the clinical category (19, 58%). We identified different machine learning models used in the selected studies, including classification models (18, 55%), regression models (5, 16%), model-based clustering methods (2, 6%), natural language processing (1, 3%), clustering algorithms (1, 3%), and deep learning-based models (3, 9%). Magnetic resonance imaging data were most commonly used for classifying bipolar patients compared to other groups (11, 34%), whereas microarray expression data sets and genomic data were the least commonly used. The maximum ratio of accuracy was 98%, whereas the minimum accuracy range was 64%. CONCLUSIONS: This scoping review provides an overview of recent studies based on machine learning models used to diagnose patients with BD regardless of their demographics or if they were compared to patients with psychiatric diagnoses. Further research can be conducted to provide clinical decision support in the health industry.


Asunto(s)
Trastorno Bipolar , Algoritmos , Trastorno Bipolar/diagnóstico , Manejo de Datos , Humanos , Aprendizaje Automático , Procesamiento de Lenguaje Natural
16.
J Med Internet Res ; 23(3): e23703, 2021 03 08.
Artículo en Inglés | MEDLINE | ID: mdl-33600346

RESUMEN

BACKGROUND: Shortly after the emergence of COVID-19, researchers rapidly mobilized to study numerous aspects of the disease such as its evolution, clinical manifestations, effects, treatments, and vaccinations. This led to a rapid increase in the number of COVID-19-related publications. Identifying trends and areas of interest using traditional review methods (eg, scoping and systematic reviews) for such a large domain area is challenging. OBJECTIVE: We aimed to conduct an extensive bibliometric analysis to provide a comprehensive overview of the COVID-19 literature. METHODS: We used the COVID-19 Open Research Dataset (CORD-19) that consists of a large number of research articles related to all coronaviruses. We used a machine learning-based method to analyze the most relevant COVID-19-related articles and extracted the most prominent topics. Specifically, we used a clustering algorithm to group published articles based on the similarity of their abstracts to identify research hotspots and current research directions. We have made our software accessible to the community via GitHub. RESULTS: Of the 196,630 publications retrieved from the database, we included 28,904 in our analysis. The mean number of weekly publications was 990 (SD 789.3). The country that published the highest number of COVID-19-related articles was China (2950/17,270, 17.08%). The highest number of articles were published in bioRxiv. Lei Liu affiliated with the Southern University of Science and Technology in China published the highest number of articles (n=46). Based on titles and abstracts alone, we were able to identify 1515 surveys, 733 systematic reviews, 512 cohort studies, 480 meta-analyses, and 362 randomized control trials. We identified 19 different topics covered among the publications reviewed. The most dominant topic was public health response, followed by clinical care practices during the COVID-19 pandemic, clinical characteristics and risk factors, and epidemic models for its spread. CONCLUSIONS: We provide an overview of the COVID-19 literature and have identified current hotspots and research directions. Our findings can be useful for the research community to help prioritize research needs and recognize leading COVID-19 researchers, institutes, countries, and publishers. Our study shows that an AI-based bibliometric analysis has the potential to rapidly explore a large corpus of academic publications during a public health crisis. We believe that this work can be used to analyze other eHealth-related literature to help clinicians, administrators, and policy makers to obtain a holistic view of the literature and be able to categorize different topics of the existing research for further analyses. It can be further scaled (for instance, in time) to clinical summary documentation. Publishers should avoid noise in the data by developing a way to trace the evolution of individual publications and unique authors.


Asunto(s)
Bibliometría , COVID-19/epidemiología , Aprendizaje Automático , COVID-19/virología , Humanos , Proyectos de Investigación , SARS-CoV-2/aislamiento & purificación
17.
Nucleic Acids Res ; 45(D1): D145-D150, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27789689

RESUMEN

Transcription factors (TFs) play a pivotal role in transcriptional regulation, making them crucial for cell survival and important biological functions. For the regulation of transcription, interactions of different regulatory proteins known as transcription co-factors (TcoFs) and TFs are essential in forming necessary protein complexes. Although TcoFs themselves do not bind DNA directly, their influence on transcriptional regulation and initiation, although indirect, has been shown to be significant, with the functionality of TFs strongly influenced by the presence of TcoFs. In the TcoF-DB v2 database, we collect information on TcoFs. In this article, we describe updates and improvements implemented in TcoF-DB v2. TcoF-DB v2 provides several new features that enables exploration of the roles of TcoFs. The content of the database has significantly expanded, and is enriched with information from Gene Ontology, biological pathways, diseases and molecular signatures. TcoF-DB v2 now includes many more TFs; has substantially increased the number of human TcoFs to 958, and now includes information on mouse (418 new TcoFs). TcoF-DB v2 enables the exploration of information on TcoFs and allows investigations into their influence on transcriptional regulation in humans and mice. TcoF-DB v2 can be accessed at http://tcofdb.org/.


Asunto(s)
Proteínas Portadoras , Bases de Datos Genéticas , Regulación de la Expresión Génica , Factores de Transcripción , Animales , Proteínas Portadoras/metabolismo , Humanos , Ratones , Unión Proteica , Factores de Transcripción/metabolismo
18.
Nucleic Acids Res ; 45(5): 2838-2848, 2017 03 17.
Artículo en Inglés | MEDLINE | ID: mdl-27924038

RESUMEN

Non-coding RNA (ncRNA) genes play a major role in control of heterogeneous cellular behavior. Yet, their functions are largely uncharacterized. Current available databases lack in-depth information of ncRNA functions across spectrum of various cells/tissues. Here, we present FARNA, a knowledgebase of inferred functions of 10,289 human ncRNA transcripts (2,734 microRNA and 7,555 long ncRNA) in 119 tissues and 177 primary cells of human. Since transcription factors (TFs) and TF co-factors (TcoFs) are crucial components of regulatory machinery for activation of gene transcription, cellular processes and diseases in which TFs and TcoFs are involved suggest functions of the transcripts they regulate. In FARNA, functions of a transcript are inferred from TFs and TcoFs whose genes co-express with the transcript controlled by these TFs and TcoFs in a considered cell/tissue. Transcripts were annotated using statistically enriched GO terms, pathways and diseases across cells/tissues based on guilt-by-association principle. Expression profiles across cells/tissues based on Cap Analysis of Gene Expression (CAGE) are provided. FARNA, having the most comprehensive function annotation of considered ncRNAs across widest spectrum of human cells/tissues, has a potential to greatly contribute to our understanding of ncRNA roles and their regulatory mechanisms in human. FARNA can be accessed at: http://cbrc.kaust.edu.sa/farna.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Bases del Conocimiento , MicroARNs/fisiología , ARN Largo no Codificante/fisiología , Humanos , MicroARNs/metabolismo , ARN Largo no Codificante/metabolismo , Factores de Transcripción/metabolismo
20.
RNA Biol ; 14(7): 963-971, 2017 07 03.
Artículo en Inglés | MEDLINE | ID: mdl-28387604

RESUMEN

Noncoding RNAs (ncRNAs), particularly microRNAs (miRNAs) and long ncRNAs (lncRNAs), are important players in diseases and emerge as novel drug targets. Thus, unraveling the relationships between ncRNAs and other biomedical entities in cells are critical for better understanding ncRNA roles that may eventually help develop their use in medicine. To support ncRNA research and facilitate retrieval of relevant information regarding miRNAs and lncRNAs from the plethora of published ncRNA-related research, we developed DES-ncRNA ( www.cbrc.kaust.edu.sa/des_ncrna ). DES-ncRNA is a knowledgebase containing text- and data-mined information from public scientific literature and other public resources. Exploration of mined information is enabled through terms and pairs of terms from 19 topic-specific dictionaries including, for example, antibiotics, toxins, drugs, enzymes, mutations, pathways, human genes and proteins, drug indications and side effects, mutations, diseases, etc. DES-ncRNA contains approximately 878,000 associations of terms from these dictionaries of which 36,222 (5,373) are with regards to miRNAs (lncRNAs). We provide several ways to explore information regarding ncRNAs to users including controlled generation of association networks as well as hypotheses generation. We show an example how DES-ncRNA can aid research on Alzheimer disease and suggest potential therapeutic role for Fasudil. DES-ncRNA is a powerful tool that can be used on its own or as a complement to the existing resources, to support research in human ncRNA. To our knowledge, this is the only knowledgebase dedicated to human miRNAs and lncRNAs derived primarily through literature-mining enabling exploration of a broad spectrum of associated biomedical entities, not paralleled by any other resource.


Asunto(s)
Minería de Datos , Bases del Conocimiento , MicroARNs/genética , ARN Largo no Codificante/genética , Programas Informáticos , 1-(5-Isoquinolinesulfonil)-2-Metilpiperazina/análogos & derivados , 1-(5-Isoquinolinesulfonil)-2-Metilpiperazina/uso terapéutico , Enfermedad de Alzheimer/tratamiento farmacológico , Enfermedad de Alzheimer/genética , Diccionarios como Asunto , Progresión de la Enfermedad , Ontología de Genes , Humanos , MicroARNs/metabolismo , ARN Largo no Codificante/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA