Pesquisa | Portal de Pesquisa da BVS

1.

DPI_CDF: druggable protein identifier using cascade deep forest.

Arif, Muhammad; Fang, Ge; Ghulam, Ali; Musleh, Saleh; Alam, Tanvir.

BMC Bioinformatics ; 25(1): 145, 2024 Apr 05.

Artigo em Inglês | MEDLINE | ID: mdl-38580921

RESUMO

BACKGROUND: Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor's performance is still not satisfactory. METHODS: In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF. RESULTS: The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew's-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process. AVAILABILITY: The benchmark datasets and source codes are available in GitHub: http://github.com/Muhammad-Arif-NUST/DPI_CDF .

Assuntos

Proteínas , Software , Sequência de Aminoácidos , Matrizes de Pontuação de Posição Específica , Evolução Biológica , Biologia Computacional/métodos

2.

Unified mRNA Subcellular Localization Predictor based on machine learning techniques.

Musleh, Saleh; Arif, Muhammad; Alajez, Nehad M; Alam, Tanvir.

BMC Genomics ; 25(1): 151, 2024 Feb 07.

Artigo em Inglês | MEDLINE | ID: mdl-38326777

RESUMO

BACKGROUND: The mRNA subcellular localization bears substantial impact in the regulation of gene expression, cellular migration, and adaptation. However, the methods employed for experimental determination of this localization are arduous, time-intensive, and come with a high cost. METHODS: In this research article, we tackle the essential challenge of predicting the subcellular location of messenger RNAs (mRNAs) through Unified mRNA Subcellular Localization Predictor (UMSLP), a machine learning (ML) based approach. We embrace an in silico strategy that incorporate four distinct feature sets: kmer, pseudo k-tuple nucleotide composition, nucleotide physicochemical attributes, and the 3D sequence depiction achieved via Z-curve transformation for predicting subcellular localization in benchmark dataset across five distinct subcellular locales, encompassing nucleus, cytoplasm, extracellular region (ExR), mitochondria, and endoplasmic reticulum (ER). RESULTS: The proposed ML model UMSLP attains cutting-edge outcomes in predicting mRNA subcellular localization. On independent testing dataset, UMSLP ahcieved over 87% precision, 94% specificity, and 94% accuracy. Compared to other existing tools, UMSLP outperformed mRNALocator, mRNALoc, and SubLocEP by 11%, 21%, and 32%, respectively on average prediction accuracy for all five locales. SHapley Additive exPlanations analysis highlights the dominance of k-mer features in predicting cytoplasm, nucleus, ER, and ExR localizations, while Z-curve based features play pivotal roles in mitochondria subcellular localization detection. AVAILABILITY: We have shared datasets, code, Docker API for users in GitHub at: https://github.com/smusleh/UMSLP .

Assuntos

Retículo Endoplasmático , Mitocôndrias , RNA Mensageiro/genética , Mitocôndrias/genética , Biologia Computacional/métodos , Aprendizado de Máquina , Nucleotídeos

3.

An ensemble-based machine learning model for predicting type 2 diabetes and its effect on bone health.

Alsadi, Belqes; Musleh, Saleh; Al-Absi, Hamada R H; Refaee, Mahmoud; Qureshi, Rizwan; El Hajj, Nady; Alam, Tanvir.

BMC Med Inform Decis Mak ; 24(1): 144, 2024 May 29.

Artigo em Inglês | MEDLINE | ID: mdl-38811939

RESUMO

BACKGROUND: Diabetes is a chronic condition that can result in many long-term physiological, metabolic, and neurological complications. Therefore, early detection of diabetes would help to determine a proper diagnosis and treatment plan. METHODS: In this study, we employed machine learning (ML) based case-control study on a diabetic cohort size of 1000 participants form Qatar Biobank to predict diabetes using clinical and bone health indicators from Dual Energy X-ray Absorptiometry (DXA) machines. ML models were utilized to distinguish diabetes groups from non-diabetes controls. Recursive feature elimination (RFE) was leveraged to identify a subset of features to improve the performance of model. SHAP based analysis was used for the importance of features and support the explainability of the proposed model. RESULTS: Ensemble based models XGboost and RF achieved over 84% accuracy for detecting diabetes. After applying RFE, we selected only 20 features which improved the model accuracy to 87.2%. From a clinical standpoint, higher HDL-Cholesterol and Neutrophil levels were observed in the diabetic group, along with lower vitamin B12 and testosterone levels. Lower sodium levels were found in diabetics, potentially stemming from clinical factors including specific medications, hormonal imbalances, unmanaged diabetes. We believe Dapagliflozin prescriptions in Qatar were associated with decreased Gamma Glutamyltransferase and Aspartate Aminotransferase enzyme levels, confirming prior research. We observed that bone area, bone mineral content, and bone mineral density were slightly lower in the Diabetes group across almost all body parts, but the difference against the control group was not statistically significant except in T12, troch and trunk area. No significant negative impact of diabetes progression on bone health was observed over a period of 5-15 yrs in the cohort. CONCLUSION: This study recommends the inclusion of ML model which combines both DXA and clinical data for the early diagnosis of diabetes.

Assuntos

Absorciometria de Fóton , Diabetes Mellitus Tipo 2 , Aprendizado de Máquina , Humanos , Pessoa de Meia-Idade , Masculino , Estudos de Casos e Controles , Feminino , Catar , Adulto , Idoso , Densidade Óssea

4.

Long non-coding RNA and RNA-binding protein interactions in cancer: Experimental and machine learning approaches.

Shaath, Hibah; Vishnubalaji, Radhakrishnan; Elango, Ramesh; Kardousha, Ahmed; Islam, Zeyaul; Qureshi, Rizwan; Alam, Tanvir; Kolatkar, Prasanna R; Alajez, Nehad M.

Semin Cancer Biol ; 86(Pt 3): 325-345, 2022 11.

Artigo em Inglês | MEDLINE | ID: mdl-35643221

RESUMO

Understanding the complex and specific roles played by non-coding RNAs (ncRNAs), which comprise the bulk of the genome, is important for understanding virtually every hallmark of cancer. This large group of molecules plays pivotal roles in key regulatory mechanisms in various cellular processes. Regulatory mechanisms, mediated by long non-coding RNA (lncRNA) and RNA-binding protein (RBP) interactions, are well documented in several types of cancer. Their effects are enabled through networks affecting lncRNA and RBP stability, RNA metabolism including N6-methyladenosine (m6A) and alternative splicing, subcellular localization, and numerous other mechanisms involved in cancer. In this review, we discuss the reciprocal interplay between lncRNAs and RBPs and their involvement in epigenetic regulation via histone modifications, as well as their key role in resistance to cancer therapy. Other aspects of RBPs including their structural domains, provide a deeper knowledge on how lncRNAs and RBPs interact and exert their biological functions. In addition, current state-of-the-art knowledge, facilitated by machine and deep learning approaches, unravels such interactions in better details to further enhance our understanding of the field, and the potential to harness RNA-based therapeutics as an alternative treatment modality for cancer are discussed.

Assuntos

Neoplasias , RNA Longo não Codificante , Humanos , RNA Longo não Codificante/genética , Epigênese Genética , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Neoplasias/genética , Aprendizado de Máquina

5.

MSLP: mRNA subcellular localization predictor based on machine learning techniques.

Musleh, Saleh; Islam, Mohammad Tariqul; Qureshi, Rizwan; Alajez, Nehad M; Alam, Tanvir.

BMC Bioinformatics ; 24(1): 109, 2023 Mar 22.

Artigo em Inglês | MEDLINE | ID: mdl-36949389

RESUMO

BACKGROUND: Subcellular localization of messenger RNA (mRNAs) plays a pivotal role in the regulation of gene expression, cell migration as well as in cellular adaptation. Experiment techniques for pinpointing the subcellular localization of mRNAs are laborious, time-consuming and expensive. Therefore, in silico approaches for this purpose are attaining great attention in the RNA community. METHODS: In this article, we propose MSLP, a machine learning-based method to predict the subcellular localization of mRNA. We propose a novel combination of four types of features representing k-mer, pseudo k-tuple nucleotide composition (PseKNC), physicochemical properties of nucleotides, and 3D representation of sequences based on Z-curve transformation to feed into machine learning algorithm to predict the subcellular localization of mRNAs. RESULTS: Considering the combination of the above-mentioned features, ennsemble-based models achieved state-of-the-art results in mRNA subcellular localization prediction tasks for multiple benchmark datasets. We evaluated the performance of our method in ten subcellular locations, covering cytoplasm, nucleus, endoplasmic reticulum (ER), extracellular region (ExR), mitochondria, cytosol, pseudopodium, posterior, exosome, and the ribosome. Ablation study highlighted k-mer and PseKNC to be more dominant than other features for predicting cytoplasm, nucleus, and ER localizations. On the other hand, physicochemical properties and Z-curve based features contributed the most to ExR and mitochondria detection. SHAP-based analysis revealed the relative importance of features to provide better insights into the proposed approach. AVAILABILITY: We have implemented a Docker container and API for end users to run their sequences on our model. Datasets, the code of API and the Docker are shared for the community in GitHub at: https://github.com/smusleh/MSLP .

Assuntos

Algoritmos , Núcleo Celular , RNA Mensageiro/genética , Ribossomos , Aprendizado de Máquina , Biologia Computacional/métodos

6.

Comparative transcriptomics of primary cells in vertebrates.

Alam, Tanvir; Agrawal, Saumya; Severin, Jessica; Young, Robert S; Andersson, Robin; Arner, Erik; Hasegawa, Akira; Lizio, Marina; Ramilowski, Jordan A; Abugessaisa, Imad; Ishizu, Yuri; Noma, Shohei; Tarui, Hiroshi; Taylor, Martin S; Lassmann, Timo; Itoh, Masayoshi; Kasukawa, Takeya; Kawaji, Hideya; Marchionni, Luigi; Sheng, Guojun; R R Forrest, Alistair; Khachigian, Levon M; Hayashizaki, Yoshihide; Carninci, Piero; de Hoon, Michiel J L.

Genome Res ; 30(7): 951-961, 2020 07.

Artigo em Inglês | MEDLINE | ID: mdl-32718981

RESUMO

Gene expression profiles in homologous tissues have been observed to be different between species, which may be due to differences between species in the gene expression program in each cell type, but may also reflect differences in cell type composition of each tissue in different species. Here, we compare expression profiles in matching primary cells in human, mouse, rat, dog, and chicken using Cap Analysis Gene Expression (CAGE) and short RNA (sRNA) sequencing data from FANTOM5. While we find that expression profiles of orthologous genes in different species are highly correlated across cell types, in each cell type many genes were differentially expressed between species. Expression of genes with products involved in transcription, RNA processing, and transcriptional regulation was more likely to be conserved, while expression of genes encoding proteins involved in intercellular communication was more likely to have diverged during evolution. Conservation of expression correlated positively with the evolutionary age of genes, suggesting that divergence in expression levels of genes critical for cell function was restricted during evolution. Motif activity analysis showed that both promoters and enhancers are activated by the same transcription factors in different species. An analysis of expression levels of mature miRNAs and of primary miRNAs identified by CAGE revealed that evolutionary old miRNAs are more likely to have conserved expression patterns than young miRNAs. We conclude that key aspects of the regulatory network are conserved, while differential expression of genes involved in cell-to-cell communication may contribute greatly to phenotypic differences between species.

Assuntos

Evolução Molecular , Transcriptoma , Animais , Galinhas/genética , Cães , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Camundongos , MicroRNAs/metabolismo , Motivos de Nucleotídeos , Análise de Componente Principal , Regiões Promotoras Genéticas , Ratos , Especificidade da Espécie , Fatores de Transcrição/metabolismo

7.

An atlas of human long non-coding RNAs with accurate 5' ends.

Hon, Chung-Chau; Ramilowski, Jordan A; Harshbarger, Jayson; Bertin, Nicolas; Rackham, Owen J L; Gough, Julian; Denisenko, Elena; Schmeier, Sebastian; Poulsen, Thomas M; Severin, Jessica; Lizio, Marina; Kawaji, Hideya; Kasukawa, Takeya; Itoh, Masayoshi; Burroughs, A Maxwell; Noma, Shohei; Djebali, Sarah; Alam, Tanvir; Medvedeva, Yulia A; Testa, Alison C; Lipovich, Leonard; Yip, Chi-Wai; Abugessaisa, Imad; Mendez, Mickaël; Hasegawa, Akira; Tang, Dave; Lassmann, Timo; Heutink, Peter; Babina, Magda; Wells, Christine A; Kojima, Soichi; Nakamura, Yukio; Suzuki, Harukazu; Daub, Carsten O; de Hoon, Michiel J L; Arner, Erik; Hayashizaki, Yoshihide; Carninci, Piero; Forrest, Alistair R R.

Nature ; 543(7644): 199-204, 2017 03 09.

Artigo em Inglês | MEDLINE | ID: mdl-28241135

RESUMO

Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5' ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome.

Assuntos

Bases de Dados Genéticas , RNA Longo não Codificante/química , RNA Longo não Codificante/genética , Transcriptoma/genética , Células Cultivadas , Sequência Conservada/genética , Conjuntos de Dados como Assunto , Elementos Facilitadores Genéticos/genética , Epigênese Genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Genoma Humano/genética , Estudo de Associação Genômica Ampla , Genômica , Humanos , Internet , Anotação de Sequência Molecular , Especificidade de Órgãos/genética , Polimorfismo de Nucleotídeo Único , Regiões Promotoras Genéticas/genética , Locos de Características Quantitativas/genética , Estabilidade de RNA , RNA Mensageiro/genética

8.

EMO-MoviNet: Enhancing Action Recognition in Videos with EvoNorm, Mish Activation, and Optimal Frame Selection for Efficient Mobile Deployment.

Hussain, Tarique; Memon, Zulfiqar Ali; Qureshi, Rizwan; Alam, Tanvir.

Sensors (Basel) ; 23(19)2023 Sep 27.

Artigo em Inglês | MEDLINE | ID: mdl-37836936

RESUMO

The primary goal of this study is to develop a deep neural network for action recognition that enhances accuracy and minimizes computational costs. In this regard, we propose a modified EMO-MoviNet-A2* architecture that integrates Evolving Normalization (EvoNorm), Mish activation, and optimal frame selection to improve the accuracy and efficiency of action recognition tasks in videos. The asterisk notation indicates that this model also incorporates the stream buffer concept. The Mobile Video Network (MoviNet) is a member of the memory-efficient architectures discovered through Neural Architecture Search (NAS), which balances accuracy and efficiency by integrating spatial, temporal, and spatio-temporal operations. Our research implements the MoviNet model on the UCF101 and HMDB51 datasets, pre-trained on the kinetics dataset. Upon implementation on the UCF101 dataset, a generalization gap was observed, with the model performing better on the training set than on the testing set. To address this issue, we replaced batch normalization with EvoNorm, which unifies normalization and activation functions. Another area that required improvement was key-frame selection. We also developed a novel technique called Optimal Frame Selection (OFS) to identify key-frames within videos more effectively than random or densely frame selection methods. Combining OFS with Mish nonlinearity resulted in a 0.8-1% improvement in accuracy in our UCF101 20-classes experiment. The EMO-MoviNet-A2* model consumes 86% fewer FLOPs and approximately 90% fewer parameters on the UCF101 dataset, with a trade-off of 1-2% accuracy. Additionally, it achieves 5-7% higher accuracy on the HMDB51 dataset while requiring seven times fewer FLOPs and ten times fewer parameters compared to the reference model, Motion-Augmented RGB Stream (MARS).

9.

The Role of Artificial Intelligence in Decoding Speech from EEG Signals: A Scoping Review.

Shah, Uzair; Alzubaidi, Mahmood; Mohsen, Farida; Abd-Alrazaq, Alaa; Alam, Tanvir; Househ, Mowafa.

Sensors (Basel) ; 22(18)2022 Sep 15.

Artigo em Inglês | MEDLINE | ID: mdl-36146323

RESUMO

Background: Brain traumas, mental disorders, and vocal abuse can result in permanent or temporary speech impairment, significantly impairing one's quality of life and occasionally resulting in social isolation. Brain-computer interfaces (BCI) can support people who have issues with their speech or who have been paralyzed to communicate with their surroundings via brain signals. Therefore, EEG signal-based BCI has received significant attention in the last two decades for multiple reasons: (i) clinical research has capitulated detailed knowledge of EEG signals, (ii) inexpensive EEG devices, and (iii) its application in medical and social fields. Objective: This study explores the existing literature and summarizes EEG data acquisition, feature extraction, and artificial intelligence (AI) techniques for decoding speech from brain signals. Method: We followed the PRISMA-ScR guidelines to conduct this scoping review. We searched six electronic databases: PubMed, IEEE Xplore, the ACM Digital Library, Scopus, arXiv, and Google Scholar. We carefully selected search terms based on target intervention (i.e., imagined speech and AI) and target data (EEG signals), and some of the search terms were derived from previous reviews. The study selection process was carried out in three phases: study identification, study selection, and data extraction. Two reviewers independently carried out study selection and data extraction. A narrative approach was adopted to synthesize the extracted data. Results: A total of 263 studies were evaluated; however, 34 met the eligibility criteria for inclusion in this review. We found 64-electrode EEG signal devices to be the most widely used in the included studies. The most common signal normalization and feature extractions in the included studies were the bandpass filter and wavelet-based feature extraction. We categorized the studies based on AI techniques, such as machine learning and deep learning. The most prominent ML algorithm was a support vector machine, and the DL algorithm was a convolutional neural network. Conclusions: EEG signal-based BCI is a viable technology that can enable people with severe or temporal voice impairment to communicate to the world directly from their brain. However, the development of BCI technology is still in its infancy.

Assuntos

Interfaces Cérebro-Computador , Algoritmos , Inteligência Artificial , Eletroencefalografia/métodos , Humanos , Qualidade de Vida , Fala

10.

Cardiovascular Disease Diagnosis from DXA Scan and Retinal Images Using Deep Learning.

Al-Absi, Hamada R H; Islam, Mohammad Tariqul; Refaee, Mahmoud Ahmed; Chowdhury, Muhammad E H; Alam, Tanvir.

Sensors (Basel) ; 22(12)2022 Jun 07.

Artigo em Inglês | MEDLINE | ID: mdl-35746092

RESUMO

Cardiovascular diseases (CVD) are the leading cause of death worldwide. People affected by CVDs may go undiagnosed until the occurrence of a serious heart failure event such as stroke, heart attack, and myocardial infraction. In Qatar, there is a lack of studies focusing on CVD diagnosis based on non-invasive methods such as retinal image or dual-energy X-ray absorptiometry (DXA). In this study, we aimed at diagnosing CVD using a novel approach integrating information from retinal images and DXA data. We considered an adult Qatari cohort of 500 participants from Qatar Biobank (QBB) with an equal number of participants from the CVD and the control groups. We designed a case-control study with a novel multi-modal (combining data from multiple modalities-DXA and retinal images)-to propose a deep learning (DL)-based technique to distinguish the CVD group from the control group. Uni-modal models based on retinal images and DXA data achieved 75.6% and 77.4% accuracy, respectively. The multi-modal model showed an improved accuracy of 78.3% in classifying CVD group and the control group. We used gradient class activation map (GradCAM) to highlight the areas of interest in the retinal images that influenced the decisions of the proposed DL model most. It was observed that the model focused mostly on the centre of the retinal images where signs of CVD such as hemorrhages were present. This indicates that our model can identify and make use of certain prognosis markers for hypertension and ischemic heart disease. From DXA data, we found higher values for bone mineral density, fat content, muscle mass and bone area across majority of the body parts in CVD group compared to the control group indicating better bone health in the Qatari CVD cohort. This seminal method based on DXA scans and retinal images demonstrate major potentials for the early detection of CVD in a fast and relatively non-invasive manner.

Assuntos

Doenças Cardiovasculares , Aprendizado Profundo , Absorciometria de Fóton/métodos , Adulto , Densidade Óssea , Doenças Cardiovasculares/diagnóstico por imagem , Estudos de Casos e Controles , Humanos

11.

Thermal Change Index-Based Diabetic Foot Thermogram Image Classification Using Machine Learning Techniques.

Khandakar, Amith; Chowdhury, Muhammad E H; Reaz, Mamun Bin Ibne; Ali, Sawal Hamid Md; Abbas, Tariq O; Alam, Tanvir; Ayari, Mohamed Arselene; Mahbub, Zaid B; Habib, Rumana; Rahman, Tawsifur; Tahir, Anas M; Bakar, Ahmad Ashrif A; Malik, Rayaz A.

Sensors (Basel) ; 22(5)2022 Feb 24.

Artigo em Inglês | MEDLINE | ID: mdl-35270938

RESUMO

Diabetes mellitus (DM) can lead to plantar ulcers, amputation and death. Plantar foot thermogram images acquired using an infrared camera have been shown to detect changes in temperature distribution associated with a higher risk of foot ulceration. Machine learning approaches applied to such infrared images may have utility in the early diagnosis of diabetic foot complications. In this work, a publicly available dataset was categorized into different classes, which were corroborated by domain experts, based on a temperature distribution parameter-the thermal change index (TCI). We then explored different machine-learning approaches for classifying thermograms of the TCI-labeled dataset. Classical machine learning algorithms with feature engineering and the convolutional neural network (CNN) with image enhancement techniques were extensively investigated to identify the best performing network for classifying thermograms. The multilayer perceptron (MLP) classifier along with the features extracted from thermogram images showed an accuracy of 90.1% in multi-class classification, which outperformed the literature-reported performance metrics on this dataset.

Assuntos

Diabetes Mellitus , Pé Diabético , Algoritmos , Pé Diabético/diagnóstico por imagem , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Termografia

12.

Proteome-level assessment of origin, prevalence and function of leucine-aspartic acid (LD) motifs.

Alam, Tanvir; Alazmi, Meshari; Naser, Rayan; Huser, Franceline; Momin, Afaque A; Astro, Veronica; Hong, SeungBeom; Walkiewicz, Katarzyna W; Canlas, Christian G; Huser, Raphaël; Ali, Amal J; Merzaban, Jasmeen; Adamo, Antonio; Jaremko, Mariusz; Jaremko, Lukasz; Bajic, Vladimir B; Gao, Xin; Arold, Stefan T.

Bioinformatics ; 36(4): 1121-1128, 2020 02 15.

Artigo em Inglês | MEDLINE | ID: mdl-31584626

RESUMO

MOTIVATION: Leucine-aspartic acid (LD) motifs are short linear interaction motifs (SLiMs) that link paxillin family proteins to factors controlling cell adhesion, motility and survival. The existence and importance of LD motifs beyond the paxillin family is poorly understood. RESULTS: To enable a proteome-wide assessment of LD motifs, we developed an active learning based framework (LD motif finder; LDMF) that iteratively integrates computational predictions with experimental validation. Our analysis of the human proteome revealed a dozen new proteins containing LD motifs. We found that LD motif signalling evolved in unicellular eukaryotes more than 800 Myr ago, with paxillin and vinculin as core constituents, and nuclear export signal as a likely source of de novo LD motifs. We show that LD motif proteins form a functionally homogenous group, all being involved in cell morphogenesis and adhesion. This functional focus is recapitulated in cells by GFP-fused LD motifs, suggesting that it is intrinsic to the LD motif sequence, possibly through their effect on binding partners. Our approach elucidated the origin and dynamic adaptations of an ancestral SLiM, and can serve as a guide for the identification of other SLiMs for which only few representatives are known. AVAILABILITY AND IMPLEMENTATION: LDMF is freely available online at www.cbrc.kaust.edu.sa/ldmf; Source code is available at https://github.com/tanviralambd/LD/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Proteoma , Motivos de Aminoácidos , Ácido Aspártico , Humanos , Leucina , Prevalência

13.

A Comprehensive Overview of the COVID-19 Literature: Machine Learning-Based Bibliometric Analysis.

Abd-Alrazaq, Alaa; Schneider, Jens; Mifsud, Borbala; Alam, Tanvir; Househ, Mowafa; Hamdi, Mounir; Shah, Zubair.

J Med Internet Res ; 23(3): e23703, 2021 03 08.

Artigo em Inglês | MEDLINE | ID: mdl-33600346

RESUMO

BACKGROUND: Shortly after the emergence of COVID-19, researchers rapidly mobilized to study numerous aspects of the disease such as its evolution, clinical manifestations, effects, treatments, and vaccinations. This led to a rapid increase in the number of COVID-19-related publications. Identifying trends and areas of interest using traditional review methods (eg, scoping and systematic reviews) for such a large domain area is challenging. OBJECTIVE: We aimed to conduct an extensive bibliometric analysis to provide a comprehensive overview of the COVID-19 literature. METHODS: We used the COVID-19 Open Research Dataset (CORD-19) that consists of a large number of research articles related to all coronaviruses. We used a machine learning-based method to analyze the most relevant COVID-19-related articles and extracted the most prominent topics. Specifically, we used a clustering algorithm to group published articles based on the similarity of their abstracts to identify research hotspots and current research directions. We have made our software accessible to the community via GitHub. RESULTS: Of the 196,630 publications retrieved from the database, we included 28,904 in our analysis. The mean number of weekly publications was 990 (SD 789.3). The country that published the highest number of COVID-19-related articles was China (2950/17,270, 17.08%). The highest number of articles were published in bioRxiv. Lei Liu affiliated with the Southern University of Science and Technology in China published the highest number of articles (n=46). Based on titles and abstracts alone, we were able to identify 1515 surveys, 733 systematic reviews, 512 cohort studies, 480 meta-analyses, and 362 randomized control trials. We identified 19 different topics covered among the publications reviewed. The most dominant topic was public health response, followed by clinical care practices during the COVID-19 pandemic, clinical characteristics and risk factors, and epidemic models for its spread. CONCLUSIONS: We provide an overview of the COVID-19 literature and have identified current hotspots and research directions. Our findings can be useful for the research community to help prioritize research needs and recognize leading COVID-19 researchers, institutes, countries, and publishers. Our study shows that an AI-based bibliometric analysis has the potential to rapidly explore a large corpus of academic publications during a public health crisis. We believe that this work can be used to analyze other eHealth-related literature to help clinicians, administrators, and policy makers to obtain a holistic view of the literature and be able to categorize different topics of the existing research for further analyses. It can be further scaled (for instance, in time) to clinical summary documentation. Publishers should avoid noise in the data by developing a way to trace the evolution of individual publications and unique authors.

Assuntos

Bibliometria , COVID-19/epidemiologia , Aprendizado de Máquina , COVID-19/virologia , Humanos , Projetos de Pesquisa , SARS-CoV-2/isolamento & purificação

14.

The Role of Machine Learning in Diagnosing Bipolar Disorder: Scoping Review.

Jan, Zainab; Ai-Ansari, Noor; Mousa, Osama; Abd-Alrazaq, Alaa; Ahmed, Arfan; Alam, Tanvir; Househ, Mowafa.

J Med Internet Res ; 23(11): e29749, 2021 11 19.

Artigo em Inglês | MEDLINE | ID: mdl-34806996

RESUMO

BACKGROUND: Bipolar disorder (BD) is the 10th most common cause of frailty in young individuals and has triggered morbidity and mortality worldwide. Patients with BD have a life expectancy 9 to 17 years lower than that of normal people. BD is a predominant mental disorder, but it can be misdiagnosed as depressive disorder, which leads to difficulties in treating affected patients. Approximately 60% of patients with BD are treated for depression. However, machine learning provides advanced skills and techniques for better diagnosis of BD. OBJECTIVE: This review aims to explore the machine learning algorithms used for the detection and diagnosis of bipolar disorder and its subtypes. METHODS: The study protocol adopted the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. We explored 3 databases, namely Google Scholar, ScienceDirect, and PubMed. To enhance the search, we performed backward screening of all the references of the included studies. Based on the predefined selection criteria, 2 levels of screening were performed: title and abstract review, and full review of the articles that met the inclusion criteria. Data extraction was performed independently by all investigators. To synthesize the extracted data, a narrative synthesis approach was followed. RESULTS: We retrieved 573 potential articles were from the 3 databases. After preprocessing and screening, only 33 articles that met our inclusion criteria were identified. The most commonly used data belonged to the clinical category (19, 58%). We identified different machine learning models used in the selected studies, including classification models (18, 55%), regression models (5, 16%), model-based clustering methods (2, 6%), natural language processing (1, 3%), clustering algorithms (1, 3%), and deep learning-based models (3, 9%). Magnetic resonance imaging data were most commonly used for classifying bipolar patients compared to other groups (11, 34%), whereas microarray expression data sets and genomic data were the least commonly used. The maximum ratio of accuracy was 98%, whereas the minimum accuracy range was 64%. CONCLUSIONS: This scoping review provides an overview of recent studies based on machine learning models used to diagnose patients with BD regardless of their demographics or if they were compared to patients with psychiatric diagnoses. Further research can be conducted to provide clinical decision support in the health industry.

Assuntos

Transtorno Bipolar , Algoritmos , Transtorno Bipolar/diagnóstico , Gerenciamento de Dados , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural

15.

TcoF-DB v2: update of the database of human and mouse transcription co-factors and transcription factor interactions.

Schmeier, Sebastian; Alam, Tanvir; Essack, Magbubah; Bajic, Vladimir B.

Nucleic Acids Res ; 45(D1): D145-D150, 2017 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-27789689

RESUMO

Transcription factors (TFs) play a pivotal role in transcriptional regulation, making them crucial for cell survival and important biological functions. For the regulation of transcription, interactions of different regulatory proteins known as transcription co-factors (TcoFs) and TFs are essential in forming necessary protein complexes. Although TcoFs themselves do not bind DNA directly, their influence on transcriptional regulation and initiation, although indirect, has been shown to be significant, with the functionality of TFs strongly influenced by the presence of TcoFs. In the TcoF-DB v2 database, we collect information on TcoFs. In this article, we describe updates and improvements implemented in TcoF-DB v2. TcoF-DB v2 provides several new features that enables exploration of the roles of TcoFs. The content of the database has significantly expanded, and is enriched with information from Gene Ontology, biological pathways, diseases and molecular signatures. TcoF-DB v2 now includes many more TFs; has substantially increased the number of human TcoFs to 958, and now includes information on mouse (418 new TcoFs). TcoF-DB v2 enables the exploration of information on TcoFs and allows investigations into their influence on transcriptional regulation in humans and mice. TcoF-DB v2 can be accessed at http://tcofdb.org/.

Assuntos

Proteínas de Transporte , Bases de Dados Genéticas , Regulação da Expressão Gênica , Fatores de Transcrição , Animais , Proteínas de Transporte/metabolismo , Humanos , Camundongos , Ligação Proteica , Fatores de Transcrição/metabolismo

16.

FARNA: knowledgebase of inferred functions of non-coding RNA transcripts.

Alam, Tanvir; Uludag, Mahmut; Essack, Magbubah; Salhi, Adil; Ashoor, Haitham; Hanks, John B; Kapfer, Craig; Mineta, Katsuhiko; Gojobori, Takashi; Bajic, Vladimir B.

Nucleic Acids Res ; 45(5): 2838-2848, 2017 03 17.

Artigo em Inglês | MEDLINE | ID: mdl-27924038

RESUMO

Non-coding RNA (ncRNA) genes play a major role in control of heterogeneous cellular behavior. Yet, their functions are largely uncharacterized. Current available databases lack in-depth information of ncRNA functions across spectrum of various cells/tissues. Here, we present FARNA, a knowledgebase of inferred functions of 10,289 human ncRNA transcripts (2,734 microRNA and 7,555 long ncRNA) in 119 tissues and 177 primary cells of human. Since transcription factors (TFs) and TF co-factors (TcoFs) are crucial components of regulatory machinery for activation of gene transcription, cellular processes and diseases in which TFs and TcoFs are involved suggest functions of the transcripts they regulate. In FARNA, functions of a transcript are inferred from TFs and TcoFs whose genes co-express with the transcript controlled by these TFs and TcoFs in a considered cell/tissue. Transcripts were annotated using statistically enriched GO terms, pathways and diseases across cells/tissues based on guilt-by-association principle. Expression profiles across cells/tissues based on Cap Analysis of Gene Expression (CAGE) are provided. FARNA, having the most comprehensive function annotation of considered ncRNAs across widest spectrum of human cells/tissues, has a potential to greatly contribute to our understanding of ncRNA roles and their regulatory mechanisms in human. FARNA can be accessed at: http://cbrc.kaust.edu.sa/farna.

Assuntos

Bases de Dados de Ácidos Nucleicos , Bases de Conhecimento , MicroRNAs/fisiologia , RNA Longo não Codificante/fisiologia , Humanos , MicroRNAs/metabolismo , RNA Longo não Codificante/metabolismo , Fatores de Transcrição/metabolismo

17.

Correction: MSLP: mRNA subcellular localization predictor based on machine learning techniques.

Musleh, Saleh; Islam, Mohammad Tariqul; Qureshi, Rizwan; Alajez, Nehad M; Alam, Tanvir.

BMC Bioinformatics ; 24(1): 156, 2023 Apr 18.

Artigo em Inglês | MEDLINE | ID: mdl-37072697

18.

DES-ncRNA: A knowledgebase for exploring information about human micro and long noncoding RNAs based on literature-mining.

Salhi, Adil; Essack, Magbubah; Alam, Tanvir; Bajic, Vladan P; Ma, Lina; Radovanovic, Aleksandar; Marchand, Benoit; Schmeier, Sebastian; Zhang, Zhang; Bajic, Vladimir B.

RNA Biol ; 14(7): 963-971, 2017 07 03.

Artigo em Inglês | MEDLINE | ID: mdl-28387604

RESUMO

Noncoding RNAs (ncRNAs), particularly microRNAs (miRNAs) and long ncRNAs (lncRNAs), are important players in diseases and emerge as novel drug targets. Thus, unraveling the relationships between ncRNAs and other biomedical entities in cells are critical for better understanding ncRNA roles that may eventually help develop their use in medicine. To support ncRNA research and facilitate retrieval of relevant information regarding miRNAs and lncRNAs from the plethora of published ncRNA-related research, we developed DES-ncRNA ( www.cbrc.kaust.edu.sa/des_ncrna ). DES-ncRNA is a knowledgebase containing text- and data-mined information from public scientific literature and other public resources. Exploration of mined information is enabled through terms and pairs of terms from 19 topic-specific dictionaries including, for example, antibiotics, toxins, drugs, enzymes, mutations, pathways, human genes and proteins, drug indications and side effects, mutations, diseases, etc. DES-ncRNA contains approximately 878,000 associations of terms from these dictionaries of which 36,222 (5,373) are with regards to miRNAs (lncRNAs). We provide several ways to explore information regarding ncRNAs to users including controlled generation of association networks as well as hypotheses generation. We show an example how DES-ncRNA can aid research on Alzheimer disease and suggest potential therapeutic role for Fasudil. DES-ncRNA is a powerful tool that can be used on its own or as a complement to the existing resources, to support research in human ncRNA. To our knowledge, this is the only knowledgebase dedicated to human miRNAs and lncRNAs derived primarily through literature-mining enabling exploration of a broad spectrum of associated biomedical entities, not paralleled by any other resource.

Assuntos

Mineração de Dados , Bases de Conhecimento , MicroRNAs/genética , RNA Longo não Codificante/genética , Software , 1-(5-Isoquinolinasulfonil)-2-Metilpiperazina/análogos & derivados , 1-(5-Isoquinolinasulfonil)-2-Metilpiperazina/uso terapêutico , Doença de Alzheimer/tratamento farmacológico , Doença de Alzheimer/genética , Dicionários como Assunto , Progressão da Doença , Ontologia Genética , Humanos , MicroRNAs/metabolismo , RNA Longo não Codificante/metabolismo

19.

Redefining the transcriptional regulatory dynamics of classically and alternatively activated macrophages by deepCAGE transcriptomics.

Roy, Sugata; Schmeier, Sebastian; Arner, Erik; Alam, Tanvir; Parihar, Suraj P; Ozturk, Mumin; Tamgue, Ousman; Kawaji, Hideya; de Hoon, Michiel J L; Itoh, Masayoshi; Lassmann, Timo; Carninci, Piero; Hayashizaki, Yoshihide; Forrest, Alistair R R; Bajic, Vladimir B; Guler, Reto; Brombacher, Frank; Suzuki, Harukazu.

Nucleic Acids Res ; 43(14): 6969-82, 2015 Aug 18.

Artigo em Inglês | MEDLINE | ID: mdl-26117544

RESUMO

Classically or alternatively activated macrophages (M1 and M2, respectively) play distinct and important roles for microbiocidal activity, regulation of inflammation and tissue homeostasis. Despite this, their transcriptional regulatory dynamics are poorly understood. Using promoter-level expression profiling by non-biased deepCAGE we have studied the transcriptional dynamics of classically and alternatively activated macrophages. Transcription factor (TF) binding motif activity analysis revealed four motifs, NFKB1_REL_RELA, IRF1,2, IRF7 and TBP that are commonly activated but have distinct activity dynamics in M1 and M2 activation. We observe matching changes in the expression profiles of the corresponding TFs and show that only a restricted set of TFs change expression. There is an overall drastic and transient up-regulation in M1 and a weaker and more sustainable up-regulation in M2. Novel TFs, such as Thap6, Maff, (M1) and Hivep1, Nfil3, Prdm1, (M2) among others, were suggested to be involved in the activation processes. Additionally, 52 (M1) and 67 (M2) novel differentially expressed genes and, for the first time, several differentially expressed long non-coding RNA (lncRNA) transcriptome markers were identified. In conclusion, the finding of novel motifs, TFs and protein-coding and lncRNA genes is an important step forward to fully understand the transcriptional machinery of macrophage activation.

Assuntos

Regulação da Expressão Gênica , Ativação de Macrófagos/genética , Macrófagos/metabolismo , Transcriptoma , Animais , Células Cultivadas , DNA/química , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Interferon gama/farmacologia , Interleucina-13/farmacologia , Interleucina-4/farmacologia , Macrófagos/efeitos dos fármacos , Masculino , Camundongos Endogâmicos BALB C , Motivos de Nucleotídeos , Regiões Promotoras Genéticas , Análise de Sequência de DNA , Fatores de Transcrição/metabolismo

20.

How to find a leucine in a haystack? Structure, ligand recognition and regulation of leucine-aspartic acid (LD) motifs.

Alam, Tanvir; Alazmi, Meshari; Gao, Xin; Arold, Stefan T.

Biochem J ; 460(3): 317-29, 2014 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-24870021

RESUMO

LD motifs (leucine-aspartic acid motifs) are short helical protein-protein interaction motifs that have emerged as key players in connecting cell adhesion with cell motility and survival. LD motifs are required for embryogenesis, wound healing and the evolution of multicellularity. LD motifs also play roles in disease, such as in cancer metastasis or viral infection. First described in the paxillin family of scaffolding proteins, LD motifs and similar acidic LXXLL interaction motifs have been discovered in several other proteins, whereas 16 proteins have been reported to contain LDBDs (LD motif-binding domains). Collectively, structural and functional analyses have revealed a surprising multivalency in LD motif interactions and a wide diversity in LDBD architectures. In the present review, we summarize the molecular basis for function, regulation and selectivity of LD motif interactions that has emerged from more than a decade of research. This overview highlights the intricate multi-level regulation and the inherently noisy and heterogeneous nature of signalling through short protein-protein interaction motifs.

Assuntos

Motivos de Aminoácidos/fisiologia , Ácido Aspártico/metabolismo , Leucina/metabolismo , Proteínas Adaptadoras de Transdução de Sinal/fisiologia , Proteínas Reguladoras de Apoptose/fisiologia , Proteínas de Ciclo Celular/fisiologia , Quinase 2 de Adesão Focal/química , Humanos , Ligantes , Proteínas de Membrana/fisiologia , Proteínas dos Microfilamentos/metabolismo , Paxilina/química , Proteína I de Ligação a Poli(A)/metabolismo , Estrutura Terciária de Proteína , Proteínas Proto-Oncogênicas/fisiologia , Proteínas Proto-Oncogênicas c-bcl-2/fisiologia , Vinculina/fisiologia

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA