Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
Med Biol Eng Comput ; 2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38700613

RESUMO

Neurodegenerative diseases often exhibit a strong link with sleep disruption, highlighting the importance of effective sleep stage monitoring. In this light, automatic sleep stage classification (ASSC) plays a pivotal role, now more streamlined than ever due to the advancements in deep learning (DL). However, the opaque nature of DL models can be a barrier in their clinical adoption, due to trust concerns among medical practitioners. To bridge this gap, we introduce SleepBoost, a transparent multi-level tree-based ensemble model specifically designed for ASSC. Our approach includes a crafted feature engineering block (FEB) that extracts 41 time and frequency domain features, out of which 23 are selected based on their high mutual information score (> 0.23). Uniquely, SleepBoost integrates three fundamental linear models into a cohesive multi-level tree structure, further enhanced by a novel reward-based adaptive weight allocation mechanism. Tested on the Sleep-EDF-20 dataset, SleepBoost demonstrates superior performance with an accuracy of 86.3%, F1-score of 80.9%, and Cohen kappa score of 0.807, outperforming leading DL models in ASSC. An ablation study underscores the critical role of our selective feature extraction in enhancing model accuracy and interpretability, crucial for clinical settings. This innovative approach not only offers a more transparent alternative to traditional DL models but also extends potential implications for monitoring and understanding sleep patterns in the context of neurodegenerative disorders. The open-source availability of SleepBoost's implementation at https://github.com/akibzaman/SleepBoost can further facilitate its accessibility and potential for widespread clinical adoption.

3.
Data Brief ; 52: 109938, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38173982

RESUMO

Along with the traditional news publishing policies, news agencies now share news over the internet since people nowadays prefer reading news online. Moreover, news media maintain YouTube channels to publish visual stories. Readers comment to share their opinions below the corresponding news item. These news and comments have been a great source of information and research. However, there is a lack of research in the Bengali news context. This article presents a dataset containing 7,62,678 public comments and replies from 16,016 video news published from 2017 to 2023 from a renowned Bengali news YouTube channel. The data withholds 15 properties of news that include video URL, title, likes, views, date of publishing, hashtags, description, comment author, comment time, comment, likes in the comment, reply author, reply time, reply, and likes in the responses. To ensure privacy, the commentator's name is encoded in the dataset. The dataset is open to use for researchers at https://data.mendeley.com/datasets/3c3j3bkxvn/4. A translated file for the raw dataset is also included. This data may help scholars to identify patterns in public opinion and analyze how public opinion changes over time.

4.
PeerJ ; 12: e16762, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38274328

RESUMO

Background: Global prevalence of neurodegenerative diseases such as Alzheimer's disease and Parkinson's disease is increasing gradually, whereas approvals of successful therapeutics for central nervous system disorders are inadequate. Accumulating evidence suggests pivotal roles of the receptor-interacting serine/threonine-protein kinase 1 (RIPK1) in modulating neuroinflammation and necroptosis. Discoveries of potent small molecule inhibitors for RIPK1 with favorable pharmacokinetic properties could thus address the unmet medical needs in treating neurodegeneration. Methods: In a structure-based virtual screening, we performed site-specific molecular docking of 4,858 flavonoids against the kinase domain of RIPK1 using AutoDock Vina. We predicted physicochemical descriptors of the top ligands using the SwissADME webserver. Binding interactions of the best ligands and the reference ligand L8D were validated using replicated 500-ns Gromacs molecular dynamics simulations and free energy calculations. Results: From Vina docking, we shortlisted the top 20 flavonoids with the highest binding affinities, ranging from -11.7 to -10.6 kcal/mol. Pharmacokinetic profiling narrowed down the list to three orally bioavailable and blood-brain-barrier penetrant flavonoids: Nitiducarpin, Pinocembrin 7-O-benzoate, and Paratocarpin J. Next, trajectories of molecular dynamics simulations of the top protein-ligand complexes were analyzed for binding interactions. The root-mean-square deviation (RMSD) was 1.191 Å (±0.498 Å), 1.725 Å (±0.828 Å), 1.923 Å (±0.942 Å), 0.972 Å (±0.155 Å) for Nitiducarpin, Pinocembrin 7-O-benzoate, Paratocarpin J, and L8D, respectively. The radius of gyration (Rg) was 2.034 nm (±0.015 nm), 2.0.39 nm (± 0.025 nm), 2.053 nm (±0.021 nm), 2.037 nm (±0.016 nm) for Nitiducarpin, Pinocembrin 7-O-benzoate, Paratocarpin J, and L8D, respectively. The solvent accessible surface area (SASA) was 159.477 nm2 (±3.021 nm2), 159.661 nm2 (± 3.707 nm2), 160.755 nm2 (±4.252 nm2), 156.630 nm2 (±3.521 nm2), for Nitiducarpin, Pinocembrin 7-O-benzoate, Paratocarpin J, and L8D complexes, respectively. Therefore, lower RMSD, Rg, and SASA values demonstrated that Nitiducarpin formed the most stable complex with the target protein among the best three ligands. Finally, 2D protein-ligand interaction analysis revealed persistent hydrophobic interactions of Nitiducarpin with the critical residues of RIPK1, including the catalytic triads and the activation loop residues, implicated in the kinase activity and ligand binding. Conclusion: Our target-based virtual screening identified three flavonoids as strong RIPK1 inhibitors, with Nitiducarpin exhibiting the most potent inhibitory potential. Future in vitro and in vivo studies with these ligands could offer new hope for developing effective therapeutics and improving the quality of life for individuals affected by neurodegeneration.


Assuntos
Flavonoides , Qualidade de Vida , Humanos , Simulação de Acoplamento Molecular , Flavonoides/farmacologia , Ligantes , Benzoatos , Proteína Serina-Treonina Quinases de Interação com Receptores
5.
Sci Data ; 10(1): 521, 2023 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-37543626

RESUMO

Digital radiography is one of the most common and cost-effective standards for the diagnosis of bone fractures. For such diagnoses expert intervention is required which is time-consuming and demands rigorous training. With the recent growth of computer vision algorithms, there is a surge of interest in computer-aided diagnosis. The development of algorithms demands large datasets with proper annotations. Existing X-Ray datasets are either small or lack proper annotation, which hinders the development of machine-learning algorithms and evaluation of the relative performance of algorithms for classification, localization, and segmentation. We present FracAtlas, a new dataset of X-Ray scans curated from the images collected from 3 major hospitals in Bangladesh. Our dataset includes 4,083 images that have been manually annotated for bone fracture classification, localization, and segmentation with the help of 2 expert radiologists and an orthopedist using the open-source labeling platform, makesense.ai. There are 717 images with 922 instances of fractures. Each of the fracture instances has its own mask and bounding box, whereas the scans also have global labels for classification tasks. We believe the dataset will be a valuable resource for researchers interested in developing and evaluating machine learning algorithms for bone fracture diagnosis.


Assuntos
Algoritmos , Fraturas Ósseas , Humanos , Diagnóstico por Computador/métodos , Fraturas Ósseas/diagnóstico por imagem , Aprendizado de Máquina , Radiografia
6.
Heliyon ; 9(4): e15163, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37095970

RESUMO

Early purchase prediction plays a vital role for an e-commerce website. It enables e-shoppers to enlist consumers for product suggestions, offer discount and for many other interventions. Several work has already been done using session log for analyzing customer behavior whether he performs a purchase on the product or not. In most cases, it is difficult to find out and make a list of customers and offer them discount when their session ends. In this paper, we propose a customer's purchase intention prediction model where e-shoppers can detect customer's purpose earlier. First, we apply feature selection technique to select best features. Then the extracted features are fed to train supervised learning models. Several classifiers like support vector machine (SVM), random forest (RF), multilayer perceptron (MLP), decision tree (DT), and XGBoost classifiers have been applied along with oversampling method for balancing the dataset. The experiments were performed on a standard benchmark dataset. Experimental results show that XGBoost classifier with feature selection techniques and oversampling method has the significantly higher area under ROC curve (auROC) score and are under precision-recall curve (auPR) score which are 0.937 and 0.754 respectively. On the other hand accuracy achieved by XGBoost and Decision tree are significantly improved and they are 90.65% and 90.54% respectively. Overall performance of the gradient boosting method is significantly improved compared to other classifiers and state-of-the-art methods. In addition to this, a method for explainable analysis on the problem was outlined.

7.
Gene ; 853: 147045, 2023 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-36503892

RESUMO

DNA-binding proteins play a vital role in biological activity including DNA replication, DNA packing, and DNA reparation. DNA-binding proteins can be classified into single-stranded DNA-binding proteins (SSBs) or double-stranded DNA-binding proteins (DSBs). Determining whether a protein is DSB or SSB helps determine the protein's function. Therefore, many studies have been conducted to accurately identify DSB and SSB in recent years. Despite all the efforts have been made so far, the DSB and SSB prediction performance remains limited. In this study, we propose a new method called CNN-Pred to accurately predict DSB and SSB. To build CNN-Pred, we first extract evolutionary-based features in the form of mono-gram and bi-gram profiles using position specific scoring matrix (PSSM). We then, use 1D-convolutional neural network (CNN) as the classifier to our extracted features. Our results demonstrate that CNN-Pred can enhance the DSB and SSB prediction accuracies by more than 4%, on the independent test compared to previous studies found in the literature. CNN-pred as a standalone tool and all its source codes are publicly available at: https://github.com/MLBC-lab/CNN-Pred.


Assuntos
DNA , Redes Neurais de Computação , DNA/metabolismo , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Replicação do DNA , Software
8.
Gene ; 851: 146993, 2023 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-36272653

RESUMO

Post-translational modification (PTM) is a biological process involving a protein's enzymatic changes after its translation by the ribosome. Phosphorylation is one of the most critical PTMs that occurs when a phosphate group interacts with an amino acid residue along protein sequence. It contributes to cell communication, DNA repair, and gene regulation. Predicting microbial phosphorylation sites can provide better understanding of host-pathogen interaction and the development of anti-microbial agents. Experimental methods such as mass spectrometry are time-consuming, laborious, and expensive. This paper proposes a new approach, called RotPhoPred, for predicting phospho-serine (pS), phospho-threonine (pT), and phospho-tyrosine (pY) sites in the microbial organism by integrating evolutionary bigram profile with structural information and using Rotation Forest as the classification technique. To the best of our knowledge, our extracted features and employed classifier have never been utilized for this task. Comparative results demonstrate that the RotPhoPred surpasses its peers in terms of different metrics such as sensitivity (90.0%, 75.4% and 78.2%), specificity (92.1%, 97.2% and 94.7%), accuracy (91.0%, 86.3%, 86.4%), and MCC (0.82, 0.74 and 0.74) for pS, pT, and pY sites predictions, respectively. RotPhoPred as a standalone predictor and all its source codes are publicly available at: https://github.com/faisalahm3d/RotPredPho.


Assuntos
Biologia Computacional , Processamento de Proteína Pós-Traducional , Biologia Computacional/métodos , Fosforilação , Sequência de Aminoácidos , Software , Treonina/metabolismo , Serina/metabolismo
9.
Sci Rep ; 12(1): 11451, 2022 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-35794165

RESUMO

AMPylation is an emerging post-translational modification that occurs on the hydroxyl group of threonine, serine, or tyrosine via a phosphodiester bond. AMPylators catalyze this process as covalent attachment of adenosine monophosphate to the amino acid side chain of a peptide. Recent studies have shown that this post-translational modification is directly responsible for the regulation of neurodevelopment and neurodegeneration and is also involved in many physiological processes. Despite the importance of this post-translational modification, there is no peptide sequence dataset available for conducting computation analysis. Therefore, so far, no computational approach has been proposed for predicting AMPylation. In this study, we introduce a new dataset of this distinct post-translational modification and develop a new machine learning tool using a deep convolutional neural network called DeepAmp to predict AMPylation sites in proteins. DeepAmp achieves 77.7%, 79.1%, 76.8%, 0.55, and 0.85 in terms of Accuracy, Sensitivity, Specificity, Matthews Correlation Coefficient, and Area Under Curve for AMPylation site prediction task, respectively. As the first machine learning model, DeepAmp demonstrate promising results which highlight its potential to solve this problem. Our presented dataset and DeepAmp as a standalone predictor are publicly available at https://github.com/MehediAzim/DeepAmp .


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Sequência de Aminoácidos , Aminoácidos , Processamento de Proteína Pós-Traducional
10.
Methods Mol Biol ; 2499: 125-134, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35696077

RESUMO

Posttranslational modification (PTM) is an important biological mechanism to promote functional diversity among the proteins. So far, a wide range of PTMs has been identified. Among them, glycation is considered as one of the most important PTMs. Glycation is associated with different neurological disorders including Parkinson and Alzheimer. It is also shown to be responsible for different diseases, including vascular complications of diabetes mellitus. Despite all the efforts have been made so far, the prediction performance of glycation sites using computational methods remains limited. Here we present a newly developed machine learning tool called iProtGly-SS that utilizes sequential and structural information as well as Support Vector Machine (SVM) classifier to enhance lysine glycation site prediction accuracy. The performance of iProtGly-SS was investigated using the three most popular benchmarks used for this task. Our results demonstrate that iProtGly-SS is able to achieve 81.61%, 93.62%, and 92.95% prediction accuracies on these benchmarks, which are significantly better than those results reported in the previous studies. iProtGly-SS is implemented as a web-based tool which is publicly available at http://brl.uiu.ac.bd/iprotgly-ss/ .


Assuntos
Biologia Computacional , Proteínas , Biologia Computacional/métodos , Glicosilação , Lisina/metabolismo , Processamento de Proteína Pós-Traducional , Proteínas/química , Máquina de Vetores de Suporte
11.
Bioinformatics ; 38(15): 3717-3724, 2022 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-35731219

RESUMO

MOTIVATION: Advances in sequencing technologies have led to the sequencing of genomes of a multitude of organisms. However, draft genomes of many of these organisms contain a large number of gaps due to the repeats in genomes, low sequencing coverage and limitations in sequencing technologies. Although there exists several tools for filling gaps, many of these do not utilize all information relevant to gap filling. RESULTS: Here, we present a probabilistic method for filling gaps in draft genome assemblies using second-generation reads based on a generative model for sequencing that takes into account information on insert sizes and sequencing errors. Our method is based on the expectation-maximization algorithm unlike the graph-based methods adopted in the literature. Experiments on real biological datasets show that this novel approach can fill up large portions of gaps with small number of errors and misassemblies compared to other state-of-the-art gap-filling tools. AVAILABILITY AND IMPLEMENTATION: The method is implemented using C++ in a software named 'Filling Gaps by Iterative Read Distribution (Figbird)', which is available at https://github.com/SumitTarafder/Figbird. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Genoma
12.
Behav Sci (Basel) ; 12(4)2022 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-35447659

RESUMO

Social media have become an indispensable part of peoples' daily lives. Research suggests that interactions on social media partly exhibit individuals' personality, sentiment, and behavior. In this study, we examine the association between students' mental health and psychological attributes derived from social media interactions and academic performance. We build a classification model where students' psychological attributes and mental health issues will be predicted from their social media interactions. Then, students' academic performance will be identified from their predicted psychological attributes and mental health issues in the previous level. Firstly, we select samples by using judgmental sampling technique and collect the textual content from students' Facebook news feeds. Then, we derive feature vectors using MPNet (Masked and Permuted Pre-training for Language Understanding), which is one of the latest pre-trained sentence transformer models. Secondly, we find two different levels of correlations: (i) users' social media usage and their psychological attributes and mental health status and (ii) users' psychological attributes and mental health status and their academic performance. Thirdly, we build a two-level hybrid model to predict academic performance (i.e., Grade Point Average (GPA)) from students' Facebook posts: (1) from Facebook posts to mental health and psychological attributes using a regression model (SM-MP model) and (2) from psychological and mental attributes to the academic performance using a classifier model (MP-AP model). Later, we conduct an evaluation study by using real-life samples to validate the performance of the model and compare the performance with Baseline Models (i.e., Linguistic Inquiry and Word Count (LIWC) and Empath). Our model shows a strong performance with a microaverage f-score of 0.94 and an AUC-ROC score of 0.95. Finally, we build an ensemble model by combining both the psychological attributes and the mental health models and find that our combined model outperforms the independent models.

13.
Data Brief ; 42: 108091, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35392615

RESUMO

The speech emotion recognition system determines a speaker's emotional state by analyzing his/her speech audio signal. It is an essential at the same time a challenging task in human-computer interaction systems and is one of the most demanding areas of research using artificial intelligence and deep machine learning architectures. Despite being the world's seventh most widely spoken language, Bangla is still classified as one of the low-resource languages for speech emotion recognition tasks because of inadequate availability of data. There is an apparent lack of speech emotion recognition dataset to perform this type of research in Bangla language. This article presents a Bangla language-based emotional speech-audio recognition dataset to address this problem. BanglaSER is a Bangla language-based speech emotion recognition dataset. It consists of speech-audio data of 34 participating speakers from diverse age groups between 19 and 47 years, with a balanced 17 male and 17 female nonprofessional participating actors. This dataset contains 1467 Bangla speech-audio recordings of five rudimentary human emotional states, namely angry, happy, neutral, sad, and surprise. Three trials are conducted for each emotional state. Hence, the total number of recordings involves 3 statements × 3 repetitions × 4 emotional states (angry, happy, sad, and surprise) × 34 participating speakers = 1224 recordings + 3 statements × 3 repetitions × 1 emotional state (neutral) × 27 participating speakers = 243 recordings, resulting in a total number of recordings of 1467. BanglaSER dataset is created by recording speech-audios through smartphones, and laptops, having a balanced number of recordings in each category with evenly distributed participating male and female actors, and would serve as an essential training dataset for the Bangla speech emotion recognition model in terms of generalization. BanglaSER is compatible with various deep learning architectures such as Convolutional neural networks, Long short-term memory, Gated recurrent unit, Transformer, etc. The dataset is available at https://data.mendeley.com/datasets/t9h6p943xy/5 and can be used for research purposes.

14.
Genomics ; 114(2): 110264, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-34998929

RESUMO

Cancer is one of the major causes of human death per year. In recent years, cancer identification and classification using machine learning have gained momentum due to the availability of high throughput sequencing data. Using RNA-seq, cancer research is blooming day by day and new insights of cancer and related treatments are coming into light. In this paper, we propose PanClassif, a method that requires a very few and effective genes to detect cancer from RNA-seq data and is able to provide performance gain in several wide range machine learning classifiers. We have taken 22 types of cancer samples from The Cancer Genome Atlas (TCGA) having 8287 cancer samples and 680 normal samples. Firstly, PanClassif uses k-Nearest Neighbour (k-NN) smoothing to smooth the samples to handle noise in the data. Then effective genes are selected by Anova based test. For balancing the train data, PanClassif applies an oversampling method, SMOTE. We have performed comprehensive experiments on the datasets using several classification algorithms. Experimental results shows that PanClassif outperform existing state-of-the-art methods available and shows consistent performance for two single cell RNA-seq datasets taken from Gene Expression Omnibus (GEO). PanClassif improves performances of a wide variety of classifiers for both binary cancer prediction and multi-class cancer classification. PanClassif is available as a python package (https://pypi.org/project/panclassif/). All the source code and materials of PanClassif are available at https://github.com/Zwei-inc/panclassif.


Assuntos
Aprendizado de Máquina , Neoplasias , Algoritmos , Expressão Gênica , Perfilação da Expressão Gênica , Humanos , Neoplasias/diagnóstico , Neoplasias/genética , RNA-Seq , Análise de Sequência de RNA/métodos , Software
15.
Sci Rep ; 11(1): 23676, 2021 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-34880291

RESUMO

Although advancing the therapeutic alternatives for treating deadly cancers has gained much attention globally, still the primary methods such as chemotherapy have significant downsides and low specificity. Most recently, Anticancer peptides (ACPs) have emerged as a potential alternative to therapeutic alternatives with much fewer negative side-effects. However, the identification of ACPs through wet-lab experiments is expensive and time-consuming. Hence, computational methods have emerged as viable alternatives. During the past few years, several computational ACP identification techniques using hand-engineered features have been proposed to solve this problem. In this study, we propose a new multi headed deep convolutional neural network model called ACP-MHCNN, for extracting and combining discriminative features from different information sources in an interactive way. Our model extracts sequence, physicochemical, and evolutionary based features for ACP identification using different numerical peptide representations while restraining parameter overhead. It is evident through rigorous experiments using cross-validation and independent-dataset that ACP-MHCNN outperforms other models for anticancer peptide identification by a substantial margin on our employed benchmarks. ACP-MHCNN outperforms state-of-the-art model by 6.3%, 8.6%, 3.7%, 4.0%, and 0.20 in terms of accuracy, sensitivity, specificity, precision, and MCC respectively. ACP-MHCNN and its relevant codes and datasets are publicly available at: https://github.com/mrzResearchArena/Anticancer-Peptides-CNN . ACP-MHCNN is also publicly available as an online predictor at: https://anticancer.pythonanywhere.com/ .


Assuntos
Antineoplásicos/química , Antineoplásicos/farmacologia , Biologia Computacional/métodos , Aprendizado Profundo , Descoberta de Drogas/métodos , Redes Neurais de Computação , Peptídeos/química , Peptídeos/farmacologia , Algoritmos , Sequência de Aminoácidos , Fenômenos Químicos , Humanos , Curva ROC , Reprodutibilidade dos Testes
16.
Comput Biol Chem ; 92: 107502, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33962169

RESUMO

DNA Replication plays the most crucial part in biological inheritance, ensuring an even flow of genetic information from parent to offspring. The beginning site of DNA Replication which is called the Origin of Replication (ORI), plays a significant role in understanding the molecular mechanisms and genomic analysis of DNA. Hence, it is paramount to accurately identify the origin of replication to gain a more accurate understanding of the biochemical and genomic properties of DNA. In this paper, We have proposed a new approach named OriC-ENS that uses sequence-based feature extraction techniques, K-mer, K-gapped Mono-Di, and Di Mono, and an ensemble classification technique that uses majority voting for the identification of Origin of Replication. We have used three SVM classifiers, one for the K-mer features and two more for K-Gapped Mono-Di and K-Gapped Di-mono features. Finally, we used majority voting to combine the prediction by each predictor. Experimental results on the S. Cerevisiae dataset have shown that our method achieves an accuracy of 91.62 % which outperforms other state-of-the-art methods by a significant margin. We have also tested our method using other evaluation metrics such as Matthews Correlation Coefficient (MCC), Area Under Curve(AUC), Sensitivity, and Specificity, where it has achieved a score of 0.83, 0.98, 0.90, and 0.92 respectively. We have further evaluated our model on an independent test set collected from OriDB, consisting of the sequences of Schizosaccharomyces pombe where we have seen that our model can predict the origin of replication efficiently and with great precision. We have made our python-based source code available at https://github.com/MehediAzim/OriC-ENS.


Assuntos
Genes Fúngicos/genética , Saccharomyces cerevisiae/genética , Análise de Sequência de DNA , Replicação do DNA/genética , Bases de Dados Genéticas
17.
Comput Biol Chem ; 92: 107489, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33932779

RESUMO

The information of a cell is primarily contained in deoxyribonucleic acid (DNA). There is a flow of DNA information to protein sequences via ribonucleic acids (RNA) through transcription and translation. These entities are vital for the genetic process. Recent epigenetics developments also show the importance of the genetic material and knowledge of their attributes and functions. However, the growth in these entities' available features or functionalities is still slow due to the time-consuming and expensive in vitro experimental methods. In this paper, we have proposed an ensemble classification algorithm called SubFeat to predict biological entities' functionalities from different types of datasets. Our model uses a feature subspace-based novel ensemble method. It divides the feature space into sub-spaces, which are then passed to learn individual classifier models. The ensemble is built on these base classifiers that use a weighted majority voting mechanism. SubFeat tested on four datasets comprising two DNA, one RNA, and one protein dataset, and it outperformed all the existing single classifiers and the ensemble classifiers. SubFeat is made available as a Python-based tool. We have made the package SubFeat available online along with a user manual. It is freely accessible from here: https://github.com/fazlulhaquejony/SubFeat.


Assuntos
Algoritmos , DNA/análise , Proteínas/análise , RNA/análise , Humanos , Análise de Sequência de DNA , Análise de Sequência de Proteína , Análise de Sequência de RNA
18.
Sci Rep ; 11(1): 10357, 2021 05 14.
Artigo em Inglês | MEDLINE | ID: mdl-33990665

RESUMO

DNA N6-methylation (6mA) in Adenine nucleotide is a post replication modification responsible for many biological functions. Automated and accurate computational methods can help to identify 6mA sites in long genomes saving significant time and money. Our study develops a convolutional neural network (CNN) based tool i6mA-CNN capable of identifying 6mA sites in the rice genome. Our model coordinates among multiple types of features such as PseAAC (Pseudo Amino Acid Composition) inspired customized feature vector, multiple one hot representations and dinucleotide physicochemical properties. It achieves auROC (area under Receiver Operating Characteristic curve) score of 0.98 with an overall accuracy of 93.97% using fivefold cross validation on benchmark dataset. Finally, we evaluate our model on three other plant genome 6mA site identification test datasets. Results suggest that our proposed tool is able to generalize its ability of 6mA site identification on plant genomes irrespective of plant species. An algorithm for potential motif extraction and a feature importance analysis procedure are two by products of this research. Web tool for this research can be found at: https://cutt.ly/dgp3QTR .


Assuntos
Epigenômica/métodos , Genoma de Planta , Redes Neurais de Computação , Oryza/genética , Adenina/análogos & derivados , Adenina/análise , Adenina/metabolismo , Motivos de Aminoácidos/genética , Metilação de DNA , Epigênese Genética
19.
Comput Biol Chem ; 92: 107494, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33930742

RESUMO

Proteins are one of the most important molecules that govern the cellular processes in most of the living organisms. Various functions of the proteins are of paramount importance to understand the basics of life. Several supervised learning approaches are applied in this field to predict the functionality of proteins. In this paper, we propose a convolutional neural network based approach ProtConv to predict the functionality of proteins by converting the amino-acid sequences to a two dimensional image. We have used a protein embedding technique using transfer learning to generate the feature vector. Feature vector is then converted into a square sized single channel image to be fed into a convolutional network. The neural network architecture used here is a combination of convolutional filters and average pooling layers followed by dense fully connected layers to predict a binary function. We have performed experiments on standard benchmark datasets taken from two very important protein function prediction task: proinflammatory cytokines and anticancer peptides. Our experiments show that the proposed method, ProtConv achieves state-of-the-art performances on both of the datasets. All necessary details about implementation with source code and datasets are made available at: https://github.com/swakkhar/ProtConv.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Proteínas/química , Sequência de Aminoácidos , Bases de Dados de Proteínas , Proteínas/metabolismo
20.
Comput Struct Biotechnol J ; 18: 3528-3538, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33304452

RESUMO

RNA modification is an essential step towards generation of new RNA structures. Such modification is potentially able to modify RNA function or its stability. Among different modifications, 5-Hydroxymethylcytosine (5hmC) modification of RNA exhibit significant potential for a series of biological processes. Understanding the distribution of 5hmC in RNA is essential to determine its biological functionality. Although conventional sequencing techniques allow broad identification of 5hmC, they are both time-consuming and resource-intensive. In this study, we propose a new computational tool called iRNA5hmC-PS to tackle this problem. To build iRNA5hmC-PS we extract a set of novel sequence-based features called Position-Specific Gapped k-mer (PSG k-mer) to obtain maximum sequential information. Our feature analysis shows that our proposed PSG k-mer features contain vital information for the identification of 5hmC sites. We also use a group-wise feature importance calculation strategy to select a small subset of features containing maximum discriminative information. Our experimental results demonstrate that iRNA5hmC-PS is able to enhance the prediction performance, dramatically. iRNA5hmC-PS achieves 78.3% prediction performance, which is 12.8% better than those reported in the previous studies. iRNA5hmC-PS is publicly available as an online tool at http://103.109.52.8:81/iRNA5hmC-PS. Its benchmark dataset, source codes, and documentation are available at https://github.com/zahid6454/iRNA5hmC-PS.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...