Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 62
Filtrar
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38742520

RESUMO

The dynamic evolution of the severe acute respiratory syndrome coronavirus 2 virus is primarily driven by mutations in its genetic sequence, culminating in the emergence of variants with increased capability to evade host immune responses. Accurate prediction of such mutations is fundamental in mitigating pandemic spread and developing effective control measures. This study introduces a robust and interpretable deep-learning approach called PRIEST. This innovative model leverages time-series viral sequences to foresee potential viral mutations. Our comprehensive experimental evaluations underscore PRIEST's proficiency in accurately predicting immune-evading mutations. Our work represents a substantial step in utilizing deep-learning methodologies for anticipatory viral mutation analysis and pandemic response.


Assuntos
COVID-19 , Evasão da Resposta Imune , Mutação , SARS-CoV-2 , SARS-CoV-2/genética , SARS-CoV-2/imunologia , Humanos , COVID-19/virologia , COVID-19/imunologia , COVID-19/genética , Evasão da Resposta Imune/genética , Aprendizado Profundo , Evolução Molecular , Pandemias
2.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36460620

RESUMO

Lysine succinylation is a kind of post-translational modification (PTM) that plays a crucial role in regulating the cellular processes. Aberrant succinylation may cause inflammation, cancers, metabolism diseases and nervous system diseases. The experimental methods to detect succinylation sites are time-consuming and costly. This thus calls for computational models with high efficacy, and attention has been given in the literature to develop such models, albeit with only moderate success in the context of different evaluation metrics. One crucial aspect in this context is the biochemical and physicochemical properties of amino acids, which appear to be useful as features for such computational predictors. However, some of the existing computational models did not use the biochemical and physicochemical properties of amino acids. In contrast, some others used them without considering the inter-dependency among the properties. The combinations of biochemical and physicochemical properties derived through our optimization process achieve better results than the results achieved by combining all the properties. We propose three deep learning architectures: CNN+Bi-LSTM (CBL), Bi-LSTM+CNN (BLC) and their combination (CBL_BLC). We find that CBL_BLC outperforms the other two. Ensembling of different models successfully improves the results. Notably, tuning the threshold of the ensemble classifiers further improves the results. Upon comparing our work with other existing works on two datasets, we successfully achieve better sensitivity and specificity by varying the threshold value.


Assuntos
Algoritmos , Lisina , Lisina/metabolismo , Aminoácidos/química , Sensibilidade e Especificidade , Processamento de Proteína Pós-Traducional
3.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33003198

RESUMO

Despite impressive improvement in the next-generation sequencing technology, reliable detection of indels is still a difficult endeavour. Recognition of true indels is of prime importance in many applications, such as personalized health care, disease genomics and population genetics. Recently, advanced machine learning techniques have been successfully applied to classification problems with large-scale data. In this paper, we present SICaRiO, a gradient boosting classifier for the reliable detection of true indels, trained with the gold-standard dataset from 'Genome in a Bottle' (GIAB) consortium. Our filtering scheme significantly improves the performance of each variant calling pipeline used in GIAB and beyond. SICaRiO uses genomic features that can be computed from publicly available resources, i.e. it does not require sequencing pipeline-specific information (e.g. read depth). This study also sheds lights on prior genomic contexts responsible for the erroneous calling of indels made by sequencing pipelines. We have compared prediction difficulty for three categories of indels over different sequencing pipelines. We have also ranked genomic features according to their predictivity in determining false positives.


Assuntos
Bases de Dados de Ácidos Nucleicos , Sequenciamento de Nucleotídeos em Larga Escala , Mutação INDEL , Aprendizado de Máquina , Software
4.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34058749

RESUMO

BACKGROUND: Genomic Islands (GIs) are clusters of genes that are mobilized through horizontal gene transfer. GIs play a pivotal role in bacterial evolution as a mechanism of diversification and adaptation to different niches. Therefore, identification and characterization of GIs in bacterial genomes is important for understanding bacterial evolution. However, quantifying GIs is inherently difficult, and the existing methods suffer from low prediction accuracy and precision-recall trade-off. Moreover, several of them are supervised in nature, and thus, their applications to newly sequenced genomes are riddled with their dependency on the functional annotation of existing genomes. RESULTS: We present SSG-LUGIA, a completely automated and unsupervised approach for identifying GIs and horizontally transferred genes. SSG-LUGIA is a novel method based on unsupervised anomaly detection technique, accompanied by further refinement using cues from signal processing literature. SSG-LUGIA leverages the atypical compositional biases of the alien genes to localize GIs in prokaryotic genomes. SSG-LUGIA was assessed on a large benchmark dataset `IslandPick' and on a set of 15 well-studied genomes in the literature and followed by a thorough analysis on the well-understood Salmonella typhi CT18 genome. Furthermore, the efficacy of SSG-LUGIA in identifying horizontally transferred genes was evaluated on two additional bacterial genomes, namely, those of Corynebacterium diphtheria NCTC13129 and Pseudomonas aeruginosa LESB58. SSG-LUGIA was examined on draft genomes and was demonstrated to be efficient as an ensemble method. CONCLUSIONS: Our results indicate that SSG-LUGIA achieved superior performance in comparison to frequently used existing methods. Importantly, it yielded a better trade-off between precision and recall than the existing methods. Its nondependency on the functional annotation of genomes makes it suitable for analyzing newly sequenced, yet uncharacterized genomes. Thus, our study is a significant advance in identification of GIs and horizontally transferred genes. SSG-LUGIA is available as an open source software at https://nibtehaz.github.io/SSG-LUGIA/.


Assuntos
Algoritmos , Bactérias/genética , Biologia Computacional , Transferência Genética Horizontal , Genoma Bacteriano , Ilhas Genômicas
5.
PLoS Comput Biol ; 18(3): e1009911, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35275927

RESUMO

All proteomes contain both proteins and polypeptide segments that don't form a defined three-dimensional structure yet are biologically active-called intrinsically disordered proteins and regions (IDPs and IDRs). Most of these IDPs/IDRs lack useful functional annotation limiting our understanding of their importance for organism fitness. Here we characterized IDRs using protein sequence annotations of functional sites and regions available in the UniProt knowledgebase ("UniProt features": active site, ligand-binding pocket, regions mediating protein-protein interactions, etc.). By measuring the statistical enrichment of twenty-five UniProt features in 981 IDRs of 561 human proteins, we identified eight features that are commonly located in IDRs. We then collected the genetic variant data from the general population and patient-based databases and evaluated the prevalence of population and pathogenic variations in IDPs/IDRs. We observed that some IDRs tolerate 2 to 12-times more single amino acid-substituting missense mutations than synonymous changes in the general population. However, we also found that 37% of all germline pathogenic mutations are located in disordered regions of 96 proteins. Based on the observed-to-expected frequency of mutations, we categorized 34 IDRs in 20 proteins (DDX3X, KIT, RB1, etc.) as intolerant to mutation. Finally, using statistical analysis and a machine learning approach, we demonstrate that mutation-intolerant IDRs carry a distinct signature of functional features. Our study presents a novel approach to assign functional importance to IDRs by leveraging the wealth of available genetic data, which will aid in a deeper understating of the role of IDRs in biological processes and disease mechanisms.


Assuntos
Proteínas Intrinsicamente Desordenadas , Sequência de Aminoácidos , Variação Genética/genética , Humanos , Proteínas Intrinsicamente Desordenadas/química , Conformação Proteica , Proteoma/genética
6.
Proc Natl Acad Sci U S A ; 117(45): 28201-28211, 2020 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-33106425

RESUMO

Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations' positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants' pathogenicity in terms of the perturbed molecular mechanisms.


Assuntos
Mutação de Sentido Incorreto/genética , Proteínas/química , Proteínas/genética , Sequência de Aminoácidos , Proteína BRCA1/química , Proteína BRCA1/genética , Biologia Computacional/métodos , Humanos , Aprendizado de Máquina , Modelos Moleculares , Mutação de Sentido Incorreto/fisiologia , PTEN Fosfo-Hidrolase/química , PTEN Fosfo-Hidrolase/genética , Conformação Proteica , Proteínas/fisiologia
7.
Bioinformatics ; 37(10): 1468-1470, 2021 06 16.
Artigo em Inglês | MEDLINE | ID: mdl-33016997

RESUMO

MOTIVATION: Researchers and practitioners use a number of popular sequence comparison tools that use many alignment-based techniques. Due to high time and space complexity and length-related restrictions, researchers often seek alignment-free tools. Recently, some interesting ideas, namely, Minimal Absent Words (MAW) and Relative Absent Words (RAW), have received much interest among the scientific community as distance measures that can give us alignment-free alternatives. This drives us to structure a framework for analysing biological sequences in an alignment-free manner. RESULTS: In this application note, we present Alignment-free Dissimilarity Analysis & Comparison Tool (ADACT), a simple web-based tool that computes the analogy among sequences using a varied number of indexes through the distance matrix, species relation list and phylogenetic tree. This tool basically combines absent word (MAW or RAW) computation, dissimilarity measures, species relationship and thus brings all required software in one platform for the ease of researchers and practitioners alike in the field of bioinformatics. We have also developed a restful API. AVAILABILITY AND IMPLEMENTATION: ADACT has been hosted at http://research.buet.ac.bd/ADACT/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Nucleotídeos , Filogenia , Alinhamento de Sequência , Análise de Sequência de DNA , Software
8.
Nucleic Acids Res ; 48(W1): W132-W139, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32402084

RESUMO

Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the molecular-level effect of missense variants, however, remains challenging and requires a particular investigation of amino acid substitutions in the context of protein structure and function. Answers to questions like 'Is a variant perturbing a site involved in key macromolecular interactions and/or cellular signaling?', or 'Is a variant changing an amino acid located at the protein core or part of a cluster of known pathogenic mutations in 3D?' are crucial. Motivated by these needs, we developed MISCAST (missense variant to protein structure analysis web suite; http://miscast.broadinstitute.org/). MISCAST is an interactive and user-friendly web server to visualize and analyze missense variants in protein sequence and structure space. Additionally, a comprehensive set of protein structural and functional features have been aggregated in MISCAST from multiple databases, and displayed on structures alongside the variants to provide users with the biological context of the variant location in an integrated platform. We further made the annotated data and protein structures readily downloadable from MISCAST to foster advanced offline analysis of missense variants by a wide biological community.


Assuntos
Mutação de Sentido Incorreto , Conformação Proteica , Software , Humanos , Internet , Proteínas/química , Proteínas/genética
9.
Sensors (Basel) ; 22(3)2022 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-35161664

RESUMO

Cardiovascular diseases are the most common causes of death around the world. To detect and treat heart-related diseases, continuous blood pressure (BP) monitoring along with many other parameters are required. Several invasive and non-invasive methods have been developed for this purpose. Most existing methods used in hospitals for continuous monitoring of BP are invasive. On the contrary, cuff-based BP monitoring methods, which can predict systolic blood pressure (SBP) and diastolic blood pressure (DBP), cannot be used for continuous monitoring. Several studies attempted to predict BP from non-invasively collectible signals such as photoplethysmograms (PPG) and electrocardiograms (ECG), which can be used for continuous monitoring. In this study, we explored the applicability of autoencoders in predicting BP from PPG and ECG signals. The investigation was carried out on 12,000 instances of 942 patients of the MIMIC-II dataset, and it was found that a very shallow, one-dimensional autoencoder can extract the relevant features to predict the SBP and DBP with state-of-the-art performance on a very large dataset. An independent test set from a portion of the MIMIC-II dataset provided a mean absolute error (MAE) of 2.333 and 0.713 for SBP and DBP, respectively. On an external dataset of 40 subjects, the model trained on the MIMIC-II dataset provided an MAE of 2.728 and 1.166 for SBP and DBP, respectively. For both the cases, the results met British Hypertension Society (BHS) Grade A and surpassed the studies from the current literature.


Assuntos
Hipertensão , Fotopletismografia , Pressão Sanguínea , Determinação da Pressão Arterial , Eletrocardiografia , Humanos , Hipertensão/diagnóstico
10.
BMC Med Imaging ; 21(1): 15, 2021 01 28.
Artigo em Inglês | MEDLINE | ID: mdl-33509110

RESUMO

BACKGROUND: Segmentation of nuclei in cervical cytology pap smear images is a crucial stage in automated cervical cancer screening. The task itself is challenging due to the presence of cervical cells with spurious edges, overlapping cells, neutrophils, and artifacts. METHODS: After the initial preprocessing steps of adaptive thresholding, in our approach, the image passes through a convolution filter to filter out some noise. Then, contours from the resultant image are filtered by their distinctive contour properties followed by a nucleus size recovery procedure based on contour average intensity value. RESULTS: We evaluate our method on a public (benchmark) dataset collected from ISBI and also a private real dataset. The results show that our algorithm outperforms other state-of-the-art methods in nucleus segmentation on the ISBI dataset with a precision of 0.978 and recall of 0.933. A promising precision of 0.770 and a formidable recall of 0.886 on the private real dataset indicate that our algorithm can effectively detect and segment nuclei on real cervical cytology images. Tuning various parameters, the precision could be increased to as high as 0.949 with an acceptable decrease of recall to 0.759. Our method also managed an Aggregated Jaccard Index of 0.681 outperforming other state-of-the-art methods on the real dataset. CONCLUSION: We have proposed a contour property-based approach for segmentation of nuclei. Our algorithm has several tunable parameters and is flexible enough to adapt to real practical scenarios and requirements.


Assuntos
Colo do Útero/patologia , Detecção Precoce de Câncer/métodos , Processamento de Imagem Assistida por Computador/métodos , Teste de Papanicolaou/métodos , Neoplasias do Colo do Útero/diagnóstico , Neoplasias do Colo do Útero/patologia , Algoritmos , Núcleo Celular , Feminino , Humanos
11.
BMC Bioinformatics ; 21(1): 223, 2020 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-32487025

RESUMO

BACKGROUND: The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional machine learning that can compete with deep learning models. RESULTS: In this paper, we present CRISPRpred(SEQ), a method for sgRNA on-target activity prediction that leverages only traditional machine learning techniques and hand-crafted features extracted from sgRNA sequences. We compare the results of CRISPRpred(SEQ) with that of DeepCRISPR, the current state-of-the-art, which uses a deep learning pipeline. Despite using only traditional machine learning methods, we have been able to beat DeepCRISPR for the three out of four cell lines in the benchmark dataset convincingly (2.174%, 6.905% and 8.119% improvement for the three cell lines). CONCLUSION: CRISPRpred(SEQ) has been able to convincingly beat DeepCRISPR in 3 out of 4 cell lines. We believe that by exploring further, one can design better features only using the sgRNA sequences and can come up with a better method leveraging only traditional machine learning algorithms that can fully beat the deep learning models.


Assuntos
Sistemas CRISPR-Cas/genética , Edição de Genes/métodos , Aprendizado de Máquina , RNA Guia de Cinetoplastídeos/genética , Algoritmos , Área Sob a Curva , Sequência de Bases , Bases de Dados Genéticas , Aprendizado Profundo , Células HEK293 , Humanos , Curva ROC , Reprodutibilidade dos Testes
12.
J Theor Biol ; 452: 22-34, 2018 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-29753757

RESUMO

A DNA-binding protein (DNA-BP) is a protein that can bind and interact with a DNA. Identification of DNA-BPs using experimental methods is expensive as well as time consuming. As such, fast and accurate computational methods are sought for predicting whether a protein can bind with a DNA or not. In this paper, we focus on building a new computational model to identify DNA-BPs in an efficient and accurate way. Our model extracts meaningful information directly from the protein sequences, without any dependence on functional domain or structural information. After feature extraction, we have employed Random Forest (RF) model to rank the features. Afterwards, we have used Recursive Feature Elimination (RFE) method to extract an optimal set of features and trained a prediction model using Support Vector Machine (SVM) with linear kernel. Our proposed method, named as DNA-binding Protein Prediction model using Chou's general PseAAC (DPP-PseAAC), demonstrates superior performance compared to the state-of-the-art predictors on standard benchmark dataset. DPP-PseAAC achieves accuracy values of 93.21%, 95.91% and 77.42% for 10-fold cross-validation test, jackknife test and independent test respectively. The source code of DPP-PseAAC, along with relevant dataset and detailed experimental results, can be found at https://github.com/srautonu/DNABinding. A publicly accessible web interface has also been established at: http://77.68.43.135:8080/DPP-PseAAC/.


Assuntos
Algoritmos , Biologia Computacional/métodos , Proteínas de Ligação a DNA/metabolismo , Máquina de Vetores de Suporte , Sequência de Aminoácidos , Aminoácidos/química , Aminoácidos/genética , Aminoácidos/metabolismo , DNA/química , DNA/genética , DNA/metabolismo , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/genética , Bases de Dados de Proteínas , Modelos Moleculares , Conformação de Ácido Nucleico , Domínios Proteicos , Reprodutibilidade dos Testes
13.
Malar J ; 16(1): 432, 2017 10 27.
Artigo em Inglês | MEDLINE | ID: mdl-29078771

RESUMO

BACKGROUND: Malaria, being a mosquito-borne infectious disease, is still one of the most devastating global health issues. The malaria vector Anopheles vagus is widely distributed in Asia and a dominant vector in Bandarban, Bangladesh. However, despite its wide distribution, no agent based model (ABM) of An. vagus has yet been developed. Additionally, its response to combined vector control interventions has not been examined. METHODS: A spatial ABM, denoted as ABM[Formula: see text], was designed and implemented based on the biological attributes of An. vagus by modifying an established, existing ABM of Anopheles gambiae. Environmental factors such as temperature and rainfall were incorporated into ABM[Formula: see text] using daily weather profiles. Real-life field data of Bandarban were used to generate landscapes which were used in the simulations. ABM[Formula: see text] was verified and validated using several standard techniques and against real-life field data. Using artificial landscapes, the individual and combined efficacies of existing vector control interventions are modeled, applied, and examined. RESULTS: Simulated female abundance curves generated by ABM[Formula: see text] closely follow the patterns observed in the field. Due to the use of daily temperature and rainfall data, ABM[Formula: see text] was able to generate seasonal patterns for a particular area. When two interventions were applied with parameters set to mid-ranges, ITNs/LLINs with IRS produced better results compared to the other cases. Moreover, any intervention combined with ITNs/LLINs yielded better results. Not surprisingly, three interventions applied in combination generate best results compared to any two interventions applied in combination. CONCLUSIONS: Output of ABM[Formula: see text] showed high sensitivity to real-life field data of the environmental factors and the landscape of a particular area. Hence, it is recommended to use the model for a given area in connection to its local field data. For applying combined interventions, three interventions altogether are highly recommended whenever possible. It is also suggested that ITNs/LLINs with IRS can be applied when three interventions are not available.


Assuntos
Anopheles/fisiologia , Malária/transmissão , Controle de Mosquitos/métodos , Mosquitos Vetores/fisiologia , Animais , Bangladesh/epidemiologia , Feminino , Humanos , Malária/epidemiologia , Masculino , Modelos Teóricos , Dinâmica Populacional
14.
ScientificWorldJournal ; 2014: 743431, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25045745

RESUMO

A Hamiltonian path in a graph is a path involving all the vertices of the graph. In this paper, we revisit the famous Hamiltonian path problem and present new sufficient conditions for the existence of a Hamiltonian path in a graph.

15.
J Comput Biol ; 30(3): 245-249, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36706434

RESUMO

Motivation: Phylogenetic trees are often inferred from a multiple sequence alignment (MSA) where the tree accuracy is heavily impacted by the nature of estimated alignment. Carefully equipping an MSA tool with multiple application-aware objectives positively impacts its capability to yield better trees. Results: We introduce Multiobjective Application-aware Multiple Sequence Alignment and Maximum Likelihood Ensemble (MAMMLE), a framework for inferring better phylogenetic trees from unaligned sequences by hybridizing two MSA tools [i.e., Multiple Sequence Comparison by Log-Expectation (MUSCLE) and Multiple Alignment using Fast Fourier Transform (MAFFT)] with multiobjective optimization strategy and leveraging multiple maximum likelihood hypotheses. In our experiments, MAMMLE exhibits 5.57% (4.77%) median improvement (deterioration) over MUSCLE on 50.34% (37.41%) of instances.


Assuntos
Algoritmos , Software , Filogenia , Alinhamento de Sequência
16.
iScience ; 26(2): 105945, 2023 Feb 17.
Artigo em Inglês | MEDLINE | ID: mdl-36866046

RESUMO

The bendability of genomic DNA impacts chromatin packaging and protein-DNA binding. However, we do not have a comprehensive understanding of the motifs influencing DNA bendability. Recent high-throughput technologies such as Loop-Seq offer an opportunity to address this gap but the lack of accurate and interpretable machine learning models still remains. Here we introduce DeepBend, a convolutional neural network model with convolutions designed to directly capture the motifs underlying DNA bendability and their periodic occurrences or relative arrangements that modulate bendability. DeepBend consistently performs on par with alternative models while giving an extra edge through mechanistic interpretations. Besides confirming the known motifs of DNA bendability, DeepBend also revealed several novel motifs and showed how the spatial patterns of motif occurrences influence bendability. DeepBend's genome-wide prediction of bendability further showed how bendability is linked to chromatin conformation and revealed the motifs controlling the bendability of topologically associated domains and their boundaries.

17.
Protein J ; 42(2): 135-146, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36977849

RESUMO

The inception of next generations sequencing technologies have exponentially increased the volume of biological sequence data. Protein sequences, being quoted as the 'language of life', has been analyzed for a multitude of applications and inferences. Owing to the rapid development of deep learning, in recent years there have been a number of breakthroughs in the domain of Natural Language Processing. Since these methods are capable of performing different tasks when trained with a sufficient amount of data, off-the-shelf models are used to perform various biological applications. In this study, we investigated the applicability of the popular Skip-gram model for protein sequence analysis and made an attempt to incorporate some biological insights into it. We propose a novel k-mer embedding scheme, Align-gram, which is capable of mapping the similar k-mers close to each other in a vector space. Furthermore, we experiment with other sequence-based protein representations and observe that the embeddings derived from Align-gram aids modeling and training deep learning models better. Our experiments with a simple baseline LSTM model and a much complex CNN model of DeepGoPlus shows the potential of Align-gram in performing different types of deep learning applications for protein sequence analysis.


Assuntos
Proteínas , Análise de Sequência de Proteína , Sequência de Aminoácidos
18.
Comput Biol Med ; 152: 106372, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36516574

RESUMO

Uncontrolled proliferation of B-lymphoblast cells is a common characterization of Acute Lymphoblastic Leukemia (ALL). B-lymphoblasts are found in large numbers in peripheral blood in malignant cases. Early detection of the cell in bone marrow is essential as the disease progresses rapidly if left untreated. However, automated classification of the cell is challenging, owing to its fine-grained variability with B-lymphoid precursor cells and imbalanced data points. Deep learning algorithms demonstrate potential for such fine-grained classification as well as suffer from the imbalanced class problem. In this paper, we explore different deep learning-based State-Of-The-Art (SOTA) approaches to tackle imbalanced classification problems. Our experiment includes input, GAN (Generative Adversarial Networks), and loss-based methods to mitigate the issue of imbalanced class on the challenging C-NMC and ALLIDB-2 dataset for leukemia detection. We have shown empirical evidence that loss-based methods outperform GAN-based and input-based methods in imbalanced classification scenarios.


Assuntos
Algoritmos , Leucemia-Linfoma Linfoblástico de Células Precursoras , Humanos , Leucemia-Linfoma Linfoblástico de Células Precursoras/diagnóstico , Leucemia-Linfoma Linfoblástico de Células Precursoras/patologia
19.
Front Public Health ; 11: 1125917, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36950105

RESUMO

COVID-19 has taken a huge toll on our lives over the last 3 years. Global initiatives put forward by all stakeholders are still in place to combat this pandemic and help us learn lessons for future ones. While the vaccine rollout was not able to curb the spread of the disease for all strains, the research community is still trying to develop effective therapeutics for COVID-19. Although Paxlovid and remdesivir have been approved by the FDA against COVID-19, they are not free of side effects. Therefore, the search for a therapeutic solution with high efficacy continues in the research community. To support this effort, in this latest version (v3) of COVID-19Base, we have summarized the biomedical entities linked to COVID-19 that have been highlighted in the scientific literature after the vaccine rollout. Eight different topic-specific dictionaries, i.e., gene, miRNA, lncRNA, PDB entries, disease, alternative medicines registered under clinical trials, drugs, and the side effects of drugs, were used to build this knowledgebase. We have introduced a BLSTM-based deep-learning model to predict the drug-disease associations that outperforms the existing model for the same purpose proposed in the earlier version of COVID-19Base. For the very first time, we have incorporated disease-gene, disease-miRNA, disease-lncRNA, and drug-PDB associations covering the largest number of biomedical entities related to COVID-19. We have provided examples of and insights into different biomedical entities covered in COVID-19Base to support the research community by incorporating all of these entities under a single platform to provide evidence-based support from the literature. COVID-19Base v3 can be accessed from: https://covidbase-v3.vercel.app/. The GitHub repository for the source code and data dictionaries is available to the community from: https://github.com/91Abdullah/covidbasev3.0.


Assuntos
COVID-19 , MicroRNAs , RNA Longo não Codificante , Humanos , SARS-CoV-2 , Bases de Conhecimento
20.
Comput Biol Med ; 144: 105385, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35299044

RESUMO

Lung cancer is a leading cause of death throughout the world. Because the prompt diagnosis of tumors allows oncologists to discern their nature, type, and mode of treatment, tumor detection and segmentation from CT scan images is a crucial field of study. This paper investigates lung tumor segmentation via a two-dimensional Discrete Wavelet Transform (DWT) on the LOTUS dataset (31,247 training, and 4458 testing samples) and a Deeply Supervised MultiResUNet model. Coupling the DWT, which is used to achieve a more meticulous textural analysis while integrating information from neighboring CT slices, with the deep supervision of the model architecture results in an improved dice coefficient of 0.8472. A key characteristic of our approach is its avoidance of 3D kernels (despite being used for a 3D segmentation task), thereby making it quite lightweight.


Assuntos
Processamento de Imagem Assistida por Computador , Neoplasias Pulmonares , Humanos , Processamento de Imagem Assistida por Computador/métodos , Neoplasias Pulmonares/diagnóstico por imagem , Tomografia Computadorizada por Raios X , Análise de Ondaletas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA