Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
PeerJ ; 12: e17010, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38495766

RESUMO

Proteins are considered indispensable for facilitating an organism's viability, reproductive capabilities, and other fundamental physiological functions. Conventional biological assays are characterized by prolonged duration, extensive labor requirements, and financial expenses in order to identify essential proteins. Therefore, it is widely accepted that employing computational methods is the most expeditious and effective approach to successfully discerning essential proteins. Despite being a popular choice in machine learning (ML) applications, the deep learning (DL) method is not suggested for this specific research work based on sequence features due to the restricted availability of high-quality training sets of positive and negative samples. However, some DL works on limited availability of data are also executed at recent times which will be our future scope of work. Conventional ML techniques are thus utilized in this work due to their superior performance compared to DL methodologies. In consideration of the aforementioned, a technique called EPI-SF is proposed here, which employs ML to identify essential proteins within the protein-protein interaction network (PPIN). The protein sequence is the primary determinant of protein structure and function. So, initially, relevant protein sequence features are extracted from the proteins within the PPIN. These features are subsequently utilized as input for various machine learning models, including XGB Boost Classifier, AdaBoost Classifier, logistic regression (LR), support vector classification (SVM), Decision Tree model (DT), Random Forest model (RF), and Naïve Bayes model (NB). The objective is to detect the essential proteins within the PPIN. The primary investigation conducted on yeast examined the performance of various ML models for yeast PPIN. Among these models, the RF model technique had the highest level of effectiveness, as indicated by its precision, recall, F1-score, and AUC values of 0.703, 0.720, 0.711, and 0.745, respectively. It is also found to be better in performance when compared to the other state-of-arts based on traditional centrality like betweenness centrality (BC), closeness centrality (CC), etc. and deep learning methods as well like DeepEP, as emphasized in the result section. As a result of its favorable performance, EPI-SF is later employed for the prediction of novel essential proteins inside the human PPIN. Due to the tendency of viruses to selectively target essential proteins involved in the transmission of diseases within human PPIN, investigations are conducted to assess the probable involvement of these proteins in COVID-19 and other related severe diseases.


Assuntos
Mapas de Interação de Proteínas , Saccharomyces cerevisiae , Humanos , Teorema de Bayes , Proteínas/química , Aprendizado de Máquina
2.
Brief Funct Genomics ; 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-38183212

RESUMO

The traditional method of drug reuse or repurposing has significantly contributed to the identification of new antiviral compounds and therapeutic targets, enabling rapid response to developing infectious illnesses. This article presents an overview of how modern computational methods are used in drug repurposing for the treatment of viral infectious diseases. These methods utilize data sets that include reviewed information on the host's response to pathogens and drugs, as well as various connections such as gene expression patterns and protein-protein interaction networks. We assess the potential benefits and limitations of these methods by examining monkeypox as a specific example, but the knowledge acquired can be applied to other comparable disease scenarios.

3.
4.
PLoS One ; 18(6): e0286862, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37352172

RESUMO

Robust semantic segmentation of tumour micro-environment is one of the major open challenges in machine learning enabled computational pathology. Though deep learning based systems have made significant progress, their task agnostic data driven approach often lacks the contextual grounding necessary in biomedical applications. We present a novel fuzzy water flow scheme that takes the coarse segmentation output of a base deep learning framework to then provide a more fine-grained and instance level robust segmentation output. Our two stage synergistic segmentation method, Deep-Fuzz, works especially well for overlapping objects, and achieves state-of-the-art performance in four public cell nuclei segmentation datasets. We also show through visual examples how our final output is better aligned with pathological insights, and thus more clinically interpretable.


Assuntos
Aprendizado Profundo , Núcleo Celular , Aprendizado de Máquina , Água , Processamento de Imagem Assistida por Computador
5.
Vaccines (Basel) ; 11(3)2023 Feb 25.
Artigo em Inglês | MEDLINE | ID: mdl-36992133

RESUMO

SARS-CoV-2 is a novel coronavirus that replicates itself via interacting with the host proteins. As a result, identifying virus and host protein-protein interactions could help researchers better understand the virus disease transmission behavior and identify possible COVID-19 drugs. The International Committee on Virus Taxonomy has determined that nCoV is genetically 89% compared to the SARS-CoV epidemic in 2003. This paper focuses on assessing the host-pathogen protein interaction affinity of the coronavirus family, having 44 different variants. In light of these considerations, a GO-semantic scoring function is provided based on Gene Ontology (GO) graphs for determining the binding affinity of any two proteins at the organism level. Based on the availability of the GO annotation of the proteins, 11 viral variants, viz., SARS-CoV-2, SARS, MERS, Bat coronavirus HKU3, Bat coronavirus Rp3/2004, Bat coronavirus HKU5, Murine coronavirus, Bovine coronavirus, Rat coronavirus, Bat coronavirus HKU4, Bat coronavirus 133/2005, are considered from 44 viral variants. The fuzzy scoring function of the entire host-pathogen network has been processed with ~180 million potential interactions generated from 19,281 host proteins and around 242 viral proteins. ~4.5 million potential level one host-pathogen interactions are computed based on the estimated interaction affinity threshold. The resulting host-pathogen interactome is also validated with state-of-the-art experimental networks. The study has also been extended further toward the drug-repurposing study by analyzing the FDA-listed COVID drugs.

6.
Comput Biol Med ; 152: 106329, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36473342

RESUMO

In the present work, we have explored the potential of Copula-based ensemble of CNNs(Convolutional Neural Networks) over individual classifiers for malignancy identification in histopathology and cytology images. The Copula-based model that integrates three best performing CNN architectures, namely, DenseNet-161/201, ResNet-101/34, InceptionNet-V3 is proposed. Also, the limitation of small dataset is circumvented using a Fuzzy template based data augmentation technique that intelligently selects multiple region of interests (ROIs) from an image. The proposed framework of data augmentation amalgamated with the ensemble technique showed a gratifying performance in malignancy prediction surpassing the individual CNN's performance on breast cytology and histopathology datasets. The proposed method has achieved accuracies of 84.37%, 97.32%, 91.67% on the JUCYT, BreakHis and BI datasets respectively. This automated technique will serve as a useful guide to the pathologist in delivering the appropriate diagnostic decision in reduced time and effort. The relevant codes of the proposed ensemble model are publicly available on GitHub.


Assuntos
Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/patologia , Redes Neurais de Computação , Mama/diagnóstico por imagem , Mama/patologia
7.
Front Genet ; 13: 969915, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36246645

RESUMO

Protein function prediction is gradually emerging as an essential field in biological and computational studies. Though the latter has clinched a significant footprint, it has been observed that the application of computational information gathered from multiple sources has more significant influence than the one derived from a single source. Considering this fact, a methodology, PFP-GO, is proposed where heterogeneous sources like Protein Sequence, Protein Domain, and Protein-Protein Interaction Network have been processed separately for ranking each individual functional GO term. Based on this ranking, GO terms are propagated to the target proteins. While Protein sequence enriches the sequence-based information, Protein Domain and Protein-Protein Interaction Networks embed structural/functional and topological based information, respectively, during the phase of GO ranking. Performance analysis of PFP-GO is also based on Precision, Recall, and F-Score. The same was found to perform reasonably better when compared to the other existing state-of-art. PFP-GO has achieved an overall Precision, Recall, and F-Score of 0.67, 0.58, and 0.62, respectively. Furthermore, we check some of the top-ranked GO terms predicted by PFP-GO through multilayer network propagation that affect the 3D structure of the genome. The complete source code of PFP-GO is freely available at https://sites.google.com/view/pfp-go/.

8.
Vaccines (Basel) ; 10(10)2022 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-36298508

RESUMO

Recent research has highlighted that a large section of druggable protein targets in the Human interactome remains unexplored for various diseases. It might lead to the drug repurposing study and help in the in-silico prediction of new drug-human protein target interactions. The same applies to the current pandemic of COVID-19 disease in global health issues. It is highly desirable to identify potential human drug targets for COVID-19 using a machine learning approach since it saves time and labor compared to traditional experimental methods. Structure-based drug discovery where druggability is determined by molecular docking is only appropriate for the protein whose three-dimensional structures are available. With machine learning algorithms, differentiating relevant features for predicting targets and non-targets can be used for the proteins whose 3-D structures are unavailable. In this research, a Machine Learning-based Drug Target Discovery (ML-DTD) approach is proposed where a machine learning model is initially built up and tested on the curated dataset consisting of COVID-19 human drug targets and non-targets formed by using the Therapeutic Target Database (TTD) and human interactome using several classifiers like XGBBoost Classifier, AdaBoost Classifier, Logistic Regression, Support Vector Classification, Decision Tree Classifier, Random Forest Classifier, Naive Bayes Classifier, and K-Nearest Neighbour Classifier (KNN). In this method, protein features include Gene Set Enrichment Analysis (GSEA) ranking, properties derived from the protein sequence, and encoded protein network centrality-based measures. Among all these, XGBBoost, KNN, and Random Forest models are satisfactory and consistent. This model is further used to predict novel COVID-19 human drug targets, which are further validated by target pathway analysis, the emergence of allied repurposed drugs, and their subsequent docking study.

9.
Cells ; 11(17)2022 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-36078056

RESUMO

Proteins are vital for the significant cellular activities of living organisms. However, not all of them are essential. Identifying essential proteins through different biological experiments is relatively more laborious and time-consuming than the computational approaches used in recent times. However, practical implementation of conventional scientific methods sometimes becomes challenging due to poor performance impact in specific scenarios. Thus, more developed and efficient computational prediction models are required for essential protein identification. An effective methodology is proposed in this research, capable of predicting essential proteins in a refined yeast protein-protein interaction network (PPIN). The rule-based refinement is done using protein complex and local interaction density information derived from the neighborhood properties of proteins in the network. Identification and pruning of non-essential proteins are equally crucial here. In the initial phase, careful assessment is performed by applying node and edge weights to identify and discard the non-essential proteins from the interaction network. Three cut-off levels are considered for each node and edge weight for pruning the non-essential proteins. Once the PPIN has been filtered out, the second phase starts with two centralities-based approaches: (1) local interaction density (LID) and (2) local interaction density with protein complex (LIDC), which are successively implemented to identify the essential proteins in the yeast PPIN. Our proposed methodology achieves better performance in comparison to the existing state-of-the-art techniques.


Assuntos
Mapas de Interação de Proteínas , Saccharomyces cerevisiae , Proteínas/metabolismo , Saccharomyces cerevisiae/metabolismo
10.
Artigo em Inglês | MEDLINE | ID: mdl-32750875

RESUMO

Over the years, several methods have been proposed for the computational PPI prediction with different performance evaluation strategies. While attempting to benchmark performance scores, most of these methods often suffer with ill-treated cross-validation strategies, adhoc selection of positive/negative samples etc. To address these issues, in our proposed multi-level feature based PPI prediction approach (JUPPI), using sequence, domain and GO information as features, a refined evaluation strategy has been introduced. During the evaluation process, we first extract high quality negative data using three-stage filtering, and then introduce a pair-input based cross validation strategy with three difficulty levels for test-set predictions. Our proposed evaluation strategy reduces the component-level overlapping issue in test sets. Performance of JUPPI is compared with those of the state-of-the-art approaches in this domain and tested on six independent PPI datasets. In almost all the datasets, JUPPI outperforms the state-of-the-art not only at human proteome level for PPI prediction, but also for prediction of interactors for intrinsic disordered human proteins. https://figshare.com/projects/JUPPI_A_Multi-level_Feature_Based_Method_for_PPI_Prediction_and_a_Refined_Strategy_for_Performance_Assessment/81656 JUPPI tool and the developed datasets (JUPPId) are available in public domain for academic use along with supplementary materials, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2020.3004970.


Assuntos
Biologia Computacional , Proteínas , Humanos
11.
Methods ; 203: 564-574, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-34455072

RESUMO

With the gradual increase in the COVID-19 mortality rate, there is an urgent need for an effective drug/vaccine. Several drugs like Remdesivir, Azithromycin, Favirapir, Ritonavir, Darunavir, etc., are put under evaluation in more than 300 clinical trials to treat COVID-19. On the other hand, several vaccines like Pfizer-BioNTech, Moderna, Johnson & Johnson's Janssen, Sputnik V, Covishield, Covaxin, etc., also evolved from the research study. While few of them already gets approved, others show encouraging results and are still under assessment. In parallel, there are also significant developments in new drug development. But, since the approval of new molecules takes substantial time, drug repurposing studies have also gained considerable momentum. The primary agent of the disease progression of COVID-19 is SARS-CoV2/nCoV, which is believed to have ~89% genetic resemblance with SARS-CoV, a coronavirus responsible for the massive outbreak in 2003. With this hypothesis, Human-SARS-CoV protein interactions are used to develop an in-silico Human-nCoV network by identifying potential COVID-19 human spreader proteins by applying the SIS model and fuzzy thresholding by a possible COVID-19 FDA drugs target-based validation. At first, the complete list of FDA drugs is identified for the level-1 and level-2 spreader proteins in this network, followed by applying a drug consensus scoring strategy. The same consensus strategy is involved in the second analysis but on a curated overlapping set of key genes/proteins identified from COVID-19 symptoms. Validation using subsequent docking study has also been performed on COVID-19 potential drugs with the available major COVID-19 crystal structures whose PDB IDs are: 6LU7, 6M2Q, 6W9C, 6M0J, 6M71 and 6VXX. Our computational study and docking results suggest that Fostamatinib (R406 as its active promoiety) may also be considered as one of the potential candidates for further clinical trials in pursuit to counter the spread of COVID-19.


Assuntos
Tratamento Farmacológico da COVID-19 , Reposicionamento de Medicamentos , Aminopiridinas , Antivirais/farmacologia , Antivirais/uso terapêutico , ChAdOx1 nCoV-19 , Reposicionamento de Medicamentos/métodos , Humanos , Simulação de Acoplamento Molecular , Morfolinas , Pirimidinas , RNA Viral , SARS-CoV-2
12.
Methods ; 203: 488-497, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-34902553

RESUMO

Novel coronavirus(SARS-CoV2) replicates the host cell's genome by interacting with the host proteins. Due to this fact, the identification of virus and host protein-protein interactions could be beneficial in understanding the disease transmission behavior of the virus as well as in potential COVID-19 drug identification. International Committee on Taxonomy of Viruses (ICTV) has declared that nCoV is highly genetically similar to the SARS-CoV epidemic in 2003 (∼89% similarity). With this hypothesis, the present work focuses on developing a computational model for the nCoV-Human protein interaction network, using the experimentally validated SARS-CoV-Human protein interactions. Initially, level-1 and level-2 human spreader proteins are identified in the SARS-CoV-Human interaction network, using Susceptible-Infected-Susceptible (SIS) model. These proteins are considered potential human targets for nCoV bait proteins. A gene-ontology-based fuzzy affinity function has been used to construct the nCoV-Human protein interaction network at a ∼99.98% specificity threshold. This also identifies 37 level-1 human spreaders for COVID-19 in the human protein-interaction network. 2474 level-2 human spreaders are subsequently identified using the SIS model. The derived host-pathogen interaction network is finally validated using six potential FDA-listed drugs for COVID-19 with significant overlap between the known drug target proteins and the identified spreader proteins.


Assuntos
COVID-19 , SARS-CoV-2 , Simulação por Computador , Humanos , Mapas de Interação de Proteínas/genética , Proteínas , RNA Viral , SARS-CoV-2/genética
13.
PeerJ ; 9: e12117, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34567845

RESUMO

The entire world is witnessing the coronavirus pandemic (COVID-19), caused by a novel coronavirus (n-CoV) generally distinguished as Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). SARS-CoV-2 promotes fatal chronic respiratory disease followed by multiple organ failure, ultimately putting an end to human life. International Committee on Taxonomy of Viruses (ICTV) has reached a consensus that SARS-CoV-2 is highly genetically similar (up to 89%) to the Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV), which had an outbreak in 2003. With this hypothesis, current work focuses on identifying the spreader nodes in the SARS-CoV-human protein-protein interaction network (PPIN) to find possible lineage with the disease propagation pattern of the current pandemic. Various PPIN characteristics like edge ratio, neighborhood density, and node weight have been explored for defining a new feature spreadability index by which spreader proteins and protein-protein interaction (in the form of network edges) are identified. Top spreader nodes with a high spreadability index have been validated by Susceptible-Infected-Susceptible (SIS) disease model, first using a synthetic PPIN followed by a SARS-CoV-human PPIN. The ranked edges highlight the path of entire disease propagation from SARS-CoV to human PPIN (up to level-2 neighborhood). The developed network attribute, spreadability index, and the generated SIS model, compared with the other network centrality-based methodologies, perform better than the existing state-of-art.

14.
Int J Mol Sci ; 22(18)2021 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-34576064

RESUMO

S-palmitoylation is a reversible covalent post-translational modification of cysteine thiol side chain by palmitic acid. S-palmitoylation plays a critical role in a variety of biological processes and is engaged in several human diseases. Therefore, identifying specific sites of this modification is crucial for understanding their functional consequences in physiology and pathology. We present a random forest (RF) classifier-based consensus strategy (RFCM-PALM) for predicting the palmitoylated cysteine sites on synaptic proteins from male/female mouse data. To design the prediction model, we have introduced a heuristic strategy for selection of the optimum set of physicochemical features from the AAIndex dataset using (a) K-Best (KB) features, (b) genetic algorithm (GA), and (c) a union (UN) of KB and GA based features. Furthermore, decisions from best-trained models of the KB, GA, and UN-based classifiers are combined by designing a three-star quality consensus strategy to further refine and enhance the scores of the individual models. The experiment is carried out on three categorized synaptic protein datasets of a male mouse, female mouse, and combined (male + female), whereas in each group, weighted data is used as training, and knock-out is used as the hold-out set for performance evaluation and comparison. RFCM-PALM shows ~80% area under curve (AUC) score in all three categories of datasets and achieve 10% average accuracy (male-15%, female-15%, and combined-7%) improvements on the hold-out set compared to the state-of-the-art approaches. To summarize, our method with efficient feature selection and novel consensus strategy shows significant performance gains in the prediction of S-palmitoylation sites in mouse datasets.


Assuntos
Algoritmos , Simulação por Computador , Lipoilação , Proteínas do Tecido Nervoso/metabolismo , Sinapses/metabolismo , Animais , Bases de Dados de Proteínas , Feminino , Masculino , Camundongos
15.
IEEE Trans Med Imaging ; 40(12): 3919-3931, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34329158

RESUMO

This paper proposes a novel local feature descriptor coined as a local instant-and-center-symmetric neighbor-based pattern of the extrema-images (LINPE) to detect breast abnormalities in thermal breast images. It is a hybrid descriptor that combines two different feature descriptors: one is the inverse-probability difference extrema (IpDE), and another is the local instant and center-symmetric neighbor-based pattern (LICsNP). IpDE is developed to compute the intensity-inhomogeneity-invariant feature-based image of the breast thermogram. Besides, the LICsNP is intended to capture the local microstructure pattern information in the IpDE image. A new paradigm, named Broad Learning (BL) network, is introduced here as a classifier to differentiate the healthy and sick breast thermograms efficiently. The efficacy of the proposed system is quantitatively validated on the images of DMR-IR and DBT-TU-JU databases. Extensive experimentation on these databases with an average accuracy of 96.90% and 94%, respectively, justifies proposed system's superiority in the differentiation of healthy and sick breast thermograms over the other related existing state-of-the-art methods. The proposed system also performs consistently in the presence of noise and rotational changes.


Assuntos
Mama , Termografia , Mama/diagnóstico por imagem , Bases de Dados Factuais
16.
J Med Syst ; 45(2): 25, 2021 Jan 16.
Artigo em Inglês | MEDLINE | ID: mdl-33452582

RESUMO

The microscope is one of the widely used pathological equipment to analyze body fluids like blood, sputum, etc. in granular level. In order to reduce workload on pathologists and strengthen the telehealth services, an automatic self-focusing microscope with different field image collection mechanism is required. In this work, the conversion of a compound microscope into a complete digital self-focusing automatic microscope, with intelligent field image collection mechanism, is discussed. This method uses passive autofocusing technique. In this method, most informative regions are identified on the basis of texture information. Features from these identified regions are used for autofocusing the microscope. This system is capable of collecting multiple snaps from different regions of the smear sample slides. The problem with the smear slide is that it has un-uniform thickness upon the glass slide. So some region has a very thick layer and some region has a very thin layer. In general, both of these regions are not considered for pathological analysis. The proposed system is capable to detect the region of smear slide which is suitable for collection of snap images. A soft computing approach is used to detect the desired regions of the sample in the slide. The Raspberry pi is used to design the control section. Multi-threaded parallel programming is used to optimize I/O execution and waiting time. The performance of the proposed system is satisfactory. The average peak signal-to-noise ratio (PSNR) is about 33 in comparison with manual focusing by the domain expert. The performance of the system in terms of computation time, which is calculated on the benchmark microscopic image dataset, is better than other learning-based methods. Autofocusing of pathological microscope with an intelligent field image collection mechanism is highly useful in the remote healthcare domain. This work basically describes a mechanism to migrate the conventional compound microscope into a tale-health service compatible (IoT enabled) microscope. This system is highly suitable for developing countries where an overall change of existing infrastructure is difficult due to economic reasons.


Assuntos
Processamento de Imagem Assistida por Computador , Microscopia , Humanos , Software , Escarro
17.
J Bioinform Comput Biol ; 17(4): 1950025, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31617461

RESUMO

Computational prediction of functional annotation of proteins is an uphill task. There is an ever increasing gap between functional characterization of protein sequences and deluge of protein sequences generated by large-scale sequencing projects. The dynamic nature of protein interactions is frequently observed which is mostly influenced by any new change of state or change in stimuli. Functional characterization of proteins can be inferred from their interactions with each other, which is dynamic in nature. In this work, we have used a dynamic protein-protein interaction network (PPIN), time course gene expression data and protein sequence information for prediction of functional annotation of proteins. During progression of a particular function, it has also been observed that not all the proteins are active at all time points. For unannotated active proteins, our proposed methodology explores the dynamic PPIN consisting of level-1 and level-2 neighboring proteins at different time points, filtered by Damerau-Levenshtein edit distance to estimate the similarity between two protein sequences and coefficient variation methods to assess the strength of an edge in a network. Finally, from the filtered dynamic PPIN, at each time point, functional annotations of the level-2 proteins are assigned to the unknown and unannotated active proteins through the level-1 neighbor, following a bottom-up strategy. Our proposed methodology achieves an average precision, recall and F-Score of 0.59, 0.76 and 0.61 respectively, which is significantly higher than the reported state-of-the-art methods.


Assuntos
Expressão Gênica , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Biologia Computacional/métodos , Bases de Dados Genéticas , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Proteínas/genética , Leveduras/genética
18.
PeerJ ; 7: e6830, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31198622

RESUMO

Proteins are the most versatile macromolecules in living systems and perform crucial biological functions. In the advent of the post-genomic era, the next generation sequencing is done routinely at the population scale for a variety of species. The challenging problem is to massively determine the functions of proteins that are yet not characterized by detailed experimental studies. Identification of protein functions experimentally is a laborious and time-consuming task involving many resources. We therefore propose the automated protein function prediction methodology using in silico algorithms trained on carefully curated experimental datasets. We present the improved protein function prediction tool FunPred 3.0, an extended version of our previous methodology FunPred 2, which exploits neighborhood properties in protein-protein interaction network (PPIN) and physicochemical properties of amino acids. Our method is validated using the available functional annotations in the PPIN network of Saccharomyces cerevisiae in the latest Munich information center for protein (MIPS) dataset. The PPIN data of S. cerevisiae in MIPS dataset includes 4,554 unique proteins in 13,528 protein-protein interactions after the elimination of the self-replicating and the self-interacting protein pairs. Using the developed FunPred 3.0 tool, we are able to achieve the mean precision, the recall and the F-score values of 0.55, 0.82 and 0.66, respectively. FunPred 3.0 is then used to predict the functions of unpredicted protein pairs (incomplete and missing functional annotations) in MIPS dataset of S. cerevisiae. The method is also capable of predicting the subcellular localization of proteins along with its corresponding functions. The code and the complete prediction results are available freely at: https://github.com/SovanSaha/FunPred-3.0.git.

19.
IEEE/ACM Trans Comput Biol Bioinform ; 16(6): 1773-1784, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-29993556

RESUMO

We present a human protein cluster analysis by combining: 1) n-gram based amino acid frequency features, 2) optimal feature selection, 3) hierarchical clustering, and 4) advanced partitioning techniques. Our method qualitatively and quantitatively groups proteins with increasing sequence similarity into similar clusters by calculating the frequency model of amino acids using n-grams. We experiment with n = 1, i.e., unigrams, n = 2, i.e., bigrams, and finally n = 3, i.e., trigrams for optimal selection of features to design the 3gClust algorithm. The benchmarking results on 20,105 manually curated human proteins show that 3gClust ensures better cluster compactness in the case of proteins with similar functional groups, biological processes, structural alignment, and shared domains (e.g., aquaporins, keratins). Quantitative analysis of non singleton clusters shows significant improvement in their compactness in comparison to other state-of-the art methodologies. 3gClust is available at https://sites.google.com/site/bioinfoju/projects/3gclust for academic use along with supplementary materials, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2018.2840996, and datasets.


Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Proteínas/química , Transferases/química , Algoritmos , Membrana Celular/metabolismo , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Aprendizado de Máquina , Filogenia , Conformação Proteica , Alinhamento de Sequência
20.
IEEE Trans Med Imaging ; 38(2): 572-584, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30176582

RESUMO

Segmentation of suspicious regions (SRs) of a thermal breast image (TBI) is a very significant and challenging problem for the identification of breast cancer. Therefore, in this work, we have proposed an active contour model for the segmentation of the SRs in TBI. The proposed segmentation method combines three significant steps. First, a novel method, called smaller-peaks corresponding to the high-intensity-pixels and the centroid-knowledge of SRs (SCH-CS), is proposed to approximately locate the SRs, whose contours are later used as the initial evolving curves of the level set method (LSM). Second, a new energy functional, called different local priorities embedded (DLPE), is proposed regarding the level set function. DLPE is then minimized using the interleaved level set evolution to segment the potential SRs in TBI more accurately. Finally, a new stopping criterion is incorporated into the proposed LSM. The proposed LSM not only increases the segmentation speed but also ameliorates the segmentation accuracy. The performance of our SR segmentation method was evaluated on two TBI databases, namely, DMR-IR and DBT-TU-JU, and the average segmentation accuracies obtained on these databases are 72.18% and 71.26% respectively, which are better than the other state-of-the-art methods. Beside this, a novel framework to analyze TBIs is proposed for differentiating abnormal and normal breasts on the basis of the segmented SRs. We have also shown experimentally that investigating only the SRs instead of the whole breast is more effective in differentiating abnormal and normal breasts.


Assuntos
Mama/diagnóstico por imagem , Processamento de Imagem Assistida por Computador/métodos , Termografia/métodos , Algoritmos , Neoplasias da Mama/diagnóstico por imagem , Bases de Dados Factuais , Feminino , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA