Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Front Genet ; 14: 1095330, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36865387

RESUMO

In this current era, biomedical big data handling is a challenging task. Interestingly, the integration of multi-modal data, followed by significant feature mining (gene signature detection), becomes a daunting task. Remembering this, here, we proposed a novel framework, namely, three-factor penalized, non-negative matrix factorization-based multiple kernel learning with soft margin hinge loss (3PNMF-MKL) for multi-modal data integration, followed by gene signature detection. In brief, limma, employing the empirical Bayes statistics, was initially applied to each individual molecular profile, and the statistically significant features were extracted, which was followed by the three-factor penalized non-negative matrix factorization method used for data/matrix fusion using the reduced feature sets. Multiple kernel learning models with soft margin hinge loss had been deployed to estimate average accuracy scores and the area under the curve (AUC). Gene modules had been identified by the consecutive analysis of average linkage clustering and dynamic tree cut. The best module containing the highest correlation was considered the potential gene signature. We utilized an acute myeloid leukemia cancer dataset from The Cancer Genome Atlas (TCGA) repository containing five molecular profiles. Our algorithm generated a 50-gene signature that achieved a high classification AUC score (viz., 0.827). We explored the functions of signature genes using pathway and Gene Ontology (GO) databases. Our method outperformed the state-of-the-art methods in terms of computing AUC. Furthermore, we included some comparative studies with other related methods to enhance the acceptability of our method. Finally, it can be notified that our algorithm can be applied to any multi-modal dataset for data integration, followed by gene module discovery.

2.
SN Comput Sci ; 3(5): 352, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35789572

RESUMO

Probabilistic Regression is a statistical technique and a crucial problem in the machine learning domain which employs a set of machine learning methods to forecast a continuous target variable based on the value of one or multiple predictor variables. COVID-19 is a virulent virus that has brought the whole world to a standstill. The potential of the virus to cause inter human transmission makes the world a dangerous place. This article predicts the upcoming circumstances of the Corona virus to subside its action. We have performed Conditional GAN regression to anticipate the subsequent COVID-19 cases of five countries. The GAN variant CGAN is used to design the model and predict the COVID-19 cases for 3 months ahead with least error for the dataset provided. Each country is examined individually, due to their variation in population size, tradition, medical management and preventive measures. The analysis is based on confirmed data, as provided by the World Health Organization. This paper investigates how conditional Generative Adversarial Networks (GANs) can be used to accurately exhibit intricate conditional distributions. GANs have got spectacular achievement in producing convoluted high-dimensional data, but work done on their use for regression problems is minimal. This paper exhibits how conditional GANs can be employed in probabilistic regression. It is shown that conditional GANs can be used to evaluate a wide range of various distributions and be competitive with existing probabilistic regression models.

3.
Transbound Emerg Dis ; 69(6): 3896-3905, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36379049

RESUMO

RNA sequence data from SARS CoV2 patients helps to construct a gene network related to this disease. A detailed analysis of the human host response to SARS CoV2 with expression profiling by high-throughput sequencing has been accomplished with primary human lung epithelial cell lines. Using this data, the clustered gene annotation and gene network construction are performed with the help of the String database. Among the four clusters identified, only 1 with 44 genes could be annotated. Interestingly, this corresponded to basal cells with p = 1.37e - 05, which is relevant for respiratory tract infection. Functional enrichment analysis of genes present in the gene network has been completed using the String database and the Network Analyst tool. Among three types of cell-cell communication, only the anchoring junction between the basal cell membrane and the basal lamina in the host cell is involved in the virus transmission. In this junction point, a hemidesmosome structure plays a vital role in virus spread from one cell to basal lamina in the respiratory tract. In this protein complex structure, different integrin protein molecules of the host cell are used to promote the spread of virus infection into the extracellular matrix. So, small molecular blockers of different anchoring junction proteins, such as integrin alpha 3, integrin beta 1, can provide efficient protection against this deadly viral disease. ORF8 from SARS CoV2 virus can interact with both integrin proteins of human host. By using molecular docking technique, a ternary complex of these three proteins is modelled. Several oligopeptides are predicted as modulators for this ternary complex. In silico analysis of these modulators is very important to develop novel therapeutics for the treatment of SARS CoV2.


Assuntos
COVID-19 , Síndrome Respiratória Aguda Grave , Humanos , Animais , COVID-19/veterinária , SARS-CoV-2/genética , Simulação de Acoplamento Molecular , Síndrome Respiratória Aguda Grave/veterinária , Comunicação Celular , Integrinas
4.
Phys Eng Sci Med ; 45(2): 601-612, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35575961

RESUMO

Finding components from multi-channel EEG signal for localizing and detection of onset of seizure, is a new approach in biomedical signal analysis. Tensor-based approaches are utilized to fit the components into multi-dimensional arrays in recent works. We initially decompose EEG signals into Beta band using discrete wavelet transform (DWT). We compare patient templates with normal template for cross-wavelet analysis to obtain Wavelet cross spectrum (WCS) and Wavelet cross coherence coefficients. Next we apply parallel factorization (PARAFAC) modeling, a three-way tensor-based representation in channel, frequency and time-points dimensions on features. Finally, we utilize the ensemble classifier for detecting seizure-free, onset and seizure classes. The clinical dataset for this work comprises of 5 normal subjects and 6 epileptiform patients. The classification performances of WCS features on PARAFAC model for Seizure detection using Ensemble Bagged-Trees classifier obtains 82.21% accuracy, while for Wavelet Coherence features, it provides higher 84.76% accuracy. The results have been compared with well-known Fine Gaussian SVM, Weighted KNN and Ensemble Subspace KNN classifiers. The aim is to analyze data over three dimensions namely, time, frequency and space (channels). This EEG based analysis is significant and effective as an automatic method for detection of seizure before its actual manifestation.


Assuntos
Eletroencefalografia , Epilepsia , Algoritmos , Eletroencefalografia/métodos , Epilepsia/diagnóstico por imagem , Humanos , Convulsões/diagnóstico por imagem , Análise de Ondaletas
5.
BMC Bioinformatics ; 12 Suppl 9: S18, 2011 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-22151970

RESUMO

BACKGROUND: Uncovering the relationship between the conserved chromosomal segments and the functional relatedness of elements within these segments is an important question in computational genomics. We build upon the series of works on gene teams and homology teams. RESULTS: Our primary contribution is a local sliding-window SYNS (SYNtenic teamS) algorithm that refines an existing family structure into orthologous sub-families by analyzing the neighborhoods around the members of a given family with a locally sliding window. The neighborhood analysis is done by computing conserved gene clusters. We evaluate our algorithm on the existing homologous families from the Genolevures database over five genomes of the Hemyascomycete phylum. CONCLUSIONS: The result is an efficient algorithm that works on multiple genomes, considers paralogous copies of genes and is able to uncover orthologous clusters even in distant genomes. Resulting orthologous clusters are comparable to those obtained by manual curation.


Assuntos
Algoritmos , Genômica/métodos , Família Multigênica , Sintenia , Transportadores de Cassetes de Ligação de ATP/genética , Cromossomos , Genoma Fúngico , Homologia de Sequência do Ácido Nucleico , Leveduras/genética
6.
Comput Biol Med ; 131: 104244, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33550016

RESUMO

Breast cancer is the second leading cancer type among females. In this regard, it is found that microRNAs play an important role by regulating the gene expressions at the post-transcriptional phase. However, identification of the most influencing miRNAs in breast cancer subtypes is a challenging task, while the recent advancement in Next Generation Sequencing techniques allows analyzing high throughput expression data of miRNAs. Thus, we have conducted this research with the help of NGS data of breast cancer in order to identify the most significant miRNA biomarkers. The selected miRNA biomarkers are highly associated with the multiple breast cancer subtypes. For this purpose, a two-phase technique, called Machine Learning Integrated Ensemble of Feature Selection Methods, followed by survival analysis, is proposed. In the first phase, we have selected the best among seven machine learning techniques based on classification accuracy using the entire set of features (in this case miRNAs). Subsequently, eight different feature selection methods are used separately in order to rank the features and validate each set of top features using the selected machine learning technique by considering a multi-class classification task of the breast cancer subtypes. In the second phase, based on the classification accuracy values, the top features from each feature selection method are considered to make an ensemble to provide further categorization of the miRNAs as 8*, 7* up to 1*. The 8* miRNAs provide the highest average classification accuracy of 86% after 10-fold cross-validation. Thereafter, 27 miRNAs are identified from the list that is confined within 8* to 4* miRNAs based on their importance in survival for breast cancer subtypes using Cox regression based survival analysis. Moreover, expression analysis, regulatory network analysis, protein-protein interaction analysis, KEGG pathway and gene ontology enrichment analysis are performed in order to validate biological significance of the proposed solution. Additionally, we have prepared a miRNA-protein-drug interaction network to identify possible drug for the selected miRNAs. Thus, our findings may be considered during a clinical trial for the treatment of breast cancer patients.


Assuntos
Neoplasias da Mama , MicroRNAs , Biomarcadores Tumorais/genética , Neoplasias da Mama/genética , Feminino , Humanos , Aprendizado de Máquina , MicroRNAs/genética , Análise de Sobrevida
7.
Sci Rep ; 10(1): 17699, 2020 10 19.
Artigo em Inglês | MEDLINE | ID: mdl-33077836

RESUMO

Angiotensin converting enzyme 2 (ACE2) (EC:3.4.17.23) is a transmembrane protein which is considered as a receptor for spike protein binding of novel coronavirus (SARS-CoV2). Since no specific medication is available to treat COVID-19, designing of new drug is important and essential. In this regard, in silico method plays an important role, as it is rapid and cost effective compared to the trial and error methods using experimental studies. Natural products are safe and easily available to treat coronavirus affected patients, in the present alarming situation. In this paper five phytochemicals, which belong to flavonoid and anthraquinone subclass, have been selected as small molecules in molecular docking study of spike protein of SARS-CoV2 with its human receptor ACE2 molecule. Their molecular binding sites on spike protein bound structure with its receptor have been analyzed. From this analysis, hesperidin, emodin and chrysin are selected as competent natural products from both Indian and Chinese medicinal plants, to treat COVID-19. Among them, the phytochemical hesperidin can bind with ACE2 protein and bound structure of ACE2 protein and spike protein of SARS-CoV2 noncompetitively. The binding sites of ACE2 protein for spike protein and hesperidin, are located in different parts of ACE2 protein. Ligand spike protein causes conformational change in three-dimensional structure of protein ACE2, which is confirmed by molecular docking and molecular dynamics studies. This compound modulates the binding energy of bound structure of ACE2 and spike protein. This result indicates that due to presence of hesperidin, the bound structure of ACE2 and spike protein fragment becomes unstable. As a result, this natural product can impart antiviral activity in SARS CoV2 infection. The antiviral activity of these five natural compounds are further experimentally validated with QSAR study.


Assuntos
Betacoronavirus/metabolismo , Peptidil Dipeptidase A/metabolismo , Glicoproteína da Espícula de Coronavírus/metabolismo , Regulação Alostérica , Sequência de Aminoácidos , Enzima de Conversão de Angiotensina 2 , Antraquinonas/química , Antraquinonas/metabolismo , Betacoronavirus/isolamento & purificação , Sítios de Ligação , COVID-19 , Infecções por Coronavirus/patologia , Infecções por Coronavirus/virologia , Emodina/química , Emodina/metabolismo , Humanos , Simulação de Acoplamento Molecular , Pandemias , Peptidil Dipeptidase A/química , Pneumonia Viral/patologia , Pneumonia Viral/virologia , Ligação Proteica , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , SARS-CoV-2 , Glicoproteína da Espícula de Coronavírus/química
8.
Database (Oxford) ; 20202020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-32016318

RESUMO

Development of new computational methods and testing their performance has to be carried out using experimental data. Only in comparison to existing knowledge can method performance be assessed. For that purpose, benchmark datasets with known and verified outcome are needed. High-quality benchmark datasets are valuable and may be difficult, laborious and time consuming to generate. VariBench and VariSNP are the two existing databases for sharing variation benchmark datasets used mainly for variation interpretation. They have been used for training and benchmarking predictors for various types of variations and their effects. VariBench was updated with 419 new datasets from 109 papers containing altogether 329 014 152 variants; however, there is plenty of redundancy between the datasets. VariBench is freely available at http://structure.bmc.lu.se/VariBench/. The contents of the datasets vary depending on information in the original source. The available datasets have been categorized into 20 groups and subgroups. There are datasets for insertions and deletions, substitutions in coding and non-coding region, structure mapped, synonymous and benign variants. Effect-specific datasets include DNA regulatory elements, RNA splicing, and protein property for aggregation, binding free energy, disorder and stability. Then there are several datasets for molecule-specific and disease-specific applications, as well as one dataset for variation phenotype effects. Variants are often described at three molecular levels (DNA, RNA and protein) and sometimes also at the protein structural level including relevant cross references and variant descriptions. The updated VariBench facilitates development and testing of new methods and comparison of obtained performances to previously published methods. We compared the performance of the pathogenicity/tolerance predictor PON-P2 to several benchmark studies, and show that such comparisons are feasible and useful, however, there may be limitations due to lack of provided details and shared data. Database URL: http://structure.bmc.lu.se/VariBench.


Assuntos
Bases de Dados Genéticas , Benchmarking , Biologia Computacional , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas/normas , Bases de Dados Genéticas/estatística & dados numéricos , Bases de Dados de Proteínas , Humanos , Disseminação de Informação/métodos
9.
Int J Data Min Bioinform ; 11(3): 277-300, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26333263

RESUMO

Identification of co-expressed genes is the central goal in microarray gene expression analysis. Point-symmetry-based clustering is an important unsupervised learning technique for recognising symmetrical convex- or non-convex-shaped clusters. To enable fast clustering of large microarray data, we propose a distributed time-efficient scalable approach for point-symmetry-based K-Means algorithm. A natural basis for analysing gene expression data using symmetry-based algorithm is to group together genes with similar symmetrical expression patterns. This new parallel implementation also satisfies linear speedup in timing without sacrificing the quality of clustering solution on large microarray data sets. The parallel point-symmetry-based K-Means algorithm is compared with another new parallel symmetry-based K-Means and existing parallel K-Means over eight artificial and benchmark microarray data sets, to demonstrate its superiority, in both timing and validity. The statistical analysis is also performed to establish the significance of this message-passing-interface based point-symmetry K-Means implementation. We also analysed the biological relevance of clustering solutions.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Análise por Conglomerados
10.
IEEE Trans Nanobioscience ; 14(4): 360-367, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25935042

RESUMO

Identification of coexpressed genes is the central goal in microarray gene expression data analysis. Point symmetry-based clustering is an important unsupervised learning technique for recognizing symmetrical convex or non-convex shaped clusters. To enable fast automatic clustering of large microarray data, in this article, a distributed time-efficient scalable parallel rough set based hybrid approach for point symmetry-based clustering algorithm has been proposed. A natural basis for analyzing gene expression data using the symmetry-based algorithm, is to group together genes with similar symmetrical patterns of expression. Rough-set theory helps in faster convergence and initial automatic optimal classification, thereby solving the problem of unknown knowledge of number of clusters in microarray data. This new parallel implementation with K-means algorithm also satisfies the linear speedup in timing on large microarray datasets. This proposed algorithm is compared with another parallel symmetry-based K-means and parallel version of existing K-means over four artificial and benchmark microarray datasets. We also have experimented over three skewed cancer gene expression datasets. The statistical analysis are also performed to establish the significance of this new implementation. The biological relevance of the clustering solutions are also analyzed.

11.
Biotechnol Biofuels ; 7: 66, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24834124

RESUMO

BACKGROUND: The industrially important yeast Blastobotrys (Arxula) adeninivorans is an asexual hemiascomycete phylogenetically very distant from Saccharomyces cerevisiae. Its unusual metabolic flexibility allows it to use a wide range of carbon and nitrogen sources, while being thermotolerant, xerotolerant and osmotolerant. RESULTS: The sequencing of strain LS3 revealed that the nuclear genome of A. adeninivorans is 11.8 Mb long and consists of four chromosomes with regional centromeres. Its closest sequenced relative is Yarrowia lipolytica, although mean conservation of orthologs is low. With 914 introns within 6116 genes, A. adeninivorans is one of the most intron-rich hemiascomycetes sequenced to date. Several large species-specific families appear to result from multiple rounds of segmental duplications of tandem gene arrays, a novel mechanism not yet described in yeasts. An analysis of the genome and its transcriptome revealed enzymes with biotechnological potential, such as two extracellular tannases (Atan1p and Atan2p) of the tannic-acid catabolic route, and a new pathway for the assimilation of n-butanol via butyric aldehyde and butyric acid. CONCLUSIONS: The high-quality genome of this species that diverged early in Saccharomycotina will allow further fundamental studies on comparative genomics, evolution and phylogenetics. Protein components of different pathways for carbon and nitrogen source utilization were identified, which so far has remained unexplored in yeast, offering clues for further biotechnological developments. In the course of identifying alternative microorganisms for biotechnological interest, A. adeninivorans has already proved its strengthened competitiveness as a promising cell factory for many more applications.

12.
PLoS One ; 8(2): e46468, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23457439

RESUMO

UNLABELLED: Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. CONTACT: sarkar@labri.fr.


Assuntos
Proteínas/química , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Inteligência Artificial , Análise por Conglomerados , Cadeias de Markov , Alinhamento de Sequência/métodos , Software
13.
Int J Bioinform Res Appl ; 9(2): 121-33, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23467059

RESUMO

In this paper we propose an automatic protein family expansion approach for recruitment of new members among the protein-coding genes in newly sequenced genomes. The criteria for adding a new member to a family depends on the structure of each individual family versus being globally uniform. The detection of a threshold in the ROC space of all sorted iterative profile sets defines the alignments selection criteria for each family. Furthermore, the statistical estimation of most-frequent optimal sorting criteria generates the optimal filtering strategy in a learning-parameter set for profile-based homology search.


Assuntos
Proteínas/química , Algoritmos , Biologia Computacional , Genoma , Proteínas/genética , Curva ROC
14.
G3 (Bethesda) ; 2(2): 299-311, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22384408

RESUMO

Polyploidization is an important process in the evolution of eukaryotic genomes, but ensuing molecular mechanisms remain to be clarified. Autopolyploidization or whole-genome duplication events frequently are resolved in resulting lineages by the loss of single genes from most duplicated pairs, causing transient gene dosage imbalance and accelerating speciation through meiotic infertility. Allopolyploidization or formation of interspecies hybrids raises the problem of genetic incompatibility (Bateson-Dobzhansky-Muller effect) and may be resolved by the accumulation of mutational changes in resulting lineages. In this article, we show that an osmotolerant yeast species, Pichia sorbitophila, recently isolated in a concentrated sorbitol solution in industry, illustrates this last situation. Its genome is a mosaic of homologous and homeologous chromosomes, or parts thereof, that corresponds to a recently formed hybrid in the process of evolution. The respective parental contributions to this genome were characterized using existing variations in GC content. The genomic changes that occurred during the short period since hybrid formation were identified (e.g., loss of heterozygosity, unilateral loss of rDNA, reciprocal exchange) and distinguished from those undergone by the two parental genomes after separation from their common ancestor (i.e., NUMT (NUclear sequences of MiTochondrial origin) insertions, gene acquisitions, gene location movements, reciprocal translocation). We found that the physiological characteristics of this new yeast species are determined by specific but unequal contributions of its two parents, one of which could be identified as very closely related to an extant Pichia farinosa strain.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA