Pesquisa | Portal de Pesquisa da BVS

1.

scDeepInsight: a supervised cell-type identification method for scRNA-seq data with deep learning.

Jia, Shangru; Lysenko, Artem; Boroevich, Keith A; Sharma, Alok; Tsunoda, Tatsuhiko.

Brief Bioinform ; 24(5)2023 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-37523217

RESUMO

Annotation of cell-types is a critical step in the analysis of single-cell RNA sequencing (scRNA-seq) data that allows the study of heterogeneity across multiple cell populations. Currently, this is most commonly done using unsupervised clustering algorithms, which project single-cell expression data into a lower dimensional space and then cluster cells based on their distances from each other. However, as these methods do not use reference datasets, they can only achieve a rough classification of cell-types, and it is difficult to improve the recognition accuracy further. To effectively solve this issue, we propose a novel supervised annotation method, scDeepInsight. The scDeepInsight method is capable of performing manifold assignments. It is competent in executing data integration through batch normalization, performing supervised training on the reference dataset, doing outlier detection and annotating cell-types on query datasets. Moreover, it can help identify active genes or marker genes related to cell-types. The training of the scDeepInsight model is performed in a unique way. Tabular scRNA-seq data are first converted to corresponding images through the DeepInsight methodology. DeepInsight can create a trainable image transformer to convert non-image RNA data to images by comprehensively comparing interrelationships among multiple genes. Subsequently, the converted images are fed into convolutional neural networks such as EfficientNet-b3. This enables automatic feature extraction to identify the cell-types of scRNA-seq samples. We benchmarked scDeepInsight with six other mainstream cell annotation methods. The average accuracy rate of scDeepInsight reached 87.5%, which is more than 7% higher compared with the state-of-the-art methods.

Assuntos

Aprendizado Profundo , Análise da Expressão Gênica de Célula Única , Algoritmos , Benchmarking , Análise por Conglomerados , Análise de Sequência de RNA , Perfilação da Expressão Gênica

2.

Advances in AI and machine learning for predictive medicine.

Sharma, Alok; Lysenko, Artem; Jia, Shangru; Boroevich, Keith A; Tsunoda, Tatsuhiko.

J Hum Genet ; 2024 Feb 29.

Artigo em Inglês | MEDLINE | ID: mdl-38424184

RESUMO

The field of omics, driven by advances in high-throughput sequencing, faces a data explosion. This abundance of data offers unprecedented opportunities for predictive modeling in precision medicine, but also presents formidable challenges in data analysis and interpretation. Traditional machine learning (ML) techniques have been partly successful in generating predictive models for omics analysis but exhibit limitations in handling potential relationships within the data for more accurate prediction. This review explores a revolutionary shift in predictive modeling through the application of deep learning (DL), specifically convolutional neural networks (CNNs). Using transformation methods such as DeepInsight, omics data with independent variables in tabular (table-like, including vector) form can be turned into image-like representations, enabling CNNs to capture latent features effectively. This approach not only enhances predictive power but also leverages transfer learning, reducing computational time, and improving performance. However, integrating CNNs in predictive omics data analysis is not without challenges, including issues related to model interpretability, data heterogeneity, and data size. Addressing these challenges requires a multidisciplinary approach, involving collaborations between ML experts, bioinformatics researchers, biologists, and medical doctors. This review illuminates these complexities and charts a course for future research to unlock the full predictive potential of CNNs in omics data analysis and related fields.

3.

DeepFeature: feature selection in nonimage data using convolutional neural network.

Sharma, Alok; Lysenko, Artem; Boroevich, Keith A; Vans, Edwin; Tsunoda, Tatsuhiko.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34368836

RESUMO

Artificial intelligence methods offer exciting new capabilities for the discovery of biological mechanisms from raw data because they are able to detect vastly more complex patterns of association that cannot be captured by classical statistical tests. Among these methods, deep neural networks are currently among the most advanced approaches and, in particular, convolutional neural networks (CNNs) have been shown to perform excellently for a variety of difficult tasks. Despite that applications of this type of networks to high-dimensional omics data and, most importantly, meaningful interpretation of the results returned from such models in a biomedical context remains an open problem. Here we present, an approach applying a CNN to nonimage data for feature selection. Our pipeline, DeepFeature, can both successfully transform omics data into a form that is optimal for fitting a CNN model and can also return sets of the most important genes used internally for computing predictions. Within the framework, the Snowfall compression algorithm is introduced to enable more elements in the fixed pixel framework, and region accumulation and element decoder is developed to find elements or genes from the class activation maps. In comparative tests for cancer type prediction task, DeepFeature simultaneously achieved superior predictive performance and better ability to discover key pathways and biological processes meaningful for this context. Capabilities offered by the proposed framework can enable the effective use of powerful deep learning methods to facilitate the discovery of causal mechanisms in high-dimensional biomedical data.

Assuntos

Aprendizado Profundo , Redes Neurais de Computação , Algoritmos , Humanos

4.

Navigating the disease landscape: knowledge representations for contextualizing molecular signatures.

Saqi, Mansoor; Lysenko, Artem; Guo, Yi-Ke; Tsunoda, Tatsuhiko; Auffray, Charles.

Brief Bioinform ; 20(2): 609-623, 2019 03 25.

Artigo em Inglês | MEDLINE | ID: mdl-29684165

RESUMO

Large amounts of data emerging from experiments in molecular medicine are leading to the identification of molecular signatures associated with disease subtypes. The contextualization of these patterns is important for obtaining mechanistic insight into the aberrant processes associated with a disease, and this typically involves the integration of multiple heterogeneous types of data. In this review, we discuss knowledge representations that can be useful to explore the biological context of molecular signatures, in particular three main approaches, namely, pathway mapping approaches, molecular network centric approaches and approaches that represent biological statements as knowledge graphs. We discuss the utility of each of these paradigms, illustrate how they can be leveraged with selected practical examples and identify ongoing challenges for this field of research.

Assuntos

Biologia Computacional , Medicina Molecular , Humanos , Medicina de Precisão

5.

HseSUMO: Sumoylation site prediction using half-sphere exposures of amino acids residues.

Sharma, Alok; Lysenko, Artem; López, Yosvany; Dehzangi, Abdollah; Sharma, Ronesh; Reddy, Hamendra; Sattar, Abdul; Tsunoda, Tatsuhiko.

BMC Genomics ; 19(Suppl 9): 982, 2019 Apr 18.

Artigo em Inglês | MEDLINE | ID: mdl-30999862

RESUMO

BACKGROUND: Post-translational modifications are viewed as an important mechanism for controlling protein function and are believed to be involved in multiple important diseases. However, their profiling using laboratory-based techniques remain challenging. Therefore, making the development of accurate computational methods to predict post-translational modifications is particularly important for making progress in this area of research. RESULTS: This work explores the use of four half-sphere exposure-based features for computational prediction of sumoylation sites. Unlike most of the previously proposed approaches, which focused on patterns of amino acid co-occurrence, we were able to demonstrate that protein structural based features could be sufficiently informative to achieve good predictive performance. The evaluation of our method has demonstrated high sensitivity (0.9), accuracy (0.89) and Matthew's correlation coefficient (0.78-0.79). We have compared these results to the recently released pSumo-CD method and were able to demonstrate better performance of our method on the same evaluation dataset. CONCLUSIONS: The proposed predictor HseSUMO uses half-sphere exposures of amino acids to predict sumoylation sites. It has shown promising results on a benchmark dataset when compared with the state-of-the-art method. The extracted data of this study can be accessed at https://github.com/YosvanyLopez/HseSUMO .

Assuntos

Algoritmos , Aminoácidos/química , Biologia Computacional/métodos , Proteínas/química , Proteínas/metabolismo , Sumoilação , Sítios de Ligação , Humanos , Máquina de Vetores de Suporte

6.

Recon2Neo4j: applying graph database technologies for managing comprehensive genome-scale networks.

Balaur, Irina; Mazein, Alexander; Saqi, Mansoor; Lysenko, Artem; Rawlings, Christopher J; Auffray, Charles.

Bioinformatics ; 33(7): 1096-1098, 2017 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-27993779

RESUMO

Summary: The goal of this work is to offer a computational framework for exploring data from the Recon2 human metabolic reconstruction model. Advanced user access features have been developed using the Neo4j graph database technology and this paper describes key features such as efficient management of the network data, examples of the network querying for addressing particular tasks, and how query results are converted back to the Systems Biology Markup Language (SBML) standard format. The Neo4j-based metabolic framework facilitates exploration of highly connected and comprehensive human metabolic data and identification of metabolic subnetworks of interest. A Java-based parser component has been developed to convert query results (available in the JSON format) into SBML and SIF formats in order to facilitate further results exploration, enhancement or network sharing. Availability and Implementation: The Neo4j-based metabolic framework is freely available from: https://diseaseknowledgebase.etriks.org/metabolic/browser/ . The java code files developed for this work are available from the following url: https://github.com/ibalaur/MetabolicFramework . Contact: ibalaur@eisbm.org. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Redes e Vias Metabólicas , Software , Gráficos por Computador , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Genoma , Humanos , Redes e Vias Metabólicas/genética , Modelos Biológicos

7.

Genetical and comparative genomics of Brassica under altered Ca supply identifies Arabidopsis Ca-transporter orthologs.

Graham, Neil S; Hammond, John P; Lysenko, Artem; Mayes, Sean; O Lochlainn, Seosamh; Blasco, Bego; Bowen, Helen C; Rawlings, Chris J; Rios, Juan J; Welham, Susan; Carion, Pierre W C; Dupuy, Lionel X; King, Graham J; White, Philip J; Broadley, Martin R.

Plant Cell ; 26(7): 2818-30, 2014 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-25082855

RESUMO

Although Ca transport in plants is highly complex, the overexpression of vacuolar Ca(2+) transporters in crops is a promising new technology to improve dietary Ca supplies through biofortification. Here, we sought to identify novel targets for increasing plant Ca accumulation using genetical and comparative genomics. Expression quantitative trait locus (eQTL) mapping to 1895 cis- and 8015 trans-loci were identified in shoots of an inbred mapping population of Brassica rapa (IMB211 × R500); 23 cis- and 948 trans-eQTLs responded specifically to altered Ca supply. eQTLs were screened for functional significance using a large database of shoot Ca concentration phenotypes of Arabidopsis thaliana. From 31 Arabidopsis gene identifiers tagged to robust shoot Ca concentration phenotypes, 21 mapped to 27 B. rapa eQTLs, including orthologs of the Ca(2+) transporters At-CAX1 and At-ACA8. Two of three independent missense mutants of BraA.cax1a, isolated previously by targeting induced local lesions in genomes, have allele-specific shoot Ca concentration phenotypes compared with their segregating wild types. BraA.CAX1a is a promising target for altering the Ca composition of Brassica, consistent with prior knowledge from Arabidopsis. We conclude that multiple-environment eQTL analysis of complex crop genomes combined with comparative genomics is a powerful technique for novel gene identification/prioritization.

Assuntos

Arabidopsis/genética , Brassica/genética , Cálcio/metabolismo , Proteínas de Transporte de Cátions/genética , Genoma de Planta/genética , Genômica/métodos , Arabidopsis/metabolismo , Brassica/metabolismo , Proteínas de Transporte de Cátions/metabolismo , Mapeamento Cromossômico , Produtos Agrícolas , Regulação da Expressão Gênica de Plantas , Interação Gene-Ambiente , Mutação de Sentido Incorreto , Fenótipo , Folhas de Planta/genética , Folhas de Planta/metabolismo , Proteínas de Plantas/genética , Brotos de Planta/genética , Brotos de Planta/metabolismo , Plantas Geneticamente Modificadas , Locos de Características Quantitativas/genética , Vacúolos/metabolismo

8.

Transcriptome and metabolite profiling of the infection cycle of Zymoseptoria tritici on wheat reveals a biphasic interaction with plant immunity involving differential pathogen chromosomal contributions and a variation on the hemibiotrophic lifestyle definition.

Rudd, Jason J; Kanyuka, Kostya; Hassani-Pak, Keywan; Derbyshire, Mark; Andongabo, Ambrose; Devonshire, Jean; Lysenko, Artem; Saqi, Mansoor; Desai, Nalini M; Powers, Stephen J; Hooper, Juliet; Ambroso, Linda; Bharti, Arvind; Farmer, Andrew; Hammond-Kosack, Kim E; Dietrich, Robert A; Courbot, Mikael.

Plant Physiol ; 167(3): 1158-85, 2015 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-25596183

RESUMO

The hemibiotrophic fungus Zymoseptoria tritici causes Septoria tritici blotch disease of wheat (Triticum aestivum). Pathogen reproduction on wheat occurs without cell penetration, suggesting that dynamic and intimate intercellular communication occurs between fungus and plant throughout the disease cycle. We used deep RNA sequencing and metabolomics to investigate the physiology of plant and pathogen throughout an asexual reproductive cycle of Z. tritici on wheat leaves. Over 3,000 pathogen genes, more than 7,000 wheat genes, and more than 300 metabolites were differentially regulated. Intriguingly, individual fungal chromosomes contributed unequally to the overall gene expression changes. Early transcriptional down-regulation of putative host defense genes was detected in inoculated leaves. There was little evidence for fungal nutrient acquisition from the plant throughout symptomless colonization by Z. tritici, which may instead be utilizing lipid and fatty acid stores for growth. However, the fungus then subsequently manipulated specific plant carbohydrates, including fructan metabolites, during the switch to necrotrophic growth and reproduction. This switch coincided with increased expression of jasmonic acid biosynthesis genes and large-scale activation of other plant defense responses. Fungal genes encoding putative secondary metabolite clusters and secreted effector proteins were identified with distinct infection phase-specific expression patterns, although functional analysis suggested that many have overlapping/redundant functions in virulence. The pathogenic lifestyle of Z. tritici on wheat revealed through this study, involving initial defense suppression by a slow-growing extracellular and nutritionally limited pathogen followed by defense (hyper) activation during reproduction, reveals a subtle modification of the conceptual definition of hemibiotrophic plant infection.

Assuntos

Ascomicetos/metabolismo , Cromossomos Fúngicos/genética , Metaboloma/genética , Imunidade Vegetal , Transcriptoma/genética , Triticum/imunologia , Triticum/microbiologia , Ascomicetos/genética , Ascomicetos/crescimento & desenvolvimento , Progressão da Doença , Frutanos/metabolismo , Perfilação da Expressão Gênica , Regulação Fúngica da Expressão Gênica , Genes Fúngicos , Hexoses/metabolismo , Família Multigênica , Nitratos/metabolismo , Doenças das Plantas/imunologia , Doenças das Plantas/microbiologia , Folhas de Planta/microbiologia , Reprodução Assexuada , Ácido Salicílico/metabolismo , Análise de Sequência de RNA , Fatores de Tempo

9.

A novel approach to identify genes that determine grain protein deviation in cereals.

Mosleth, Ellen F; Wan, Yongfang; Lysenko, Artem; Chope, Gemma A; Penson, Simon P; Shewry, Peter R; Hawkesford, Malcolm J.

Plant Biotechnol J ; 13(5): 625-35, 2015 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-25400203

RESUMO

Grain yield and protein content were determined for six wheat cultivars grown over 3 years at multiple sites and at multiple nitrogen (N) fertilizer inputs. Although grain protein content was negatively correlated with yield, some grain samples had higher protein contents than expected based on their yields, a trait referred to as grain protein deviation (GPD). We used novel statistical approaches to identify gene transcripts significantly related to GPD across environments. The yield and protein content were initially adjusted for nitrogen fertilizer inputs and then adjusted for yield (to remove the negative correlation with protein content), resulting in a parameter termed corrected GPD. Significant genetic variation in corrected GPD was observed for six cultivars grown over a range of environmental conditions (a total of 584 samples). Gene transcript profiles were determined in a subset of 161 samples of developing grain to identify transcripts contributing to GPD. Principal component analysis (PCA), analysis of variance (ANOVA) and means of scores regression (MSR) were used to identify individual principal components (PCs) correlating with GPD alone. Scores of the selected PCs, which were significantly related to GPD and protein content but not to the yield and significantly affected by cultivar, were identified as reflecting a multivariate pattern of gene expression related to genetic variation in GPD. Transcripts with consistent variation along the selected PCs were identified by an approach hereby called one-block means of scores regression (one-block MSR).

Assuntos

Grão Comestível/genética , Variação Genética , Nitrogênio/metabolismo , Proteínas de Armazenamento de Sementes/metabolismo , Triticum/genética , Grão Comestível/metabolismo , Meio Ambiente , Fenótipo , Proteínas de Armazenamento de Sementes/genética , Transcriptoma , Triticum/metabolismo

10.

Enhanced analysis of tabular data through Multi-representation DeepInsight.

Sharma, Alok; López, Yosvany; Jia, Shangru; Lysenko, Artem; Boroevich, Keith A; Tsunoda, Tatsuhiko.

Sci Rep ; 14(1): 12851, 2024 Jun 04.

Artigo em Inglês | MEDLINE | ID: mdl-38834670

RESUMO

Tabular data analysis is a critical task in various domains, enabling us to uncover valuable insights from structured datasets. While traditional machine learning methods can be used for feature engineering and dimensionality reduction, they often struggle to capture the intricate relationships and dependencies within real-world datasets. In this paper, we present Multi-representation DeepInsight (MRep-DeepInsight), a novel extension of the DeepInsight method designed to enhance the analysis of tabular data. By generating multiple representations of samples using diverse feature extraction techniques, our approach is able to capture a broader range of features and reveal deeper insights. We demonstrate the effectiveness of MRep-DeepInsight on single-cell datasets, Alzheimer's data, and artificial data, showcasing an improved accuracy over the original DeepInsight approach and machine learning methods like random forest, XGBoost, LightGBM, FT-Transformer and L2-regularized logistic regression. Our results highlight the value of incorporating multiple representations for robust and accurate tabular data analysis. By leveraging the power of diverse representations, MRep-DeepInsight offers a promising new avenue for advancing decision-making and scientific discovery across a wide range of fields.

11.

DeepInsight-3D architecture for anti-cancer drug response prediction with deep-learning on multi-omics.

Sharma, Alok; Lysenko, Artem; Boroevich, Keith A; Tsunoda, Tatsuhiko.

Sci Rep ; 13(1): 2483, 2023 02 11.

Artigo em Inglês | MEDLINE | ID: mdl-36774402

RESUMO

Modern oncology offers a wide range of treatments and therefore choosing the best option for particular patient is very important for optimal outcome. Multi-omics profiling in combination with AI-based predictive models have great potential for streamlining these treatment decisions. However, these encouraging developments continue to be hampered by very high dimensionality of the datasets in combination with insufficiently large numbers of annotated samples. Here we proposed a novel deep learning-based method to predict patient-specific anticancer drug response from three types of multi-omics data. The proposed DeepInsight-3D approach relies on structured data-to-image conversion that then allows use of convolutional neural networks, which are particularly robust to high dimensionality of the inputs while retaining capabilities to model highly complex relationships between variables. Of particular note, we demonstrate that in this formalism additional channels of an image can be effectively used to accommodate data from different omics layers while implicitly encoding the connection between them. DeepInsight-3D was able to outperform other state-of-the-art methods applied to this task. The proposed improvements can facilitate the development of better personalized treatment strategies for different cancers in the future.

Assuntos

Antineoplásicos , Aprendizado Profundo , Neoplasias , Humanos , Multiômica , Neoplasias/tratamento farmacológico , Redes Neurais de Computação , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico

12.

Immune subtypes and neoantigen-related immune evasion in advanced colorectal cancer.

Sugawara, Toshitaka; Miya, Fuyuki; Ishikawa, Toshiaki; Lysenko, Artem; Nishino, Jo; Kamatani, Takashi; Takemoto, Akira; Boroevich, Keith A; Kakimi, Kazuhiro; Kinugasa, Yusuke; Tanabe, Minoru; Tsunoda, Tatsuhiko.

iScience ; 25(2): 103740, 2022 Feb 18.

Artigo em Inglês | MEDLINE | ID: mdl-35128352

RESUMO

Elimination of cancerous cells by the immune system is an important mechanism of protection from cancer, however, its effectiveness can be reduced owing to development of resistance and evasion. To understand the systemic immune response in advanced untreated primary colorectal cancer, we analyze immune subtypes and immune evasion via neoantigen-related mechanisms. We identify a distinctive cancer subtype characterized by immune evasion and very poor overall survival. This subtype has less clonal highly expressed neoantigens and high chromosomal instability, resulting in adaptive immune resistance mediated by the immune checkpoint molecules and neoantigen presentation disorders. We also observe that neoantigen depletion caused by immunoediting and high clonal neoantigen load are correlated with a good overall survival. Our results indicate that the status of the tumor microenvironment and neoantigen composition are promising new prognostic biomarkers with potential relevance for treatment plan decisions in advanced CRC.

13.

Assessing the functional coherence of modules found in multiple-evidence networks from Arabidopsis.

Lysenko, Artem; Defoin-Platel, Michael; Hassani-Pak, Keywan; Taubert, Jan; Hodgman, Charlie; Rawlings, Christopher J; Saqi, Mansoor.

BMC Bioinformatics ; 12: 203, 2011 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-21612636

RESUMO

BACKGROUND: Combining multiple evidence-types from different information sources has the potential to reveal new relationships in biological systems. The integrated information can be represented as a relationship network, and clustering the network can suggest possible functional modules. The value of such modules for gaining insight into the underlying biological processes depends on their functional coherence. The challenges that we wish to address are to define and quantify the functional coherence of modules in relationship networks, so that they can be used to infer function of as yet unannotated proteins, to discover previously unknown roles of proteins in diseases as well as for better understanding of the regulation and interrelationship between different elements of complex biological systems. RESULTS: We have defined the functional coherence of modules with respect to the Gene Ontology (GO) by considering two complementary aspects: (i) the fragmentation of the GO functional categories into the different modules and (ii) the most representative functions of the modules. We have proposed a set of metrics to evaluate these two aspects and demonstrated their utility in Arabidopsis thaliana. We selected 2355 proteins for which experimentally established protein-protein interaction (PPI) data were available. From these we have constructed five relationship networks, four based on single types of data: PPI, co-expression, co-occurrence of protein names in scientific literature abstracts and sequence similarity and a fifth one combining these four evidence types. The ability of these networks to suggest biologically meaningful grouping of proteins was explored by applying Markov clustering and then by measuring the functional coherence of the clusters. CONCLUSIONS: Relationship networks integrating multiple evidence-types are biologically informative and allow more proteins to be assigned to a putative functional module. Using additional evidence types concentrates the functional annotations in a smaller number of modules without unduly compromising their consistency. These results indicate that integration of more data sources improves the ability to uncover functional association between proteins, both by allowing more proteins to be linked and producing a network where modular structure more closely reflects the hierarchy in the gene ontology.

Assuntos

Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Arabidopsis/metabolismo , Metabolômica/métodos , Algoritmos , Proteínas de Arabidopsis/genética , Análise por Conglomerados , Bases de Dados Genéticas , Cadeias de Markov , Redes e Vias Metabólicas

14.

AIGO: towards a unified framework for the analysis and the inter-comparison of GO functional annotations.

Defoin-Platel, Michael; Hindle, Matthew M; Lysenko, Artem; Powers, Stephen J; Habash, Dimah Z; Rawlings, Christopher J; Saqi, Mansoor.

BMC Bioinformatics ; 12: 431, 2011 Nov 03.

Artigo em Inglês | MEDLINE | ID: mdl-22054122

RESUMO

BACKGROUND: In response to the rapid growth of available genome sequences, efforts have been made to develop automatic inference methods to functionally characterize them. Pipelines that infer functional annotation are now routinely used to produce new annotations at a genome scale and for a broad variety of species. These pipelines differ widely in their inference algorithms, confidence thresholds and data sources for reasoning. This heterogeneity makes a comparison of the relative merits of each approach extremely complex. The evaluation of the quality of the resultant annotations is also challenging given there is often no existing gold-standard against which to evaluate precision and recall. RESULTS: In this paper, we present a pragmatic approach to the study of functional annotations. An ensemble of 12 metrics, describing various aspects of functional annotations, is defined and implemented in a unified framework, which facilitates their systematic analysis and inter-comparison. The use of this framework is demonstrated on three illustrative examples: analysing the outputs of state-of-the-art inference pipelines, comparing electronic versus manual annotation methods, and monitoring the evolution of publicly available functional annotations. The framework is part of the AIGO library (http://code.google.com/p/aigo) for the Analysis and the Inter-comparison of the products of Gene Ontology (GO) annotation pipelines. The AIGO library also provides functionalities to easily load, analyse, manipulate and compare functional annotations and also to plot and export the results of the analysis in various formats. CONCLUSIONS: This work is a step toward developing a unified framework for the systematic study of GO functional annotations. This framework has been designed so that new metrics on GO functional annotations can be added in a very straightforward way.

Assuntos

Bovinos/genética , Genômica/métodos , Anotação de Sequência Molecular , Vocabulário Controlado , Algoritmos , Animais , Mapeamento Cromossômico , Bases de Dados Genéticas , Genoma , Humanos

15.

Data integration for plant genomics--exemplars from the integration of Arabidopsis thaliana databases.

Lysenko, Artem; Lysenko, Atem; Hindle, Matthew Morritt; Taubert, Jan; Saqi, Mansoor; Rawlings, Christopher John.

Brief Bioinform ; 10(6): 676-93, 2009 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-19933213

RESUMO

The development of a systems based approach to problems in plant sciences requires integration of existing information resources. However, the available information is currently often incomplete and dispersed across many sources and the syntactic and semantic heterogeneity of the data is a challenge for integration. In this article, we discuss strategies for data integration and we use a graph based integration method (Ondex) to illustrate some of these challenges with reference to two example problems concerning integration of (i) metabolic pathway and (ii) protein interaction data for Arabidopsis thaliana. We quantify the degree of overlap for three commonly used pathway and protein interaction information sources. For pathways, we find that the AraCyc database contains the widest coverage of enzyme reactions and for protein interactions we find that the IntAct database provides the largest unique contribution to the integrated dataset. For both examples, however, we observe a relatively small amount of data common to all three sources. Analysis and visual exploration of the integrated networks was used to identify a number of practical issues relating to the interpretation of these datasets. We demonstrate the utility of these approaches to the analysis of groups of coexpressed genes from an individual microarray experiment, in the context of pathway information and for the combination of coexpression data with an integrated protein interaction network.

Assuntos

Proteínas de Arabidopsis/genética , Arabidopsis/genética , Mapeamento Cromossômico/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Genoma de Planta/genética , Armazenamento e Recuperação da Informação/métodos , Mapeamento de Interação de Proteínas/métodos , Integração de Sistemas

16.

Cerebrospinal fluid proteome shows disrupted neuronal development in multiple sclerosis.

Mosleth, Ellen F; Vedeler, Christian Alexander; Liland, Kristian Hovde; McLeod, Anette; Bringeland, Gerd Haga; Kroondijk, Liesbeth; Berven, Frode Steingrimsen; Lysenko, Artem; Rawlings, Christopher J; Eid, Karim El-Hajj; Opsahl, Jill Anette; Gjertsen, Bjørn Tore; Myhr, Kjell-Morten; Gavasso, Sonia.

Sci Rep ; 11(1): 4087, 2021 02 18.

Artigo em Inglês | MEDLINE | ID: mdl-33602999

RESUMO

Despite intensive research, the aetiology of multiple sclerosis (MS) remains unknown. Cerebrospinal fluid proteomics has the potential to reveal mechanisms of MS pathogenesis, but analyses must account for disease heterogeneity. We previously reported explorative multivariate analysis by hierarchical clustering of proteomics data of MS patients and controls, which resulted in two groups of individuals. Grouping reflected increased levels of intrathecal inflammatory response proteins and decreased levels of proteins involved in neural development in one group relative to the other group. MS patients and controls were present in both groups. Here we reanalysed these data and we also reanalysed data from an independent cohort of patients diagnosed with clinically isolated syndrome (CIS), who have symptoms of MS without evidence of dissemination in space and/or time. Some, but not all, CIS patients had intrathecal inflammation. The analyses reported here identified a common protein signature of MS/CIS that was not linked to elevated intrathecal inflammation. The signature included low levels of complement proteins, semaphorin-7A, reelin, neural cell adhesion molecules, inter-alpha-trypsin inhibitor heavy chain H2, transforming growth factor beta 1, follistatin-related protein 1, malate dehydrogenase 1 cytoplasmic, plasma retinol-binding protein, biotinidase, and transferrin, all known to play roles in neural development. Low levels of these proteins suggest that MS/CIS patients suffer from abnormally low oxidative capacity that results in disrupted neural development from an early stage of the disease.

Assuntos

Proteínas do Líquido Cefalorraquidiano/análise , Esclerose Múltipla/líquido cefalorraquidiano , Proteoma/análise , Adolescente , Adulto , Biomarcadores/líquido cefalorraquidiano , Estudos de Casos e Controles , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Esclerose Múltipla/patologia , Adulto Jovem

17.

PHI-Nets: A Network Resource for Ascomycete Fungal Pathogens to Annotate and Identify Putative Virulence Interacting Proteins and siRNA Targets.

Janowska-Sejda, Elzbieta I; Lysenko, Artem; Urban, Martin; Rawlings, Chris; Tsoka, Sophia; Hammond-Kosack, Kim E.

Front Microbiol ; 10: 2721, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31866958

RESUMO

Interactions between proteins underlie all aspects of complex biological mechanisms. Therefore, methodologies based on complex network analyses can facilitate identification of promising candidate genes involved in phenotypes of interest and put this information into appropriate contexts. To facilitate discovery and gain additional insights into globally important pathogenic fungi, we have reconstructed computationally inferred interactomes using an interolog and domain-based approach for 15 diverse Ascomycete fungal species, across nine orders, specifically Aspergillus fumigatus, Bipolaris sorokiniana, Blumeria graminis f. sp. hordei, Botrytis cinerea, Colletotrichum gloeosporioides, Colletotrichum graminicola, Fusarium graminearum, Fusarium oxysporum f. sp. lycopersici, Fusarium verticillioides, Leptosphaeria maculans, Magnaporthe oryzae, Saccharomyces cerevisiae, Sclerotinia sclerotiorum, Verticillium dahliae, and Zymoseptoria tritici. Network cartography analysis was associated with functional patterns of annotated genes linked to the disease-causing ability of each pathogen. In addition, for the best annotated organism, namely F. graminearum, the distribution of annotated genes with respect to network structure was profiled using a random walk with restart algorithm, which suggested possible co-location of virulence-related genes in the protein-protein interaction network. In a second 'use case' study involving two networks, namely B. cinerea and F. graminearum, previously identified small silencing plant RNAs were mapped to their targets. The F. graminearum phenotypic network analysis implicates eight B. cinerea targets and 35 F. graminearum predicted interacting proteins as prime candidate virulence genes for further testing. All 15 networks have been made accessible for download at www.phi-base.org providing a rich resource for major crop plant pathogens.

18.

An integrative machine learning approach for prediction of toxicity-related drug safety.

Lysenko, Artem; Sharma, Alok; Boroevich, Keith A; Tsunoda, Tatsuhiko.

Life Sci Alliance ; 1(6): e201800098, 2018 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-30515477

RESUMO

Recent trends in drug development have been marked by diminishing returns caused by the escalating costs and falling rates of new drug approval. Unacceptable drug toxicity is a substantial cause of drug failure during clinical trials and the leading cause of drug withdraws after release to the market. Computational methods capable of predicting these failures can reduce the waste of resources and time devoted to the investigation of compounds that ultimately fail. We propose an original machine learning method that leverages identity of drug targets and off-targets, functional impact score computed from Gene Ontology annotations, and biological network data to predict drug toxicity. We demonstrate that our method (TargeTox) can distinguish potentially idiosyncratically toxic drugs from safe drugs and is also suitable for speculative evaluation of different target sets to support the design of optimal low-toxicity combinations.

19.

Arete - candidate gene prioritization using biological network topology with additional evidence types.

Lysenko, Artem; Boroevich, Keith Anthony; Tsunoda, Tatsuhiko.

BioData Min ; 10: 22, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28694847

RESUMO

BACKGROUND: Refinement of candidate gene lists to select the most promising candidates for further experimental verification remains an essential step between high-throughput exploratory analysis and the discovery of specific causal genes. Given the qualitative and semantic complexity of biological data, successfully addressing this challenge requires development of flexible and interoperable solutions for making the best possible use of the largest possible fraction of all available data. RESULTS: We have developed an easily accessible framework that links two established network-based gene prioritization approaches with a supporting isolation forest-based integrative ranking method. The defining feature of the method is that both topological information of the biological networks and additional sources of evidence can be considered at the same time. The implementation was realized as an app extension for the Cytoscape graph analysis suite, and therefore can further benefit from the synergy with other analysis methods available as part of this system. CONCLUSIONS: We provide efficient reference implementations of two popular gene prioritization algorithms - DIAMOnD and random walk with restart for the Cytoscape system. An extension of those methods was also developed that allows outputs of these algorithms to be combined with additional data. To demonstrate the utility of our software, we present two example disease gene prioritization application cases and show how our tool can be used to evaluate these different approaches.

20.

EpiGeNet: A Graph Database of Interdependencies Between Genetic and Epigenetic Events in Colorectal Cancer.

Balaur, Irina; Saqi, Mansoor; Barat, Ana; Lysenko, Artem; Mazein, Alexander; Rawlings, Christopher J; Ruskin, Heather J; Auffray, Charles.

J Comput Biol ; 24(10): 969-980, 2017 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-27627442

RESUMO

The development of colorectal cancer (CRC)-the third most common cancer type-has been associated with deregulations of cellular mechanisms stimulated by both genetic and epigenetic events. StatEpigen is a manually curated and annotated database, containing information on interdependencies between genetic and epigenetic signals, and specialized currently for CRC research. Although StatEpigen provides a well-developed graphical user interface for information retrieval, advanced queries involving associations between multiple concepts can benefit from more detailed graph representation of the integrated data. This can be achieved by using a graph database (NoSQL) approach. Data were extracted from StatEpigen and imported to our newly developed EpiGeNet, a graph database for storage and querying of conditional relationships between molecular (genetic and epigenetic) events observed at different stages of colorectal oncogenesis. We illustrate the enhanced capability of EpiGeNet for exploration of different queries related to colorectal tumor progression; specifically, we demonstrate the query process for (i) stage-specific molecular events, (ii) most frequently observed genetic and epigenetic interdependencies in colon adenoma, and (iii) paths connecting key genes reported in CRC and associated events. The EpiGeNet framework offers improved capability for management and visualization of data on molecular events specific to CRC initiation and progression.

Assuntos

Neoplasias Colorretais/genética , Biologia Computacional/métodos , Gráficos por Computador , Epigênese Genética , Redes Reguladoras de Genes , Software , Bases de Dados Factuais , Humanos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA